网页抓取加密html(我有这个代码获取页面的HTML源代码：我想从中搜集一些内容 )

优采云发布时间: 2022-02-23 06:20

　　网页抓取加密html(我有这个代码获取页面的HTML源代码：我想从中搜集一些内容

)

　　我有这段代码来获取页面的 HTML 源代码：

　　1$page = file_get_contents('http://example.com/page.html');

2$page = htmlentities($page);

3

　　我想从中采集一些内容。例如，假设页面的源收录：

　　1technorati.com

2Connection failedPinging icerocket.com

3Connection failedPinging weblogs.com

4DonePinging newsgator.com

5DonePinging blo.gs

6DonePinging feedburner.com

7DonePinging blogstreet.com

8DonePinging my.yahoo.com

9Connection failedPinging moreover.com

10Connection failedPinging newsisfree.com

11Done

12

　　有没有办法可以从源代码中删除它并将其存储在一个变量中，所以它看起来像这样：

　　连接失败

　　完毕

　　等等。

　　因为页面是动态的，这就是我遇到问题的原因。我可以搜索源中的每个站点吗？但是那之后我如何得到结果呢？（连接失败/完成）

　　谢谢您的帮助！

　　我尝试使用简单的 HTML DOM PHP 库来抓取多个站点，可在此处获得：

　　然后使用这样的代码：

　　1find('h2') as $heading) { //for each heading

16 //find all spans with a inside then echo the found text out

17 echo preg_replace($pat, $rep, $heading->find('span a', 0)->plaintext) . "\n";

18}

19?>

20

　　这会导致类似：

　　15.8 Earthquake Hits East Coast of the US

2Origins of Lager Found In Argentina

3Inside Oregon State University's Open Source Lab

4WebAPI: Mozilla Proposes Open App Interface For Smartphones

5Using Tablets Becoming Popular Bathroom Activity

6The Syrian Government's Internet Strategy

7Deus Ex: Human Revolution Released

8Taken Over By Aliens? Google Has It Covered

9The GIMP Now Has a Working Single-Window Mode

10Zombie Cookies Just Won't Die

11Motorola's Most Important 18 Patents

12MK-1 Robotic Arm Capable of Near-Human Dexterity, Dancing

13Evangelical Scientists Debate Creation Story

14Android On HP TouchPad

15Google Street View Gets Israeli Government's Nod

16Internet Restored In Tripoli As Rebels Take Control

17GA Tech: Internet's Mid-Layers Vulnerable To Attack

18Serious Crypto Bug Found In PHP 5.3.7

19Twitter To Meet With UK Government About Riots

20EU Central Court Could Validate Software Patents

21

0

2022-02-23

网页抓取加密html

0 个评论

要回复文章请先登录或注册

AI时代内容工厂

网页抓取加密html(我有这个代码获取页面的HTML源代码：我想从中搜集一些内容 )

0 个评论

发起人

AI时代内容工厂

网页抓取 加密html(我有这个代码获取页面的HTML源代码：我想从中搜集一些内容 )

0 个评论

发起人

相关问题

网页抓取加密html(我有这个代码获取页面的HTML源代码：我想从中搜集一些内容 )