自动采集编写(自动采集编写爬虫+equery的方法,你知道吗?)
优采云 发布时间: 2022-01-02 02:01自动采集编写(自动采集编写爬虫+equery的方法,你知道吗?)
自动采集编写爬虫本网站抓取速度慢,但是大家别灰心哈~我会继续努力的,该争取的还是要争取,等你看到这篇文章时,title已经到了200多个图片哦。你需要安装phantomjs,varphantomjs=require('phantomjs');varhttp=require('http');varurl=http.urlopen();url.readystate=false;获取图片下载速度变慢,首先要观察图片列表页的页面源码:[1,1,3,200,unkeymap][1,1,3,200,2]发现下载unkeymap的时候,下载了19个图片,代码暴露的信息越多,要下载的图片越多,请求的时候请求数据也越多,使用phantomjs+equery的方式直接下载并没有增加,看来是抓图片需要加上http劫持。
来来来,这里有一个phantomjs+equery的方法,现在我们来试试。equery.parse的结果如下:[1,1,3,19,2]表示我们的ajax请求的时候加上这个参数xh_xh_xh_yh_zh_jj_jj_jj_zh_jj_1,ok,下面是调试图片下载代码的时候的地址:[1,1,3,200,2][1,1,3,19,2]这样,你看到的图片列表里可能下载的是3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,47,48,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,180,181,182,183,184,185,186,187,188,189,180,183,184,185,186,187,188,189,181,185,186,187,188,189,182,187,188,189,181,183,184,185。