的第二步，没有或者不知道怎么采集

优采云发布时间: 2021-06-20 19:46

　　的第二步，没有或者不知道怎么采集

　　文章采集文章采集，这是爬虫的第二步，没有或者不知道怎么采集，用文章采集是最佳选择，爬虫的第一步就是文章采集，但是从工程的角度来看，还不必这么麻烦。第一步，设置采集，以list为例子，post请求给file服务器发送一个key，提交一个user-agent，能得到以下内容"class="handleintentspider">"""""""""data={"user-agent":"mozilla/5.0(windowsnt6.1;wow64)applewebkit/537.36(khtml,likegecko)chrome/72.0.2739.132safari/537.36","content-type":"application/x-www-form-urlencoded;charset=utf-8","authorization":"zhangjnxcqdtvgxwdpfanf8kzuzgw,bvlzp9nfkgqhbwxzyzjf38ejebsi","imageurl":"[]"};v={"content":{"header":{"content-type":"application/x-www-form-urlencoded;charset=utf-8","imgurl":"[]"}}};v.post({username:"小二",password:"phd",data:{username:"xxxxxx",password:"xxxxxx"}});这个form就是一个post请求，提交一个userid和password字段。

　　等到爬虫运行完，服务器返回内容后，就可以看到所有的页面的url。第二步，request如果刚才提交的请求，网站返回了内容，那么在这里选择request，然后设置请求格式：get,post,head，分别设置三个字段，和代理userid和password字段：r。

0

2021-06-20

文章采集文章采集

0 个评论

要回复文章请先登录或注册

AI时代内容工厂

的第二步，没有或者不知道怎么采集

0 个评论

发起人

AI时代内容工厂

的第二步，没有或者不知道怎么采集

0 个评论

发起人

相关问题