python爬虫自学笔记所写，采集豆瓣电影评分,爬取保存好评必备函数

优采云发布时间: 2021-06-16 04:03

　　文章采集器，将关键词采集下来，接下来就是上传的问题了。本教程根据python爬虫自学笔记所写，采集豆瓣电影评分,爬取保存好评必备函数。一、寻找采集源match采集的第一步就是定位采集源match，因为一旦定位采集源，然后再进行采集，速度可能会很慢，也很费时间。match函数有很多语法，使用起来比较麻烦，所以我将match函数分为两大块：一是定位采集源地址：urlstring必须指定采集上传文件夹。

　　代码如下：defmatch(path=none,url=none,name=none):"""寻找采集源地址loop"""withopen(path+url,'w')asf:matches=[f]links=[f]forlineinmatches:name=[line.strip()forlineinlinks]suggestions=[matches]withopen(file=url.write(name),'r')asformat:format=matches['name']+[namefornameinformat]ifname==url.split('\t'):url+='.'+url+'.'print('获取的时间：'+time.strftime("%y-%m-%d%h:%m:%s"))get_path=urlstring.split('')[1]url=urlstring.split('')[2].split('')[0]+'.'+url.split('')[1][0]path=[]ifpathisnone:path.append(path)forlineinmatches:matches.append(line.strip())links=[]forlinkinlinks:ifmatches[link].group()=='':suggestions.append(''+suggestions[link])try:matches=[matches[0]formatches[1]inmatchesifmatches[1].group()=='']withopen(path+url,'w')asf:formatchesinmatches:ifmatches[1]inurl:print('获取的时间：'+time.strftime("%y-%m-%d%h:%m:%s"))get_path=urlstring.split('')[1]forlinkinlinks:ifmatches[1]inurl:print('获取的时间：'+time.strftime("%y-%m-%d%h:%m:%s"))matches.append(link)else:print('未找到')withopen(path+url,'w')asf:formatchesinmatches:ifmatches[1]inurl:print('未找到')else:print('找到了')ifline.strip()inmatches:print('去掉文本')path.remove(matches[0])if__name__=='__main__':urlstring=""foriinmat。

0

2021-06-16

文章采集器

0 个评论

要回复文章请先登录或注册

AI时代内容工厂

python爬虫自学笔记所写，采集豆瓣电影评分,爬取保存好评必备函数

0 个评论

发起人

AI时代内容工厂

python爬虫自学笔记所写，采集豆瓣电影评分,爬取保存好评必备函数

0 个评论

发起人

相关问题