2020最新的Python进阶资料和高级开发教程，欢迎加入

优采云发布时间: 2021-05-07 19:16

　　python 采集网站数据，本教程使用刮板蜘蛛

　　1、安装Scrapy框架

　　命令行执行：

　　 pip install scrapy

　　如果已安装的scrapy依赖软件包与您最初安装的其他python软件包冲突，建议使用Virtualenv进行安装

　　安装完成后，只需找到一个文件夹即可创建采集器

　　scrapy startproject 你的蜘蛛名称

　　文件夹目录

　　爬虫规则写在蜘蛛目录中

　　需要爬网的items.py数据

　　pipelines.py-执行数据保存

　　设置配置

　　middlewares.py-downloader

　　以下是采集小说网站

　　的源代码

　　首先在items.py中定义采集的数据

　　# author 小白

import scrapy

class BookspiderItem(scrapy.Item):

# define the fields for your item here like:

i = scrapy.Field()

book_name = scrapy.Field()

book_img = scrapy.Field()

book_author = scrapy.Field()

book_last_chapter = scrapy.Field()

book_last_time = scrapy.Field()

book_list_name = scrapy.Field()

book_content = scrapy.Field()

pass

　　编写采集条规则

　　保存数据

　　import os

class BookspiderPipeline(object):

def process_item(self, item, spider):

curPath = 'E:/小说/'

tempPath = str(item['book_name'])

targetPath = curPath + tempPath

if not os.path.exists(targetPath):

os.makedirs(targetPath)

book_list_name = str(str(item['i'])+item['book_list_name'])

filename_path = targetPath+'/'+book_list_name+'.txt'

print('------------')

print(filename_path)

with open(filename_path,'a',encoding='utf-8') as f:

f.write(item['book_content'])

return item

　　执行

　　scrapy crawl BookSpider

　　您可以完成一个新颖的程序采集

　　推荐这里

　　scrapy shell 爬取的网页url

　　然后response.css（''）测试规则是否正确

　　我仍然推荐我自己建立的Python开发学习小组：810735403。该小组都在学习Python开发。如果您正在学习Python，欢迎加入。每个人都是软件开发人员，并且不时分享干货。（仅与Python软件开发相关），包括我自己在2020年编写的最新Python高级材料和高级开发教程的副本，欢迎高级用户和想要深入Python的人学习！

0

2021-05-07

完整的采集神器

0 个评论

要回复文章请先登录或注册

AI时代内容工厂

2020最新的Python进阶资料和高级开发教程，欢迎加入

0 个评论

发起人

AI时代内容工厂

2020最新的Python进阶资料和高级开发教程，欢迎加入

0 个评论

发起人

相关问题