文章实时采集(开发环境开发语言Python，开发架构Scrapy，非Python莫属)

优采云发布时间: 2022-02-24 12:25

　　背景

　　有朋友打算拓展业务渠道，准备在众包平台接单。他的主打产品是微信小程序，所以他想尽快收到客户发来的需求信息，然后尽快联系客户，从而达成交易。只有费率才能保证，否则山枣会被其他同事接走，他的黄花菜就凉了。

　　开发环境、开发语言、开发框架Scrapy，无非就是Python。数据神器采集！开发工具 PyCharm；功能设计实时通知：使用邮件通知，将邮箱绑定微信，实现实时通知的效果。过滤模块：根据标题和内容双重过滤关键词，丢弃不符合要求的订单，实时通知符合要求的订单。配置模块：使用json文件配置。关键代码

# -*- coding: utf-8 -*- import re import time import scrapy from scrapy import Selector from .. import common class ZbjtaskSpider(scrapy.Spider): name = 'zbjtask' allowed_domains = ['zbj.com'] start_urls = ['https://task.zbj.com/?m=1111&so=1&ss=0&fee=1'] def parse(self, response): #30 item per page nodes = response.xpath('//div[@class="demand-card"]').getall() id_nodes = response.xpath('//a[@class="prevent-defalut-link"]/@href').getall() print(id_nodes) max_id = 0 for url in id_nodes: # //task.zbj.com/16849389/ pattern = re.compile("/\d*/$") id_str_ori = pattern.findall(url).pop() id_str = id_str_ori[1:len(id_str_ori) - 1] id = int(id_str) if id > max_id: max_id = id print(max_id) for node in nodes: date = Selector(text=node).xpath('//span[@class="card-pub-time flt"]/text()').get() url = "https:" + Selector(text=node).xpath('//a[@class="prevent-defalut-link"]/@href').get() name = Selector(text=node).xpath('//a[@class="prevent-defalut-link"]/text()').get() desc = Selector(text=node).xpath('//div[@class="demand-card-desc"]/text()').get() price = Selector(text=node).xpath('//div[@class="demand-price"]/text()').get() tag = Selector(text=node).xpath('//span[@class="demand-tags"]/i/text()').get() # //task.zbj.com/16849389/ pattern = re.compile("/\d*/$") id_str_ori = pattern.findall(url).pop() id_str = id_str_ori[1:len(id_str_ori)-1] id = int(id_str) sended_id = common.read_taskid() if id > sended_id : subject = "ZBJ " + id_str + " " + name # content = price + "\n" + desc + "\n" + url + "\n" + tag + "\n" content = "%s %s <a href=%s>%s</a> %s" % (price, desc, url, url, tag) if common.send_mail(subject, content): print("ZBJ mail: send task sucess " % id) else: print("ZBJ mail: send task fail " % id) else : print("mail: task is already sended " % id) time.sleep(3) common.write_taskid(id=max_id)

def send_mail(subject, content): sender = u'xxxxx@qq.com' # 发送人邮箱 passwd = u'xxxxxx' # 发送人邮箱授权码 receivers = u'xxxxx@qq.com' # 收件人邮箱 # subject = u'一品威客开发任务 ' #主题 # content = u'这是我使用python smtplib模块和email模块自动发送的邮件' #正文 try: # msg = MIMEText(content, 'plain', 'utf-8') msg = MIMEText(content, 'html', 'utf-8') msg['Subject'] = subject msg['From'] = sender msg['TO'] = receivers s = smtplib.SMTP_SSL('smtp.qq.com', 465) s.set_debuglevel(1) s.login(sender, passwd) s.sendmail(sender, receivers, msg.as_string()) return True except Exception as e: print(e) return False

　　总结

　　程序上线后运行稳定，达到了预期效果。订单接收率非常有效！

　　附：猪八戒平台架构图

　　附：Scrapy思维导图

　　-------------------------------------------------- -------------------------------------------------- ---------------

0

2022-02-24

文章实时采集

0 个评论

要回复文章请先登录或注册

AI时代内容工厂

文章实时采集(开发环境开发语言Python，开发架构Scrapy，非Python莫属)

0 个评论

发起人

AI时代内容工厂

文章实时采集(开发环境开发语言Python，开发架构Scrapy，非Python莫属)

0 个评论

发起人

相关问题