实时文章采集(开发环境开发语言Python,开发架构Scrapy,非Python莫属 )
优采云 发布时间: 2022-02-24 00:06实时文章采集(开发环境开发语言Python,开发架构Scrapy,非Python莫属
)
背景
有朋友打算拓展业务渠道,准备在众包平台接单。他的主打产品是微信小程序,所以他想尽快收到客户发来的需求信息,然后尽快联系客户,从而达成交易。只有费率才能保证,否则山枣会被其他同事接走,他的黄花菜就凉了。
开发环境、开发语言、开发框架Scrapy,无非就是Python。数据神器采集!开发工具 PyCharm;功能设计实时通知:使用邮件通知,将邮箱绑定微信,实现实时通知的效果。过滤模块:根据标题和内容双重过滤关键词,丢弃不符合要求的订单,实时通知符合要求的订单。配置模块:使用json文件配置。关键代码
# -*- coding: utf-8 -*-<br />import re<br /><br />import scrapy<br />from flask import json<br />from requests import Request<br />from scrapy import Selector<br />from .. import common<br />import time<br /><br />from selenium import webdriver<br />from selenium.webdriver.chrome.options import Options<br /><br /><br />class CodemarttaskSpider(scrapy.Spider):<br /> name = 'codemarttask'<br /> allowed_domains = ['codemart.com']<br /> start_urls = ['https://codemart.com/api/project?page=1&roleTypeId=&status=RECRUITING']<br /><br /> # 重要,需要修改 application/json ,否则默认返回的是xml数据!!!<br /> def parse(self, response):<br /> # 30 item per page<br /> # print(response.text)<br /> print(<br /> "--------------------------------------------------------------------------------------------------------")<br /> json_data = json.loads(response.text)<br /> rewards = json_data.get("rewards")<br /> print(rewards)<br /> url_prefix = "https://codemart.com/project/"<br /><br /> sended_id = common.read_taskid()<br /> max_id = sended_id<br /> print("sended_id ", sended_id)<br /> for node in rewards:<br /> id = node.get("id")<br /> id_str = str(id)<br /> name = node.get("name")<br /> description = node.get("description")<br /> price = node.get("price")<br /> roles = node.get("roles") # 招募:【roles】<br /> status = node.get("status")<br /> pubTime = node.get("pubTime")<br /> url = url_prefix + id_str<br /> print(name)<br /> print(pubTime)<br /> print(price)<br /><br /> if id > sended_id:<br /> if id > max_id:<br /> max_id = id<br /> subject = "CodeMart " + id_str + " " + name<br /> # content = price + "\n" + description + "\n" + url + "\n" + status + "\n" + roles + "\n"<br /> content = "%s <p> %s <p> < a href=%s>%s <p> %s <p> %s" % (price, description, url, url, status, roles)<br /> if common.send_mail(subject, content):<br /> print("CodeMart mail: send task sucess " % id)<br /> else:<br /> print("CodeMart mail: send task fail " % id)<br /> else:<br /> print("mail: task is already sended " % id)<br /> time.sleep(3)<br /><br /> # 记录最大id<br /> common.write_taskid(id=max_id)
def send_mail(subject, content):<br /> sender = u'xxxxx@qq.com' # 发送人邮箱<br /> passwd = u'xxxxxx' # 发送人邮箱授权码<br /> receivers = u'xxxxx@qq.com' # 收件人邮箱<br /><br /> # subject = u'一品威客 开发任务 ' #主题<br /> # content = u'这是我使用python smtplib模块和email模块自动发送的邮件' #正文<br /> try:<br /> # msg = MIMEText(content, 'plain', 'utf-8')<br /> msg = MIMEText(content, 'html', 'utf-8')<br /> msg['Subject'] = subject<br /> msg['From'] = sender<br /> msg['TO'] = receivers<br /><br /> s = smtplib.SMTP_SSL('smtp.qq.com', 465)<br /> s.set_debuglevel(1)<br /> s.login(sender, passwd)<br /> s.sendmail(sender, receivers, msg.as_string())<br /> return True<br /> except Exception as e:<br /> print(e)<br /> return False
总结
程序上线后运行稳定,达到了预期效果。订单接受率非常有效!
附:Scrapy*敏*感*词*