文章自动采集自动发布(运用Python和WordPress建一个属于自己的文章抓取站点介绍 )

优采云发布时间: 2021-10-14 06:21

　　文章自动采集自动发布(运用Python和WordPress建一个属于自己的文章抓取站点介绍

)

　　很多用WordPress建网站的朋友都有这样的烦恼，网站建好了，没时间自己写文章，慢慢放弃了，有的朋友采集了浏览器里很多赞博客的网站地址，因为采集的网址太多太复杂，从此我就很少点了。其实只需要几行代码，我们就可以使用Python和WordPress搭建自己的文章爬虫站点。主要目的是使用python报xmlrpc模块来编写网络爬虫。网页内容通过正则匹配爬取后，自动发布到WordPress使用xmlrpc部署的网站。然后用crond定时抓包。

　　#/usr/bin/env python

#coding=utf8

import httplib

import hashlib

import urllib

import random

import urllib2

import md5

import re

import json

import sys

import time

from lxml import html

from wordpress_xmlrpc import Client, WordPressPost

from wordpress_xmlrpc.methods.posts import NewPost

from newspaper import Article

reload(sys)

sys.setdefaultencoding('utf-8')

time1 = time.time()

#得到html的源码

def gethtml(url1):

#伪装浏览器头部

headers = {

'User-Agent':'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6'

}

req = urllib2.Request(

url = url1,

headers = headers

)

html = urllib2.urlopen(req).read()

return html

#得到目标url源码

code1 = gethtml('http://whuhan2013.github.io/archive/')

tree = html.fromstring(code1)

#print tree

targeturl=tree.xpath("//li[@class='listing-item']/a/@href")

def sends():

# print targeturl

for i in range(len(targeturl)):

#u=content1[i][0]

url="http://whuhan2013.github.io"+targeturl[i]

print url

a=Article(url,language='zh')

a.download()

a.parse()

#print a.text

dst=a.text

tag='test'

title=a.title

#print 'here2'

#链接WordPress，输入xmlrpc链接，后台账号密码

wp = Client('http://119.29.152.242/wordpress/xmlrpc.php','Ricardo','286840jjx')

#示例：wp = Client('http://www.python-cn.com/xmlrpc.php','username','password')

post = WordPressPost()

post.title = title

# post.post_type='test'

post.content = dst

post.post_status = 'publish'

#发送到WordPress

#print 'here3'

wp.call(NewPost(post))

time.sleep(3)

print 'posts updates'

if __name__=='__main__':

sends()

f1.close()

　　最后可以通过crontab定时运行程序，采集指定文章发送到WordPress

　　参考链接：使用Python实现WordPress网站*敏*感*词*自动化发布文章

　　源码：wordpress自动发布

　　访问：梁又一夜的博客

　　Wordpress 支持 Markdown 和代码高亮、丰富的文章样式、文章访问插件等

　　给力行的博客插件

　　效果如下

0

2021-10-14

文章自动采集自动发布

0 个评论

要回复文章请先登录或注册

AI时代内容工厂

文章自动采集自动发布(运用Python和WordPress建一个属于自己的文章抓取站点介绍 )

0 个评论

发起人

AI时代内容工厂

文章自动采集自动发布(运用Python和WordPress建一个属于自己的文章抓取站点介绍 )

0 个评论

发起人

相关问题