python抓取网页数据(“微信运动”能够向朋友分享一个的网页数据 )
优采云 发布时间: 2021-10-09 14:32python抓取网页数据(“微信运动”能够向朋友分享一个的网页数据
)
“微信运动”可以与朋友分享一个收录运动数据的网页,网页中收录我们需要的数据。url类似于:用户的openid,它有微信体育唯一使用的openid。打开 fiddler 来抓包。先打开fiddler,再打开微信体育,点击我的主页,如下:
微信通过请求头来区分请求是否是通过微信浏览器发出的。如果直接用浏览器打开链接,会出现如下错误提示,说明不是通过微信浏览器打开的,被微信拦截了:
通过Fiddler的抓包数据,我们可以通过伪造Request Headers请求头来抓数据
Fiddler 捕获数据包并显示:
通过邮递员伪造请求头来模拟微信浏览器。伪造请求头后,在浏览器中成功获取到相应的网页内容:
Python实现代码:
import requests
import re
import json
class WechatSprot(object):
def __init__(self, openid):
self.openid = openid
def getInfo(self):
url = "http://hw.weixin.qq.com/steprank/step/personal"
querystring = {"openid": self.openid}
headers = {
\'host\': "hw.weixin.qq.com",
\'connection\': "keep-alive",
\'accept\': "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
\'user-agent\': "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36 MicroMessenger/6.5.2.501 NetType/WIFI WindowsWechat QBCore/3.43.691.400 QQBrowser/9.0.2524.400",
\'accept-encoding\': "gzip, deflate",
\'accept-language\': "zh-CN,zh;q=0.8,en-us;q=0.6,en;q=0.5;q=0.4",
\'cookie\': "hwstepranksk=JxMBWw1sxQhxnMgsJnnLh-r0VFzLH6RtJWv5b_j3z8MPs6-J; pass_ticket=p9R%2FqjIh%2BlXt%2BoxP7GIWrqm3Sbf1Minisk%2FNUz5zra4ReETR2ATI8H57zkEERCvG",
}
response = requests.request("GET", url, headers=headers, params=querystring)
res = re.findall(\'window.json = (.+);\', response.text)
# print(res)
# exit()
return json.loads(res[0])
if __name__ == "__main__":
obj = WechatSprot(用户的openid)
print(obj.getInfo())