querylist采集微信公众号文章( 微信公众号爬虫关键是获取请求地址,注意:请求是需要cookies参数 )
优采云 发布时间: 2022-01-01 21:03querylist采集微信公众号文章(
微信公众号爬虫关键是获取请求地址,注意:请求是需要cookies参数
)
微信公众号爬虫的关键是获取请求的地址。这个文章是方法之一。
登录自己的公众号后台,微信官方平台,进入图文消息编辑界面,进入超链接,选择公众号文章,
搜索人民日报等公众号,会弹出最新的文章列表。这时候就可以找到对应的请求了。可以通过公众号文章翻页找到请求的参数规则。
注意:请求需要cookies参数,可以复制浏览器访问的cookies。代码如下:(cookies中的关键参数已经脱敏为“???”)
''' the key is to use cookies'''
import requests
headers={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36','referer':'https://mp.weixin.qq.com/cgi-bin/appmsg?t=media/appmsg_edit_v2&action=edit&isNew=1&type=77&createType=0&token=2101561850&lang=zh_CN'}
cookies={'appmsglist_action_3207019504': 'card', ' pgv_info': 'ssid', ' pgv_pvid': '4778161886', ' RK': '1KphnCSeRK', ' ptcz': '4a3ad775ddc10f3d9f50479110d37c6f7d5c7e8b38ebdb1e90207808173ad942', ' rewardsn': '', ' wxtokenkey': '777', ' _ga': 'GA1.2.1040497363.1629278077', ' o_cookie': '???????', ' pac_uid': '1_???????', ' tvfe_boss_uuid': 'a1a981cb70609f6e', ' vversion_name': '8.2.95', ' user_id': 'null', ' session_id': 'null',
' ua_id': 'ZpH4w1C3ipVIqGlHAAAAANiI5kknP2NeaIGodK9Opus', ' wxuin': '32385700599949', ' mm_lang': 'zh_CN', ' ptui_loginuin': '???????', ' verifysession': 'h014382c87bb4d29015296cbadb898e8e19aed8e594f786d35bf732285d003ed4c300f3ad957b8e52bf', ' video_omgid': '', ' uin': 'o0???????', ' iip': '0', ' mmad_session': 'db8fbb73a2b0a4*敏*感*词*1bc175f34a6ad7d79d7245bbd1ba1a04ed6f116b38c7c8b6*敏*感*词*a209839bd7378*敏*感*词*da98642d25827cce39f657a4b128eb2c6658eb64dad90d50adf1bdf73a0fae947e3047a489efc*敏*感*词*cd503f920e2c7f38ac8e4728189d5c2711de1c56c245721266e7088080fefde3', ' ts_uid': '8844190317', ' sig': 'h01ac912472130166d03e296461b8fba0d24e1a2bbe362cbae1470395802352c863c771017587fdabdb', ' uuid': '5d8752d7b10e69ca60b82d934f101a8c', ' rand_info': 'CAESIK4WkEF7objSg84LpN/56kispUPwx5XIFkZWGhEmjYpM', ' slave_bizuin': '3207019504', ' data_bizuin': '3226019316', ' bizuin': '3207019504', ' data_ticket': 'DaBODpqknEMzImuPqc7tT2ZR07to0GCNXX9WR2+lfcCOvPl/ZUTGnX5wAkd2yzQn', ' slave_sid': 'eFkzZVJOeXg5aHRIdHFuMlcyaUplT2JBbXVQZk5jYzB1aXM0bENQdFZUMmlwQWFvODVvX0V0MEM4cTdjWGN1NmJsYzFaTXI2YnpQZWNQNHluNjV6N1BMT3B1MWNHYU1kUWVPQU5oYTJ1eTJvb2dpU09oNG5rYk5JMGgyRFV0TnlYUFFMTDRabllhc0RLTXlL', ' slave_user': 'gh_90314c99dc76', ' xid': 'cb00dd5d681ce20868e0ffd778c1863f'}
url='https://mp.weixin.qq.com/cgi-bin/appmsg'
for page in range(0,5):
page=page*5
#fakeid 为公众号账号,此处为人民日报账号;page实现翻页
data={'action': 'list_ex', 'begin': page, 'count': '5', 'fakeid': 'MjM5MjAxNDM4MA==', 'type': '9', 'query': '', 'token': '2101561850', 'lang': 'zh_CN', 'f': 'json', 'ajax': '1'}
res=requests.get(url,headers=headers,params=data,cookies=cookies)
print(res.status_code)
app_msg_list=res.json()['app_msg_list']
# print(app_msg_list)
for message in app_msg_list:
link=message['link']
title=message['title']
print(title,link)