用Python爬取文章，并转PDF格式电子书

优采云发布时间: 2022-04-28 10:02

　　前言

　　前段时间，我在某个姓B的发了个视频，就是采集了自己的文章，转制成PDF格式的教程，CSDN居然给我举报了！！！

　　现在我来写一篇获取自己的文章，然后转制成PDF格式的电子式，看看能不能发出去

　　wkhtmltopdf [软件]，这个是必学准备好的，不然这个案例是实现不出来的获取文章内容代码

　　发送请求, 对于url地址发送请求

　　解析数据, 提取内容

　　保存数据, 先保存成html文件

　　再把html文件转成PDF

　　有疑问的同学，或者想要Python相关资料的可以加群：195242658 找管理员领取资料和*敏*感*词*解答代码实现

　　请求数据

import requests # 数据请求模块 url = f'https://blog.csdn.net/fei347795790/article/list/1' # 确定请求网址 # headers 请求头, 主要用于伪装python, 防止程序被服务器识别出来 headers = { 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.88 Safari/537.36' } # 用requests模块里面get方式发送请求 response = requests.get(url=url, headers=headers) print(response.text)

　　响应对象 200 表示请求成功

　　解析数据, 提取内容

for index in href: html_data = requests.get(url=index, headers=headers).text selector_1 = parsel.Selector(html_data) title = selector_1.css('#articleContentId::text').get() content = selector_1.css('#content_views').get() article_content = html_str.format(article=content) print(title) print(article_content) break

　　保存数据

html_path = 'html\\' + title +'.html' with open(html_path, mode='w', encoding=' utf-8') as f: f.write(article_content) print(title,'保存成功')

　　转制为pdf文件

html_path = 'html\\ + title + '.html' pdf_path = 'pdf\\' + title + '.pdf' with open(html_path, mode='w', encoding='utf-8') as f: f.write(article_content) config = pdfkit.configuration(wkhtmltopdf=r'C:\01-Software-installation\wkhtmltopdf\bin\wkhtmltopdf.exe') ppdfkit.from_file(html_path,pdf_path,configuration=config) print(title,'保存成功')

0

2022-04-28

文章内容采集

0 个评论

要回复文章请先登录或注册

AI时代内容工厂

用Python爬取文章，并转PDF格式电子书

0 个评论

发起人

AI时代内容工厂

用Python爬取文章，并转PDF格式电子书

0 个评论

发起人

相关问题