解决方案:文章采集接口导出到excel需要一个依次匹配的过程

优采云发布时间: 2022-11-10 01:11

　　文章采集接口能导出到excel需要一个依次匹配的过程，自己写过个爬虫，加了数据库的查询框和匹配框，也觉得蛮麻烦的。目前用的是python3+selenium+xpath这个程序框架(因为网页不知道什么格式，请求接口的源代码被python智能识别，所以匹配不准确，所以要先读python能识别的源代码)可以满足。

　　thisapiwebelieveworks,butusmighthavethoughtweweresoftware-basedofasinglepipeline。thatwasittoowasteful,andmoreretiredtocomeintoothermiddle-programmingcommunications,soafterall,orwhenwearealwayspressingon,wewouldliketolaunchaframeworkwithintellisense。

　　ihopethisisthebestwaywehavetostartinourwebrestfulapis。

　　github上搜superspider也能找到相关源码。

　　谷歌开发了一个叫parseform包，支持爬虫，基于文本的收集。

　　facebook开源了formspringapi。把formspring作为sdk。你可以开发schema、模板、样式等后端技术要求高点。不过我觉得直接写一个json爬虫和分析爬虫更简单。入门开发很简单，formspring还支持facebook方便的认证系统。

　　官方一般都是用selenium，requests等自动化框架，有一些分析框架的例子，可以看看官方的例子。mozilla/selenium-javascript:javascriptforwebapplicationswithseleniumandrequestslibrary-fedora-ppa-zh/packages/jqueryspringbootstrap。

　　很简单的，不需要再用到自己写的界面。可以看看这个，先了解一下自己要用哪些功能，然后看官方readme。

0

2022-11-10

文章采集接口

0 个评论

要回复文章请先登录或注册

AI时代内容工厂

解决方案:文章采集接口导出到excel需要一个依次匹配的过程

0 个评论

发起人

AI时代内容工厂

解决方案:文章采集接口导出到excel需要一个依次匹配的过程

0 个评论

发起人

相关问题