网页新闻抓取(如何实现从各大网抓取新闻并解析网页的方法)
优采云 发布时间: 2022-03-21 18:34网页新闻抓取(如何实现从各大网抓取新闻并解析网页的方法)
如何从各大网络获取新闻,并通过格式处理现实给我们的新闻客户端?
我使用Android客户端抓取和解析网页的方法有两种:
一、使用jsoup
没仔细研究,网上也有类似的,可以参考这两位兄弟:
二、使用htmlparser
我在项目中使用htmlparser快速解析腾讯新闻,代码如下:
<p>1Java代码 收藏代码
2public class NetUtil {
3 public static List DATALIST = new ArrayList();
4
5 public static String[][] CHANNEL_URL = new String[][] {
6 new String[]{"http://news.qq.com/world_index.shtml","http://news.qq.com"},
7 new String[]{"http://news.qq.com/china_index.shtml","http://news.qq.com"},
8 new String[]{"http://news.qq.com/society_index.shtml","http://news.qq.com"},
9 new String[]{"http://news.qq.com/china_index.shtml","http://news.qq.com"},
10 new String[]{"http://news.qq.com/china_index.shtml","http://news.qq.com"},
11 new String[]{"http://news.qq.com/china_index.shtml","http://news.qq.com"},
12 new String[]{"http://news.qq.com/china_index.shtml","http://news.qq.com"},
13 new String[]{"http://news.qq.com/china_index.shtml","http://news.qq.com"},
14 new String[]{"http://news.qq.com/china_index.shtml","http://news.qq.com"},
15 new String[]{"http://news.qq.com/china_index.shtml","http://news.qq.com"},
16 new String[]{"http://news.qq.com/china_index.shtml","http://news.qq.com"},
17 };
18
19 public static int getTechNews(List techData, int cId) {
20 int result = 0;
21 try {
22 NodeFilter filter = new AndFilter(new TagNameFilter("div"),
23 new HasAttributeFilter("id", "listZone"));
24 Parser parser = new Parser();
25 parser.setURL(CHANNEL_URL[cId][0]);
26 parser.setEncoding(parser.getEncoding());
27
28 NodeList list = parser.extractAllNodesThatMatch(filter);
29 for (int i = 0; i