网页新闻抓取(如何实现从各大网抓取新闻并解析网页的方法)

优采云 发布时间: 2022-03-21 18:34

  网页新闻抓取(如何实现从各大网抓取新闻并解析网页的方法)

  如何从各大网络获取新闻,并通过格式处理现实给我们的新闻客户端?

  我使用Android客户端抓取和解析网页的方法有两种:

  一、使用jsoup

  没仔细研究,网上也有类似的,可以参考这两位兄弟:

  二、使用htmlparser

  我在项目中使用htmlparser快速解析腾讯新闻,代码如下:

<p>1Java代码 收藏代码

2public class NetUtil {

3 public static List DATALIST = new ArrayList();

4

5 public static String[][] CHANNEL_URL = new String[][] {

6 new String[]{"http://news.qq.com/world_index.shtml","http://news.qq.com"},

7 new String[]{"http://news.qq.com/china_index.shtml","http://news.qq.com"},

8 new String[]{"http://news.qq.com/society_index.shtml","http://news.qq.com"},

9 new String[]{"http://news.qq.com/china_index.shtml","http://news.qq.com"},

10 new String[]{"http://news.qq.com/china_index.shtml","http://news.qq.com"},

11 new String[]{"http://news.qq.com/china_index.shtml","http://news.qq.com"},

12 new String[]{"http://news.qq.com/china_index.shtml","http://news.qq.com"},

13 new String[]{"http://news.qq.com/china_index.shtml","http://news.qq.com"},

14 new String[]{"http://news.qq.com/china_index.shtml","http://news.qq.com"},

15 new String[]{"http://news.qq.com/china_index.shtml","http://news.qq.com"},

16 new String[]{"http://news.qq.com/china_index.shtml","http://news.qq.com"},

17 };

18

19 public static int getTechNews(List techData, int cId) {

20 int result = 0;

21 try {

22 NodeFilter filter = new AndFilter(new TagNameFilter("div"),

23 new HasAttributeFilter("id", "listZone"));

24 Parser parser = new Parser();

25 parser.setURL(CHANNEL_URL[cId][0]);

26 parser.setEncoding(parser.getEncoding());

27

28 NodeList list = parser.extractAllNodesThatMatch(filter);

29 for (int i = 0; i

0 个评论

要回复文章请先登录注册


官方客服QQ群

微信人工客服

QQ人工客服


线