c httpclient抓取网页(java爬取网页源代码解析搜索词的地址采用模拟地址方法)

优采云发布时间: 2021-10-08 02:18

　　java爬取网页源码解析

　　1. 搜索词的地址采用模拟地址方式（通过分析搜索引擎的参数获得，如百度），然后将搜索词添加到模拟地址中。

　　2. 函数的输入参数是模拟地址。

　　String query = URLEncoder.encode("潘珠婷", "UTF-8");

　　字符串

　　url=""+query+"&pn="+p*10+"&tn=baiduhome_pg&ie=utf-8"public void MakeQuery(String domain) {

　　试试{

　　HttpClient httpClient = new HttpClient();

　　GetMethod getMethod = new GetMethod(domain);

　　//System.out.println

　　("************************************************ ****************");

　　//System.out.println(getMethod);

　　试试{

　　httpClient.executeMethod(getMethod);

　　}catch（异常 e）{

　　System.out.println("网络问题");

　　}

　　getMethod.getParams()。 setParameter(HttpMethodParams.RETRY_HANDLER,new DefaultHttpMethodRetryHandler());

　　int statusCode = httpClient.executeMethod(getMethod);

　　if (statusCode != HttpStatus.SC_OK) {

　　System.err.println("方法失败："

　　+ getMethod.getStatusLine());

　　}

　　byte[] responseBody = getMethod.getResponseBody();

　　//System.out.println

　　("************************************************ ****************");

　　//System.out.println(responseBody);

　　String response = new String(responseBody, "UTF-8");

　　//System.out.println

　　("************************************************ ****************");

　　//System.out.println(响应);

　　//Jsoup解析html

　　文档 doc=Jsoup.parse(response);

　　//System.out.println

　　("************************************************ ****************");

　　//System.out.println(doc);

　　元素内容=doc.getElementsByClass("f");

　　for(元素内容:内容){

　　元素链接 = content.getElementsByTag("a")。第一个();

0

2021-10-08

c httpclient抓取网页

0 个评论

要回复文章请先登录或注册

AI时代内容工厂

c httpclient抓取网页(java爬取网页源代码解析搜索词的地址采用模拟地址方法)

0 个评论

发起人