c httpclient抓取网页(() )

优采云 发布时间: 2021-10-20 15:04

  c httpclient抓取网页(()

)

  1、GET 方法

  第一步是创建一个客户端,类似于用浏览器打开一个网页

  HttpClient httpClient = new HttpClient();

  第二步是创建一个GET方法来获取你需要爬取的网页的网址

  GetMethod getMethod = new GetMethod("");

  第三步,获取URL的响应状态码,200表示请求成功

  int statusCode = httpClient.executeMethod(getMethod);

  第四步,获取网页源代码

  byte[] responseBody = getMethod.getResponseBody();

  主要就是这四个步骤,当然还有很多其他的,比如网页编码的问题

   1 public static String spiderHtml() throws Exception {

2 //URL url = new URL("http://top.baidu.com/buzz?b=1");

3

4 HttpClient client = new HttpClient();

5 GetMethod method = new GetMethod("http://top.baidu.com/buzz?b=1");

6

7 int statusCode = client.executeMethod(method);

8 if(statusCode != HttpStatus.SC_OK) {

9 System.err.println("Method failed: " + method.getStatusLine());

10 }

11

12 byte[] body = method.getResponseBody();

13 String html = new String(body,"gbk");

  

2、Post方式

1 HttpClient httpClient = new HttpClient();

   2 PostMethod postMethod = new PostMethod(UrlPath);

3 postMethod.getParams().setParameter(HttpMethodParams.RETRY_HANDLER,new DefaultHttpMethodRetryHandler());

4 NameValuePair[] postData = new NameValuePair[2];

5 postData[0] = new NameValuePair("username", "xkey");

6 postData[1] = new NameValuePair("userpass", "********");

7 postMethod.setRequestBody(postData);

8 try {

9 int statusCode = httpClient.executeMethod(postMethod);

10 if (statusCode == HttpStatus.SC_OK) {

11 byte[] responseBody = postMethod.getResponseBody();

12 String html = new String(responseBody);

13 System.out.println(html);

14 }

15 } catch (Exception e) {

16 System.err.println("页面无法访问");

17 }finally{

18 postMethod.releaseConnection();

19 }

相关链接:http://blog.csdn.net/acceptedxukai/article/details/7030700

  http://www.cnblogs.com/modou/articles/1325569.html

   

0 个评论

要回复文章请先登录注册


官方客服QQ群

微信人工客服

QQ人工客服


线