httpclient 抓取网页(() )

优采云 发布时间: 2022-02-08 18:14

  httpclient 抓取网页(()

)

  1、GET 方法

  第一步是创建一个客户端,类似于使用浏览器打开网页的方式

  HttpClient httpClient = new HttpClient();

  第二步,创建GET方法,获取需要抓取的网页的URL

  GetMethod getMethod = new GetMethod("");

  第三步,获取URL的响应状态码,200表示请求成功

  int statusCode = httpClient.executeMethod(getMethod);

  第四步,获取网页源代码

  byte[] responseBody = getMethod.getResponseBody();

  主要是这四个步骤,当然还有很多其他的,比如网页编码问题

  

2、Post方式

1 HttpClient httpClient = new HttpClient();

   2 PostMethod postMethod = new PostMethod(UrlPath);

3 postMethod.getParams().setParameter(HttpMethodParams.RETRY_HANDLER,new DefaultHttpMethodRetryHandler());

4 NameValuePair[] postData = new NameValuePair[2];

5 postData[0] = new NameValuePair("username", "xkey");

6 postData[1] = new NameValuePair("userpass", "********");

7 postMethod.setRequestBody(postData);

8 try {

9 int statusCode = httpClient.executeMethod(postMethod);

10 if (statusCode == HttpStatus.SC_OK) {

11 byte[] responseBody = postMethod.getResponseBody();

12 String html = new String(responseBody);

13 System.out.println(html);

14 }

15 } catch (Exception e) {

16 System.err.println("页面无法访问");

17 }finally{

18 postMethod.releaseConnection();

19 }

相关链接:http://blog.csdn.net/acceptedxukai/article/details/7030700

  http://www.cnblogs.com/modou/articles/1325569.html

   

0 个评论

要回复文章请先登录注册


官方客服QQ群

微信人工客服

QQ人工客服


线