c httpclient抓取网页(() )

优采云发布时间: 2021-10-20 15:04

　　c httpclient抓取网页(()

)

　　1、GET 方法

　　第一步是创建一个客户端，类似于用浏览器打开一个网页

　　HttpClient httpClient = new HttpClient();

　　第二步是创建一个GET方法来获取你需要爬取的网页的网址

　　GetMethod getMethod = new GetMethod("");

　　第三步，获取URL的响应状态码，200表示请求成功

　　int statusCode = httpClient.executeMethod(getMethod);

　　第四步，获取网页源代码

　　byte[] responseBody = getMethod.getResponseBody();

　　主要就是这四个步骤，当然还有很多其他的，比如网页编码的问题

　　 1 public static String spiderHtml() throws Exception {

2 //URL url = new URL("http://top.baidu.com/buzz?b=1");

3

4 HttpClient client = new HttpClient();

5 GetMethod method = new GetMethod("http://top.baidu.com/buzz?b=1");

6

7 int statusCode = client.executeMethod(method);

8 if(statusCode != HttpStatus.SC_OK) {

9 System.err.println("Method failed: " + method.getStatusLine());

10 }

11

12 byte[] body = method.getResponseBody();

13 String html = new String(body,"gbk");

2、Post方式

1 HttpClient httpClient = new HttpClient();

　　 2 PostMethod postMethod = new PostMethod(UrlPath);

3 postMethod.getParams().setParameter(HttpMethodParams.RETRY_HANDLER,new DefaultHttpMethodRetryHandler());

4 NameValuePair[] postData = new NameValuePair[2];

5 postData[0] = new NameValuePair("username", "xkey");

6 postData[1] = new NameValuePair("userpass", "********");

7 postMethod.setRequestBody(postData);

8 try {

9 int statusCode = httpClient.executeMethod(postMethod);

10 if (statusCode == HttpStatus.SC_OK) {

11 byte[] responseBody = postMethod.getResponseBody();

12 String html = new String(responseBody);

13 System.out.println(html);

14 }

15 } catch (Exception e) {

16 System.err.println("页面无法访问");

17 }finally{

18 postMethod.releaseConnection();

19 }

相关链接：http://blog.csdn.net/acceptedxukai/article/details/7030700

　　http://www.cnblogs.com/modou/articles/1325569.html

0

2021-10-20

c httpclient抓取网页

0 个评论

要回复文章请先登录或注册

AI时代内容工厂

c httpclient抓取网页(() )

0 个评论

发起人