httpclient 抓取网页(() )
优采云 发布时间: 2022-02-08 18:14httpclient 抓取网页(()
)
1、GET 方法
第一步是创建一个客户端,类似于使用浏览器打开网页的方式
HttpClient httpClient = new HttpClient();
第二步,创建GET方法,获取需要抓取的网页的URL
GetMethod getMethod = new GetMethod("");
第三步,获取URL的响应状态码,200表示请求成功
int statusCode = httpClient.executeMethod(getMethod);
第四步,获取网页源代码
byte[] responseBody = getMethod.getResponseBody();
主要是这四个步骤,当然还有很多其他的,比如网页编码问题
2、Post方式
1 HttpClient httpClient = new HttpClient();
2 PostMethod postMethod = new PostMethod(UrlPath);
3 postMethod.getParams().setParameter(HttpMethodParams.RETRY_HANDLER,new DefaultHttpMethodRetryHandler());
4 NameValuePair[] postData = new NameValuePair[2];
5 postData[0] = new NameValuePair("username", "xkey");
6 postData[1] = new NameValuePair("userpass", "********");
7 postMethod.setRequestBody(postData);
8 try {
9 int statusCode = httpClient.executeMethod(postMethod);
10 if (statusCode == HttpStatus.SC_OK) {
11 byte[] responseBody = postMethod.getResponseBody();
12 String html = new String(responseBody);
13 System.out.println(html);
14 }
15 } catch (Exception e) {
16 System.err.println("页面无法访问");
17 }finally{
18 postMethod.releaseConnection();
19 }
相关链接:http://blog.csdn.net/acceptedxukai/article/details/7030700
http://www.cnblogs.com/modou/articles/1325569.html