httpunit 抓取网页(一下HttpUnit不是,不是不是不是)

优采云发布时间: 2021-10-22 14:09

　　时间

　　2015-02-12742 浏览次数

　　简介：最近在想怎么从网页中抓取需要的数据。直接使用java提供的API太麻烦了。可能有一些成熟的自动化测试web程序库中可能需要的功能，比如HttpUnit、Watij、Selenium；我现在试过HttpUnit，不是很方便，只有...

　　最近在思考如何从网页中抓取需要的数据。直接使用java提供的API太麻烦了。可能有一些成熟的自动化测试web程序库中可能需要的功能，例如HttpUnit、Watij、Selenium；现在我尝试了 HttpUnit。这不是很方便。我只能找到带有 id 的表格元素。如果没有id，就得自己处理响应流

public static void main(String[] args) {

WebClient webClient = new WebClient();

HtmlPage page = null;

try {

page = (HtmlPage) webClient.getPage("http://biz.cn.yahoo.com/stock.html");

} catch (FailingHttpStatusCodeException e) {

//e.printStackTrace();

} catch (MalformedURLException e) {

//e.printStackTrace();

} catch (IOException e) {

//e.printStackTrace();

}

WebResponse wr = page.getWebResponse();

HtmlDivision he = page.getHtmlElementById("stat1");

if (he.hasChildNodes()){

Iterator i = he.getChildElements().iterator();

while(i.hasNext()){

System.out.println(i.next());

}

System.out.println(he.getAttribute("id"));

//System.out.println(he.asXml());

Iterator i = page.getAllHtmlChildElements().iterator();

if(i.hasNext()){

HtmlElement h = i.next();

System.out.println(h.getNodeName());

}

　　网页使用使用内网单位使用抓取网页使用抓取网页网页

0

2021-10-22

httpunit 抓取网页

0 个评论

要回复文章请先登录或注册

AI时代内容工厂

httpunit 抓取网页(一下HttpUnit不是,不是不是不是)

0 个评论

发起人

AI时代内容工厂

httpunit 抓取网页(一下HttpUnit不是,不是不是不是)

0 个评论

发起人

相关问题