vba抓取网页数据(DOM资源管理器中看到的相同信息是什么意思的?)

优采云 发布时间: 2022-02-07 01:14

  vba抓取网页数据(DOM资源管理器中看到的相同信息是什么意思的?)

  问题描述

  我在为这个特定网页抓取数据时遇到了可怕的事情......基本上,当我在浏览器中加载 URL 并手动按 F12 时,我可以在 DOM Explorer 中看到“我需要的信息,但是当我尝试以编程方式做同样的事情时(见下文),HTMLDoc 不收录我在 DOM Explorer 中看到的相同信息”...

  公共子 testCode()将 IE 调暗为 SHDocVw.InternetExplorer将 HTMLDoc 变暗为 MSHTML.HTMLDocument设置 IE = 新的 SHDocVw.InternetExplorer用 IE.navigate "https://www.wunderground.com/cgi-bin/findweather/getForecast?query=EIDW"而 .Busy = True 或 .ReadyState READYSTATE_COMPLETE:温德设置 HTMLDoc = .Document结束于结束子

  有人可以帮我访问 DOM Explorer 中的信息吗?我知道 HTML 并不总是您在浏览器中看到的,而是创建您在浏览器中看到的内容的说明,但必须有一种方法可以通过 HTML 以编程方式创建 DOM...

  另外,我相信我所追求的数据是由脚本或 iFrame 生成的,但我一直无法生成我正在寻找的数据,因为它都搞砸了......

  更新

  查看下面的 DOM Explorer 图片:

  解决方案

  大纲:

  实际上,每次打开该网页时,网络浏览器都会执行几乎相同的操作。

  您可以使用下面的 VBA 代码来解析响应和输出结果。将 JSON.bas 模块导入 VBA 项目进行 JSON 处理。

  Sub TestScrapeWunderground()

Dim sContent As String

Dim sKey As String

Dim sLocation As String

Dim vJSON As Variant

Dim sState As String

Dim oDays As Object

Dim oHours As Object

Dim vDay As Variant

Dim vHour As Variant

Dim aRows() As Variant

Dim aHeader() As Variant

' GET XHR to retrieve location and key

With CreateObject("MSXML2.ServerXMLHTTP")

.Open "GET", "https://www.wunderground.com/cgi-bin/findweather/getForecast?query=EIDW", False

.Send

sContent = .responseText

End With

' Extract location and key from HTML content

sLocation = Split(Split(sContent, "var query = 'zmw:' + '", 2)(1), "'", 2)(0)

sKey = Split(Split(sContent, vbTab & "k: '", 2)(1), "'", 2)(0)

' GET XHR to retrieve JSON data

With CreateObject("MSXML2.ServerXMLHTTP")

.Open "GET", "https://api-ak-aws.wunderground.com/api/" & sKey & "/forecast10day/hourly10day/labels/conditions/astronomy10day/lang:en/units:metric/v:2.0/bestfct:1/q/zmw:" & sLocation & ".json", False

.Send

sContent = .responseText

End With

' Parse JSON response to data structure

JSON.Parse sContent, vJSON, sState

' Populate dictionaries with daily and hourly forecast data

Set oDays = CreateObject("Scripting.Dictionary")

Set oHours = CreateObject("Scripting.Dictionary")

For Each vDay In vJSON("forecast")("days")

oDays(vDay("summary")) = ""

For Each vHour In vDay("hours")

oHours(vHour) = ""

Next

Next

' Convert daily forecast data to arrays

JSON.ToArray oDays.Keys(), aRows, aHeader

' Output daily forecast data to table

With Sheets(1)

.Cells.Delete

OutputArray .Cells(1, 1), aHeader

Output2DArray .Cells(2, 1), aRows

.Columns.AutoFit

End With

' Convert hourly forecast data to arrays

JSON.ToArray oHours.Keys(), aRows, aHeader

' Output hourly forecast data to table

With Sheets(2)

.Cells.Delete

OutputArray .Cells(1, 1), aHeader

Output2DArray .Cells(2, 1), aRows

.Columns.AutoFit

End With

' Convert response data to arrays

JSON.ToArray Array(vJSON("response")), aRows, aHeader

' Output response transposed data to table

With Sheets(3)

.Cells.Delete

Output2DArray .Cells(1, 1), WorksheetFunction.Transpose(aHeader)

Output2DArray .Cells(1, 2), WorksheetFunction.Transpose(aRows)

.Columns.AutoFit

End With

' Convert current data to arrays

JSON.ToArray Array(vJSON("current_observation")), aRows, aHeader

' Output current transposed data to table

With Sheets(4)

.Cells.Delete

Output2DArray .Cells(1, 1), WorksheetFunction.Transpose(aHeader)

Output2DArray .Cells(1, 2), WorksheetFunction.Transpose(aRows)

.Columns.AutoFit

End With

' Populate dictionary with daily astronomy data

Set oDays = CreateObject("Scripting.Dictionary")

For Each vDay In vJSON("astronomy")("days")

oDays(vDay) = ""

Next

' Convert daily astronomy data to arrays

JSON.ToArray oDays.Keys(), aRows, aHeader

' Output daily astronomy transposed data to table

With Sheets(5)

.Cells.Delete

Output2DArray .Cells(1, 1), WorksheetFunction.Transpose(aHeader)

Output2DArray .Cells(1, 2), WorksheetFunction.Transpose(aRows)

.Columns.AutoFit

End With

' Convert hourly history data to arrays

JSON.ToArray vJSON("history")("days")(0)("hours"), aRows, aHeader

' Output hourly history data to table

With Sheets(6)

.Cells.Delete

OutputArray .Cells(1, 1), aHeader

Output2DArray .Cells(2, 1), aRows

.Columns.AutoFit

End With

MsgBox "Completed"

End Sub

Sub OutputArray(oDstRng As Range, aCells As Variant)

With oDstRng

.Parent.Select

With .Resize( _

1, _

UBound(aCells) - LBound(aCells) + 1)

.NumberFormat = "@"

.Value = aCells

End With

End With

End Sub

Sub Output2DArray(oDstRng As Range, aCells As Variant)

With oDstRng

.Parent.Select

With .Resize( _

UBound(aCells, 1) - LBound(aCells, 1) + 1, _

UBound(aCells, 2) - LBound(aCells, 2) + 1)

.NumberFormat = "@"

.Value = aCells

End With

End With

End Sub

  第二个 XHR 返回 JSON 数据,为了说明如何从中提取必要的数据,您可以将 JSON 保存到文件中,复制内容并将其粘贴到任何 JSON 查看器中以供进一步研究。我使用在线工具,根元素结构如下图:

  有6个主要部分,数据的相关部分被提取并输出到6个工作表(必须在运行前手动创建):

  Sheet1 - Daily forecast

Sheet2 - Horly forecast

Sheet3 - Response data (transposed)

Sheet4 - Current data (transposed)

Sheet5 - Astronomy (transposed)

Sheet6 - Hourly history data

  通过该示例,您可以从该 JSON 响应中提取所需的数据。

0 个评论

要回复文章请先登录注册


官方客服QQ群

微信人工客服

QQ人工客服


线