excelvba抓取网页数据(ExcelCookie的网页抓取问题-,-scrapingExcel,Vba,)

优采云 发布时间: 2022-03-28 21:20

  excelvba抓取网页数据(ExcelCookie的网页抓取问题-,-scrapingExcel,Vba,)

  Excel cookie 的 Web 抓取问题 - 仅在一次手动浏览器调用后才抓取值

  excelvbacookies网络抓取

  Excel cookie 的 Web 抓取问题 - 仅在一次手动浏览器调用、excel、vba、cookie、web-scraping、setcookie、Excel、Vba、Cookie、Web Scraping、Setcookie 后才抓取值,我正在尝试从stock 获取市场上的数据 网站,目前我使用以下方法来获取数据纯 URL:此代码工作正常但需要更长的时间 Sub OI_Pull() Application.EnableEvents = FalseApplication.ScreenUpdating = FalseApplication.DisplayAlerts = FalseDim即作为新的 InternetExplorerSet 即 = CreateObject("InternetExplorer.Ap

  我正在尝试从股票市场 网站 获取数据,目前我正在使用以下方法获取数据

  普通网址:

  此代码工作正常,但需要更长的时间

  Sub OI_Pull()

Application.EnableEvents = False

Application.ScreenUpdating = False

Application.DisplayAlerts = False

Dim ie As New InternetExplorer

Set ie = CreateObject("InternetExplorer.Application")

Dim Link As String

Link = Sheet5.Range("B5").Value

ie.Visible = True

ie.Top = 0

ie.Left = 0

ie.Width = 1000

ie.Height = 750

ie.AddressBar = 0

ie.StatusBar = 0

ie.Toolbar = 0

ie.navigate "https://www.nseindia.com/products/content/derivatives/equities/historical_fo.htm"

Do

DoEvents

Loop Until ie.readyState = READYSTATE_COMPLETE

Dim doc As HTMLDocument

Set doc = ie.Document

Dim objElement As HTMLObjectElement

Dim sDD As String

doc.Focus

'instrumentType

doc.getElementById("instrumentType").Focus 'Setting Instrument Type

doc.getElementById("instrumentType").selectedIndex = 2

doc.getElementById("instrumentType").FireEvent ("onchange")

doc.getElementById("year").Focus 'Setting Year

doc.getElementById("year").selectedIndex = 5 '2020

doc.getElementById("year").FireEvent ("onchange")

doc.getElementById("expiryDate").Focus 'Setting ExpiryDate

doc.getElementById("expiryDate").selectedIndex = 3

doc.getElementById("expiryDate").FireEvent ("onchange")

doc.getElementById("dateRange").Focus 'Date Range

doc.getElementById("dateRange").selectedIndex = 1 ' 7 days

doc.getElementById("dateRange").FireEvent ("onchange")

Dim i As Integer

For i = 1 To 428 '1 to 428

doc.getElementById("Symbol").Focus

doc.getElementById("Symbol").selectedIndex = i

doc.getElementById("Symbol").FireEvent ("onchange")

Set objElement = doc.getElementsByClassName("getdata-button")(0)

'Application.Wait Now + TimeValue("00:00:02")

objElement.Click

Application.Wait Now + TimeValue("00:00:03")

'------------------

Dim hTable As Object, hBody As Object, hTR As Object, hTD As Object

Dim tb As Object, bb As Object, tr As Object, td As Object

y = 1 'Column A in Excel

Z = 1 'Row 1 in Excel

Set hTable = doc.getElementsByTagName("Table")

'Set hTable = doc.getElementById("historicalData")

For Each tb In hTable

Set hBody = tb.getElementsByTagName("tbody")

For Each bb In hBody

Set hTR = bb.getElementsByTagName("tr")

For Each tr In hTR

Set hTD = tr.getElementsByTagName("td")

y = 1 ' Resets back to column A

For Each td In hTD

Sheets("OI Pull").Cells(Z, y).Value = td.innerText

y = y + 1

Next td

DoEvents

Z = Z + 1

Next tr

Exit For

Next bb

Exit For

Next tb

'-------------------

Application.Wait Now + TimeValue("00:00:01")

Next i

Application.Wait Now + TimeValue("00:00:01")

'ie.Quit

ie.Visible = True

Set doc = Nothing

Set ie = Nothing

Application.DisplayAlerts = True

Application.ScreenUpdating = True

Application.EnableEvents = True

End Sub

  我设置的值如下面的屏幕截图所示

  网站参数

  大约有 250 个值要抓取,目前在浏览器打开的情况下需要 20 分钟才能抓取,直到抓取完成

  所以我想让它更快,所以我尝试使用 MSXML2.XMLHTTP 方法

  如果我使用 MSXML2.XMLHTTP 方法,则接收到错误的 URL 响应文本,如果我在浏览器中打开 URL,则只有 MSXML2.XMLHTTP 方法可以正常工作

  所以这里的问题出在 cookie 上,首先传递 URL 并获取响应 cookie,然后再次传递 URL 和接收到的 cookie。我尝试使用 getallresponseheaders 获取 cookie,但始终无法获取 URL 的 cookie

0 个评论

要回复文章请先登录注册


官方客服QQ群

微信人工客服

QQ人工客服


线