excelvba抓取网页数据(ExcelCookie的网页抓取问题-,-scrapingExcel,Vba,)
优采云 发布时间: 2022-03-28 21:20excelvba抓取网页数据(ExcelCookie的网页抓取问题-,-scrapingExcel,Vba,)
Excel cookie 的 Web 抓取问题 - 仅在一次手动浏览器调用后才抓取值
excelvbacookies网络抓取
Excel cookie 的 Web 抓取问题 - 仅在一次手动浏览器调用、excel、vba、cookie、web-scraping、setcookie、Excel、Vba、Cookie、Web Scraping、Setcookie 后才抓取值,我正在尝试从stock 获取市场上的数据 网站,目前我使用以下方法来获取数据纯 URL:此代码工作正常但需要更长的时间 Sub OI_Pull() Application.EnableEvents = FalseApplication.ScreenUpdating = FalseApplication.DisplayAlerts = FalseDim即作为新的 InternetExplorerSet 即 = CreateObject("InternetExplorer.Ap
我正在尝试从股票市场 网站 获取数据,目前我正在使用以下方法获取数据
普通网址:
此代码工作正常,但需要更长的时间
Sub OI_Pull()
Application.EnableEvents = False
Application.ScreenUpdating = False
Application.DisplayAlerts = False
Dim ie As New InternetExplorer
Set ie = CreateObject("InternetExplorer.Application")
Dim Link As String
Link = Sheet5.Range("B5").Value
ie.Visible = True
ie.Top = 0
ie.Left = 0
ie.Width = 1000
ie.Height = 750
ie.AddressBar = 0
ie.StatusBar = 0
ie.Toolbar = 0
ie.navigate "https://www.nseindia.com/products/content/derivatives/equities/historical_fo.htm"
Do
DoEvents
Loop Until ie.readyState = READYSTATE_COMPLETE
Dim doc As HTMLDocument
Set doc = ie.Document
Dim objElement As HTMLObjectElement
Dim sDD As String
doc.Focus
'instrumentType
doc.getElementById("instrumentType").Focus 'Setting Instrument Type
doc.getElementById("instrumentType").selectedIndex = 2
doc.getElementById("instrumentType").FireEvent ("onchange")
doc.getElementById("year").Focus 'Setting Year
doc.getElementById("year").selectedIndex = 5 '2020
doc.getElementById("year").FireEvent ("onchange")
doc.getElementById("expiryDate").Focus 'Setting ExpiryDate
doc.getElementById("expiryDate").selectedIndex = 3
doc.getElementById("expiryDate").FireEvent ("onchange")
doc.getElementById("dateRange").Focus 'Date Range
doc.getElementById("dateRange").selectedIndex = 1 ' 7 days
doc.getElementById("dateRange").FireEvent ("onchange")
Dim i As Integer
For i = 1 To 428 '1 to 428
doc.getElementById("Symbol").Focus
doc.getElementById("Symbol").selectedIndex = i
doc.getElementById("Symbol").FireEvent ("onchange")
Set objElement = doc.getElementsByClassName("getdata-button")(0)
'Application.Wait Now + TimeValue("00:00:02")
objElement.Click
Application.Wait Now + TimeValue("00:00:03")
'------------------
Dim hTable As Object, hBody As Object, hTR As Object, hTD As Object
Dim tb As Object, bb As Object, tr As Object, td As Object
y = 1 'Column A in Excel
Z = 1 'Row 1 in Excel
Set hTable = doc.getElementsByTagName("Table")
'Set hTable = doc.getElementById("historicalData")
For Each tb In hTable
Set hBody = tb.getElementsByTagName("tbody")
For Each bb In hBody
Set hTR = bb.getElementsByTagName("tr")
For Each tr In hTR
Set hTD = tr.getElementsByTagName("td")
y = 1 ' Resets back to column A
For Each td In hTD
Sheets("OI Pull").Cells(Z, y).Value = td.innerText
y = y + 1
Next td
DoEvents
Z = Z + 1
Next tr
Exit For
Next bb
Exit For
Next tb
'-------------------
Application.Wait Now + TimeValue("00:00:01")
Next i
Application.Wait Now + TimeValue("00:00:01")
'ie.Quit
ie.Visible = True
Set doc = Nothing
Set ie = Nothing
Application.DisplayAlerts = True
Application.ScreenUpdating = True
Application.EnableEvents = True
End Sub
我设置的值如下面的屏幕截图所示
网站参数
大约有 250 个值要抓取,目前在浏览器打开的情况下需要 20 分钟才能抓取,直到抓取完成
所以我想让它更快,所以我尝试使用 MSXML2.XMLHTTP 方法
如果我使用 MSXML2.XMLHTTP 方法,则接收到错误的 URL 响应文本,如果我在浏览器中打开 URL,则只有 MSXML2.XMLHTTP 方法可以正常工作
所以这里的问题出在 cookie 上,首先传递 URL 并获取响应 cookie,然后再次传递 URL 和接收到的 cookie。我尝试使用 getallresponseheaders 获取 cookie,但始终无法获取 URL 的 cookie