实时抓取网页数据(我正在尝试从这个网页中获取实时商品价值它有一个iframe)

优采云发布时间: 2021-10-11 13:27

　　我正在尝试从此网页获取实时产品价值。它有一个 iframe 地址::8000/

　　这是我用 BeautifulSoup 尝试过的：

　　from bs4 import BeautifulSoup

#import time

import urllib

data = []

url=urllib.urlopen("http://213.136.84.136:8000/")

html=url.read()

url.close()

soup = BeautifulSoup(html,"html.parser")

span=soup.find('table', attrs={'class':'table2'})

table_body = span.find('tbody')

rows = table_body.find_all('tr')

for row in rows:

cols = row.find_all('td')

cols = [ele.text.strip() for ele in cols]

data.append([ele for ele in cols if ele]) # Get rid of empty values

print([data])

　　这是我得到的输出：

　　[[[], [u'INTERNATIONAL MARKET'], [u'SPOT Gold'], [u'SPOT Silver'], [u'CrudeOil'], [u'Copper'], [u'NaturalGas'], [u'Dow Jones'], [u'Bank Nifty'], [u'INDIAN MARKET'], [u'MCXGold'], [u'MCXSilver'], [u'MCXCrudeOil'], [u'MCXCopper'], [u'MCXLead'], [u'MCXNickel'], [u'MCXZinc'], [u'MCXNaturalGas'], [u'MCXAluminium'], [u'MCXMenthaOil'], [u'USDINR'], [], [u"Disclaimer: We can't assure any guarantee about the accuaracy of the data."]]]

　　它不返回任何引号。你知道吗？

　　以下是HTML代码：

SYMBOL

LTP

HIGH

LOW

INTERNATIONAL MARKET

SPOT Gold

SPOT Silver

CrudeOil

Copper

NaturalGas

Dow Jones

Bank Nifty

INDIAN MARKET

MCXGold

MCXSilver

MCXCrudeOil

MCXCopper

MCXLead

MCXNickel

MCXZinc

MCXNaturalGas

MCXAluminium

MCXMenthaOil

USDINR

Disclaimer: We can't assure any guarantee about the accuaracy of the data.

　　有没有人知道其他方式来获得这个实时报价？你知道吗？

0

2021-10-11

实时抓取网页数据

0 个评论

要回复文章请先登录或注册

AI时代内容工厂

实时抓取网页数据(我正在尝试从这个网页中获取实时商品价值它有一个iframe)

0 个评论

发起人