网页表格抓取(尤利西斯敏感词教|我正在做以下页面的网络抓取工作：尤利西斯 )

优采云发布时间: 2022-04-05 08:06

　　网页表格抓取(尤利西斯*敏*感*词*教|我正在做以下页面的网络抓取工作：尤利西斯

)

　　尤利西斯人力资源

　　我正在对以下页面进行网络抓取：COVID，我需要做的是生成一个表格的 csv，该表格显示在页面上，但动态加载了我正在使用 selenium 的数据。问题是即使那样我也找不到收录以下代码的表格：

　　import requests

from bs4 import BeautifulSoup

from selenium import webdriver

from selenium.webdriver.common.keys import Keys

import time

#url of the page we want to scrape

url = "https://saluddigital.ssch.gob.mx/covid/"

# initiating the webdriver. Parameter includes the path of the webdriver.

driver = webdriver.Firefox()

driver.get(url)

# this is just to ensure that the page is loaded

html = driver.page_source

soup = BeautifulSoup(html, "html.parser")

print(len(soup.find_all("table")))

driver.close()

driver.quit()

　　当我打印表单时，我得到 0，因为它找不到它。

　　萨姆苏尔*敏*感*词*教 |

　　我也在尝试使用数据提取并生成 csv 文件。希望对您有所帮助。

　　import requests

from bs4 import BeautifulSoup

from selenium import webdriver

from selenium.webdriver.common.keys import Keys

import time

import csv

url = "https://saluddigital.ssch.gob.mx/covid/"

# initiating the webdriver. Parameter includes the path of the webdriver.

driver = webdriver.Chrome()

driver.get(url)

time.sleep(5) # delay for load properly

# # this is just to ensure that the page is loaded

html = driver.page_source

soup = BeautifulSoup(html, "html.parser")

table = soup.select_one('div.contenedor-general')

header = [[a.getText(strip=True,separator=' ')][0].split() for a in table.find_all('tr', {'class': 'header-table'})]

text1 = [t.text.strip().split() for t in soup.find_all('tr', {'class': 'ringlon-1'})]

text2 = [t.text.strip().split() for t in soup.find_all('tr', {'class': 'ringlon-2'})]

with open('outz.csv', 'w') as f:

wr = csv.writer(f, delimiter=',')

wr.writerow(header[0][1:])

for row in text1:

wr.writerow(row)

for row in text2:

wr.writerow(row)

0

2022-04-05

网页表格抓取

0 个评论

要回复文章请先登录或注册

AI时代内容工厂

网页表格抓取(尤利西斯敏感词教|我正在做以下页面的网络抓取工作：尤利西斯 )

0 个评论

发起人