网页表格抓取(尤利西斯*敏*感*词*教|我正在做以下页面的网络抓取工作:尤利西斯 )

优采云 发布时间: 2022-04-05 08:06

  网页表格抓取(尤利西斯*敏*感*词*教|我正在做以下页面的网络抓取工作:尤利西斯

)

  尤利西斯人力资源

  我正在对以下页面进行网络抓取:COVID,我需要做的是生成一个表格的 csv,该表格显示在页面上,但动态加载了我正在使用 selenium 的数据。问题是即使那样我也找不到收录以下代码的表格:

  import requests

from bs4 import BeautifulSoup

from selenium import webdriver

from selenium.webdriver.common.keys import Keys

import time

#url of the page we want to scrape

url = "https://saluddigital.ssch.gob.mx/covid/"

# initiating the webdriver. Parameter includes the path of the webdriver.

driver = webdriver.Firefox()

driver.get(url)

# this is just to ensure that the page is loaded

html = driver.page_source

soup = BeautifulSoup(html, "html.parser")

print(len(soup.find_all("table")))

driver.close()

driver.quit()

  当我打印表单时,我得到 0,因为它找不到它。

  萨姆苏尔*敏*感*词*教 |

  我也在尝试使用数据提取并生成 csv 文件。希望对您有所帮助。

  import requests

from bs4 import BeautifulSoup

from selenium import webdriver

from selenium.webdriver.common.keys import Keys

import time

import csv

url = "https://saluddigital.ssch.gob.mx/covid/"

# initiating the webdriver. Parameter includes the path of the webdriver.

driver = webdriver.Chrome()

driver.get(url)

time.sleep(5) # delay for load properly

# # this is just to ensure that the page is loaded

html = driver.page_source

soup = BeautifulSoup(html, "html.parser")

table = soup.select_one('div.contenedor-general')

header = [[a.getText(strip=True,separator=' ')][0].split() for a in table.find_all('tr', {'class': 'header-table'})]

text1 = [t.text.strip().split() for t in soup.find_all('tr', {'class': 'ringlon-1'})]

text2 = [t.text.strip().split() for t in soup.find_all('tr', {'class': 'ringlon-2'})]

with open('outz.csv', 'w') as f:

wr = csv.writer(f, delimiter=',')

wr.writerow(header[0][1:])

for row in text1:

wr.writerow(row)

for row in text2:

wr.writerow(row)

0 个评论

要回复文章请先登录注册


官方客服QQ群

微信人工客服

QQ人工客服


线