网页表格抓取(尤利西斯*敏*感*词*教|我正在做以下页面的网络抓取工作:尤利西斯 )
优采云 发布时间: 2022-04-05 08:06网页表格抓取(尤利西斯*敏*感*词*教|我正在做以下页面的网络抓取工作:尤利西斯
)
尤利西斯人力资源
我正在对以下页面进行网络抓取:COVID,我需要做的是生成一个表格的 csv,该表格显示在页面上,但动态加载了我正在使用 selenium 的数据。问题是即使那样我也找不到收录以下代码的表格:
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
#url of the page we want to scrape
url = "https://saluddigital.ssch.gob.mx/covid/"
# initiating the webdriver. Parameter includes the path of the webdriver.
driver = webdriver.Firefox()
driver.get(url)
# this is just to ensure that the page is loaded
html = driver.page_source
soup = BeautifulSoup(html, "html.parser")
print(len(soup.find_all("table")))
driver.close()
driver.quit()
当我打印表单时,我得到 0,因为它找不到它。
萨姆苏尔*敏*感*词*教 |
我也在尝试使用数据提取并生成 csv 文件。希望对您有所帮助。
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
import csv
url = "https://saluddigital.ssch.gob.mx/covid/"
# initiating the webdriver. Parameter includes the path of the webdriver.
driver = webdriver.Chrome()
driver.get(url)
time.sleep(5) # delay for load properly
# # this is just to ensure that the page is loaded
html = driver.page_source
soup = BeautifulSoup(html, "html.parser")
table = soup.select_one('div.contenedor-general')
header = [[a.getText(strip=True,separator=' ')][0].split() for a in table.find_all('tr', {'class': 'header-table'})]
text1 = [t.text.strip().split() for t in soup.find_all('tr', {'class': 'ringlon-1'})]
text2 = [t.text.strip().split() for t in soup.find_all('tr', {'class': 'ringlon-2'})]
with open('outz.csv', 'w') as f:
wr = csv.writer(f, delimiter=',')
wr.writerow(header[0][1:])
for row in text1:
wr.writerow(row)
for row in text2:
wr.writerow(row)