python抓取网页数据( 我尝试的实际字符串如下所示：解决方案 )

优采云发布时间: 2021-12-14 03:00

　　python抓取网页数据(

我尝试的实际字符串如下所示：解决方案

)

　　import requests

import urllib2

from bs4 import BeautifulSoup

from pprint import pprint

import pandas as pd

import bs4

url = 'https://www.namus.gov/MissingPersons/Case#/53061'

page = urllib2.urlopen(url)

soup = BeautifulSoup(page, 'html')

#print(soup.prettify())

findall = soup.find_all("a")

for link in findall:

pprint(link.get("ng-href"))

　　当我运行代码时，我设置了一个元组而不是链接。我试过引用 href、src、ng-href 和非工作。当我真的需要将谷歌地图链接作为字符串时，我只能拉取 subSection。

　　#I get this: u'{{subSection.mapLink()}}'

#when I really need this: #"http://www.google.com/maps/place/35.9467011,-84.03260329999999"

　　我试图抓取的实际字符串如下所示：

Map

　　解决方案

　　由于这是一个透视图网站，很多信息是使用Javascript动态加载的，您可以查看网络选项卡以查看从哪里检索这些数据。在本例中，这是一个带有以下模板的 JSON API：

　　https://www.namus.gov/api/CaseSets/NamUs/MissingPersons/Cases/{CASE_ID}

　　为您提供页面内嵌的所有信息，并可以动态构建地图url，例如：

　　import requests

id = '53061'

resp = requests.get('https://www.namus.gov/api/CaseSets/NamUs/MissingPersons/Cases/{}'.format(id))

body = resp.json()

loc = body['sighting']['publicGeolocation']

coord = loc['coordinates']

print("address : " + loc['formattedAddress'])

print("google map link : " + "http://www.google.com/maps/place/{},{}".format(coord['lat'],coord['lon']))

0

2021-12-14

python抓取网页数据

0 个评论

要回复文章请先登录或注册