php抓取网页json数据(json数据1.json的作用让不同编程语言之间可以进行有效的数据交流 )

优采云发布时间: 2021-11-11 18:01

　　php抓取网页json数据(json数据1.json的作用让不同编程语言之间可以进行有效的数据交流

)

　　json 数据 1. json 的作用

　　实现不同编程语言之间的有效数据通信（几乎所有高级语言都支持json数据格式）

　　2. 什么是 json

　　一个有效的json只有一个数据，唯一的数据必须是json支持的数据类型。

　　json 支持的数据类型：

　　直接写数字，例如：100、+23、-45、12.5、3e4

　　String-双引号，只有双引号，例如："yuting"、"Chongqing"、"abc\n123"

　　Boolean-only 两个值，true 和 false

　　空值-null

　　数组（列表）——相当于一个 Python 列表：[元素 1，元素 2，元素 3，...]

　　Dictionary-key 必须是字符串：{key 1: value 1, key 2: value 2,...}

　　3. json 数据分析

　　1）如果通过requests发送请求得到的数据是json数据，可以直接使用response.json()将json数据转换成Python数据。

　　2）非json接口提供的json数据解析。

　　系统模块：json -> 提供两个功能：将json数据转换为Python数据和将Python数据转换为json数据。

　　json.loads(json格式的字符串)-将json转换为Python，返回值的类型会因字符串的内容不同而不同。

　　json.dumps(python data)-将python数据转为json格式的字符串，返回值为字符串。

　　说明：json格式的字符串——字符串的内容为json格式的数据，例如：‘100’、‘"abc"’、‘{"a": 100}’

　　int、float、str、bool、list、dict、None、tuple-可以转成json数据

　　import json

# json转换python

result = json.loads('100')

print(result, type(result)) # 100

result = json.loads('"abc"')

print(result, type(result)) # abc

result = json.loads('{"a": 100}')

print(result, type(result)) # {'a': 100}

result = json.loads('[100, "abc", true, null]')

print(result, type(result)) # [100, 'abc', True, None]

# python转json

result = json.dumps(100)

print(result, type(result)) # 100

result = json.dumps('abc')

print(result, type(result)) # "abc"

result = json.dumps([100, 12.4, True, None, 'hello', [10, 'a'], {10: 20}, (100, 200)])

print(result, type(result)) # '[100, 12.4, true, null, "hello", [10, "a"], {"10": 20}, [100, 200]]'

　　获取多页数据

　　# https://movie.douban.com/top250?start=0&filter=

# https://movie.douban.com/top250?start=25&filter=

# https://movie.douban.com/top250?start=50&filter=

import requests

def get_html(url):

headers = {

'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36'

}

r = requests.get(url, headers=headers)

r.encoding = r.apparent_encoding

return r.text

def get_all_data():

for x in range(0, 226, 25):

url = f'https://movie.douban.com/top250?start={x}&filter='

get_html(url)

　　xpath 数据分析

　　运行环境：from lxml import etree

　　lxml：可以通过xpath方法解析html数据和xml数据

　　1. 创建树结构并获取根节点

　　etree.HTML（html代码）-将html页面创建成树状结构并返回树的根节点。

　　html_node = etree.HTML(open('files/网页.html', encoding='utf-8').read())

print(html_node) #

　　2.根据路径获取指定标签

　　node object.xpath(path)-返回值是一个列表，列表中的元素都是根据路径找到的结果

　　1）绝对路径-/从根节点写全路径

　　注意：绝对写法和搜索方式与谁去了前面的.xpath无关

　　h1_node = html_node.xpath('/html/body/div/h1')[0]

print(h1_node) #

p_nodes = html_node.xpath('/html/body/div/p')

print(p_nodes) # [, ]

div_node_2 = html_node.xpath('/html/body/div')[-1]

result = div_node_2.xpath('/html/body/div/p')

print(result)

　　2）相对路径：./path,. ./路径

　　使用。表示当前节点（谁去.xpath，谁是当前节点）

　　用..表示当前节点的上层节点

　　result = html_node.xpath('./body/div/p')

print(result) # [, ]

result = div_node_2.xpath('./p/text()')

print(result) # []

result = div_node_2.xpath('../div/h1/text()')

print(result)

　　3）全局搜索：//

　　//路径

<p>result = html_node.xpath('//img')

print(result) # [, , , , ]

result = html_node.xpath('//div/img')

print(result) # [,

0

2021-11-11

php抓取网页json数据

0 个评论

要回复文章请先登录或注册

AI时代内容工厂

php抓取网页json数据(json数据1.json的作用让不同编程语言之间可以进行有效的数据交流 )

0 个评论

发起人