python网页数据抓取( 有些函数动态加载网页数据的安装方法-有些网页 )

优采云发布时间: 2021-12-08 03:10

　　python网页数据抓取(

有些函数动态加载网页数据的安装方法-有些网页

)

　　有些网页不是静态加载的，而是通过javascipt函数动态加载的。例如，在下面的网页中，通过javascirpt函数从后台加载了表中看涨合约和看跌合约的数据。仅使用beautifulsoup 无法捕获此表中的数据。

　　查资料，发现可以用PhantomJS爬取这类网页的数据。但 PhantomJS 主要用于 Java。如果要在python中使用，需要通过Selenium在python中调用PhantomJS。写代码的时候主要参考这个网页：Is there a way to use PhantomJS in Python?

　　Selenium 是一个浏览器虚拟器，可以通过 Selenium 模拟各种浏览器上的各种行为。python中使用PhantomJS通过Selenium获取动态网页数据时需要安装以下库：

　　1. Beautifulsoup，用于解析网页内容

　　2. Node.js

　　3. 安装Node.js后通过Node.js安装PhantomJS。在Mac终端输入npm -g install phantomjs（Windows下cmd也一样）

　　4. 安装 Selenium

　　完成以上四步后，就可以在python中使用PhantomJS了。

　　代码显示如下：

<p># -*- coding: utf-8 -*-

from bs4 import BeautifulSoup

from selenium import webdriver

from selenium.webdriver.support.ui import WebDriverWait

from selenium.webdriver.common.by import By

from selenium.webdriver.support import expected_conditions as EC

import urllib2

import time

baseUrl = "http://stock.finance.sina.com.cn/option/quotes.html"

csvPath = "FinanceData.csv"

csvFile = open(csvPath, 'w')

def is_chinese(uchar):

# 判断一个unicode是否是汉字

if uchar >= u'\u4e00' and uchar= u'\u4e00' and uchar

0

2021-12-08

python网页数据抓取

0 个评论

要回复文章请先登录或注册

AI时代内容工厂

python网页数据抓取( 有些函数动态加载网页数据的安装方法-有些网页 )

0 个评论

发起人