php抓取网页源码(我的Python代码在PHP脚本中使用的内容吗？)

优采云发布时间: 2022-03-14 15:24

　　我在抓取网络时遇到问题。我使用 PHP 进行抓取，但有一个特定的网站我必须使用 Python 来代替。我的解决方案是从 PHP 调用 Python 脚本来获取页面源代码，因为我的主脚本必须用 PHP 编写。你知道吗？

　　以下 Python 代码在我使用 Python 调用时效果很好：

　　from selenium import webdriver

from selenium.webdriver.common.desired_capabilities import DesiredCapabilities

driver = webdriver.Remote("http://127.0.0.1:4444/wd/hub", DesiredCapabilities.CHROME)

link = 'websitelink'

driver.get(link)

s = driver.page_source

print((s.encode("utf-8")))

driver.quit()

　　下面的 Python 代码是我在 PHP 脚本中使用的，因为我将网站URL 作为变量传递给它：

　　import sys

link = sys.argv[1]

from selenium import webdriver

from selenium.webdriver.common.desired_capabilities import DesiredCapabilities

driver = webdriver.Remote("http://127.0.0.1:4444/wd/hub", DesiredCapabilities.CHROME)

driver.get(link)

s = driver.page_source

print((s.encode("utf-8")))

driver.quit()

　　这是我称之为 Python 脚本的 PHP 代码：

　　$page_html = shell_exec('python3.6 test.py '.$url);

　　所以预期的结果是 page_html 将具有来自网站的页面内容。你知道吗？

　　问题是，我的网站网址有第 1 页、第 2 页、第 3 页。。在 PHP 中，我只得到第一页，但是当我只使用 Python 脚本时，我可以得到正确的结果。我在这上面浪费了 2 天；有谁知道为什么当我从 PHP 启动 Python 脚本时我没有得到整个页面？你知道吗？

0

2022-03-14

php抓取网页源码

0 个评论

要回复文章请先登录或注册