php禁止网页抓取(php的curl正常抓取页面程序：如果你抓取到的是302状态)

优采云发布时间: 2021-10-29 01:06

　　php的curl正常爬取过程如下：

$url = 'http://www.baidu.com';

$ch = curl_init();

curl_setopt($ch, curlopt_url, $url);

curl_setopt($ch, curlopt_verbose, true);

curl_setopt($ch, curlopt_header, true);

curl_setopt($ch, curlopt_nobody, true);

curl_setopt($ch, curlopt_customrequest, 'get');

curl_setopt($ch, curlopt_returntransfer, true);

curl_setopt($ch, curlopt_timeout, 20);

curl_setopt($ch, curlopt_autoreferer, true);

curl_setopt($ch, curlopt_followlocation, true);

$ret = curl_exec($ch);

$info = curl_getinfo($ch);

curl_close($ch);

　　如果抓取到302状态，那是因为在爬取过程中，有些跳转需要给下一个链接传递参数，如果没有收到相应的参数，下一个链接也被设置了，就是非法访问。

　　显示应该是正常的。

　　上面是用来抓取函数的，应该几乎没有问题。您可以查看 curlopt_customrequest 相关信息。

　　使用自定义请求消息而不是“get”或“head”作为 http 请求。这是用于执行“删除”或其他更模糊的 http 请求。有效值为“get”、“post”、“connect”等。换句话说，不要在此处输入整个 http 请求。例如输入“get /index.html http/1.0\r\n\r\n”是错误的。

0

2021-10-29

php禁止网页抓取

0 个评论

要回复文章请先登录或注册

AI时代内容工厂

php禁止网页抓取(php的curl正常抓取页面程序：如果你抓取到的是302状态)

0 个评论

发起人