php网页抓取( 得到的结果为:方法三:使用php的高级函数CURL)

优采云 发布时间: 2021-10-21 05:44

  php网页抓取(

得到的结果为:方法三:使用php的高级函数CURL)

  $thisurl = "http://www.lao8.org";

$html = file_get_contents($thisurl );

print_r($http_response_header);

  得到的结果是:

  Array

(

[0] => HTTP/1.1 200 OK

[1] => Cache-Control: max-age=86400

[2] => Content-Length: 76102

[3] => Content-Type: text/html

[4] => Content-Location: http://www.lao8.org/index.html

[5] => Last-Modified: Fri, 19 Jul 2013 03:52:30 GMT

[6] => Accept-Ranges: bytes

[7] => ETag: "50bc48643384ce1:5cb3"

[8] => Server: Microsoft-IIS/6.0

[9] => X-Powered-By: ASP.NET

[10] => Date: Fri, 19 Jul 2013 09:06:41 GMT

[11] => Connection: close

)

  方法三:使用stream_get_meta_data()函数

  推荐指数:★★★

  使用stream_get_meta_data()代码只需要三行:

  $thisurl = "http://www.lao8.org/";

$fp = fopen($thisurl, 'r');

print_r(stream_get_meta_data($fp));

  得到的结果是:

  Array

(

[wrapper_data] => Array

(

[0] => HTTP/1.1 200 OK

[1] => Cache-Control: max-age=86400

[2] => Content-Length: 76102

[3] => Content-Type: text/html

[4] => Content-Location: http://www.lao8.org/index.html

[5] => Last-Modified: Fri, 19 Jul 2013 03:52:30 GMT

[6] => Accept-Ranges: bytes

[7] => ETag: "50bc48643384ce1:5cb3"

[8] => Server: Microsoft-IIS/6.0

[9] => X-Powered-By: ASP.NET

[10] => Date: Fri, 19 Jul 2013 09:06:41 GMT

[11] => Connection: close

)

[wrapper_type] => http

[stream_type] => tcp_socket

[mode] => r+

[unread_bytes] => 1086

[seekable] =>

[uri] => http://www.lao8.org/

[timed_out] =>

[blocked] => 1

[eof] =>

)

  方法四:使用PHP的高级函数CURL()获取

  推荐指数:★★★★

  以上三种方法都可以获取通用的网页页眉信息。如果您想获取更详细的头信息,例如网页是否启用了 GZip 压缩。这时候就可以使用PHP的高级函数curl()来获取了。

  使用 curl 获取 header 来检测 GZip 压缩

  先贴代码:

  输出结果如下:

  HTTP/1.1 200 OK

Cache-Control: max-age=86400

Content-Length: 15189

Content-Type: text/html

Content-Encoding: gzip

Content-Location: http://www.lao8.org/index.html

Last-Modified: Fri, 19 Jul 2013 03:52:28 GMT

Accept-Ranges: bytes

ETag: "0268633384ce1:5cb3"

Vary: Accept-Encoding

Server: Microsoft-IIS/6.0

X-Powered-By: ASP.NET

Date: Fri, 19 Jul 2013 09:27:21 GMT

  可以看到使用curl获取的header信息有这一行:Content-Encoding:gzip,并且网页开启了GZip压缩。

  推荐学习:《PHP视频教程》

  以上就是php如何只抓取网页头部的详细内容。更多详情请关注龙方网络其他相关文章!

0 个评论

要回复文章请先登录注册


官方客服QQ群

微信人工客服

QQ人工客服


线