QueryListrules(array)规则字段解释下面几个复杂的解释

优采云 发布时间: 2021-06-07 07:37

  QueryListrules(array)规则字段解释下面几个复杂的解释

  QueryList 规则(数组 $rules)

  //采集规则$rules = array( '规则名' => array('jQuery选择器','要采集的属性'[,"标签过滤列表"][,"回调函数"]), '规则名2' => array('jQuery选择器','要采集的属性'[,"标签过滤列表"][,"回调函数"]), ..........);//注:方括号括起来的参数可选

  //采集规则$rules = [ //采集img标签的src属性,也就是采集页面中的图片链接 'name1' => ['img','src'], //采集class为content的div的纯文本内容, //并移除内容中的a标签内容,移除id为footer标签的内容,保留img标签 'name2' => ['div.content','text','-a -#footer img'], //采集第二个div的html内容,并在内容中追加了一些自定义内容 'name3' => ['div:eq(1)','html','',function($content){ $content += 'some str...'; return $content; }]];

  规则字段说明

  下面分别解释几个复杂的字段。

  1.要采集的属性

  值有以下3种:

  2.tag 过滤列表

  设置此选项可用于过滤不需要的内容。多个值用空格分隔。有如下两条规则:

   这是中文内容,这里有个链接</a> 这里有一段广告 这里还有一段广告

  获取class为article的元素的内部内容,但不想要那段广告文字,则可以设置采集规则为:

  //采集规则$rules = [ 'content' => ['.article','html','-.ad1 -.ad2']];

  意思是:采集class是article元素里面的html内容,去掉class为ad1,class为ad2的元素的内容。

  现在得到的内容是:

  这是中文内容,这里有个链接

  在实际采集中,我们一般不想要采集其他人的外链,而是想去掉内容中的链接。这时候如果把filter改成-.ad1 -.ad2 -a,采集就会到达内容为:

  这是中文内容,

  链接去掉了,但实际上我们要保存链接的文本内容,所以过滤器应该改为:-.ad1 -.ad2 a,所以采集到达的内容是:<//p

ppre class="prettyprint linenums prettyprinted" style=""ol class="linenums"li class="L0"codespan class="pun"这是中文内容,这里有个链接/span/code/li/ol/pre/p

p用法/p

ppre class="prettyprint linenums prettyprinted" style=""ol class="linenums"li class="L0"codespan class="pln"$html/spanspan class="pun"=/spanspan class="pln"STR/span/code/lili class="L1"codespan class="pun"/spanspan class="pln"div /spanspan class="kwd"class/spanspan class="pun"=/spanspan class="str""content"/spanspan class="pun"/span/code/lili class="L2"codespan class="pln" /spanspan class="str"div/span/code/lili class="L3"codespan class="pln" /spanspan class="pun"/spanspan class="pln"a href/spanspan class="pun"=/spanspan class="str""https://querylist.cc/1.html"/spanspan class="pun"这是链接一//spanspan class="pln"a/spanspan class="pun"/span/code/lili class="L4"codespan class="pln" /spanspan class="str"span/spanspan class="pun"这是文字一//spanspan class="pln"span/spanspan class="pun"/span/code/lili class="L5"codespan class="pln" /spanspan class="pun"//spanspan class="pln"div/spanspan class="pun"/span/code/lili class="L6"code/code/lili class="L7"codespan class="pln" /spanspan class="str"div/span/code/lili class="L8"codespan class="pln" /spanspan class="pun"/spanspan class="pln"a href/spanspan class="pun"=/spanspan class="str""https://querylist.cc/2.html"/spanspan class="pun"这是链接二//spanspan class="pln"a/spanspan class="pun"/span/code/lili class="L9"codespan class="pln" /spanspan class="str"span/spanspan class="pun"这是文字二//spanspan class="pln"span/spanspan class="pun"/span/code/lili class="L0"codespan class="pln" /spanspan class="pun"//spanspan class="pln"div/spanspan class="pun"/span/code/lili class="L1"code/code/lili class="L2"codespan class="pln" /spanspan class="str"div/span/code/lili class="L3"codespan class="pln" /spanspan class="pun"/spanspan class="pln"a href/spanspan class="pun"=/spanspan class="str""https://querylist.cc/1.html"/spanspan class="pun"这是链接三//spanspan class="pln"a/spanspan class="pun"/span/code/lili class="L4"codespan class="pln" /spanspan class="str"span/spanspan class="pun"这是文字三//spanspan class="pln"span/spanspan class="pun"/span/code/lili class="L5"codespan class="pln" /spanspan class="pun"//spanspan class="pln"div/spanspan class="pun"/span/code/lili class="L6"codespan class="pun"//spanspan class="pln"div/spanspan class="pun"/span/code/lili class="L7"codespan class="pln"STR/spanspan class="pun";/span/code/lili class="L8"code/code/lili class="L9"codespan class="com"//采集规则/span/code/lili class="L0"codespan class="pln"$rules /spanspan class="pun"=/spanspan class="pln" /spanspan class="pun"[/span/code/lili class="L1"codespan class="pln" /spanspan class="com"//采集a标签的href属性/span/code/lili class="L2"codespan class="pln" /spanspan class="str"'link'/spanspan class="pln" /spanspan class="pun"=/spanspan class="pln" /spanspan class="pun"[/spanspan class="str"'a'/spanspan class="pun",/spanspan class="str"'href'/spanspan class="pun"],/span/code/lili class="L3"codespan class="pln" /spanspan class="com"//采集a标签的text文本/span/code/lili class="L4"codespan class="pln" /spanspan class="str"'link_text'/spanspan class="pln" /spanspan class="pun"=/spanspan class="pln" /spanspan class="pun"[/spanspan class="str"'a'/spanspan class="pun",/spanspan class="str"'text'/spanspan class="pun"],/span/code/lili class="L5"codespan class="pln" /spanspan class="com"//采集span标签的text文本/span/code/lili class="L6"codespan class="pln" /spanspan class="str"'txt'/spanspan class="pln" /spanspan class="pun"=/spanspan class="pln" /spanspan class="pun"[/spanspan class="str"'span'/spanspan class="pun",/spanspan class="str"'text'/spanspan class="pun"]/span/code/lili class="L7"codespan class="pun"];/span/code/lili class="L8"code/code/lili class="L9"codespan class="pln"$ql /spanspan class="pun"=/spanspan class="pln" /spanspan class="typ"QueryList/spanspan class="pun"::/spanspan class="pln"html/spanspan class="pun"(/spanspan class="pln"$html/spanspan class="pun")->rules($rules)->query();$data = $ql->getData();print_r($data->all());

  采集Result:

  Array( [0] => Array ( [link] => https://querylist.cc/1.html [link_text] => 这是链接一 [txt] => 这是文字一 ) [1] => Array ( [link] => https://querylist.cc/2.html [link_text] => 这是链接二 [txt] => 这是文字二 ) [2] => Array ( [link] => https://querylist.cc/1.html [link_text] => 这是链接三 [txt] => 这是文字三 ))

0 个评论

要回复文章请先登录注册


官方客服QQ群

微信人工客服

QQ人工客服


线