技巧:前嗅ForeSpider脚本教程：频道脚本使用场景及配置关键词搜索

优采云发布时间: 2022-11-27 09:18

　　今天小编为大家带来的教程是：ForeSpider脚本教程中频道脚本的应用场景和脚本配置关键词搜索实战教程。具体内容如下：

　　一、频道脚本使用场景

　　当您需要手动创建采集源列表，或完全使用脚本采集数据时，在“Channel Script”中，您可以使用classes extractor和result。

　　你可以定义一个类的对象来使用它的成员方法，或者使用EXTRACT和RESULT这两个全局对象。

　　关于这两个类的详细说明以及本场景的更多示例，可以点击“教程->脚本教程->脚本示例->频道脚本”查看相关内容。

　　2.脚本配置关键词搜索

　　1、关键词不收录

验证码

　　场景：当某类链接只需要替换部分字符串时，即可获取到目标链接。可以将这部分字符串作为关键词，添加频道脚本，达到提取链接的目的。

　　示例：暂时没有。

　　2.关键词收录

验证码

　　场景：当无法通过可视化方式获取到验证码刷新事件时，需要手动找到验证码刷新请求，填入相应的输入框。同时在参数列表中添加验证码参数。

　　示例：采集

京东店铺的商家信息。

" />

　　使用开发者工具（以搜狗浏览器为例，在浏览器中按F12打开）抓包，在目标网页右击“Inspect Element”，选择“NetWork”，首先清空所有缓存信息，点击验证码图片，找到验证码刷新请求：

　　“”。

　　可以发现每次刷新验证码时，请求链接中random参数的值都不一样，所以需要找到random产生的js事件。在网页源代码中找到如下代码。

　　在文本框中填写js刷新事件。值得注意的是，只需要修改this.src的值即可。

　　脚本示例：

　　var key = EXTRACT.GetSearch(this);

var form = key.Search();

url u;

var postData;

while(form){

var ocrCode = form.verifyCode;

u.urlname = "https://mall.jd.com/"+"showLicence-"+form.text+".html";

u.title = ocrCode;

<p>

" />

u.entryid = this.id;

u.tmplid = 1;

postData = "verifyCode="+ocrCode;

var d = EXTRACT.OpenDoc(this,u.urlname,postData,0);

if(d){

this.Run(d,1);

EXTRACT.CloseDoc(d);

}

form = key.Search();

}

key.End();

</p>

　　解决方案:【壁纸小程序】搭建自己的壁纸小程序-微信抖音双端

　　前端使用uni-app，后端使用wordpress

　　1.前端展示

　　二、实现原理简析 1.wordpress后端

　　(1) Wordpress先建立分类（一级分类就够了）和标签；

　　（2）Wordpress创建文章，文章内容为图片，一般一篇文章放3~5张图片；然后设置分类；

　　(3) 发表文章；

　　(4)设置在Geek API中显示的类别；

　　可以修改jike-api-controller.php的第86行，将by ID desc limit 6的6改为3，这样可以展示更多的分类。.

　　$sql="SELECT ID,post_title,post_content FROM wp_posts,wp_term_relationships,wp_term_taxonomy WHERE ID=object_id and wp_term_relationships.term_taxonomy_id = wp_term_taxonomy.term_taxonomy_id and post_type='post' and post_status = 'publish' and wp_term_relationships.term_taxonomy_id = $CID and taxonomy = 'category' order by ID desc limit 3";

　　2.前端统一应用

　　(1)修改域名，前端通过API获取分类内容，设置内容，然后负责展示

　　3.如何自动发布文章

　　手动发布文章是一项劳动密集型任务。作为程序员，一定要偷懒，所以可以使用优采云

等采集工具自动采集发布文章，也可以使用wordpress的restful api + python自动发布文章。

　　安装 WP-API 插件的 JWT 身份验证

　　(1)根据jwt文档配置服务器

　　(2) 获取令牌

　　图片准备好了

　　这里的规则是每3张图片对应一篇文章；

　　文件夹下的图片都是同一个类别，同一个标签；一类是一个文件夹

　　使用python脚本自动发布

　　#!/usr/bin/python3

# -*- coding: utf-8 -*-

import os

import requests

import json

import datetime

def post_3_image_fotmat(img1, img2, img3):

line1 = "\n\n"

line2 = ""

img_line1 = img1

endline2 = "\n\n\n\n"

line3 = ""

img_line2 = img2

endline3 = "\n\n\n\n"

line4 = ""

img_line3 = img3

endline4 = "\n\n\n\n"

endline1 = "\n"

return line1 + line2 + img_line1 + endline2 + line3 + img_line2 + endline3 + line4 + img_line3 + endline4 + endline1

def file_name(file_dir):

D={}

# for root, dirs, files in os.walk(file_dir):

for file in os.listdir(file_dir):

img_unicode = file.encode("utf-8")

if os.path.splitext(file)[1] == '.jpeg' or os.path.splitext(file)[1] == '.jpg' or os.path.splitext(file)[1] == '.png' or os.path.splitext(file)[1] == '.webp':

<p>

" />

D[img_unicode] = "image/" + os.path.splitext(file)[1][1:]

return D

end_point_url = "https://你自己的域名/wp-json/wp/v2/posts"

upload_img_url = "https://你自己的域名/wp-json/wp/v2/media"

my_token = "" #修改成你自己的

# 1. 先发布一份草稿，获取post_id

p_title = str(int(datetime.datetime.now().timestamp()))

p_content = "null"

p_categories = 6 # 这里可以查看你wordpress 里面的分类id，然后再回来填

# 例如，点击编辑某个分类，url将会是这样 https:///term.php?taxonomy=category&tag_ID=6category， tag_ID=6 后面的数字即是分类id，下面的tag同理

p_tags = 5

pre_post_payload = {

'title': p_title,

'content': p_content,

'categories': p_categories,

'tags': p_tags,

}

pre_post_header = {'content-type': "Application/json",

'Authorization': my_token,

'cache-control': "no-cache"}

r = requests.post(end_point_url, data=json.dumps(pre_post_payload),

headers=pre_post_header)

pre_post_id = json.loads(r.text)["id"]

d = file_name("./")

up_load_img_list = []

up_load_img_id = []

#2 上传图片, post的参数从第一步的 pre_post_id 获取

for img_file,img_type in d.items():

img_file_name = str(datetime.datetime.now().timestamp()) + os.path.splitext(img_file.decode("utf-8"))[1]

header = {'content-type': img_type,

'Authorization': my_token,

'cache-control': "no-cache",

'Content-Disposition':'attachent;filename=%s'% img_file_name }

post = {

'post': pre_post_id

}

data = open(img_file.decode("utf-8"), 'rb').read()

print(img_file.decode("utf-8") + " vs " + img_file_name)

r = requests.post(upload_img_url, data=data,

headers=header)

json_r = json.loads(r.text)

print(json_r)

#print("data-id: ", json_r["id"])

" />

#p_data["data-id"] = json_r["id"]

my_str = json_r["description"]["rendered"]

img_start_tag_index = my_str.find(')

img_end_tag_index = my_str.find('/>', img_start_tag_index)

data_id = " data-id=%s " % json_r["id"]

up_load_img_id.append(json_r["id"])

new_str = my_str[img_start_tag_index:img_end_tag_index] + data_id + '/>'

print(new_str)

up_load_img_list.append(new_str)

# 3. 关联

modify_post_header = {'content-type': "Application/json",

'Authorization': my_token,

'cache-control': "no-cache",

'Content-Disposition':'attachent;filename=%s'% img_file_name}

modify_url = upload_img_url + "/" + str(json_r["id"])

r = requests.post(modify_url, headers=modify_post_header, json = post)

p_content = post_3_image_fotmat(up_load_img_list[0], up_load_img_list[1], up_load_img_list[2])

modify_point_url = end_point_url + "/%s"%pre_post_id

wp_link = {

'wp:attachment': [

{'href': upload_img_url + "?parent=%s"%pre_post_id }

]

}

# 正式发布

payload = {

'id': pre_post_id,

'status': "publish",

'title': p_title,

'content': p_content,

'categories': p_categories,

'tags': p_tags,

'_links': wp_link

}

header = {'content-type': "Application/json",

'Authorization': my_token,

'cache-control': "no-cache"}

r = requests.post(modify_point_url, data=json.dumps(payload),

headers=header)

#print(r.text)

</p>

　　目前还有一个小问题，图片上传后会自动裁剪，提供下载时需要使用原图，改进

0

2022-11-27

关键词文章采集源码

0 个评论

要回复文章请先登录或注册

AI时代内容工厂

技巧:前嗅ForeSpider脚本教程：频道脚本使用场景及配置关键词搜索

0 个评论

发起人

AI时代内容工厂

技巧:前嗅ForeSpider脚本教程：频道脚本使用场景及配置关键词搜索

0 个评论

发起人

相关问题