Python采集站长工具关键词库
优采云 发布时间: 2020-08-24 13:07建立关键词库是SEO重要工作之一。如何获得更多关键词?通常是找到一批母词,用它们作为词缀,使用拓词工具拓展出更多长尾关键词。
那么,词根从哪儿来呢?比较好的来源之一就是竞争对手的关键词库。今天就来谈一谈怎样使用Python采集竞争对手站长工具()关键词库。
经测试,在不登录站长工具网站的情况下,只能最多访问前10页的关键词列表。登陆状态下则最多访问前57页关键词列表。想要访问更多,则须要开通VIP。
目前没有开通VIP,所以就只能采集登录状态下前57页关键词,一共1140个有百度指数的关键词。示例代码如下:
<p>import re,time,requests,random
import browsercookie
class ZhanZhang(object):
def __init__(self):
self.headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36'}
self.domains = [domain.rstrip() for domain in open('domains.txt',encoding='utf-8-sig')]
def get_cookies(self):
cookies = browsercookie.chrome()
cookies = [cookie for cookie in cookies if 'chinaz.com' in str(cookie)]
cookies = requests.utils.dict_from_cookiejar(cookies)
return cookies
def curl(self,url,retries = 3,num = 1):
cookies = self.get_cookies()
try:
response = requests.get(url, cookies = cookies, headers = self.headers, timeout=5)
response.encoding = 'utf-8'
html = response.text
except:
html = None
if retries > 0:
print('请求失败,重试第%s次' % num)
return self.curl(url, retries - 1, num + 1)
return html
def page(self,host):
html = self.curl('http://rank.chinaz.com/?host=%s' % host)
time.sleep(random.uniform(1,2))
data = re.search('共(\d+?)页',html)
if data:
num = data.group(1)
if int(num) > 57:
num = 57
else:
num = 1
page = ['http://rank.chinaz.com/%s-0--%s' % (host,i) for i in range(1,int(num)+1)]
return page
def words(self,host):
words = []
pages = self.page(host)
for url in pages:
html = self.curl(url)
time.sleep(random.uniform(1,3))
kws = re.findall('([\s\S]*?)', html)
for kw in kws:
keys = re.findall('class="ellipsis block">(.*?)</a>