话题：js 抓取网页内容 - 自动文章采集器-优采云官网

js抓取网页内容方法简单的下载数据的案例分析方法

网站优化 • 优采云发表了文章 • 0 个评论 • 174 次浏览 • 2022-09-17 00:00 • 来自相关话题

　　js抓取网页内容方法简单的下载数据的案例分析方法
　　js抓取网页内容方法简单，而且如果抓取得网页尺寸小的话，也能保证抓取速度；不仅如此，还能够避免自己的爬虫被搜索引擎检测到，也不会泄露自己的信息；如果在抓取网页时，你的网页反反复复出现同一个页面，就能够增强抓取的速度，当然它和搜索引擎是一样的，搜索引擎一旦发现这个页面是爬虫抓取，就会自动在网页进行重定向，这样就不会再出现你抓取的网页；利用html4j爬取网页内容，类似与以前的favicon图标提取，但是它支持更多的条件，比如手机和pc端网页都支持抓取到；。
　　一、一个简单的下载数据的案例这个案例用的是网页反反复复出现同一个页面，
　　1、在浏览器中输入网址：百度();url=index#
　　
　　2、按下回车键，页面中就会出现查询页面，
　　3、选择图片后，再把鼠标悬停在url中的图片上，选择它展开即可，
　　4、最后按下回车键，页面中就会展开出抓取内容页面，这个下载页面就是保存图片的网页。
　　当你爬去一个网站的页面的时候，你需要采用浏览器，因为如果你用爬虫在本地提取，可能会丢失隐私数据，而你使用浏览器是最简单的。这里不得不提一下采用http协议的爬虫，那么也就意味着你会采用get或post方式来进行交互，但如果我们有一个admin账号和密码，那么就需要额外设置密码和在浏览器中进行登录，来对页面进行记录。如何使用http协议爬虫？。
　　
　　1、在ie等浏览器中输入地址：百度();url=index#然后点击搜索，就会出现搜索结果页，页面出现后输入一个手机号和验证码再点登录，就会登录成功，把一个页面打开按下回车键，我们会看到在网页中插入了一个公众号入口，点击这个链接，如果我们的url没有变动，它就会自动跳转，当然你可以自定义跳转路径，而我们所需要去获取的url地址就被封装成了一个标签token。
　　2、在浏览器中输入url：'/'/info/thumb-xxx/get-post""&s=30328482&xxx=re:ehr00141"&yxx=re:ehr00141"&zxx=re:ehr00141"#''，xxx是中文冒号，url是一个标准的请求url，但是作为工具来说，你可以设置一个函数进行格式化输出格式，很方便。
　　注意：设置url参数的text:""字符串，只有url参数时使用！！-alias-ip_abcd'取代set-ip规则或者是xxx.xxx.xxx.xxx;ip或者xxx.xxx.xxx.xxx'。
　　3、在浏览器中输入url：""，查看全部

　　js抓取网页内容方法简单的下载数据的案例分析方法
　　js抓取网页内容方法简单，而且如果抓取得网页尺寸小的话，也能保证抓取速度；不仅如此，还能够避免自己的爬虫被搜索引擎检测到，也不会泄露自己的信息；如果在抓取网页时，你的网页反反复复出现同一个页面，就能够增强抓取的速度，当然它和搜索引擎是一样的，搜索引擎一旦发现这个页面是爬虫抓取，就会自动在网页进行重定向，这样就不会再出现你抓取的网页；利用html4j爬取网页内容，类似与以前的favicon图标提取，但是它支持更多的条件，比如手机和pc端网页都支持抓取到；。
　　一、一个简单的下载数据的案例这个案例用的是网页反反复复出现同一个页面，
　　1、在浏览器中输入网址：百度();url=index#
　　

　　2、按下回车键，页面中就会出现查询页面，
　　3、选择图片后，再把鼠标悬停在url中的图片上，选择它展开即可，
　　4、最后按下回车键，页面中就会展开出抓取内容页面，这个下载页面就是保存图片的网页。
　　当你爬去一个网站的页面的时候，你需要采用浏览器，因为如果你用爬虫在本地提取，可能会丢失隐私数据，而你使用浏览器是最简单的。这里不得不提一下采用http协议的爬虫，那么也就意味着你会采用get或post方式来进行交互，但如果我们有一个admin账号和密码，那么就需要额外设置密码和在浏览器中进行登录，来对页面进行记录。如何使用http协议爬虫？。
　　

　　1、在ie等浏览器中输入地址：百度();url=index#然后点击搜索，就会出现搜索结果页，页面出现后输入一个手机号和验证码再点登录，就会登录成功，把一个页面打开按下回车键，我们会看到在网页中插入了一个公众号入口，点击这个链接，如果我们的url没有变动，它就会自动跳转，当然你可以自定义跳转路径，而我们所需要去获取的url地址就被封装成了一个标签token。
　　2、在浏览器中输入url：'/'/info/thumb-xxx/get-post""&s=30328482&xxx=re:ehr00141"&yxx=re:ehr00141"&zxx=re:ehr00141"#''，xxx是中文冒号，url是一个标准的请求url，但是作为工具来说，你可以设置一个函数进行格式化输出格式，很方便。
　　注意：设置url参数的text:""字符串，只有url参数时使用！！-alias-ip_abcd'取代set-ip规则或者是xxx.xxx.xxx.xxx;ip或者xxx.xxx.xxx.xxx'。
　　3、在浏览器中输入url：""，

js抓取网页内容成为经典的做法是利用extract函数，该函数作用

网站优化 • 优采云发表了文章 • 0 个评论 • 117 次浏览 • 2022-07-30 14:05 • 来自相关话题

　　js抓取网页内容成为经典的做法是利用extract函数，该函数作用
　　js抓取网页内容成为经典的做法是利用extract函数，该函数作用在excel表中的列中查找数据，返回txt文件。如果涉及到转化成pdf，extract函数不能成功的话，很多情况下是由于文件不是excel文件，这时候你必须使用pdftex-particle，一款开源脚本工具，从该脚本中可以读取excel中的数据，然后将它变成pdf格式的数据。在python中安装extract，可以使用如下命令：pipinstallextract。
　　一、读取excel的话，需要使用excelhome提供的dataframe。excel，下面是其基本的用法：dataframe。excel()读取excel转换为pdf的方法有两种：fromnumpyimport*>>>importnumpyasnp>>>frompandasimportdataframe>>>frompdftextimportparticle>>>ss=particle('upward',{'type':'horizontal','data':np。to_dtype(dataframe)})。
　　
　　二、解析excel数据利用pandas读取excel数据一般可以在解析excel数据的常用方法中查看到。
　　三、excel中格式及数据类型转换成pdf我们分别把excel中的三个列列转换成一个单元格。
　　1、把excel的列转换成pdf中的一列首先，我们可以把数据转换成pdf中的一个单元格，然后再进行下一步操作。#由url格式转换成html格式>>>excel=pd.excelfile('excel.xlsx')>>>list然后，我们可以根据html格式中的公式将html中的列格式化成pdf格式，然后转换出单元格，这里用ipythonnotebook软件可以实现。
　　
　　>>>excel.append(html,'form')>>>excel.append(html,'sheet')>>>excel.append(html,'table')>>>excel.append(excel.html_split_placement_format.perm)把html格式转换成pdf格式的excel中的例子非常多，不一一说明了。
　　2、把excel的列转换成一个全是pdf单元格的列表函数中用到函数excellib.spreadsheet()，这个函数用来处理各种pdf格式的数据。>>>excellib.spreadsheet(1,{'state':0,'education':15,'statment':2})#当前对象含1列excellib.spreadsheet({'state':0,'education':15,'statment':2})#当前对象含15列excellib.spreadsheet({'state':0,'education':15,'statment':2})#当前对象含2列函数中用到函数size()，通过size函数可以获取excel列的数量，这个函数不在数据文件中处理，在pdftext中使用size()功能。
　　3、把excel中的列转换成查看全部

　　js抓取网页内容成为经典的做法是利用extract函数，该函数作用
　　js抓取网页内容成为经典的做法是利用extract函数，该函数作用在excel表中的列中查找数据，返回txt文件。如果涉及到转化成pdf，extract函数不能成功的话，很多情况下是由于文件不是excel文件，这时候你必须使用pdftex-particle，一款开源脚本工具，从该脚本中可以读取excel中的数据，然后将它变成pdf格式的数据。在python中安装extract，可以使用如下命令：pipinstallextract。
　　一、读取excel的话，需要使用excelhome提供的dataframe。excel，下面是其基本的用法：dataframe。excel()读取excel转换为pdf的方法有两种：fromnumpyimport*>>>importnumpyasnp>>>frompandasimportdataframe>>>frompdftextimportparticle>>>ss=particle('upward',{'type':'horizontal','data':np。to_dtype(dataframe)})。
　　

　　二、解析excel数据利用pandas读取excel数据一般可以在解析excel数据的常用方法中查看到。
　　三、excel中格式及数据类型转换成pdf我们分别把excel中的三个列列转换成一个单元格。
　　1、把excel的列转换成pdf中的一列首先，我们可以把数据转换成pdf中的一个单元格，然后再进行下一步操作。#由url格式转换成html格式>>>excel=pd.excelfile('excel.xlsx')>>>list然后，我们可以根据html格式中的公式将html中的列格式化成pdf格式，然后转换出单元格，这里用ipythonnotebook软件可以实现。
　　

　　>>>excel.append(html,'form')>>>excel.append(html,'sheet')>>>excel.append(html,'table')>>>excel.append(excel.html_split_placement_format.perm)把html格式转换成pdf格式的excel中的例子非常多，不一一说明了。
　　2、把excel的列转换成一个全是pdf单元格的列表函数中用到函数excellib.spreadsheet()，这个函数用来处理各种pdf格式的数据。>>>excellib.spreadsheet(1,{'state':0,'education':15,'statment':2})#当前对象含1列excellib.spreadsheet({'state':0,'education':15,'statment':2})#当前对象含15列excellib.spreadsheet({'state':0,'education':15,'statment':2})#当前对象含2列函数中用到函数size()，通过size函数可以获取excel列的数量，这个函数不在数据文件中处理，在pdftext中使用size()功能。
　　3、把excel中的列转换成

js 抓取网页内容 JavaScript 启动性能瓶颈分析与解决方案

网站优化 • 优采云发表了文章 • 0 个评论 • 100 次浏览 • 2022-07-15 00:18 • 来自相关话题

　　js 抓取网页内容 JavaScript 启动性能瓶颈分析与解决方案
　　（点击上方公众号，可快速关注）
　　
　　英文：Addy Osmani 译文：王下邀月熊_Chevalier
　　/a/40139
　　
　　在 Web 开发中，随着需求的增加与代码库的扩张，我们最终发布的 Web 页面也逐渐膨胀。不过这种膨胀远不止意味着占据更多的传输带宽，其还意味着用户浏览网页时可能更差劲的性能体验。浏览器在下载完某个页面依赖的脚本之后，其还需要经过语法分析、解释与运行这些步骤。而本文则会深入分析浏览器对于 JavaScript 的这些处理流程，挖掘出那些影响你应用启动时间的罪魁祸首，并且根据我个人的经验提出相对应的解决方案。回顾过去，我们还没有专门地考虑过如何去优化 JavaScript 解析/编译这些步骤；我们预想中的是解析器在发现查看全部

　　js 抓取网页内容 JavaScript 启动性能瓶颈分析与解决方案
　　（点击上方公众号，可快速关注）
　　

　　英文：Addy Osmani 译文：王下邀月熊_Chevalier
　　/a/40139
　　

　　在 Web 开发中，随着需求的增加与代码库的扩张，我们最终发布的 Web 页面也逐渐膨胀。不过这种膨胀远不止意味着占据更多的传输带宽，其还意味着用户浏览网页时可能更差劲的性能体验。浏览器在下载完某个页面依赖的脚本之后，其还需要经过语法分析、解释与运行这些步骤。而本文则会深入分析浏览器对于 JavaScript 的这些处理流程，挖掘出那些影响你应用启动时间的罪魁祸首，并且根据我个人的经验提出相对应的解决方案。回顾过去，我们还没有专门地考虑过如何去优化 JavaScript 解析/编译这些步骤；我们预想中的是解析器在发现

js抓取网页内容，无非是爬虫asp语言接口。

网站优化 • 优采云发表了文章 • 0 个评论 • 87 次浏览 • 2022-07-02 00:06 • 来自相关话题

　　js抓取网页内容，无非是爬虫asp语言接口。
　　js抓取网页内容，无非是爬虫asp语言接口。找一个电商网站，例如凡客大商城，能抓取的页面结构大概就是：imgurlurlredirecturlthirdpageurlimgurlurl搜索就是一个html解析器，你只要熟悉html语言就行了。另外对于url请求ajax可以看一下这篇文章。另外这种简单问题你不应该去百度，这种问题一抓一大把，你去小木虫不行么。
　　
　　小木虫里面应该有不少大神，多看看他们的问答吧。这个找论坛规律比较容易，知乎这类有没有一些有规律的题目还有搜索引擎都能做到，可以尝试用这个搜索引擎按问题分类整理出来，然后按照字数排序一下再做几个小回答，至少我在知乎试了试小木虫几个问题，很管用，你要是希望能够短时间内知道这个方法最好还是自己写一个爬虫。
　　
　　小木虫小木虫目前有什么好处呢？1.小木虫的外链是全面的，所以你可以上到各种不同学科的论文，各种不同论坛，各种专业的qq群，可以多爬取几个2.小木虫上面优秀的答主一般都是硕博及硕博以上学历，按小木虫关注人数排列，在小木虫上发表的质量也是较高的，大家经常提的问题一般会显示在每一个专业的学长学姐的网站上3.知乎知乎的大v主要是以it专业为主，对于有相关问题的人而言，他们会主动关注这些人。
　　4.uc服务搜狐爱奇艺腾讯优酷的内容非常丰富，所以你能搜到一些相关专业的人5.新浪微博热门话题可以说是非常多，微博搜索也可以给你一个入口6.豆瓣豆瓣上面还是会有很多有趣的问题的，并且豆瓣关注的人可能是同一个学科的其他专业的人。查看全部

　　js抓取网页内容，无非是爬虫asp语言接口。
　　js抓取网页内容，无非是爬虫asp语言接口。找一个电商网站，例如凡客大商城，能抓取的页面结构大概就是：imgurlurlredirecturlthirdpageurlimgurlurl搜索就是一个html解析器，你只要熟悉html语言就行了。另外对于url请求ajax可以看一下这篇文章。另外这种简单问题你不应该去百度，这种问题一抓一大把，你去小木虫不行么。
　　

　　小木虫里面应该有不少大神，多看看他们的问答吧。这个找论坛规律比较容易，知乎这类有没有一些有规律的题目还有搜索引擎都能做到，可以尝试用这个搜索引擎按问题分类整理出来，然后按照字数排序一下再做几个小回答，至少我在知乎试了试小木虫几个问题，很管用，你要是希望能够短时间内知道这个方法最好还是自己写一个爬虫。
　　

　　小木虫小木虫目前有什么好处呢？1.小木虫的外链是全面的，所以你可以上到各种不同学科的论文，各种不同论坛，各种专业的qq群，可以多爬取几个2.小木虫上面优秀的答主一般都是硕博及硕博以上学历，按小木虫关注人数排列，在小木虫上发表的质量也是较高的，大家经常提的问题一般会显示在每一个专业的学长学姐的网站上3.知乎知乎的大v主要是以it专业为主，对于有相关问题的人而言，他们会主动关注这些人。
　　4.uc服务搜狐爱奇艺腾讯优酷的内容非常丰富，所以你能搜到一些相关专业的人5.新浪微博热门话题可以说是非常多，微博搜索也可以给你一个入口6.豆瓣豆瓣上面还是会有很多有趣的问题的，并且豆瓣关注的人可能是同一个学科的其他专业的人。

js 抓取网页内容【第2030期】JavaScript 启动性能瓶颈分析与解决方案

网站优化 • 优采云发表了文章 • 0 个评论 • 71 次浏览 • 2022-06-30 05:51 • 来自相关话题

　　js 抓取网页内容【第2030期】JavaScript 启动性能瓶颈分析与解决方案
　　前言
　　
　　找到它存在的意义。今日早读文章由@王下邀月熊翻译授权分享。
　　
　　正文从这开始~~
　　在 Web 开发中，随着需求的增加与代码库的扩张，我们最终发布的 Web 页面也逐渐膨胀。不过这种膨胀远不止意味着占据更多的传输带宽，其还意味着用户浏览网页时可能更差劲的性能体验。浏览器在下载完某个页面依赖的脚本之后，其还需要经过语法分析、解释与运行这些步骤。而本文则会深入分析浏览器对于 JavaScript 的这些处理流程，挖掘出那些影响你应用启动时间的罪魁祸首，并且根据我个人的经验提出相对应的解决方案。回顾过去，我们还没有专门地考虑过如何去优化 JavaScript 解析/编译这些步骤；我们预想中的是解析器在发现查看全部

　　js 抓取网页内容【第2030期】JavaScript 启动性能瓶颈分析与解决方案
　　前言
　　

　　找到它存在的意义。今日早读文章由@王下邀月熊翻译授权分享。
　　

　　正文从这开始~~
　　在 Web 开发中，随着需求的增加与代码库的扩张，我们最终发布的 Web 页面也逐渐膨胀。不过这种膨胀远不止意味着占据更多的传输带宽，其还意味着用户浏览网页时可能更差劲的性能体验。浏览器在下载完某个页面依赖的脚本之后，其还需要经过语法分析、解释与运行这些步骤。而本文则会深入分析浏览器对于 JavaScript 的这些处理流程，挖掘出那些影响你应用启动时间的罪魁祸首，并且根据我个人的经验提出相对应的解决方案。回顾过去，我们还没有专门地考虑过如何去优化 JavaScript 解析/编译这些步骤；我们预想中的是解析器在发现

Selenium爬取36万条数据告诉你：网易云音乐热评究竟有什么规律？

网站优化 • 优采云发表了文章 • 0 个评论 • 138 次浏览 • 2022-06-28 18:00 • 来自相关话题

Selenium爬取36万条数据告诉你：网易云音乐热评究竟有什么规律？
　　文 |沐沐
　　来源：GOGO数据「ID: mu_science」
　　嗨！朋友，我是沐沐
　　欢迎你来到学习python的宝藏基地~~
　　长按下方二维码可以添加我为好友哦
　　网易云音乐火不火我不知道，可是评论很火，之前也见过不少的帖子抓取网易云音乐评论，今天咱们也来试试
　　这篇文章主要介绍了python selenium爬取网易云音乐热评，文中通过示例代码介绍的非常详细，对大家的学习或者工作具有一定的参考学习价值，需要的朋友们下面随着小编来一起学习学习吧
　　Selenium安装
　　在此之前我们首先要准备好selenium的配置和安装，如下：
　　selenium可以直接可以用pip安装。
　　pip install selenium
　　chromedriver安装
　　要注意的是chromedriver的版本一定要与Chrome的版本一致，不然就不起作用。
　　有两个下载地址分别如下：
　　1、
　　2、
　　当然，你首先需要查看你的Chrome版本，在浏览器中输入
　　chrome://version即可查看浏览器版本信息
　　目标确定
　　我们可以选择任意自己喜欢的音乐来采集评论，我这里就以岁月神偷为例来采集36万+条评论然后来做可视化分析
　　
　　导入所需模块
　　以下为我们此次爬取网易云热评所需的python库
　　import random from selenium import webdriver from icecream import ic import time import csv
　　目标网址
　　我们要获取的网易云音乐链接如下，我们要获取的内容有该音乐下的评论作者、评论时间和评论内容
　　https://music.163.com/#/song?id=28285910
　　打开浏览器并且加载网页内容
　　执行如下代码之后会自动跳转到我们所要爬取的网易云音乐页面
　　网易云音乐相比于其他网站它的内容都嵌套在iframe中，相当于多了一个门。所以我们想要获取到内容必须先要进入到iframe中
　　# 驱动加载 driver = webdriver.Chrome() # 打开网站 driver.get('https://music.163.com/#/song?id=28285910') # 等待网页加载完成，不是死等；加载完成即可 driver.implicitly_wait(10) # 定位iframe iframe = driver.find_element_by_css_selector('.g-iframe') # 先进入到iframe driver.switch_to.frame(iframe) 
　　我们要获取评论内容必须要拉到网页最底部才可以完全加载出div标签，这段逻辑我们交由js来实现
　　# 下拉页面到最底部 js = 'document.documentElement.scrollTop = document.documentElement.scrollHeight' driver.execute_script(js)
　　获取网页信息
　　如上分析，所有评论信息都存在网页对应的div标签之中
　　所以接下来我们的思路就很清晰，已经进入到了iframe中。接下俩就可以所有的div标签再去提取内部我们所需要的信息
　　# 获取所有评论列表 div标签 divs = driver.find_elements_by_css_selector('.itm') print(len(divs)) ''' 35 '''
　　第一页是15条热评+20条评论已经成功获取到，下一步提取我们所需要的评论内容
　　提取网页信息
　　接下来我们就在div标签中提取我们所需要的信息
　　咱们再提取的时候如果你会一点点js的话就可以使用id(#)，class(.)的方法，
　　如果你不懂的话直接右键copy xpat或者selector都是可以实现的
　　for div in divs: user_name = div.find_element_by_css_selector('.cnt.f-brk a').text hot_cmts = div.find_element_by_css_selector('.cnt.f-brk').text.split('：')[1] cmts_time = div.find_element_by_css_selector('.time.s-fc4').text ic(user_name, hot_cmts, cmts_time) ''' ic| user_name: '什么事都让我分心' hot_cmts: '上个月你结婚了，新娘和你很般配，嗯。你从当年的小男生长成了大男孩。亲她的时候，我突然想起高二那个中午，你偷亲我，你不知道的是，其实当时我没有睡着。现在我也有了女朋友，准备明年结婚了，祝彼此幸福。' cmts_time: '2016年4月13日' ic| user_name: '吴繁繁' hot_cmts: '枕在奶奶腿上听这首歌，奶奶七十多，像个好奇宝宝一样用手指小心地划着我的手机屏幕，看看歌词看看封面，把手机凑近耳朵听。时间是让人猝不及防的东西。' cmts_time: '2015年7月12日' ic| user_name: 'jjjkkklllmmm' hot_cmts: '刚进大学寝室的时候，发现床板上有人用记号笔画了一张请假条，请假原因是毕业，离校时间是6.20，返校时间是永不。其实老师唯一没骗我们的一句话就是' cmts_time: '2016年5月13日' ic| user_name: '南说哦' hot_cmts: '大家都说我的性子很慢，其实我也可以很快比如，后面有狗追我或者，你在前面等我' cmts_time: '2017年5月21日' ic| user_name: '_时光慢点_VI' hot_cmts: '听歌的时候，旋律永远是第一感觉，然后才是歌词，歌词过后才是细节。 就像读小说，一开始只对剧情感兴趣，慢慢你开始琢磨小说中的人物，最后才发掘小说的内涵。' cmts_time: '2015年2月9日' ic| user_name: '刘家鑫很蠢' hot_cmts: ('逛留言板上看到的一句话 "我对你这么好你却总这样不冷不热的可我毫无办法谁叫一开始主动的人是我偶尔也会想想当我终于消失在追逐你的长途里 ' '某个夜里你的手机微微一震你会不会恍然地以为还是我给你的温柔"一个恍惚瞬间戳到泪点。') cmts_time: '2016年4月26日' '''
　　
　　数据保存
　　数据成功提取接下来我们将数据保存在csv中便于后续可视化展示
　　f = open('suiyue.csv', mode='a', encoding='utf-8-sig', newline='') csv_writer = csv.DictWriter(f, fieldnames=[ '用户名称', '评论时间', '评论内容' ]) dit = { '用户名称': user_name, '评论时间': cmts_time, '评论内容': hot_cmts } csv_writer.writerow(dit)
　　多页获取
　　我们定个小目标，先获取300页数据
　　for page in range(1, 300+1): print(f'-------------正在抓取第{page}页-------------') time.sleep(random.random() * 3) # 延时防止被反爬 spider_page() # 点击翻页 driver.find_element_by_css_selector('.znxt').click()
　　总共获取了3000条测试数据，如果你有时间和兴趣可以获取更多哈
　　数据处理
　　接下来就是对数据去重和去空处理了,然后随机抽取五条数据展示如下：
　　# 读取数据 rcv_data = pd.read_csv('./岁月神偷.csv', encoding='gbk') # 删除重复记录 rcv_data = rcv_data.drop_duplicates() # 删除缺失值 rcv_data = rcv_data.dropna() # 抽样展示5条数据 print(rcv_data.sample(5)) ''' 用户名称 评论时间 评论内容 153 清风不识字何故乱翻书_2027 11月25日 22:21 时间是让人猝不及防的东西，我的青春，随着这首歌结束了。。。 1796 小花不快乐 9月21日 22:34 对不起是对我自己说的 610 烟非烟雨亦雨 11月9日 04:23 [多多比耶] 1817 气氕氘氚氙 9月21日 11:02 3Q 1048 颜颜柒柒柒 10月21日 00:38 还好嘛，现在是21年10月21日了 '''
　　词频展示
　　文章评论出现频率最高的前十个词分别如下:
　　# 词频设置 all_words = [word for word in result.split(' ') if len(word) > 1 and word not in stop_words] wordcount = Counter(all_words).most_common(10) ''' ('我们', '时间', '一个', '喜欢', '现在', '没有', '真的', '自己', '一起', '知道') (187, 168, 163, 156, 150, 142, 130, 115, 104, 95) '''
　　接下来我们使用气泡图和饼图来直观的展示如下：
　　词云展示
　　我们使用结巴分词
　　最后使用stylecloud绘制漂亮的词云图展示
　　gen_stylecloud(text=result, icon_name='fas fa-comment', font_path='msyh.ttc', background_color='white', output_name=pic, custom_stopwords=stop_words ) print('词云图绘制成功！')
　　情感分析查看全部

　　Selenium爬取36万条数据告诉你：网易云音乐热评究竟有什么规律？
　　文 |沐沐
　　来源：GOGO数据「ID: mu_science」
　　嗨！朋友，我是沐沐
　　欢迎你来到学习python的宝藏基地~~
　　长按下方二维码可以添加我为好友哦
　　网易云音乐火不火我不知道，可是评论很火，之前也见过不少的帖子抓取网易云音乐评论，今天咱们也来试试
　　这篇文章主要介绍了python selenium爬取网易云音乐热评，文中通过示例代码介绍的非常详细，对大家的学习或者工作具有一定的参考学习价值，需要的朋友们下面随着小编来一起学习学习吧
　　Selenium安装
　　在此之前我们首先要准备好selenium的配置和安装，如下：
　　selenium可以直接可以用pip安装。
　　pip install selenium
　　chromedriver安装
　　要注意的是chromedriver的版本一定要与Chrome的版本一致，不然就不起作用。
　　有两个下载地址分别如下：
　　1、
　　2、
　　当然，你首先需要查看你的Chrome版本，在浏览器中输入
　　chrome://version即可查看浏览器版本信息
　　目标确定
　　我们可以选择任意自己喜欢的音乐来采集评论，我这里就以岁月神偷为例来采集36万+条评论然后来做可视化分析
　　

导入所需模块
　　以下为我们此次爬取网易云热评所需的python库
　　import random from selenium import webdriver from icecream import ic import time import csv
　　目标网址
　　我们要获取的网易云音乐链接如下，我们要获取的内容有该音乐下的评论作者、评论时间和评论内容
　　https://music.163.com/#/song?id=28285910
　　打开浏览器并且加载网页内容
　　执行如下代码之后会自动跳转到我们所要爬取的网易云音乐页面
　　网易云音乐相比于其他网站它的内容都嵌套在iframe中，相当于多了一个门。所以我们想要获取到内容必须先要进入到iframe中
　　# 驱动加载 driver = webdriver.Chrome() # 打开网站 driver.get('https://music.163.com/#/song?id=28285910') # 等待网页加载完成，不是死等；加载完成即可 driver.implicitly_wait(10) # 定位iframe iframe = driver.find_element_by_css_selector('.g-iframe') # 先进入到iframe driver.switch_to.frame(iframe) 
　　我们要获取评论内容必须要拉到网页最底部才可以完全加载出div标签，这段逻辑我们交由js来实现
　　# 下拉页面到最底部 js = 'document.documentElement.scrollTop = document.documentElement.scrollHeight' driver.execute_script(js)
　　获取网页信息
　　如上分析，所有评论信息都存在网页对应的div标签之中
　　所以接下来我们的思路就很清晰，已经进入到了iframe中。接下俩就可以所有的div标签再去提取内部我们所需要的信息
　　# 获取所有评论列表 div标签 divs = driver.find_elements_by_css_selector('.itm') print(len(divs)) ''' 35 '''
　　第一页是15条热评+20条评论已经成功获取到，下一步提取我们所需要的评论内容
　　提取网页信息
　　接下来我们就在div标签中提取我们所需要的信息
　　咱们再提取的时候如果你会一点点js的话就可以使用id(#)，class(.)的方法，
　　如果你不懂的话直接右键copy xpat或者selector都是可以实现的
　　for div in divs: user_name = div.find_element_by_css_selector('.cnt.f-brk a').text hot_cmts = div.find_element_by_css_selector('.cnt.f-brk').text.split('：')[1] cmts_time = div.find_element_by_css_selector('.time.s-fc4').text ic(user_name, hot_cmts, cmts_time) ''' ic| user_name: '什么事都让我分心' hot_cmts: '上个月你结婚了，新娘和你很般配，嗯。你从当年的小男生长成了大男孩。亲她的时候，我突然想起高二那个中午，你偷亲我，你不知道的是，其实当时我没有睡着。现在我也有了女朋友，准备明年结婚了，祝彼此幸福。' cmts_time: '2016年4月13日' ic| user_name: '吴繁繁' hot_cmts: '枕在奶奶腿上听这首歌，奶奶七十多，像个好奇宝宝一样用手指小心地划着我的手机屏幕，看看歌词看看封面，把手机凑近耳朵听。时间是让人猝不及防的东西。' cmts_time: '2015年7月12日' ic| user_name: 'jjjkkklllmmm' hot_cmts: '刚进大学寝室的时候，发现床板上有人用记号笔画了一张请假条，请假原因是毕业，离校时间是6.20，返校时间是永不。其实老师唯一没骗我们的一句话就是' cmts_time: '2016年5月13日' ic| user_name: '南说哦' hot_cmts: '大家都说我的性子很慢，其实我也可以很快比如，后面有狗追我或者，你在前面等我' cmts_time: '2017年5月21日' ic| user_name: '_时光慢点_VI' hot_cmts: '听歌的时候，旋律永远是第一感觉，然后才是歌词，歌词过后才是细节。 就像读小说，一开始只对剧情感兴趣，慢慢你开始琢磨小说中的人物，最后才发掘小说的内涵。' cmts_time: '2015年2月9日' ic| user_name: '刘家鑫很蠢' hot_cmts: ('逛留言板上看到的一句话 "我对你这么好你却总这样不冷不热的可我毫无办法谁叫一开始主动的人是我偶尔也会想想当我终于消失在追逐你的长途里 ' '某个夜里你的手机微微一震你会不会恍然地以为还是我给你的温柔"一个恍惚瞬间戳到泪点。') cmts_time: '2016年4月26日' '''

数据保存
　　数据成功提取接下来我们将数据保存在csv中便于后续可视化展示
　　f = open('suiyue.csv', mode='a', encoding='utf-8-sig', newline='') csv_writer = csv.DictWriter(f, fieldnames=[ '用户名称', '评论时间', '评论内容' ]) dit = { '用户名称': user_name, '评论时间': cmts_time, '评论内容': hot_cmts } csv_writer.writerow(dit)
　　多页获取
　　我们定个小目标，先获取300页数据
　　for page in range(1, 300+1): print(f'-------------正在抓取第{page}页-------------') time.sleep(random.random() * 3) # 延时防止被反爬 spider_page() # 点击翻页 driver.find_element_by_css_selector('.znxt').click()
　　总共获取了3000条测试数据，如果你有时间和兴趣可以获取更多哈
　　数据处理
　　接下来就是对数据去重和去空处理了,然后随机抽取五条数据展示如下：
　　# 读取数据 rcv_data = pd.read_csv('./岁月神偷.csv', encoding='gbk') # 删除重复记录 rcv_data = rcv_data.drop_duplicates() # 删除缺失值 rcv_data = rcv_data.dropna() # 抽样展示5条数据 print(rcv_data.sample(5)) ''' 用户名称 评论时间 评论内容 153 清风不识字何故乱翻书_2027 11月25日 22:21 时间是让人猝不及防的东西，我的青春，随着这首歌结束了。。。 1796 小花不快乐 9月21日 22:34 对不起是对我自己说的 610 烟非烟雨亦雨 11月9日 04:23 [多多比耶] 1817 气氕氘氚氙 9月21日 11:02 3Q 1048 颜颜柒柒柒 10月21日 00:38 还好嘛，现在是21年10月21日了 '''
　　词频展示
　　文章评论出现频率最高的前十个词分别如下:
　　# 词频设置 all_words = [word for word in result.split(' ') if len(word) > 1 and word not in stop_words] wordcount = Counter(all_words).most_common(10) ''' ('我们', '时间', '一个', '喜欢', '现在', '没有', '真的', '自己', '一起', '知道') (187, 168, 163, 156, 150, 142, 130, 115, 104, 95) '''
　　接下来我们使用气泡图和饼图来直观的展示如下：
　　词云展示
　　我们使用结巴分词
　　最后使用stylecloud绘制漂亮的词云图展示
　　gen_stylecloud(text=result, icon_name='fas fa-comment', font_path='msyh.ttc', background_color='white', output_name=pic, custom_stopwords=stop_words ) print('词云图绘制成功！')
　　情感分析

js 抓取网页内容同构 JavaScript 应用 —— Web 世界的未来？

网站优化 • 优采云发表了文章 • 0 个评论 • 87 次浏览 • 2022-06-28 17:59 • 来自相关话题

　　js 抓取网页内容同构 JavaScript 应用 —— Web 世界的未来？
　　Web 世界有一个至理名言，就是 Java 提出的“Write once, run everywhere”。但这句话只适用于 Java 么？我们能否也用它来形容 JavaScript 呢？答案是 Yes。
　　我将会在这篇文章中介绍同构 JavaScript 应用的概念，并推荐一些资源帮助你构建此类应用。
　　一路走来
　　多年以前，web 只是一些由 HTML 和 CSS 搭建的静态页面，没有太多的交互。用户的每一个动作都需要服务器来创建并返回一个完整的页面。幸而有了 JavaScript，开发者开始创建很棒的效果，不过 Ajax 的到来才是这场革新的真正开始。Web 开发者开始编写能够与服务端进行交互，且在不重载页面的情况下向服务端发送并接受数据的页面。
　　随着时间的推移，客户端代码可以做的事情越来越多，催生了被称作单页面应用（SPA）的一类应用。SPA 在首次加载页面时就获取了所有必需的资源，或者再按需动态加载并且渲染到页面上。 Gmail 和 StackEdit editor 是很棒的 SPA 示例。
　　SPA 准许重度的交互设计，因为几乎所有的操作都在客户端执行，保持最低限度地与服务端进行交流。不幸的是，它们也存在一些严重的问题，我们选择几个进行讨论。
　　
　　性能
　　因为相对于静态页面，SPA 需要更多的客户端代码，需要下载数据的体积也更大。这使得手机加载速度很慢，可能会导致一些极端的状况 —— 比如糟糕的用户体验以及收入上的损失等。依据 Microsoft 的一篇文章 ——
　　Bing 的一项研究表明：页面的加载时间每增加 10ms，站点年度总收入就会减少 $250K。
　　SEO
　　因为单页面应用依赖于 JavaScript 的执行，服务器不会提供它们可能用到的任何 HTML 内容。因此，web 爬虫很难去索引到这些页面。爬虫就是可以向 web 服务器发送请求，并且将结果分析成原始文本的程序，而不需要像一个浏览器运行 JavaScript 那样解释和执行客户端的内容。不久前，Google 优化了搜索引擎的 web 爬虫，现在它也可以抓取基于客户端 JavaScript 所构建的页面了。但是 Bing、Yahoo 以及其他搜索引擎怎么办？一个好的索引对任何公司来说都至关重要，它通常会带来更多的流量以及更高的回报。
　　同构 JavaScript 应用
　　
　　同构 JavaScript 应用基于 JavaScript 编写，可以在客户端和服务端运行。正因为此，你只需要写一次代码，就可以在服务端渲染静态页面，还可以在客户端完成复杂的交互。所以，这种方式互通了两个世界，并且避免了前面提到了两个问题。
　　现在，有很多框架可以帮助你开发这类应用。其中最著名的一个可能是 Meteor。Meter 是一个开源 JavaScript 框架，基于 Node.js 编写，专注于实时 web 应用。我想提到的另一个项目是 Rendr，它是 Airbnb 开发的一款轻量级类库，准许同时在客户端和服务端运行 Backbone.js。
　　越来越多的公司将 Node.js 应用到他们的产品中。客户端和服务端的代码共享成为一个更加普通而自然的选择。在我看来，这种做法将是 web 开发的未来。有些类库通过共享模板又增强了这一趋势，比如 React。
　　结论
　　这篇文章介绍了同构 JavaScript 应用的概念，它是应用开发的一种全新方式，可以最大限度地结合服务端和客户端应用程序。我们还讨论了运用这种方式尝试解决的问题，以及你现在就可以参与实践的一些项目。
　　你听说过同构 JavaScript 应用么？你开发过么？你有开发经验么？
　　原文：Isomorphic JavaScript Applications — the Future of the Web? 查看全部

　　js 抓取网页内容同构 JavaScript 应用 —— Web 世界的未来？
　　Web 世界有一个至理名言，就是 Java 提出的“Write once, run everywhere”。但这句话只适用于 Java 么？我们能否也用它来形容 JavaScript 呢？答案是 Yes。
　　我将会在这篇文章中介绍同构 JavaScript 应用的概念，并推荐一些资源帮助你构建此类应用。
　　一路走来
　　多年以前，web 只是一些由 HTML 和 CSS 搭建的静态页面，没有太多的交互。用户的每一个动作都需要服务器来创建并返回一个完整的页面。幸而有了 JavaScript，开发者开始创建很棒的效果，不过 Ajax 的到来才是这场革新的真正开始。Web 开发者开始编写能够与服务端进行交互，且在不重载页面的情况下向服务端发送并接受数据的页面。
　　随着时间的推移，客户端代码可以做的事情越来越多，催生了被称作单页面应用（SPA）的一类应用。SPA 在首次加载页面时就获取了所有必需的资源，或者再按需动态加载并且渲染到页面上。 Gmail 和 StackEdit editor 是很棒的 SPA 示例。
　　SPA 准许重度的交互设计，因为几乎所有的操作都在客户端执行，保持最低限度地与服务端进行交流。不幸的是，它们也存在一些严重的问题，我们选择几个进行讨论。
　　

　　性能
　　因为相对于静态页面，SPA 需要更多的客户端代码，需要下载数据的体积也更大。这使得手机加载速度很慢，可能会导致一些极端的状况 —— 比如糟糕的用户体验以及收入上的损失等。依据 Microsoft 的一篇文章 ——
　　Bing 的一项研究表明：页面的加载时间每增加 10ms，站点年度总收入就会减少 $250K。
　　SEO
　　因为单页面应用依赖于 JavaScript 的执行，服务器不会提供它们可能用到的任何 HTML 内容。因此，web 爬虫很难去索引到这些页面。爬虫就是可以向 web 服务器发送请求，并且将结果分析成原始文本的程序，而不需要像一个浏览器运行 JavaScript 那样解释和执行客户端的内容。不久前，Google 优化了搜索引擎的 web 爬虫，现在它也可以抓取基于客户端 JavaScript 所构建的页面了。但是 Bing、Yahoo 以及其他搜索引擎怎么办？一个好的索引对任何公司来说都至关重要，它通常会带来更多的流量以及更高的回报。
　　同构 JavaScript 应用
　　

　　同构 JavaScript 应用基于 JavaScript 编写，可以在客户端和服务端运行。正因为此，你只需要写一次代码，就可以在服务端渲染静态页面，还可以在客户端完成复杂的交互。所以，这种方式互通了两个世界，并且避免了前面提到了两个问题。
　　现在，有很多框架可以帮助你开发这类应用。其中最著名的一个可能是 Meteor。Meter 是一个开源 JavaScript 框架，基于 Node.js 编写，专注于实时 web 应用。我想提到的另一个项目是 Rendr，它是 Airbnb 开发的一款轻量级类库，准许同时在客户端和服务端运行 Backbone.js。
　　越来越多的公司将 Node.js 应用到他们的产品中。客户端和服务端的代码共享成为一个更加普通而自然的选择。在我看来，这种做法将是 web 开发的未来。有些类库通过共享模板又增强了这一趋势，比如 React。
　　结论
　　这篇文章介绍了同构 JavaScript 应用的概念，它是应用开发的一种全新方式，可以最大限度地结合服务端和客户端应用程序。我们还讨论了运用这种方式尝试解决的问题，以及你现在就可以参与实践的一些项目。
　　你听说过同构 JavaScript 应用么？你开发过么？你有开发经验么？
　　原文：Isomorphic JavaScript Applications — the Future of the Web?

js 抓取网页内容【第2030期】JavaScript 启动性能瓶颈分析与解决方案

网站优化 • 优采云发表了文章 • 0 个评论 • 85 次浏览 • 2022-06-25 01:23 • 来自相关话题

　　js 抓取网页内容【第2030期】JavaScript 启动性能瓶颈分析与解决方案
　　前言
　　找到它存在的意义。今日早读文章由@王下邀月熊翻译授权分享。
　　正文从这开始~~
　　在 Web 开发中，随着需求的增加与代码库的扩张，我们最终发布的 Web 页面也逐渐膨胀。不过这种膨胀远不止意味着占据更多的传输带宽，其还意味着用户浏览网页时可能更差劲的性能体验。浏览器在下载完某个页面依赖的脚本之后，其还需要经过语法分析、解释与运行这些步骤。而本文则会深入分析浏览器对于 JavaScript 的这些处理流程，挖掘出那些影响你应用启动时间的罪魁祸首，并且根据我个人的经验提出相对应的解决方案。回顾过去，我们还没有专门地考虑过如何去优化 JavaScript 解析/编译这些步骤；我们预想中的是解析器在发现查看全部

　　js 抓取网页内容【第2030期】JavaScript 启动性能瓶颈分析与解决方案
　　前言
　　找到它存在的意义。今日早读文章由@王下邀月熊翻译授权分享。
　　正文从这开始~~
　　在 Web 开发中，随着需求的增加与代码库的扩张，我们最终发布的 Web 页面也逐渐膨胀。不过这种膨胀远不止意味着占据更多的传输带宽，其还意味着用户浏览网页时可能更差劲的性能体验。浏览器在下载完某个页面依赖的脚本之后，其还需要经过语法分析、解释与运行这些步骤。而本文则会深入分析浏览器对于 JavaScript 的这些处理流程，挖掘出那些影响你应用启动时间的罪魁祸首，并且根据我个人的经验提出相对应的解决方案。回顾过去，我们还没有专门地考虑过如何去优化 JavaScript 解析/编译这些步骤；我们预想中的是解析器在发现

js 抓取网页内容【第2030期】JavaScript 启动性能瓶颈分析与解决方案

网站优化 • 优采云发表了文章 • 0 个评论 • 88 次浏览 • 2022-06-23 18:48 • 来自相关话题

　　js 抓取网页内容【第2030期】JavaScript 启动性能瓶颈分析与解决方案
　　前言
　　找到它存在的意义。今日早读文章由@王下邀月熊翻译授权分享。
　　正文从这开始~~
　　在 Web 开发中，随着需求的增加与代码库的扩张，我们最终发布的 Web 页面也逐渐膨胀。不过这种膨胀远不止意味着占据更多的传输带宽，其还意味着用户浏览网页时可能更差劲的性能体验。浏览器在下载完某个页面依赖的脚本之后，其还需要经过语法分析、解释与运行这些步骤。而本文则会深入分析浏览器对于 JavaScript 的这些处理流程，挖掘出那些影响你应用启动时间的罪魁祸首，并且根据我个人的经验提出相对应的解决方案。回顾过去，我们还没有专门地考虑过如何去优化 JavaScript 解析/编译这些步骤；我们预想中的是解析器在发现查看全部

　　js 抓取网页内容【第2030期】JavaScript 启动性能瓶颈分析与解决方案
　　前言
　　找到它存在的意义。今日早读文章由@王下邀月熊翻译授权分享。
　　正文从这开始~~
　　在 Web 开发中，随着需求的增加与代码库的扩张，我们最终发布的 Web 页面也逐渐膨胀。不过这种膨胀远不止意味着占据更多的传输带宽，其还意味着用户浏览网页时可能更差劲的性能体验。浏览器在下载完某个页面依赖的脚本之后，其还需要经过语法分析、解释与运行这些步骤。而本文则会深入分析浏览器对于 JavaScript 的这些处理流程，挖掘出那些影响你应用启动时间的罪魁祸首，并且根据我个人的经验提出相对应的解决方案。回顾过去，我们还没有专门地考虑过如何去优化 JavaScript 解析/编译这些步骤；我们预想中的是解析器在发现

js 抓取网页内容【第2030期】JavaScript 启动性能瓶颈分析与解决方案

网站优化 • 优采云发表了文章 • 0 个评论 • 107 次浏览 • 2022-06-22 04:28 • 来自相关话题

　　js 抓取网页内容【第2030期】JavaScript 启动性能瓶颈分析与解决方案
　　前言
　　找到它存在的意义。今日早读文章由@王下邀月熊翻译授权分享。
　　正文从这开始~~
　　在 Web 开发中，随着需求的增加与代码库的扩张，我们最终发布的 Web 页面也逐渐膨胀。不过这种膨胀远不止意味着占据更多的传输带宽，其还意味着用户浏览网页时可能更差劲的性能体验。浏览器在下载完某个页面依赖的脚本之后，其还需要经过语法分析、解释与运行这些步骤。而本文则会深入分析浏览器对于 JavaScript 的这些处理流程，挖掘出那些影响你应用启动时间的罪魁祸首，并且根据我个人的经验提出相对应的解决方案。回顾过去，我们还没有专门地考虑过如何去优化 JavaScript 解析/编译这些步骤；我们预想中的是解析器在发现查看全部

　　js 抓取网页内容【第2030期】JavaScript 启动性能瓶颈分析与解决方案
　　前言
　　找到它存在的意义。今日早读文章由@王下邀月熊翻译授权分享。
　　正文从这开始~~
　　在 Web 开发中，随着需求的增加与代码库的扩张，我们最终发布的 Web 页面也逐渐膨胀。不过这种膨胀远不止意味着占据更多的传输带宽，其还意味着用户浏览网页时可能更差劲的性能体验。浏览器在下载完某个页面依赖的脚本之后，其还需要经过语法分析、解释与运行这些步骤。而本文则会深入分析浏览器对于 JavaScript 的这些处理流程，挖掘出那些影响你应用启动时间的罪魁祸首，并且根据我个人的经验提出相对应的解决方案。回顾过去，我们还没有专门地考虑过如何去优化 JavaScript 解析/编译这些步骤；我们预想中的是解析器在发现

js抓取网页内容的话，用开源的和css

网站优化 • 优采云发表了文章 • 0 个评论 • 88 次浏览 • 2022-06-21 01:00 • 来自相关话题

　　js抓取网页内容的话，用开源的和css
　　js抓取网页内容的话，用开源的，
　　1、requestspider，是一个抓取网页的常用工具库，几乎和js抓取类似。
　　2、dll抓取，支持js文件的抓取，需要有js库。目前可抓取热门的前端框架，如flask，struts2，springboot，java的框架有springmvc等。需要花时间去研究抓取技术。
　　3、chrome浏览器插件extractallocator，抓取页面最细节部分的内容，和js文件，和上面两个类似。不过后面的都需要手动设置。简单实现前端抓取，可以按照这个线路，可以免去大量的抓取练习，掌握基本抓取技术，就可以快速应用到实际场景。extractallocator抓取下载方式-xs互联网欢迎到html5挖掘与分析站-知乎专栏。
　　javascript和css是html元素，javascript能从网页中识别出字符，css能从网页中识别出颜色等。按照es6api写出来的css，能从网页中识别出点，颜色等信息。bootstrap之类框架和工具可以用来做页面组件化。所以使用框架工具做页面的设计可以写出页面，但如果是做网页爬虫，那就没有必要了。
　　页面爬虫能力有javascript/css语言背景要求，但网页抓取其实大多都不用关心用户浏览的页面，所以不要求。使用es6api写起来页面写起来更快一些，但是按照目前流行的元素使用场景，显然是自己写的css更靠谱一些。查看全部

　　js抓取网页内容的话，用开源的和css
　　js抓取网页内容的话，用开源的，
　　1、requestspider，是一个抓取网页的常用工具库，几乎和js抓取类似。
　　2、dll抓取，支持js文件的抓取，需要有js库。目前可抓取热门的前端框架，如flask，struts2，springboot，java的框架有springmvc等。需要花时间去研究抓取技术。
　　3、chrome浏览器插件extractallocator，抓取页面最细节部分的内容，和js文件，和上面两个类似。不过后面的都需要手动设置。简单实现前端抓取，可以按照这个线路，可以免去大量的抓取练习，掌握基本抓取技术，就可以快速应用到实际场景。extractallocator抓取下载方式-xs互联网欢迎到html5挖掘与分析站-知乎专栏。
　　javascript和css是html元素，javascript能从网页中识别出字符，css能从网页中识别出颜色等。按照es6api写出来的css，能从网页中识别出点，颜色等信息。bootstrap之类框架和工具可以用来做页面组件化。所以使用框架工具做页面的设计可以写出页面，但如果是做网页爬虫，那就没有必要了。
　　页面爬虫能力有javascript/css语言背景要求，但网页抓取其实大多都不用关心用户浏览的页面，所以不要求。使用es6api写起来页面写起来更快一些，但是按照目前流行的元素使用场景，显然是自己写的css更靠谱一些。

js 抓取网页内容 JavaScript SEO怎么做

网站优化 • 优采云发表了文章 • 0 个评论 • 108 次浏览 • 2022-06-19 21:38 • 来自相关话题

　　js 抓取网页内容 JavaScript SEO怎么做
　　点击可听原文音频
　　Hi. I’m Charles from Cross Border Digital. Are you using JavaScript to build your website? Or maybe you're thinking about using one of the modern JavaScript frameworks like Ember，or Node, or React, or Angular. Great frameworks for building very app-like interfaces on the web.
　　大家好，我是来自Cross Border Digital的Charles. 你是否在用JavaScript建站？或者你正在考虑用使用当前比较流行的JavaScript框架：Ember、Node、React 或者是Angular建站。这些都是非常优秀的框架，给网站访客提供类似App的操作体验。
　　But there are some specific challenges to using these JavaScript frameworks when it comes to SEO. Today, I want to share some insights and advice to help you use these frameworks in a way that will work with Google and ensure that you can rank your website.
　　但是，当涉及到SEO时，使用这些JavaScript框架存在一些特定的挑战。今天，我想分享一些见解和建议，以帮助您以与Google兼容的方式使用这些框架，确保网站在搜索引擎可以获得排名。
　　So when it comes to understanding the implications of using JavaScript to build your website, there's really two key things that we need to talk about.
　　因此，在理解使用JavaScript开发网站的意味着什么时，实际上需要讨论两个关键问题。
　　The first is how Google actually works, how they treat JavaScript, how they understand JavaScript.
　　首先是Google的实际工作方式，如何对待JavaScript，如何理解JavaScript。
　　And the second is understanding the difference between client-side and server-side rendering of the content and HTML on your website.
　　其次是了解网站上内容和HTML在客户端和服务器端渲染的区别。
　　These two things together if we can understand these, then we can understand how Google is going to interpret our JavaScript-powered website.
　　如果我们能理解这两点，那么就能了解Google如何读取JavaScript网站。
　　OK, so let's first talk about Google. There are in fact two parts to Google's crawling and indexing of the web.
　　首先来讨论一下谷歌运作方式。谷歌爬取收录网站可以分为两部分：
　　The first part, the crawler known as Google bot is the part of the infrastructure that follows every link it can find on the web to uncover every URL on your website and on every other website on the web. The Google bot crawler can't render JavaScript. It's simply visits a page. It will do a very quick pass of any HTML that it finds on the page to see if there are any other links it can follow, but otherwise, it simply passes the URLs that it does find back to the indexer.
　　第一部分，被称为Googlebot的爬虫，是搜索引擎基础结构的一部分，它通过页面上的链接，去发现更多的页面。Google bot爬虫无法渲染JavaScript。它只是访问页面。爬虫会快速查找HTML页面上的链接，继续抓取新的页面，没有新页面就会将爬取到的链接返回谷歌索引（indexer）。
　　The indexer is the part of Google's infrastructure that completely renders a page of all the content, all of the CSS, all of the layout to try in the content, to try and understand what that page is about. So that when someone does a query, it can return that page if it is relevant. Now the indexer can render JavaScript.
　　索引是Google搜索引擎基础结构的一部分，它负责渲染页面的所有内容，CSS，布局以了解该页面的核心信息。当用户在搜索引擎搜索时，才能返回客户搜索相关的页面。Indexer可以渲染JavaScript。
　　So it is true to say that Google can render JavaScript-powered web pages.
　　因此，谷歌支持渲染基于JavaScript开发的网页。
　　But to completely understand the implications of using JavaScript, it is also important to understand the difference between client-side and server-side rendering of JavaScript, because that makes a big difference to the way that Google will interpret your website.
　　但是要完全了解使用JavaScript对网站的影响，理解客户端和服务器端JavaScript渲染的区别也很重要，因为会导致Google读取网站的方式有很大的不同。
　　So generally speaking, these modern JavaScript frameworks like React, Node, and Ember, and Angular, they render in the client side. That means that when someone visits your web page that's built on one of these frameworks, the JavaScript application is delivered to the browser and then it renders everything in the browser. It calls the CSS, it calls the content, it calls the images, and any other resources required to lay out your web page into the browser and renders them on the client side.
　　因此，一般而言，这些现代JavaScript框架（例如React，Node和Ember和Angular）在客户端渲染。这意味着，当用户访问基于这些框架之一开发的网页时，JavaScript应用程序将交付给浏览器，然后在浏览器渲染所有内容。将CSS，文本，图片以及其他与页面展示有关的资源推送到浏览器，在客户端渲染。
　　A server side rendering is when all of that work is done on the server and the HTML of the complete page and all of the content is delivered to the browser.
　　服务器端渲染是指在服务器端完成所有渲染工作并将完整页面的HTML和所有内容都交付给浏览器。
　　Now this has big implications for Google because as we said, the Google bot can't render JavaScript. So that means that when Google bot visits your home page or your JavaScript-powered website, if you're rendering client side, it means that the Google bot will get that JavaScript application, but it can't render any content or any link. So it won't find any other links to crawl. It will have to send that single URL back to the indexer. The indexer will then render that page it will find any links and content on that page. It will send any links back to the crawler so that they can then continue to follow those you URLs to see if they can find any other links.
　　这就对Google收录网站产生了重大影响，因为前面所说，Google bot无法渲染JavaScript。因此，这意味着当Googlebot访问您的主页或基于JavaScript的网站时，如果网站是客户端渲染，则意味着Googlebot将获取该JavaScript应用程序，但无法渲染任何内容或任何链接。因此Google bot无法找到页面上的链接。只会返回一个URL给谷歌indexer。谷歌Indexer将渲染页面并发现页面上的更多链接和内容。发现新的链接则会返回给Google bot去继续抓取进一步发现更多的链接。
　　And in this way, it really slows down the process of crawling your website in a very big way.And this is particularly relevant for large websites. So the other thing that I have to say about client-side rendering is that Google's indexer uses a version of Chrome for rendering JavaScript-powered websites that is about three years old.
　　这就极大地减慢了网站被谷歌抓取的速度，这对于大型网站影响尤为重大。另外，关于客户端渲染，Google的indexer用来来渲染JavaScript网站所使用的Chrome版本已有大约三年的历史。
　　And that means that it can't support all the latest features of these modern JavaScript frameworks. So if you're building a JavaScript-powered website that renders on the client side, it is very important that you ensure that it is set up in such a way that it degrades nicely forolder versions of browsers so that Google can render your content and your HTML completely.
　　这意味着它不能支持那些现代JavaScript框架的所有最新功能。因此，如果您要构建一个以JavaScript为基础的网站，并在客户端进行渲染，那么确保Google可以完全渲染您的内容和HTML。
　　I've seen many cases where a client-side rendered JavaScript page, Google is able to crawl or index part of it but it doesn't see all of the continent because of some of the features that have been included in that page. So very important that your JavaScript-powered web page degrades nicely for all the versions of the browser.
　　我看过很多客户端渲染的JavaScript页面，Google能够对其进行抓取和编制索引，但是页面中包含的某些功能谷歌无法完整抓取。因此要确保你基于JavaScript开发网页对于所有版本的浏览器都可以很好地渲染。
　　So we know that Google bot can't render JavaScript. And whilst the Google indexer can render JavaScript, we also know that it is limited to older versions of Chrome. So it doesn't support all the latest features of these modern JavaScript frameworks. So if we are doing client-side rendering, it is more difficult to get all of our content indexed and it really does slow down the entire process of crawling and indexing your website, which includes any changes you make. So if you make changes to your website, particularly if you have a large website, it can take weeks or months for those changes to be reflected in the Google index.
　　我们知道Google bot无法渲染JavaScript。Google indexer可以呈现JavaScript，但我们也知道它仅限于旧版Chrome。因此，它不支持这些现代JavaScript框架的所有最新功能。因此，如果在进行客户端渲染，则很难对所有的内容建立索引，这确实会减慢对网站被抓取和索引的整个过程，其中包括您所做的任何更改。因此，如果您对网站进行更改，尤其大型网站，则这些更改可能需要数周或数月才能反映在Google索引中。
　　So what to do? Well, our strong recommendation is to use old-fashioned content management systems for public facing web sites, first and foremost. If you haven't yet startedthe build of your website, then we would strongly recommend you look at one of the content management systems like Ambraco, or Wordpress, or Drupal. Or if you're doing e-commerce, use one of the platforms like Shopify, or BigCommerce.
　　那么该怎么办？我们强烈建议普通网站使用老式的内容管理系统。如果您还没有开始网站的搭建，那么我们强烈建议您考虑内容管理系统，例如Ambraco，Wordpress或Drupal。如果是电商网站，可以考虑用Shopify, BigCommerce电商建站系统。
　　These content management applications render everything on the server side and deliver the complete HTML to the browser and to Google, which makes it much easier and faster for Google to completely crawl and index your website.
　　这些内容管理应用程序在服务器端呈现所有内容，并向浏览器和Google提供完整的HTML，这使Google能够更轻松，更快地完全抓取您的网站并为其建立索引。
　　If you have already built a JavaScript-powered website, or if you really want to for whatever reason, then our drawing recommendation is to configure that application to render everything on the server side and deliver the complete HTML and content to the browser and to Google in the same way that one of these older traditional content management systems does. If you can do that properly, then it will make it a lot easier and faster for Google to index your JavaScript-powered website.
　　如果您已经建立了JavaScript网站，或者出于特殊原因确实想要这样做，那么我们的建议是配置该应用程序在服务器端渲染所有内容，并将完整的HTML和内容提供给浏览器和Google 就像这些较旧的传统内容管理系统之一一样。这对于谷歌快速抓取收录你的JavaScript网站会有很大帮助。
　　Perhaps, you've already launched a JavaScript-powered website that is rendering everything on the client side. You're not sure how to configure it to do that on the server side. There are third party services like Prerender.io, which might help. They might be worth looking at as well.
　　又或许，您已经发布了一个在客户端渲染的基于JavaScript开发的网站，不知道如何改为在服务器端渲染。有一些第三方服务获取可以帮你解决这个问题，比如Prerender.io。可以了解一下。
　　So in a nutshell, a lot of SEO is about making it easy for Google and not putting hurdles in Google's way. And when it comes to using JavaScript on your website, there are quite a fewpitfalls and traps that if you're not aware of them, it's easy to fall into.
　　简而言之，很多SEO都是为了帮助Google而不是阻碍Google抓取网站。在您的网站上使用JavaScript时，会有很多陷阱，稍不小心就掉入。
　　So if you have any questions, we would love to help. So hit us up here on WeChat. We could have you to discuss your specific case and we look forward to talking to you. Thanks.
　　如果您有任何疑问，欢迎联系我们微信小编。期待与您交流，谢谢大家。查看全部

　　js 抓取网页内容 JavaScript SEO怎么做
　　点击可听原文音频
　　Hi. I’m Charles from Cross Border Digital. Are you using JavaScript to build your website? Or maybe you're thinking about using one of the modern JavaScript frameworks like Ember，or Node, or React, or Angular. Great frameworks for building very app-like interfaces on the web.
　　大家好，我是来自Cross Border Digital的Charles. 你是否在用JavaScript建站？或者你正在考虑用使用当前比较流行的JavaScript框架：Ember、Node、React 或者是Angular建站。这些都是非常优秀的框架，给网站访客提供类似App的操作体验。
　　But there are some specific challenges to using these JavaScript frameworks when it comes to SEO. Today, I want to share some insights and advice to help you use these frameworks in a way that will work with Google and ensure that you can rank your website.
　　但是，当涉及到SEO时，使用这些JavaScript框架存在一些特定的挑战。今天，我想分享一些见解和建议，以帮助您以与Google兼容的方式使用这些框架，确保网站在搜索引擎可以获得排名。
　　So when it comes to understanding the implications of using JavaScript to build your website, there's really two key things that we need to talk about.
　　因此，在理解使用JavaScript开发网站的意味着什么时，实际上需要讨论两个关键问题。
　　The first is how Google actually works, how they treat JavaScript, how they understand JavaScript.
　　首先是Google的实际工作方式，如何对待JavaScript，如何理解JavaScript。
　　And the second is understanding the difference between client-side and server-side rendering of the content and HTML on your website.
　　其次是了解网站上内容和HTML在客户端和服务器端渲染的区别。
　　These two things together if we can understand these, then we can understand how Google is going to interpret our JavaScript-powered website.
　　如果我们能理解这两点，那么就能了解Google如何读取JavaScript网站。
　　OK, so let's first talk about Google. There are in fact two parts to Google's crawling and indexing of the web.
　　首先来讨论一下谷歌运作方式。谷歌爬取收录网站可以分为两部分：
　　The first part, the crawler known as Google bot is the part of the infrastructure that follows every link it can find on the web to uncover every URL on your website and on every other website on the web. The Google bot crawler can't render JavaScript. It's simply visits a page. It will do a very quick pass of any HTML that it finds on the page to see if there are any other links it can follow, but otherwise, it simply passes the URLs that it does find back to the indexer.
　　第一部分，被称为Googlebot的爬虫，是搜索引擎基础结构的一部分，它通过页面上的链接，去发现更多的页面。Google bot爬虫无法渲染JavaScript。它只是访问页面。爬虫会快速查找HTML页面上的链接，继续抓取新的页面，没有新页面就会将爬取到的链接返回谷歌索引（indexer）。
　　The indexer is the part of Google's infrastructure that completely renders a page of all the content, all of the CSS, all of the layout to try in the content, to try and understand what that page is about. So that when someone does a query, it can return that page if it is relevant. Now the indexer can render JavaScript.
　　索引是Google搜索引擎基础结构的一部分，它负责渲染页面的所有内容，CSS，布局以了解该页面的核心信息。当用户在搜索引擎搜索时，才能返回客户搜索相关的页面。Indexer可以渲染JavaScript。
　　So it is true to say that Google can render JavaScript-powered web pages.
　　因此，谷歌支持渲染基于JavaScript开发的网页。
　　But to completely understand the implications of using JavaScript, it is also important to understand the difference between client-side and server-side rendering of JavaScript, because that makes a big difference to the way that Google will interpret your website.
　　但是要完全了解使用JavaScript对网站的影响，理解客户端和服务器端JavaScript渲染的区别也很重要，因为会导致Google读取网站的方式有很大的不同。
　　So generally speaking, these modern JavaScript frameworks like React, Node, and Ember, and Angular, they render in the client side. That means that when someone visits your web page that's built on one of these frameworks, the JavaScript application is delivered to the browser and then it renders everything in the browser. It calls the CSS, it calls the content, it calls the images, and any other resources required to lay out your web page into the browser and renders them on the client side.
　　因此，一般而言，这些现代JavaScript框架（例如React，Node和Ember和Angular）在客户端渲染。这意味着，当用户访问基于这些框架之一开发的网页时，JavaScript应用程序将交付给浏览器，然后在浏览器渲染所有内容。将CSS，文本，图片以及其他与页面展示有关的资源推送到浏览器，在客户端渲染。
　　A server side rendering is when all of that work is done on the server and the HTML of the complete page and all of the content is delivered to the browser.
　　服务器端渲染是指在服务器端完成所有渲染工作并将完整页面的HTML和所有内容都交付给浏览器。
　　Now this has big implications for Google because as we said, the Google bot can't render JavaScript. So that means that when Google bot visits your home page or your JavaScript-powered website, if you're rendering client side, it means that the Google bot will get that JavaScript application, but it can't render any content or any link. So it won't find any other links to crawl. It will have to send that single URL back to the indexer. The indexer will then render that page it will find any links and content on that page. It will send any links back to the crawler so that they can then continue to follow those you URLs to see if they can find any other links.
　　这就对Google收录网站产生了重大影响，因为前面所说，Google bot无法渲染JavaScript。因此，这意味着当Googlebot访问您的主页或基于JavaScript的网站时，如果网站是客户端渲染，则意味着Googlebot将获取该JavaScript应用程序，但无法渲染任何内容或任何链接。因此Google bot无法找到页面上的链接。只会返回一个URL给谷歌indexer。谷歌Indexer将渲染页面并发现页面上的更多链接和内容。发现新的链接则会返回给Google bot去继续抓取进一步发现更多的链接。
　　And in this way, it really slows down the process of crawling your website in a very big way.And this is particularly relevant for large websites. So the other thing that I have to say about client-side rendering is that Google's indexer uses a version of Chrome for rendering JavaScript-powered websites that is about three years old.
　　这就极大地减慢了网站被谷歌抓取的速度，这对于大型网站影响尤为重大。另外，关于客户端渲染，Google的indexer用来来渲染JavaScript网站所使用的Chrome版本已有大约三年的历史。
　　And that means that it can't support all the latest features of these modern JavaScript frameworks. So if you're building a JavaScript-powered website that renders on the client side, it is very important that you ensure that it is set up in such a way that it degrades nicely forolder versions of browsers so that Google can render your content and your HTML completely.
　　这意味着它不能支持那些现代JavaScript框架的所有最新功能。因此，如果您要构建一个以JavaScript为基础的网站，并在客户端进行渲染，那么确保Google可以完全渲染您的内容和HTML。
　　I've seen many cases where a client-side rendered JavaScript page, Google is able to crawl or index part of it but it doesn't see all of the continent because of some of the features that have been included in that page. So very important that your JavaScript-powered web page degrades nicely for all the versions of the browser.
　　我看过很多客户端渲染的JavaScript页面，Google能够对其进行抓取和编制索引，但是页面中包含的某些功能谷歌无法完整抓取。因此要确保你基于JavaScript开发网页对于所有版本的浏览器都可以很好地渲染。
　　So we know that Google bot can't render JavaScript. And whilst the Google indexer can render JavaScript, we also know that it is limited to older versions of Chrome. So it doesn't support all the latest features of these modern JavaScript frameworks. So if we are doing client-side rendering, it is more difficult to get all of our content indexed and it really does slow down the entire process of crawling and indexing your website, which includes any changes you make. So if you make changes to your website, particularly if you have a large website, it can take weeks or months for those changes to be reflected in the Google index.
　　我们知道Google bot无法渲染JavaScript。Google indexer可以呈现JavaScript，但我们也知道它仅限于旧版Chrome。因此，它不支持这些现代JavaScript框架的所有最新功能。因此，如果在进行客户端渲染，则很难对所有的内容建立索引，这确实会减慢对网站被抓取和索引的整个过程，其中包括您所做的任何更改。因此，如果您对网站进行更改，尤其大型网站，则这些更改可能需要数周或数月才能反映在Google索引中。
　　So what to do? Well, our strong recommendation is to use old-fashioned content management systems for public facing web sites, first and foremost. If you haven't yet startedthe build of your website, then we would strongly recommend you look at one of the content management systems like Ambraco, or Wordpress, or Drupal. Or if you're doing e-commerce, use one of the platforms like Shopify, or BigCommerce.
　　那么该怎么办？我们强烈建议普通网站使用老式的内容管理系统。如果您还没有开始网站的搭建，那么我们强烈建议您考虑内容管理系统，例如Ambraco，Wordpress或Drupal。如果是电商网站，可以考虑用Shopify, BigCommerce电商建站系统。
　　These content management applications render everything on the server side and deliver the complete HTML to the browser and to Google, which makes it much easier and faster for Google to completely crawl and index your website.
　　这些内容管理应用程序在服务器端呈现所有内容，并向浏览器和Google提供完整的HTML，这使Google能够更轻松，更快地完全抓取您的网站并为其建立索引。
　　If you have already built a JavaScript-powered website, or if you really want to for whatever reason, then our drawing recommendation is to configure that application to render everything on the server side and deliver the complete HTML and content to the browser and to Google in the same way that one of these older traditional content management systems does. If you can do that properly, then it will make it a lot easier and faster for Google to index your JavaScript-powered website.
　　如果您已经建立了JavaScript网站，或者出于特殊原因确实想要这样做，那么我们的建议是配置该应用程序在服务器端渲染所有内容，并将完整的HTML和内容提供给浏览器和Google 就像这些较旧的传统内容管理系统之一一样。这对于谷歌快速抓取收录你的JavaScript网站会有很大帮助。
　　Perhaps, you've already launched a JavaScript-powered website that is rendering everything on the client side. You're not sure how to configure it to do that on the server side. There are third party services like Prerender.io, which might help. They might be worth looking at as well.
　　又或许，您已经发布了一个在客户端渲染的基于JavaScript开发的网站，不知道如何改为在服务器端渲染。有一些第三方服务获取可以帮你解决这个问题，比如Prerender.io。可以了解一下。
　　So in a nutshell, a lot of SEO is about making it easy for Google and not putting hurdles in Google's way. And when it comes to using JavaScript on your website, there are quite a fewpitfalls and traps that if you're not aware of them, it's easy to fall into.
　　简而言之，很多SEO都是为了帮助Google而不是阻碍Google抓取网站。在您的网站上使用JavaScript时，会有很多陷阱，稍不小心就掉入。
　　So if you have any questions, we would love to help. So hit us up here on WeChat. We could have you to discuss your specific case and we look forward to talking to you. Thanks.
　　如果您有任何疑问，欢迎联系我们微信小编。期待与您交流，谢谢大家。

js 抓取网页内容同构 JavaScript 应用 —— Web 世界的未来？

网站优化 • 优采云发表了文章 • 0 个评论 • 88 次浏览 • 2022-06-19 20:54 • 来自相关话题

　　js 抓取网页内容同构 JavaScript 应用 —— Web 世界的未来？
　　Web 世界有一个至理名言，就是 Java 提出的“Write once, run everywhere”。但这句话只适用于 Java 么？我们能否也用它来形容 JavaScript 呢？答案是 Yes。
　　我将会在这篇文章中介绍同构 JavaScript 应用的概念，并推荐一些资源帮助你构建此类应用。
　　一路走来
　　多年以前，web 只是一些由 HTML 和 CSS 搭建的静态页面，没有太多的交互。用户的每一个动作都需要服务器来创建并返回一个完整的页面。幸而有了 JavaScript，开发者开始创建很棒的效果，不过 Ajax 的到来才是这场革新的真正开始。Web 开发者开始编写能够与服务端进行交互，且在不重载页面的情况下向服务端发送并接受数据的页面。
　　随着时间的推移，客户端代码可以做的事情越来越多，催生了被称作单页面应用（SPA）的一类应用。SPA 在首次加载页面时就获取了所有必需的资源，或者再按需动态加载并且渲染到页面上。 Gmail 和 StackEdit editor 是很棒的 SPA 示例。
　　SPA 准许重度的交互设计，因为几乎所有的操作都在客户端执行，保持最低限度地与服务端进行交流。不幸的是，它们也存在一些严重的问题，我们选择几个进行讨论。
　　性能
　　因为相对于静态页面，SPA 需要更多的客户端代码，需要下载数据的体积也更大。这使得手机加载速度很慢，可能会导致一些极端的状况 —— 比如糟糕的用户体验以及收入上的损失等。依据 Microsoft 的一篇文章 ——
　　Bing 的一项研究表明：页面的加载时间每增加 10ms，站点年度总收入就会减少 $250K。
　　SEO
　　因为单页面应用依赖于 JavaScript 的执行，服务器不会提供它们可能用到的任何 HTML 内容。因此，web 爬虫很难去索引到这些页面。爬虫就是可以向 web 服务器发送请求，并且将结果分析成原始文本的程序，而不需要像一个浏览器运行 JavaScript 那样解释和执行客户端的内容。不久前，Google 优化了搜索引擎的 web 爬虫，现在它也可以抓取基于客户端 JavaScript 所构建的页面了。但是 Bing、Yahoo 以及其他搜索引擎怎么办？一个好的索引对任何公司来说都至关重要，它通常会带来更多的流量以及更高的回报。
　　同构 JavaScript 应用
　　同构 JavaScript 应用基于 JavaScript 编写，可以在客户端和服务端运行。正因为此，你只需要写一次代码，就可以在服务端渲染静态页面，还可以在客户端完成复杂的交互。所以，这种方式互通了两个世界，并且避免了前面提到了两个问题。
　　现在，有很多框架可以帮助你开发这类应用。其中最著名的一个可能是 Meteor。Meter 是一个开源 JavaScript 框架，基于 Node.js 编写，专注于实时 web 应用。我想提到的另一个项目是 Rendr，它是 Airbnb 开发的一款轻量级类库，准许同时在客户端和服务端运行 Backbone.js。
　　越来越多的公司将 Node.js 应用到他们的产品中。客户端和服务端的代码共享成为一个更加普通而自然的选择。在我看来，这种做法将是 web 开发的未来。有些类库通过共享模板又增强了这一趋势，比如 React。
　　结论
　　这篇文章介绍了同构 JavaScript 应用的概念，它是应用开发的一种全新方式，可以最大限度地结合服务端和客户端应用程序。我们还讨论了运用这种方式尝试解决的问题，以及你现在就可以参与实践的一些项目。
　　你听说过同构 JavaScript 应用么？你开发过么？你有开发经验么？
　　原文：Isomorphic JavaScript Applications — the Future of the Web? 查看全部

　　js 抓取网页内容同构 JavaScript 应用 —— Web 世界的未来？
　　Web 世界有一个至理名言，就是 Java 提出的“Write once, run everywhere”。但这句话只适用于 Java 么？我们能否也用它来形容 JavaScript 呢？答案是 Yes。
　　我将会在这篇文章中介绍同构 JavaScript 应用的概念，并推荐一些资源帮助你构建此类应用。
　　一路走来
　　多年以前，web 只是一些由 HTML 和 CSS 搭建的静态页面，没有太多的交互。用户的每一个动作都需要服务器来创建并返回一个完整的页面。幸而有了 JavaScript，开发者开始创建很棒的效果，不过 Ajax 的到来才是这场革新的真正开始。Web 开发者开始编写能够与服务端进行交互，且在不重载页面的情况下向服务端发送并接受数据的页面。
　　随着时间的推移，客户端代码可以做的事情越来越多，催生了被称作单页面应用（SPA）的一类应用。SPA 在首次加载页面时就获取了所有必需的资源，或者再按需动态加载并且渲染到页面上。 Gmail 和 StackEdit editor 是很棒的 SPA 示例。
　　SPA 准许重度的交互设计，因为几乎所有的操作都在客户端执行，保持最低限度地与服务端进行交流。不幸的是，它们也存在一些严重的问题，我们选择几个进行讨论。
　　性能
　　因为相对于静态页面，SPA 需要更多的客户端代码，需要下载数据的体积也更大。这使得手机加载速度很慢，可能会导致一些极端的状况 —— 比如糟糕的用户体验以及收入上的损失等。依据 Microsoft 的一篇文章 ——
　　Bing 的一项研究表明：页面的加载时间每增加 10ms，站点年度总收入就会减少 $250K。
　　SEO
　　因为单页面应用依赖于 JavaScript 的执行，服务器不会提供它们可能用到的任何 HTML 内容。因此，web 爬虫很难去索引到这些页面。爬虫就是可以向 web 服务器发送请求，并且将结果分析成原始文本的程序，而不需要像一个浏览器运行 JavaScript 那样解释和执行客户端的内容。不久前，Google 优化了搜索引擎的 web 爬虫，现在它也可以抓取基于客户端 JavaScript 所构建的页面了。但是 Bing、Yahoo 以及其他搜索引擎怎么办？一个好的索引对任何公司来说都至关重要，它通常会带来更多的流量以及更高的回报。
　　同构 JavaScript 应用
　　同构 JavaScript 应用基于 JavaScript 编写，可以在客户端和服务端运行。正因为此，你只需要写一次代码，就可以在服务端渲染静态页面，还可以在客户端完成复杂的交互。所以，这种方式互通了两个世界，并且避免了前面提到了两个问题。
　　现在，有很多框架可以帮助你开发这类应用。其中最著名的一个可能是 Meteor。Meter 是一个开源 JavaScript 框架，基于 Node.js 编写，专注于实时 web 应用。我想提到的另一个项目是 Rendr，它是 Airbnb 开发的一款轻量级类库，准许同时在客户端和服务端运行 Backbone.js。
　　越来越多的公司将 Node.js 应用到他们的产品中。客户端和服务端的代码共享成为一个更加普通而自然的选择。在我看来，这种做法将是 web 开发的未来。有些类库通过共享模板又增强了这一趋势，比如 React。
　　结论
　　这篇文章介绍了同构 JavaScript 应用的概念，它是应用开发的一种全新方式，可以最大限度地结合服务端和客户端应用程序。我们还讨论了运用这种方式尝试解决的问题，以及你现在就可以参与实践的一些项目。
　　你听说过同构 JavaScript 应用么？你开发过么？你有开发经验么？
　　原文：Isomorphic JavaScript Applications — the Future of the Web?

手把手教你利用JavaScript 获取任意网站图片链接

网站优化 • 优采云发表了文章 • 0 个评论 • 68 次浏览 • 2022-06-18 06:43 • 来自相关话题

手把手教你利用JavaScript 获取任意网站图片链接
　　点击上方“IT共享之家”，进行关注
　　回复“资料”可获赠Python学习福利
　　今
　　日
　　鸡
　　汤
　　下马饮君酒，问君何所之。前言
　　大家好，我是IT共享者，人称皮皮。
　　妹纸图这个网站想必大家都非常熟悉了，老司机的天堂。小编第一次进表示身体逐渐变得空虚，表示一定要克制自己，远离这种正能量的网站。话不多说，今天带大家获取妹纸图上的图片链接。然后大家都懂得。
　　一、项目准备
　　360浏览器，仅此而已
　　二、项目目的
　　获取页面所有美女图片
　　三、项目步骤1.打开浏览器，搜索图片，我们以美女图片为例：
　　画面太美，不忍直视。
　　2.打开浏览器控制台
　　F12，即可打开浏览器控制台，我们今天要做的是获取所有的图片链接，顺便查看下图片。如下图：
　　今天我们就是要在这里面获取到所有的图片链接，相信没接触过前端的人肯定对此一无所知，但是小编接下来讲过了之后你还是一无所知，那就是你的不对了。
　　3.控制台功能大揭秘
　　大家可能觉得这个地方没啥用处，啥都没有，还不如Element Network用处大；诚然前两者的确用处很大，可以用来分析网页结构和网页请求，但是我想说的是控制台的功能你永远不要小瞧了，因为它可以使你在开发过程中能快速的见到效果图，比如，你写个代码，但是你想看看它现在能否运行，一般的做法就是写个HTML+CSS然后将JavaScript嵌入到里面去，这样做显然太麻烦，而且你修改后还得在刷新才能浏览器中运行看到效果，到最后就是浏览器和编辑器频繁切换，影响开发速度和效率甚至占用多余的系统资源。于是，控制台应运而生，它使得我们可以轻松的使用JavaScript代码而无需再搭配HTML和CSS才能运行，一个Console统统搞定，这就是我们刚刚说的控制台。我们可以先来看看它的功能：
　　可以看到，它是有自动提示功能的，而且比任何三方IDE的都要全面，因为它是和浏览器配套使用的，而其它IDE则无法做到这么齐全，所以你可以看到有时候如果你想用某种方法它不会提示，那就只有一个原因，就是你用错了。
　　1).改变它的编辑状态
　　控制台输入：
　　![4](4.jpgdocument.body.contentEditable=true
　　在编辑状态下，我们的点击操作是没有任何作用的，也就是说只能修改，如果想还原，刷新一下浏览器即可。
　　2).定位特定元素
　　这里我们可以先看看我们要看的浏览器图片元素的信息，可以先打印出所有的图片，这里使用一个特殊符号：
　　我们可以看到，通过这个语法糖可以打印当前页面所有的图片信息，显示70，说明这个页面有70张图片，当小编再次滚动鼠标时发现图片数量就变多了，变成了136张图片，这说明它是Ajax加载的。
　　除了这种获取图片的方式外，还可以这样：
　　document.images
　　得到的结果和上面的一模一样，有了这几样的知识点积累，我们现在就可以轻松获取所有的图片链接了。
　　4.获取图片链接和图片名
　　这里我们要把获取到的图片添加到数组中，然后在遍历即可打印出所有的图片。
　　1).建立数组存放所有的图片
　　ab=document.images #获取当前页面所有图片var aa=[] #建立数组for(const y of ab){ #建立const变量使得无法修改 aa.push(y); #把图片装进数组}
　　
　　2).遍历数组打印图片链接
　　这里可以使用多种方法，小编一一介绍。
　　1)).For ...in
　　for(const a in aa){ console.log(aa[a])}
　　2)).For...of
　　for(const a of aa){ console.log(a)}
　　3)).ForEach
　　aa.forEach(function(val,item,array){ console.log(val)});
　　4)).Map
　　![10](10.jpgaa.map(function(val,item,array){ console.log(val)});
　　可以看到第四种方法和第三种方法差不多，但是它们还是有差别的，前者是没有返回值的，而后者有并且后者支持修改返回值。虽然我们打印出了图片链接，但是图片名字并没有打印出来，于是小编开始找图片名字：
　　发现它在Div标签里，于是小编开始找满足条件的Div：
　　document.querySelectorAll('div.img_tit')#精确找到所有类名为img_tit的Divdocument.getElementsByClassName('img_tit')#找到所有类名为img_tit
　　然后我们先输出图片名再输出图片链接，这样就可以使用循环然后进行判断，如下图：
var a=0;do{ a++; if(a%2==0){ console.log(aa[a])} else{ console.log(ac[a]) } }while(a 查看全部

　　手把手教你利用JavaScript 获取任意网站图片链接
　　点击上方“IT共享之家”，进行关注
　　回复“资料”可获赠Python学习福利
　　今
　　日
　　鸡
　　汤
　　下马饮君酒，问君何所之。前言
　　大家好，我是IT共享者，人称皮皮。
　　妹纸图这个网站想必大家都非常熟悉了，老司机的天堂。小编第一次进表示身体逐渐变得空虚，表示一定要克制自己，远离这种正能量的网站。话不多说，今天带大家获取妹纸图上的图片链接。然后大家都懂得。
　　一、项目准备
　　360浏览器，仅此而已
　　二、项目目的
　　获取页面所有美女图片
　　三、项目步骤1.打开浏览器，搜索图片，我们以美女图片为例：
　　画面太美，不忍直视。
　　2.打开浏览器控制台
　　F12，即可打开浏览器控制台，我们今天要做的是获取所有的图片链接，顺便查看下图片。如下图：
　　今天我们就是要在这里面获取到所有的图片链接，相信没接触过前端的人肯定对此一无所知，但是小编接下来讲过了之后你还是一无所知，那就是你的不对了。
　　3.控制台功能大揭秘
　　大家可能觉得这个地方没啥用处，啥都没有，还不如Element Network用处大；诚然前两者的确用处很大，可以用来分析网页结构和网页请求，但是我想说的是控制台的功能你永远不要小瞧了，因为它可以使你在开发过程中能快速的见到效果图，比如，你写个代码，但是你想看看它现在能否运行，一般的做法就是写个HTML+CSS然后将JavaScript嵌入到里面去，这样做显然太麻烦，而且你修改后还得在刷新才能浏览器中运行看到效果，到最后就是浏览器和编辑器频繁切换，影响开发速度和效率甚至占用多余的系统资源。于是，控制台应运而生，它使得我们可以轻松的使用JavaScript代码而无需再搭配HTML和CSS才能运行，一个Console统统搞定，这就是我们刚刚说的控制台。我们可以先来看看它的功能：
　　可以看到，它是有自动提示功能的，而且比任何三方IDE的都要全面，因为它是和浏览器配套使用的，而其它IDE则无法做到这么齐全，所以你可以看到有时候如果你想用某种方法它不会提示，那就只有一个原因，就是你用错了。
　　1).改变它的编辑状态
　　控制台输入：
　　![4](4.jpgdocument.body.contentEditable=true
　　在编辑状态下，我们的点击操作是没有任何作用的，也就是说只能修改，如果想还原，刷新一下浏览器即可。
　　2).定位特定元素
　　这里我们可以先看看我们要看的浏览器图片元素的信息，可以先打印出所有的图片，这里使用一个特殊符号：
　　我们可以看到，通过这个语法糖可以打印当前页面所有的图片信息，显示70，说明这个页面有70张图片，当小编再次滚动鼠标时发现图片数量就变多了，变成了136张图片，这说明它是Ajax加载的。
　　除了这种获取图片的方式外，还可以这样：
　　document.images
　　得到的结果和上面的一模一样，有了这几样的知识点积累，我们现在就可以轻松获取所有的图片链接了。
　　4.获取图片链接和图片名
　　这里我们要把获取到的图片添加到数组中，然后在遍历即可打印出所有的图片。
　　1).建立数组存放所有的图片
　　ab=document.images #获取当前页面所有图片var aa=[] #建立数组for(const y of ab){ #建立const变量使得无法修改 aa.push(y); #把图片装进数组}
　　

2).遍历数组打印图片链接
　　这里可以使用多种方法，小编一一介绍。
　　1)).For ...in
　　for(const a in aa){ console.log(aa[a])}
　　2)).For...of
　　for(const a of aa){ console.log(a)}
　　3)).ForEach
　　aa.forEach(function(val,item,array){ console.log(val)});
　　4)).Map
　　![10](10.jpgaa.map(function(val,item,array){ console.log(val)});
　　可以看到第四种方法和第三种方法差不多，但是它们还是有差别的，前者是没有返回值的，而后者有并且后者支持修改返回值。虽然我们打印出了图片链接，但是图片名字并没有打印出来，于是小编开始找图片名字：
　　发现它在Div标签里，于是小编开始找满足条件的Div：
　　document.querySelectorAll('div.img_tit')#精确找到所有类名为img_tit的Divdocument.getElementsByClassName('img_tit')#找到所有类名为img_tit
　　然后我们先输出图片名再输出图片链接，这样就可以使用循环然后进行判断，如下图：
var a=0;do{ a++; if(a%2==0){ console.log(aa[a])} else{ console.log(ac[a]) } }while(a

js 抓取网页内容 JavaScript 启动性能瓶颈分析与解决方案

网站优化 • 优采云发表了文章 • 0 个评论 • 72 次浏览 • 2022-06-14 03:20 • 来自相关话题

　　js 抓取网页内容 JavaScript 启动性能瓶颈分析与解决方案
　　（点击上方公众号，可快速关注）
　　英文：Addy Osmani 译文：王下邀月熊_Chevalier
　　/a/40139
　　
　　在 Web 开发中，随着需求的增加与代码库的扩张，我们最终发布的 Web 页面也逐渐膨胀。不过这种膨胀远不止意味着占据更多的传输带宽，其还意味着用户浏览网页时可能更差劲的性能体验。浏览器在下载完某个页面依赖的脚本之后，其还需要经过语法分析、解释与运行这些步骤。而本文则会深入分析浏览器对于 JavaScript 的这些处理流程，挖掘出那些影响你应用启动时间的罪魁祸首，并且根据我个人的经验提出相对应的解决方案。回顾过去，我们还没有专门地考虑过如何去优化 JavaScript 解析/编译这些步骤；我们预想中的是解析器在发现查看全部

　　js 抓取网页内容 JavaScript 启动性能瓶颈分析与解决方案
　　（点击上方公众号，可快速关注）
　　英文：Addy Osmani 译文：王下邀月熊_Chevalier
　　/a/40139
　　

　　在 Web 开发中，随着需求的增加与代码库的扩张，我们最终发布的 Web 页面也逐渐膨胀。不过这种膨胀远不止意味着占据更多的传输带宽，其还意味着用户浏览网页时可能更差劲的性能体验。浏览器在下载完某个页面依赖的脚本之后，其还需要经过语法分析、解释与运行这些步骤。而本文则会深入分析浏览器对于 JavaScript 的这些处理流程，挖掘出那些影响你应用启动时间的罪魁祸首，并且根据我个人的经验提出相对应的解决方案。回顾过去，我们还没有专门地考虑过如何去优化 JavaScript 解析/编译这些步骤；我们预想中的是解析器在发现

js 抓取网页内容【第2030期】JavaScript 启动性能瓶颈分析与解决方案

网站优化 • 优采云发表了文章 • 0 个评论 • 81 次浏览 • 2022-06-14 03:14 • 来自相关话题

　　js 抓取网页内容【第2030期】JavaScript 启动性能瓶颈分析与解决方案
　　前言
　　找到它存在的意义。今日早读文章由@王下邀月熊翻译授权分享。
　　正文从这开始~~
　　在 Web 开发中，随着需求的增加与代码库的扩张，我们最终发布的 Web 页面也逐渐膨胀。不过这种膨胀远不止意味着占据更多的传输带宽，其还意味着用户浏览网页时可能更差劲的性能体验。浏览器在下载完某个页面依赖的脚本之后，其还需要经过语法分析、解释与运行这些步骤。而本文则会深入分析浏览器对于 JavaScript 的这些处理流程，挖掘出那些影响你应用启动时间的罪魁祸首，并且根据我个人的经验提出相对应的解决方案。回顾过去，我们还没有专门地考虑过如何去优化 JavaScript 解析/编译这些步骤；我们预想中的是解析器在发现查看全部

　　js 抓取网页内容【第2030期】JavaScript 启动性能瓶颈分析与解决方案
　　前言
　　找到它存在的意义。今日早读文章由@王下邀月熊翻译授权分享。
　　正文从这开始~~
　　在 Web 开发中，随着需求的增加与代码库的扩张，我们最终发布的 Web 页面也逐渐膨胀。不过这种膨胀远不止意味着占据更多的传输带宽，其还意味着用户浏览网页时可能更差劲的性能体验。浏览器在下载完某个页面依赖的脚本之后，其还需要经过语法分析、解释与运行这些步骤。而本文则会深入分析浏览器对于 JavaScript 的这些处理流程，挖掘出那些影响你应用启动时间的罪魁祸首，并且根据我个人的经验提出相对应的解决方案。回顾过去，我们还没有专门地考虑过如何去优化 JavaScript 解析/编译这些步骤；我们预想中的是解析器在发现

js 抓取网页内容 JavaScript SEO怎么做

网站优化 • 优采云发表了文章 • 0 个评论 • 82 次浏览 • 2022-06-12 13:22 • 来自相关话题

　　js 抓取网页内容 JavaScript SEO怎么做
　　点击可听原文音频
　　Hi. I’m Charles from Cross Border Digital. Are you using JavaScript to build your website? Or maybe you're thinking about using one of the modern JavaScript frameworks like Ember，or Node, or React, or Angular. Great frameworks for building very app-like interfaces on the web.
　　大家好，我是来自Cross Border Digital的Charles. 你是否在用JavaScript建站？或者你正在考虑用使用当前比较流行的JavaScript框架：Ember、Node、React 或者是Angular建站。这些都是非常优秀的框架，给网站访客提供类似App的操作体验。
　　But there are some specific challenges to using these JavaScript frameworks when it comes to SEO. Today, I want to share some insights and advice to help you use these frameworks in a way that will work with Google and ensure that you can rank your website.
　　但是，当涉及到SEO时，使用这些JavaScript框架存在一些特定的挑战。今天，我想分享一些见解和建议，以帮助您以与Google兼容的方式使用这些框架，确保网站在搜索引擎可以获得排名。
　　So when it comes to understanding the implications of using JavaScript to build your website, there's really two key things that we need to talk about.
　　因此，在理解使用JavaScript开发网站的意味着什么时，实际上需要讨论两个关键问题。
　　The first is how Google actually works, how they treat JavaScript, how they understand JavaScript.
　　首先是Google的实际工作方式，如何对待JavaScript，如何理解JavaScript。
　　And the second is understanding the difference between client-side and server-side rendering of the content and HTML on your website.
　　其次是了解网站上内容和HTML在客户端和服务器端渲染的区别。
　　These two things together if we can understand these, then we can understand how Google is going to interpret our JavaScript-powered website.
　　如果我们能理解这两点，那么就能了解Google如何读取JavaScript网站。
　　OK, so let's first talk about Google. There are in fact two parts to Google's crawling and indexing of the web.
　　首先来讨论一下谷歌运作方式。谷歌爬取收录网站可以分为两部分：
　　The first part, the crawler known as Google bot is the part of the infrastructure that follows every link it can find on the web to uncover every URL on your website and on every other website on the web. The Google bot crawler can't render JavaScript. It's simply visits a page. It will do a very quick pass of any HTML that it finds on the page to see if there are any other links it can follow, but otherwise, it simply passes the URLs that it does find back to the indexer.
　　第一部分，被称为Googlebot的爬虫，是搜索引擎基础结构的一部分，它通过页面上的链接，去发现更多的页面。Google bot爬虫无法渲染JavaScript。它只是访问页面。爬虫会快速查找HTML页面上的链接，继续抓取新的页面，没有新页面就会将爬取到的链接返回谷歌索引（indexer）。
　　The indexer is the part of Google's infrastructure that completely renders a page of all the content, all of the CSS, all of the layout to try in the content, to try and understand what that page is about. So that when someone does a query, it can return that page if it is relevant. Now the indexer can render JavaScript.
　　索引是Google搜索引擎基础结构的一部分，它负责渲染页面的所有内容，CSS，布局以了解该页面的核心信息。当用户在搜索引擎搜索时，才能返回客户搜索相关的页面。Indexer可以渲染JavaScript。
　　So it is true to say that Google can render JavaScript-powered web pages.
　　因此，谷歌支持渲染基于JavaScript开发的网页。
　　But to completely understand the implications of using JavaScript, it is also important to understand the difference between client-side and server-side rendering of JavaScript, because that makes a big difference to the way that Google will interpret your website.
　　但是要完全了解使用JavaScript对网站的影响，理解客户端和服务器端JavaScript渲染的区别也很重要，因为会导致Google读取网站的方式有很大的不同。
　　So generally speaking, these modern JavaScript frameworks like React, Node, and Ember, and Angular, they render in the client side. That means that when someone visits your web page that's built on one of these frameworks, the JavaScript application is delivered to the browser and then it renders everything in the browser. It calls the CSS, it calls the content, it calls the images, and any other resources required to lay out your web page into the browser and renders them on the client side.
　　因此，一般而言，这些现代JavaScript框架（例如React，Node和Ember和Angular）在客户端渲染。这意味着，当用户访问基于这些框架之一开发的网页时，JavaScript应用程序将交付给浏览器，然后在浏览器渲染所有内容。将CSS，文本，图片以及其他与页面展示有关的资源推送到浏览器，在客户端渲染。
　　A server side rendering is when all of that work is done on the server and the HTML of the complete page and all of the content is delivered to the browser.
　　服务器端渲染是指在服务器端完成所有渲染工作并将完整页面的HTML和所有内容都交付给浏览器。
　　Now this has big implications for Google because as we said, the Google bot can't render JavaScript. So that means that when Google bot visits your home page or your JavaScript-powered website, if you're rendering client side, it means that the Google bot will get that JavaScript application, but it can't render any content or any link. So it won't find any other links to crawl. It will have to send that single URL back to the indexer. The indexer will then render that page it will find any links and content on that page. It will send any links back to the crawler so that they can then continue to follow those you URLs to see if they can find any other links.
　　这就对Google收录网站产生了重大影响，因为前面所说，Google bot无法渲染JavaScript。因此，这意味着当Googlebot访问您的主页或基于JavaScript的网站时，如果网站是客户端渲染，则意味着Googlebot将获取该JavaScript应用程序，但无法渲染任何内容或任何链接。因此Google bot无法找到页面上的链接。只会返回一个URL给谷歌indexer。谷歌Indexer将渲染页面并发现页面上的更多链接和内容。发现新的链接则会返回给Google bot去继续抓取进一步发现更多的链接。
　　And in this way, it really slows down the process of crawling your website in a very big way.And this is particularly relevant for large websites. So the other thing that I have to say about client-side rendering is that Google's indexer uses a version of Chrome for rendering JavaScript-powered websites that is about three years old.
　　这就极大地减慢了网站被谷歌抓取的速度，这对于大型网站影响尤为重大。另外，关于客户端渲染，Google的indexer用来来渲染JavaScript网站所使用的Chrome版本已有大约三年的历史。
　　And that means that it can't support all the latest features of these modern JavaScript frameworks. So if you're building a JavaScript-powered website that renders on the client side, it is very important that you ensure that it is set up in such a way that it degrades nicely forolder versions of browsers so that Google can render your content and your HTML completely.
　　这意味着它不能支持那些现代JavaScript框架的所有最新功能。因此，如果您要构建一个以JavaScript为基础的网站，并在客户端进行渲染，那么确保Google可以完全渲染您的内容和HTML。
　　I've seen many cases where a client-side rendered JavaScript page, Google is able to crawl or index part of it but it doesn't see all of the continent because of some of the features that have been included in that page. So very important that your JavaScript-powered web page degrades nicely for all the versions of the browser.
　　我看过很多客户端渲染的JavaScript页面，Google能够对其进行抓取和编制索引，但是页面中包含的某些功能谷歌无法完整抓取。因此要确保你基于JavaScript开发网页对于所有版本的浏览器都可以很好地渲染。
　　So we know that Google bot can't render JavaScript. And whilst the Google indexer can render JavaScript, we also know that it is limited to older versions of Chrome. So it doesn't support all the latest features of these modern JavaScript frameworks. So if we are doing client-side rendering, it is more difficult to get all of our content indexed and it really does slow down the entire process of crawling and indexing your website, which includes any changes you make. So if you make changes to your website, particularly if you have a large website, it can take weeks or months for those changes to be reflected in the Google index.
　　我们知道Google bot无法渲染JavaScript。Google indexer可以呈现JavaScript，但我们也知道它仅限于旧版Chrome。因此，它不支持这些现代JavaScript框架的所有最新功能。因此，如果在进行客户端渲染，则很难对所有的内容建立索引，这确实会减慢对网站被抓取和索引的整个过程，其中包括您所做的任何更改。因此，如果您对网站进行更改，尤其大型网站，则这些更改可能需要数周或数月才能反映在Google索引中。
　　So what to do? Well, our strong recommendation is to use old-fashioned content management systems for public facing web sites, first and foremost. If you haven't yet startedthe build of your website, then we would strongly recommend you look at one of the content management systems like Ambraco, or Wordpress, or Drupal. Or if you're doing e-commerce, use one of the platforms like Shopify, or BigCommerce.
　　那么该怎么办？我们强烈建议普通网站使用老式的内容管理系统。如果您还没有开始网站的搭建，那么我们强烈建议您考虑内容管理系统，例如Ambraco，Wordpress或Drupal。如果是电商网站，可以考虑用Shopify, BigCommerce电商建站系统。
　　These content management applications render everything on the server side and deliver the complete HTML to the browser and to Google, which makes it much easier and faster for Google to completely crawl and index your website.
　　这些内容管理应用程序在服务器端呈现所有内容，并向浏览器和Google提供完整的HTML，这使Google能够更轻松，更快地完全抓取您的网站并为其建立索引。
　　If you have already built a JavaScript-powered website, or if you really want to for whatever reason, then our drawing recommendation is to configure that application to render everything on the server side and deliver the complete HTML and content to the browser and to Google in the same way that one of these older traditional content management systems does. If you can do that properly, then it will make it a lot easier and faster for Google to index your JavaScript-powered website.
　　如果您已经建立了JavaScript网站，或者出于特殊原因确实想要这样做，那么我们的建议是配置该应用程序在服务器端渲染所有内容，并将完整的HTML和内容提供给浏览器和Google 就像这些较旧的传统内容管理系统之一一样。这对于谷歌快速抓取收录你的JavaScript网站会有很大帮助。
　　Perhaps, you've already launched a JavaScript-powered website that is rendering everything on the client side. You're not sure how to configure it to do that on the server side. There are third party services like Prerender.io, which might help. They might be worth looking at as well.
　　又或许，您已经发布了一个在客户端渲染的基于JavaScript开发的网站，不知道如何改为在服务器端渲染。有一些第三方服务获取可以帮你解决这个问题，比如Prerender.io。可以了解一下。
　　So in a nutshell, a lot of SEO is about making it easy for Google and not putting hurdles in Google's way. And when it comes to using JavaScript on your website, there are quite a fewpitfalls and traps that if you're not aware of them, it's easy to fall into.
　　简而言之，很多SEO都是为了帮助Google而不是阻碍Google抓取网站。在您的网站上使用JavaScript时，会有很多陷阱，稍不小心就掉入。
　　So if you have any questions, we would love to help. So hit us up here on WeChat. We could have you to discuss your specific case and we look forward to talking to you. Thanks.
　　如果您有任何疑问，欢迎联系我们微信小编。期待与您交流，谢谢大家。查看全部

　　js 抓取网页内容 JavaScript SEO怎么做
　　点击可听原文音频
　　Hi. I’m Charles from Cross Border Digital. Are you using JavaScript to build your website? Or maybe you're thinking about using one of the modern JavaScript frameworks like Ember，or Node, or React, or Angular. Great frameworks for building very app-like interfaces on the web.
　　大家好，我是来自Cross Border Digital的Charles. 你是否在用JavaScript建站？或者你正在考虑用使用当前比较流行的JavaScript框架：Ember、Node、React 或者是Angular建站。这些都是非常优秀的框架，给网站访客提供类似App的操作体验。
　　But there are some specific challenges to using these JavaScript frameworks when it comes to SEO. Today, I want to share some insights and advice to help you use these frameworks in a way that will work with Google and ensure that you can rank your website.
　　但是，当涉及到SEO时，使用这些JavaScript框架存在一些特定的挑战。今天，我想分享一些见解和建议，以帮助您以与Google兼容的方式使用这些框架，确保网站在搜索引擎可以获得排名。
　　So when it comes to understanding the implications of using JavaScript to build your website, there's really two key things that we need to talk about.
　　因此，在理解使用JavaScript开发网站的意味着什么时，实际上需要讨论两个关键问题。
　　The first is how Google actually works, how they treat JavaScript, how they understand JavaScript.
　　首先是Google的实际工作方式，如何对待JavaScript，如何理解JavaScript。
　　And the second is understanding the difference between client-side and server-side rendering of the content and HTML on your website.
　　其次是了解网站上内容和HTML在客户端和服务器端渲染的区别。
　　These two things together if we can understand these, then we can understand how Google is going to interpret our JavaScript-powered website.
　　如果我们能理解这两点，那么就能了解Google如何读取JavaScript网站。
　　OK, so let's first talk about Google. There are in fact two parts to Google's crawling and indexing of the web.
　　首先来讨论一下谷歌运作方式。谷歌爬取收录网站可以分为两部分：
　　The first part, the crawler known as Google bot is the part of the infrastructure that follows every link it can find on the web to uncover every URL on your website and on every other website on the web. The Google bot crawler can't render JavaScript. It's simply visits a page. It will do a very quick pass of any HTML that it finds on the page to see if there are any other links it can follow, but otherwise, it simply passes the URLs that it does find back to the indexer.
　　第一部分，被称为Googlebot的爬虫，是搜索引擎基础结构的一部分，它通过页面上的链接，去发现更多的页面。Google bot爬虫无法渲染JavaScript。它只是访问页面。爬虫会快速查找HTML页面上的链接，继续抓取新的页面，没有新页面就会将爬取到的链接返回谷歌索引（indexer）。
　　The indexer is the part of Google's infrastructure that completely renders a page of all the content, all of the CSS, all of the layout to try in the content, to try and understand what that page is about. So that when someone does a query, it can return that page if it is relevant. Now the indexer can render JavaScript.
　　索引是Google搜索引擎基础结构的一部分，它负责渲染页面的所有内容，CSS，布局以了解该页面的核心信息。当用户在搜索引擎搜索时，才能返回客户搜索相关的页面。Indexer可以渲染JavaScript。
　　So it is true to say that Google can render JavaScript-powered web pages.
　　因此，谷歌支持渲染基于JavaScript开发的网页。
　　But to completely understand the implications of using JavaScript, it is also important to understand the difference between client-side and server-side rendering of JavaScript, because that makes a big difference to the way that Google will interpret your website.
　　但是要完全了解使用JavaScript对网站的影响，理解客户端和服务器端JavaScript渲染的区别也很重要，因为会导致Google读取网站的方式有很大的不同。
　　So generally speaking, these modern JavaScript frameworks like React, Node, and Ember, and Angular, they render in the client side. That means that when someone visits your web page that's built on one of these frameworks, the JavaScript application is delivered to the browser and then it renders everything in the browser. It calls the CSS, it calls the content, it calls the images, and any other resources required to lay out your web page into the browser and renders them on the client side.
　　因此，一般而言，这些现代JavaScript框架（例如React，Node和Ember和Angular）在客户端渲染。这意味着，当用户访问基于这些框架之一开发的网页时，JavaScript应用程序将交付给浏览器，然后在浏览器渲染所有内容。将CSS，文本，图片以及其他与页面展示有关的资源推送到浏览器，在客户端渲染。
　　A server side rendering is when all of that work is done on the server and the HTML of the complete page and all of the content is delivered to the browser.
　　服务器端渲染是指在服务器端完成所有渲染工作并将完整页面的HTML和所有内容都交付给浏览器。
　　Now this has big implications for Google because as we said, the Google bot can't render JavaScript. So that means that when Google bot visits your home page or your JavaScript-powered website, if you're rendering client side, it means that the Google bot will get that JavaScript application, but it can't render any content or any link. So it won't find any other links to crawl. It will have to send that single URL back to the indexer. The indexer will then render that page it will find any links and content on that page. It will send any links back to the crawler so that they can then continue to follow those you URLs to see if they can find any other links.
　　这就对Google收录网站产生了重大影响，因为前面所说，Google bot无法渲染JavaScript。因此，这意味着当Googlebot访问您的主页或基于JavaScript的网站时，如果网站是客户端渲染，则意味着Googlebot将获取该JavaScript应用程序，但无法渲染任何内容或任何链接。因此Google bot无法找到页面上的链接。只会返回一个URL给谷歌indexer。谷歌Indexer将渲染页面并发现页面上的更多链接和内容。发现新的链接则会返回给Google bot去继续抓取进一步发现更多的链接。
　　And in this way, it really slows down the process of crawling your website in a very big way.And this is particularly relevant for large websites. So the other thing that I have to say about client-side rendering is that Google's indexer uses a version of Chrome for rendering JavaScript-powered websites that is about three years old.
　　这就极大地减慢了网站被谷歌抓取的速度，这对于大型网站影响尤为重大。另外，关于客户端渲染，Google的indexer用来来渲染JavaScript网站所使用的Chrome版本已有大约三年的历史。
　　And that means that it can't support all the latest features of these modern JavaScript frameworks. So if you're building a JavaScript-powered website that renders on the client side, it is very important that you ensure that it is set up in such a way that it degrades nicely forolder versions of browsers so that Google can render your content and your HTML completely.
　　这意味着它不能支持那些现代JavaScript框架的所有最新功能。因此，如果您要构建一个以JavaScript为基础的网站，并在客户端进行渲染，那么确保Google可以完全渲染您的内容和HTML。
　　I've seen many cases where a client-side rendered JavaScript page, Google is able to crawl or index part of it but it doesn't see all of the continent because of some of the features that have been included in that page. So very important that your JavaScript-powered web page degrades nicely for all the versions of the browser.
　　我看过很多客户端渲染的JavaScript页面，Google能够对其进行抓取和编制索引，但是页面中包含的某些功能谷歌无法完整抓取。因此要确保你基于JavaScript开发网页对于所有版本的浏览器都可以很好地渲染。
　　So we know that Google bot can't render JavaScript. And whilst the Google indexer can render JavaScript, we also know that it is limited to older versions of Chrome. So it doesn't support all the latest features of these modern JavaScript frameworks. So if we are doing client-side rendering, it is more difficult to get all of our content indexed and it really does slow down the entire process of crawling and indexing your website, which includes any changes you make. So if you make changes to your website, particularly if you have a large website, it can take weeks or months for those changes to be reflected in the Google index.
　　我们知道Google bot无法渲染JavaScript。Google indexer可以呈现JavaScript，但我们也知道它仅限于旧版Chrome。因此，它不支持这些现代JavaScript框架的所有最新功能。因此，如果在进行客户端渲染，则很难对所有的内容建立索引，这确实会减慢对网站被抓取和索引的整个过程，其中包括您所做的任何更改。因此，如果您对网站进行更改，尤其大型网站，则这些更改可能需要数周或数月才能反映在Google索引中。
　　So what to do? Well, our strong recommendation is to use old-fashioned content management systems for public facing web sites, first and foremost. If you haven't yet startedthe build of your website, then we would strongly recommend you look at one of the content management systems like Ambraco, or Wordpress, or Drupal. Or if you're doing e-commerce, use one of the platforms like Shopify, or BigCommerce.
　　那么该怎么办？我们强烈建议普通网站使用老式的内容管理系统。如果您还没有开始网站的搭建，那么我们强烈建议您考虑内容管理系统，例如Ambraco，Wordpress或Drupal。如果是电商网站，可以考虑用Shopify, BigCommerce电商建站系统。
　　These content management applications render everything on the server side and deliver the complete HTML to the browser and to Google, which makes it much easier and faster for Google to completely crawl and index your website.
　　这些内容管理应用程序在服务器端呈现所有内容，并向浏览器和Google提供完整的HTML，这使Google能够更轻松，更快地完全抓取您的网站并为其建立索引。
　　If you have already built a JavaScript-powered website, or if you really want to for whatever reason, then our drawing recommendation is to configure that application to render everything on the server side and deliver the complete HTML and content to the browser and to Google in the same way that one of these older traditional content management systems does. If you can do that properly, then it will make it a lot easier and faster for Google to index your JavaScript-powered website.
　　如果您已经建立了JavaScript网站，或者出于特殊原因确实想要这样做，那么我们的建议是配置该应用程序在服务器端渲染所有内容，并将完整的HTML和内容提供给浏览器和Google 就像这些较旧的传统内容管理系统之一一样。这对于谷歌快速抓取收录你的JavaScript网站会有很大帮助。
　　Perhaps, you've already launched a JavaScript-powered website that is rendering everything on the client side. You're not sure how to configure it to do that on the server side. There are third party services like Prerender.io, which might help. They might be worth looking at as well.
　　又或许，您已经发布了一个在客户端渲染的基于JavaScript开发的网站，不知道如何改为在服务器端渲染。有一些第三方服务获取可以帮你解决这个问题，比如Prerender.io。可以了解一下。
　　So in a nutshell, a lot of SEO is about making it easy for Google and not putting hurdles in Google's way. And when it comes to using JavaScript on your website, there are quite a fewpitfalls and traps that if you're not aware of them, it's easy to fall into.
　　简而言之，很多SEO都是为了帮助Google而不是阻碍Google抓取网站。在您的网站上使用JavaScript时，会有很多陷阱，稍不小心就掉入。
　　So if you have any questions, we would love to help. So hit us up here on WeChat. We could have you to discuss your specific case and we look forward to talking to you. Thanks.
　　如果您有任何疑问，欢迎联系我们微信小编。期待与您交流，谢谢大家。

js 抓取网页内容 JavaScript SEO怎么做

网站优化 • 优采云发表了文章 • 0 个评论 • 87 次浏览 • 2022-06-05 05:12 • 来自相关话题

　　js 抓取网页内容 JavaScript SEO怎么做
　　点击可听原文音频
　　Hi. I’m Charles from Cross Border Digital. Are you using JavaScript to build your website? Or maybe you're thinking about using one of the modern JavaScript frameworks like Ember，or Node, or React, or Angular. Great frameworks for building very app-like interfaces on the web.
　　大家好，我是来自Cross Border Digital的Charles. 你是否在用JavaScript建站？或者你正在考虑用使用当前比较流行的JavaScript框架：Ember、Node、React 或者是Angular建站。这些都是非常优秀的框架，给网站访客提供类似App的操作体验。
　　But there are some specific challenges to using these JavaScript frameworks when it comes to SEO. Today, I want to share some insights and advice to help you use these frameworks in a way that will work with Google and ensure that you can rank your website.
　　但是，当涉及到SEO时，使用这些JavaScript框架存在一些特定的挑战。今天，我想分享一些见解和建议，以帮助您以与Google兼容的方式使用这些框架，确保网站在搜索引擎可以获得排名。
　　So when it comes to understanding the implications of using JavaScript to build your website, there's really two key things that we need to talk about.
　　因此，在理解使用JavaScript开发网站的意味着什么时，实际上需要讨论两个关键问题。
　　The first is how Google actually works, how they treat JavaScript, how they understand JavaScript.
　　首先是Google的实际工作方式，如何对待JavaScript，如何理解JavaScript。
　　And the second is understanding the difference between client-side and server-side rendering of the content and HTML on your website.
　　其次是了解网站上内容和HTML在客户端和服务器端渲染的区别。
　　These two things together if we can understand these, then we can understand how Google is going to interpret our JavaScript-powered website.
　　如果我们能理解这两点，那么就能了解Google如何读取JavaScript网站。
　　OK, so let's first talk about Google. There are in fact two parts to Google's crawling and indexing of the web.
　　首先来讨论一下谷歌运作方式。谷歌爬取收录网站可以分为两部分：
　　The first part, the crawler known as Google bot is the part of the infrastructure that follows every link it can find on the web to uncover every URL on your website and on every other website on the web. The Google bot crawler can't render JavaScript. It's simply visits a page. It will do a very quick pass of any HTML that it finds on the page to see if there are any other links it can follow, but otherwise, it simply passes the URLs that it does find back to the indexer.
　　第一部分，被称为Googlebot的爬虫，是搜索引擎基础结构的一部分，它通过页面上的链接，去发现更多的页面。Google bot爬虫无法渲染JavaScript。它只是访问页面。爬虫会快速查找HTML页面上的链接，继续抓取新的页面，没有新页面就会将爬取到的链接返回谷歌索引（indexer）。
　　The indexer is the part of Google's infrastructure that completely renders a page of all the content, all of the CSS, all of the layout to try in the content, to try and understand what that page is about. So that when someone does a query, it can return that page if it is relevant. Now the indexer can render JavaScript.
　　索引是Google搜索引擎基础结构的一部分，它负责渲染页面的所有内容，CSS，布局以了解该页面的核心信息。当用户在搜索引擎搜索时，才能返回客户搜索相关的页面。Indexer可以渲染JavaScript。
　　So it is true to say that Google can render JavaScript-powered web pages.
　　因此，谷歌支持渲染基于JavaScript开发的网页。
　　But to completely understand the implications of using JavaScript, it is also important to understand the difference between client-side and server-side rendering of JavaScript, because that makes a big difference to the way that Google will interpret your website.
　　但是要完全了解使用JavaScript对网站的影响，理解客户端和服务器端JavaScript渲染的区别也很重要，因为会导致Google读取网站的方式有很大的不同。
　　So generally speaking, these modern JavaScript frameworks like React, Node, and Ember, and Angular, they render in the client side. That means that when someone visits your web page that's built on one of these frameworks, the JavaScript application is delivered to the browser and then it renders everything in the browser. It calls the CSS, it calls the content, it calls the images, and any other resources required to lay out your web page into the browser and renders them on the client side.
　　因此，一般而言，这些现代JavaScript框架（例如React，Node和Ember和Angular）在客户端渲染。这意味着，当用户访问基于这些框架之一开发的网页时，JavaScript应用程序将交付给浏览器，然后在浏览器渲染所有内容。将CSS，文本，图片以及其他与页面展示有关的资源推送到浏览器，在客户端渲染。
　　A server side rendering is when all of that work is done on the server and the HTML of the complete page and all of the content is delivered to the browser.
　　服务器端渲染是指在服务器端完成所有渲染工作并将完整页面的HTML和所有内容都交付给浏览器。
　　Now this has big implications for Google because as we said, the Google bot can't render JavaScript. So that means that when Google bot visits your home page or your JavaScript-powered website, if you're rendering client side, it means that the Google bot will get that JavaScript application, but it can't render any content or any link. So it won't find any other links to crawl. It will have to send that single URL back to the indexer. The indexer will then render that page it will find any links and content on that page. It will send any links back to the crawler so that they can then continue to follow those you URLs to see if they can find any other links.
　　这就对Google收录网站产生了重大影响，因为前面所说，Google bot无法渲染JavaScript。因此，这意味着当Googlebot访问您的主页或基于JavaScript的网站时，如果网站是客户端渲染，则意味着Googlebot将获取该JavaScript应用程序，但无法渲染任何内容或任何链接。因此Google bot无法找到页面上的链接。只会返回一个URL给谷歌indexer。谷歌Indexer将渲染页面并发现页面上的更多链接和内容。发现新的链接则会返回给Google bot去继续抓取进一步发现更多的链接。
　　And in this way, it really slows down the process of crawling your website in a very big way.And this is particularly relevant for large websites. So the other thing that I have to say about client-side rendering is that Google's indexer uses a version of Chrome for rendering JavaScript-powered websites that is about three years old.
　　这就极大地减慢了网站被谷歌抓取的速度，这对于大型网站影响尤为重大。另外，关于客户端渲染，Google的indexer用来来渲染JavaScript网站所使用的Chrome版本已有大约三年的历史。
　　And that means that it can't support all the latest features of these modern JavaScript frameworks. So if you're building a JavaScript-powered website that renders on the client side, it is very important that you ensure that it is set up in such a way that it degrades nicely forolder versions of browsers so that Google can render your content and your HTML completely.
　　这意味着它不能支持那些现代JavaScript框架的所有最新功能。因此，如果您要构建一个以JavaScript为基础的网站，并在客户端进行渲染，那么确保Google可以完全渲染您的内容和HTML。
　　I've seen many cases where a client-side rendered JavaScript page, Google is able to crawl or index part of it but it doesn't see all of the continent because of some of the features that have been included in that page. So very important that your JavaScript-powered web page degrades nicely for all the versions of the browser.
　　我看过很多客户端渲染的JavaScript页面，Google能够对其进行抓取和编制索引，但是页面中包含的某些功能谷歌无法完整抓取。因此要确保你基于JavaScript开发网页对于所有版本的浏览器都可以很好地渲染。
　　So we know that Google bot can't render JavaScript. And whilst the Google indexer can render JavaScript, we also know that it is limited to older versions of Chrome. So it doesn't support all the latest features of these modern JavaScript frameworks. So if we are doing client-side rendering, it is more difficult to get all of our content indexed and it really does slow down the entire process of crawling and indexing your website, which includes any changes you make. So if you make changes to your website, particularly if you have a large website, it can take weeks or months for those changes to be reflected in the Google index.
　　我们知道Google bot无法渲染JavaScript。Google indexer可以呈现JavaScript，但我们也知道它仅限于旧版Chrome。因此，它不支持这些现代JavaScript框架的所有最新功能。因此，如果在进行客户端渲染，则很难对所有的内容建立索引，这确实会减慢对网站被抓取和索引的整个过程，其中包括您所做的任何更改。因此，如果您对网站进行更改，尤其大型网站，则这些更改可能需要数周或数月才能反映在Google索引中。
　　So what to do? Well, our strong recommendation is to use old-fashioned content management systems for public facing web sites, first and foremost. If you haven't yet startedthe build of your website, then we would strongly recommend you look at one of the content management systems like Ambraco, or Wordpress, or Drupal. Or if you're doing e-commerce, use one of the platforms like Shopify, or BigCommerce.
　　那么该怎么办？我们强烈建议普通网站使用老式的内容管理系统。如果您还没有开始网站的搭建，那么我们强烈建议您考虑内容管理系统，例如Ambraco，Wordpress或Drupal。如果是电商网站，可以考虑用Shopify, BigCommerce电商建站系统。
　　These content management applications render everything on the server side and deliver the complete HTML to the browser and to Google, which makes it much easier and faster for Google to completely crawl and index your website.
　　这些内容管理应用程序在服务器端呈现所有内容，并向浏览器和Google提供完整的HTML，这使Google能够更轻松，更快地完全抓取您的网站并为其建立索引。
　　If you have already built a JavaScript-powered website, or if you really want to for whatever reason, then our drawing recommendation is to configure that application to render everything on the server side and deliver the complete HTML and content to the browser and to Google in the same way that one of these older traditional content management systems does. If you can do that properly, then it will make it a lot easier and faster for Google to index your JavaScript-powered website.
　　如果您已经建立了JavaScript网站，或者出于特殊原因确实想要这样做，那么我们的建议是配置该应用程序在服务器端渲染所有内容，并将完整的HTML和内容提供给浏览器和Google 就像这些较旧的传统内容管理系统之一一样。这对于谷歌快速抓取收录你的JavaScript网站会有很大帮助。
　　Perhaps, you've already launched a JavaScript-powered website that is rendering everything on the client side. You're not sure how to configure it to do that on the server side. There are third party services like Prerender.io, which might help. They might be worth looking at as well.
　　又或许，您已经发布了一个在客户端渲染的基于JavaScript开发的网站，不知道如何改为在服务器端渲染。有一些第三方服务获取可以帮你解决这个问题，比如Prerender.io。可以了解一下。
　　So in a nutshell, a lot of SEO is about making it easy for Google and not putting hurdles in Google's way. And when it comes to using JavaScript on your website, there are quite a fewpitfalls and traps that if you're not aware of them, it's easy to fall into.
　　简而言之，很多SEO都是为了帮助Google而不是阻碍Google抓取网站。在您的网站上使用JavaScript时，会有很多陷阱，稍不小心就掉入。
　　So if you have any questions, we would love to help. So hit us up here on WeChat. We could have you to discuss your specific case and we look forward to talking to you. Thanks.
　　如果您有任何疑问，欢迎联系我们微信小编。期待与您交流，谢谢大家。
　　查看全部

　　js 抓取网页内容 JavaScript SEO怎么做
　　点击可听原文音频
　　Hi. I’m Charles from Cross Border Digital. Are you using JavaScript to build your website? Or maybe you're thinking about using one of the modern JavaScript frameworks like Ember，or Node, or React, or Angular. Great frameworks for building very app-like interfaces on the web.
　　大家好，我是来自Cross Border Digital的Charles. 你是否在用JavaScript建站？或者你正在考虑用使用当前比较流行的JavaScript框架：Ember、Node、React 或者是Angular建站。这些都是非常优秀的框架，给网站访客提供类似App的操作体验。
　　But there are some specific challenges to using these JavaScript frameworks when it comes to SEO. Today, I want to share some insights and advice to help you use these frameworks in a way that will work with Google and ensure that you can rank your website.
　　但是，当涉及到SEO时，使用这些JavaScript框架存在一些特定的挑战。今天，我想分享一些见解和建议，以帮助您以与Google兼容的方式使用这些框架，确保网站在搜索引擎可以获得排名。
　　So when it comes to understanding the implications of using JavaScript to build your website, there's really two key things that we need to talk about.
　　因此，在理解使用JavaScript开发网站的意味着什么时，实际上需要讨论两个关键问题。
　　The first is how Google actually works, how they treat JavaScript, how they understand JavaScript.
　　首先是Google的实际工作方式，如何对待JavaScript，如何理解JavaScript。
　　And the second is understanding the difference between client-side and server-side rendering of the content and HTML on your website.
　　其次是了解网站上内容和HTML在客户端和服务器端渲染的区别。
　　These two things together if we can understand these, then we can understand how Google is going to interpret our JavaScript-powered website.
　　如果我们能理解这两点，那么就能了解Google如何读取JavaScript网站。
　　OK, so let's first talk about Google. There are in fact two parts to Google's crawling and indexing of the web.
　　首先来讨论一下谷歌运作方式。谷歌爬取收录网站可以分为两部分：
　　The first part, the crawler known as Google bot is the part of the infrastructure that follows every link it can find on the web to uncover every URL on your website and on every other website on the web. The Google bot crawler can't render JavaScript. It's simply visits a page. It will do a very quick pass of any HTML that it finds on the page to see if there are any other links it can follow, but otherwise, it simply passes the URLs that it does find back to the indexer.
　　第一部分，被称为Googlebot的爬虫，是搜索引擎基础结构的一部分，它通过页面上的链接，去发现更多的页面。Google bot爬虫无法渲染JavaScript。它只是访问页面。爬虫会快速查找HTML页面上的链接，继续抓取新的页面，没有新页面就会将爬取到的链接返回谷歌索引（indexer）。
　　The indexer is the part of Google's infrastructure that completely renders a page of all the content, all of the CSS, all of the layout to try in the content, to try and understand what that page is about. So that when someone does a query, it can return that page if it is relevant. Now the indexer can render JavaScript.
　　索引是Google搜索引擎基础结构的一部分，它负责渲染页面的所有内容，CSS，布局以了解该页面的核心信息。当用户在搜索引擎搜索时，才能返回客户搜索相关的页面。Indexer可以渲染JavaScript。
　　So it is true to say that Google can render JavaScript-powered web pages.
　　因此，谷歌支持渲染基于JavaScript开发的网页。
　　But to completely understand the implications of using JavaScript, it is also important to understand the difference between client-side and server-side rendering of JavaScript, because that makes a big difference to the way that Google will interpret your website.
　　但是要完全了解使用JavaScript对网站的影响，理解客户端和服务器端JavaScript渲染的区别也很重要，因为会导致Google读取网站的方式有很大的不同。
　　So generally speaking, these modern JavaScript frameworks like React, Node, and Ember, and Angular, they render in the client side. That means that when someone visits your web page that's built on one of these frameworks, the JavaScript application is delivered to the browser and then it renders everything in the browser. It calls the CSS, it calls the content, it calls the images, and any other resources required to lay out your web page into the browser and renders them on the client side.
　　因此，一般而言，这些现代JavaScript框架（例如React，Node和Ember和Angular）在客户端渲染。这意味着，当用户访问基于这些框架之一开发的网页时，JavaScript应用程序将交付给浏览器，然后在浏览器渲染所有内容。将CSS，文本，图片以及其他与页面展示有关的资源推送到浏览器，在客户端渲染。
　　A server side rendering is when all of that work is done on the server and the HTML of the complete page and all of the content is delivered to the browser.
　　服务器端渲染是指在服务器端完成所有渲染工作并将完整页面的HTML和所有内容都交付给浏览器。
　　Now this has big implications for Google because as we said, the Google bot can't render JavaScript. So that means that when Google bot visits your home page or your JavaScript-powered website, if you're rendering client side, it means that the Google bot will get that JavaScript application, but it can't render any content or any link. So it won't find any other links to crawl. It will have to send that single URL back to the indexer. The indexer will then render that page it will find any links and content on that page. It will send any links back to the crawler so that they can then continue to follow those you URLs to see if they can find any other links.
　　这就对Google收录网站产生了重大影响，因为前面所说，Google bot无法渲染JavaScript。因此，这意味着当Googlebot访问您的主页或基于JavaScript的网站时，如果网站是客户端渲染，则意味着Googlebot将获取该JavaScript应用程序，但无法渲染任何内容或任何链接。因此Google bot无法找到页面上的链接。只会返回一个URL给谷歌indexer。谷歌Indexer将渲染页面并发现页面上的更多链接和内容。发现新的链接则会返回给Google bot去继续抓取进一步发现更多的链接。
　　And in this way, it really slows down the process of crawling your website in a very big way.And this is particularly relevant for large websites. So the other thing that I have to say about client-side rendering is that Google's indexer uses a version of Chrome for rendering JavaScript-powered websites that is about three years old.
　　这就极大地减慢了网站被谷歌抓取的速度，这对于大型网站影响尤为重大。另外，关于客户端渲染，Google的indexer用来来渲染JavaScript网站所使用的Chrome版本已有大约三年的历史。
　　And that means that it can't support all the latest features of these modern JavaScript frameworks. So if you're building a JavaScript-powered website that renders on the client side, it is very important that you ensure that it is set up in such a way that it degrades nicely forolder versions of browsers so that Google can render your content and your HTML completely.
　　这意味着它不能支持那些现代JavaScript框架的所有最新功能。因此，如果您要构建一个以JavaScript为基础的网站，并在客户端进行渲染，那么确保Google可以完全渲染您的内容和HTML。
　　I've seen many cases where a client-side rendered JavaScript page, Google is able to crawl or index part of it but it doesn't see all of the continent because of some of the features that have been included in that page. So very important that your JavaScript-powered web page degrades nicely for all the versions of the browser.
　　我看过很多客户端渲染的JavaScript页面，Google能够对其进行抓取和编制索引，但是页面中包含的某些功能谷歌无法完整抓取。因此要确保你基于JavaScript开发网页对于所有版本的浏览器都可以很好地渲染。
　　So we know that Google bot can't render JavaScript. And whilst the Google indexer can render JavaScript, we also know that it is limited to older versions of Chrome. So it doesn't support all the latest features of these modern JavaScript frameworks. So if we are doing client-side rendering, it is more difficult to get all of our content indexed and it really does slow down the entire process of crawling and indexing your website, which includes any changes you make. So if you make changes to your website, particularly if you have a large website, it can take weeks or months for those changes to be reflected in the Google index.
　　我们知道Google bot无法渲染JavaScript。Google indexer可以呈现JavaScript，但我们也知道它仅限于旧版Chrome。因此，它不支持这些现代JavaScript框架的所有最新功能。因此，如果在进行客户端渲染，则很难对所有的内容建立索引，这确实会减慢对网站被抓取和索引的整个过程，其中包括您所做的任何更改。因此，如果您对网站进行更改，尤其大型网站，则这些更改可能需要数周或数月才能反映在Google索引中。
　　So what to do? Well, our strong recommendation is to use old-fashioned content management systems for public facing web sites, first and foremost. If you haven't yet startedthe build of your website, then we would strongly recommend you look at one of the content management systems like Ambraco, or Wordpress, or Drupal. Or if you're doing e-commerce, use one of the platforms like Shopify, or BigCommerce.
　　那么该怎么办？我们强烈建议普通网站使用老式的内容管理系统。如果您还没有开始网站的搭建，那么我们强烈建议您考虑内容管理系统，例如Ambraco，Wordpress或Drupal。如果是电商网站，可以考虑用Shopify, BigCommerce电商建站系统。
　　These content management applications render everything on the server side and deliver the complete HTML to the browser and to Google, which makes it much easier and faster for Google to completely crawl and index your website.
　　这些内容管理应用程序在服务器端呈现所有内容，并向浏览器和Google提供完整的HTML，这使Google能够更轻松，更快地完全抓取您的网站并为其建立索引。
　　If you have already built a JavaScript-powered website, or if you really want to for whatever reason, then our drawing recommendation is to configure that application to render everything on the server side and deliver the complete HTML and content to the browser and to Google in the same way that one of these older traditional content management systems does. If you can do that properly, then it will make it a lot easier and faster for Google to index your JavaScript-powered website.
　　如果您已经建立了JavaScript网站，或者出于特殊原因确实想要这样做，那么我们的建议是配置该应用程序在服务器端渲染所有内容，并将完整的HTML和内容提供给浏览器和Google 就像这些较旧的传统内容管理系统之一一样。这对于谷歌快速抓取收录你的JavaScript网站会有很大帮助。
　　Perhaps, you've already launched a JavaScript-powered website that is rendering everything on the client side. You're not sure how to configure it to do that on the server side. There are third party services like Prerender.io, which might help. They might be worth looking at as well.
　　又或许，您已经发布了一个在客户端渲染的基于JavaScript开发的网站，不知道如何改为在服务器端渲染。有一些第三方服务获取可以帮你解决这个问题，比如Prerender.io。可以了解一下。
　　So in a nutshell, a lot of SEO is about making it easy for Google and not putting hurdles in Google's way. And when it comes to using JavaScript on your website, there are quite a fewpitfalls and traps that if you're not aware of them, it's easy to fall into.
　　简而言之，很多SEO都是为了帮助Google而不是阻碍Google抓取网站。在您的网站上使用JavaScript时，会有很多陷阱，稍不小心就掉入。
　　So if you have any questions, we would love to help. So hit us up here on WeChat. We could have you to discuss your specific case and we look forward to talking to you. Thanks.
　　如果您有任何疑问，欢迎联系我们微信小编。期待与您交流，谢谢大家。
　　

js 抓取网页内容同构 JavaScript 应用 —— Web 世界的未来？

网站优化 • 优采云发表了文章 • 0 个评论 • 122 次浏览 • 2022-06-04 22:24 • 来自相关话题

　　js 抓取网页内容同构 JavaScript 应用 —— Web 世界的未来？
　　Web 世界有一个至理名言，就是 Java 提出的“Write once, run everywhere”。但这句话只适用于 Java 么？我们能否也用它来形容 JavaScript 呢？答案是 Yes。
　　我将会在这篇文章中介绍同构 JavaScript 应用的概念，并推荐一些资源帮助你构建此类应用。
　　一路走来
　　多年以前，web 只是一些由 HTML 和 CSS 搭建的静态页面，没有太多的交互。用户的每一个动作都需要服务器来创建并返回一个完整的页面。幸而有了 JavaScript，开发者开始创建很棒的效果，不过 Ajax 的到来才是这场革新的真正开始。Web 开发者开始编写能够与服务端进行交互，且在不重载页面的情况下向服务端发送并接受数据的页面。
　　随着时间的推移，客户端代码可以做的事情越来越多，催生了被称作单页面应用（SPA）的一类应用。SPA 在首次加载页面时就获取了所有必需的资源，或者再按需动态加载并且渲染到页面上。 Gmail 和 StackEdit editor 是很棒的 SPA 示例。
　　SPA 准许重度的交互设计，因为几乎所有的操作都在客户端执行，保持最低限度地与服务端进行交流。不幸的是，它们也存在一些严重的问题，我们选择几个进行讨论。
　　性能
　　因为相对于静态页面，SPA 需要更多的客户端代码，需要下载数据的体积也更大。这使得手机加载速度很慢，可能会导致一些极端的状况 —— 比如糟糕的用户体验以及收入上的损失等。依据 Microsoft 的一篇文章 ——
　　Bing 的一项研究表明：页面的加载时间每增加 10ms，站点年度总收入就会减少 $250K。
　　SEO
　　因为单页面应用依赖于 JavaScript 的执行，服务器不会提供它们可能用到的任何 HTML 内容。因此，web 爬虫很难去索引到这些页面。爬虫就是可以向 web 服务器发送请求，并且将结果分析成原始文本的程序，而不需要像一个浏览器运行 JavaScript 那样解释和执行客户端的内容。不久前，Google 优化了搜索引擎的 web 爬虫，现在它也可以抓取基于客户端 JavaScript 所构建的页面了。但是 Bing、Yahoo 以及其他搜索引擎怎么办？一个好的索引对任何公司来说都至关重要，它通常会带来更多的流量以及更高的回报。
　　同构 JavaScript 应用
　　同构 JavaScript 应用基于 JavaScript 编写，可以在客户端和服务端运行。正因为此，你只需要写一次代码，就可以在服务端渲染静态页面，还可以在客户端完成复杂的交互。所以，这种方式互通了两个世界，并且避免了前面提到了两个问题。
　　现在，有很多框架可以帮助你开发这类应用。其中最著名的一个可能是 Meteor。Meter 是一个开源 JavaScript 框架，基于 Node.js 编写，专注于实时 web 应用。我想提到的另一个项目是 Rendr，它是 Airbnb 开发的一款轻量级类库，准许同时在客户端和服务端运行 Backbone.js。
　　越来越多的公司将 Node.js 应用到他们的产品中。客户端和服务端的代码共享成为一个更加普通而自然的选择。在我看来，这种做法将是 web 开发的未来。有些类库通过共享模板又增强了这一趋势，比如 React。
　　结论
　　这篇文章介绍了同构 JavaScript 应用的概念，它是应用开发的一种全新方式，可以最大限度地结合服务端和客户端应用程序。我们还讨论了运用这种方式尝试解决的问题，以及你现在就可以参与实践的一些项目。
　　你听说过同构 JavaScript 应用么？你开发过么？你有开发经验么？
　　原文：Isomorphic JavaScript Applications — the Future of the Web? 查看全部

　　js 抓取网页内容同构 JavaScript 应用 —— Web 世界的未来？
　　Web 世界有一个至理名言，就是 Java 提出的“Write once, run everywhere”。但这句话只适用于 Java 么？我们能否也用它来形容 JavaScript 呢？答案是 Yes。
　　我将会在这篇文章中介绍同构 JavaScript 应用的概念，并推荐一些资源帮助你构建此类应用。
　　一路走来
　　多年以前，web 只是一些由 HTML 和 CSS 搭建的静态页面，没有太多的交互。用户的每一个动作都需要服务器来创建并返回一个完整的页面。幸而有了 JavaScript，开发者开始创建很棒的效果，不过 Ajax 的到来才是这场革新的真正开始。Web 开发者开始编写能够与服务端进行交互，且在不重载页面的情况下向服务端发送并接受数据的页面。
　　随着时间的推移，客户端代码可以做的事情越来越多，催生了被称作单页面应用（SPA）的一类应用。SPA 在首次加载页面时就获取了所有必需的资源，或者再按需动态加载并且渲染到页面上。 Gmail 和 StackEdit editor 是很棒的 SPA 示例。
　　SPA 准许重度的交互设计，因为几乎所有的操作都在客户端执行，保持最低限度地与服务端进行交流。不幸的是，它们也存在一些严重的问题，我们选择几个进行讨论。
　　性能
　　因为相对于静态页面，SPA 需要更多的客户端代码，需要下载数据的体积也更大。这使得手机加载速度很慢，可能会导致一些极端的状况 —— 比如糟糕的用户体验以及收入上的损失等。依据 Microsoft 的一篇文章 ——
　　Bing 的一项研究表明：页面的加载时间每增加 10ms，站点年度总收入就会减少 $250K。
　　SEO
　　因为单页面应用依赖于 JavaScript 的执行，服务器不会提供它们可能用到的任何 HTML 内容。因此，web 爬虫很难去索引到这些页面。爬虫就是可以向 web 服务器发送请求，并且将结果分析成原始文本的程序，而不需要像一个浏览器运行 JavaScript 那样解释和执行客户端的内容。不久前，Google 优化了搜索引擎的 web 爬虫，现在它也可以抓取基于客户端 JavaScript 所构建的页面了。但是 Bing、Yahoo 以及其他搜索引擎怎么办？一个好的索引对任何公司来说都至关重要，它通常会带来更多的流量以及更高的回报。
　　同构 JavaScript 应用
　　同构 JavaScript 应用基于 JavaScript 编写，可以在客户端和服务端运行。正因为此，你只需要写一次代码，就可以在服务端渲染静态页面，还可以在客户端完成复杂的交互。所以，这种方式互通了两个世界，并且避免了前面提到了两个问题。
　　现在，有很多框架可以帮助你开发这类应用。其中最著名的一个可能是 Meteor。Meter 是一个开源 JavaScript 框架，基于 Node.js 编写，专注于实时 web 应用。我想提到的另一个项目是 Rendr，它是 Airbnb 开发的一款轻量级类库，准许同时在客户端和服务端运行 Backbone.js。
　　越来越多的公司将 Node.js 应用到他们的产品中。客户端和服务端的代码共享成为一个更加普通而自然的选择。在我看来，这种做法将是 web 开发的未来。有些类库通过共享模板又增强了这一趋势，比如 React。
　　结论
　　这篇文章介绍了同构 JavaScript 应用的概念，它是应用开发的一种全新方式，可以最大限度地结合服务端和客户端应用程序。我们还讨论了运用这种方式尝试解决的问题，以及你现在就可以参与实践的一些项目。
　　你听说过同构 JavaScript 应用么？你开发过么？你有开发经验么？
　　原文：Isomorphic JavaScript Applications — the Future of the Web?

js 抓取网页内容【第2030期】JavaScript 启动性能瓶颈分析与解决方案

网站优化 • 优采云发表了文章 • 0 个评论 • 94 次浏览 • 2022-05-31 16:00 • 来自相关话题

　　js 抓取网页内容【第2030期】JavaScript 启动性能瓶颈分析与解决方案
　　前言
　　找到它存在的意义。今日早读文章由@王下邀月熊翻译授权分享。
　　正文从这开始~~
　　在 Web 开发中，随着需求的增加与代码库的扩张，我们最终发布的 Web 页面也逐渐膨胀。不过这种膨胀远不止意味着占据更多的传输带宽，其还意味着用户浏览网页时可能更差劲的性能体验。浏览器在下载完某个页面依赖的脚本之后，其还需要经过语法分析、解释与运行这些步骤。而本文则会深入分析浏览器对于 JavaScript 的这些处理流程，挖掘出那些影响你应用启动时间的罪魁祸首，并且根据我个人的经验提出相对应的解决方案。回顾过去，我们还没有专门地考虑过如何去优化 JavaScript 解析/编译这些步骤；我们预想中的是解析器在发现查看全部

　　js 抓取网页内容【第2030期】JavaScript 启动性能瓶颈分析与解决方案
　　前言
　　找到它存在的意义。今日早读文章由@王下邀月熊翻译授权分享。
　　正文从这开始~~
　　在 Web 开发中，随着需求的增加与代码库的扩张，我们最终发布的 Web 页面也逐渐膨胀。不过这种膨胀远不止意味着占据更多的传输带宽，其还意味着用户浏览网页时可能更差劲的性能体验。浏览器在下载完某个页面依赖的脚本之后，其还需要经过语法分析、解释与运行这些步骤。而本文则会深入分析浏览器对于 JavaScript 的这些处理流程，挖掘出那些影响你应用启动时间的罪魁祸首，并且根据我个人的经验提出相对应的解决方案。回顾过去，我们还没有专门地考虑过如何去优化 JavaScript 解析/编译这些步骤；我们预想中的是解析器在发现

js 抓取网页内容

话题描述

相关话题

最佳回复者

1 人关注该话题