网页flash抓取(主力干将Qt(CutyCapt)-的环境下提供图像渲染CutyCapt )
优采云 发布时间: 2021-10-30 20:14网页flash抓取(主力干将Qt(CutyCapt)-的环境下提供图像渲染CutyCapt
)
install.sh:
#!/bin/bash
echo "now installing cutycapt"
sudo apt-get update -y
sudo apt-get install build-essential -y
sudo apt-get install xvfb -y
sudo apt-get install xfs xfonts-scalable xfonts-100dpi -y
sudo apt-get install libgl1-mesa-dri -y
sudo apt-get install subversion libqt4-webkit libqt4-dev g++ -y
mkdir ~/scripts
cd ~/scripts
svn co https://cutycapt.svn.sourceforge.net/svnroot/cutycapt
cd cutycapt/CutyCapt
qmake
make
xvfb-run --server-args="-screen 0, 1024x768x24" ./CutyCapt --url=http://www.google.com --out=example.png
或
xvfb(命令行下模拟X-server,并缓存渲染的图形)-在未安装X-Server的环境下提供图像渲染
CutyCapt(模拟浏览器下载网页,HTML,css渲染,Javascript执行,截取最终渲染网页的快照)——主力
Qt(CutyCapt就是基于这个框架开发的)
练习:
1.安装 CutyCapt、Qt 及相关软件包:
sudo apt-get install subversion libqt4-webkit libqt4-dev g++
svn co https://cutycapt.svn.sourceforge.net/svnroot/cutycapt
cd cutycapt/CutyCapt
qmake
make
2.安装xvfb:
apt-get install xvfb
3.爬行测试:
xvfb-run –server-args="-screen 0, 1024×768x24" ./CutyCapt –url=http://www.zol.com.cn –out=zol.png
或
cutycapt --url="http://google.com" --out=./google.jpg
参考:
您也可以使用 wkhtmltopdf
用法:
#To convert a remote HTML file to PDF:
wkhtmltopdf http://www.google.com google.pdf
#To convert a local HTML file to PDF:
wkhtmltopdf my.html my.pdf
#You can also convert to PS files if you like:
wkhtmltopdf my.html my.ps
#The eler2.pdf sample file
wkhtmltopdf http://geekz.co.uk/lovesraymond/archive/eler-highlights-2008 eler2.pdf -H --outline
视频:
3、安装中文字体库#很多系统没有中文,所以必须安装中文字体库,否则网页会显示方块
sudo apt-get install ttf-arphic-ukai ttf-arphic-uming
sudo apt-get install ttf-wqy-zenhei
sudo fc-cache -v
4、安装flash插件#Now web网站很多flash可用。为了不显示一个框,只需安装它。
sudo apt-get install flashplugin-nonfree