c爬虫抓取网页数据( 2019-03-24我想发网络爬虫)

优采云发布时间: 2021-09-10 05:06

　　c爬虫抓取网页数据(

2019-03-24我想发网络爬虫)

　　基于C#实现网络爬虫C#抓取网页Html源码

　　时间：2019-03-24

　　本文章给大家介绍了基于C#实现网络爬虫C#抓取网页的Html源代码，主要包括使用C#实现网络爬虫C#抓取网页的Html源代码，应用技巧，基本知识点总结和注意事项，有一定的参考价值，有需要的朋友可以参考。

　　我最近刚刚完成了一个简单的网络爬虫。起初我很困惑，不知道如何开始。后来查了很多资料，确实能满足我的需求。有用的信息——代码很难找到。所以想发这个文章，让想做这个功能的朋友少走一些弯路。

　　首先抓取Html源代码，选择节点的href：使用System.IO添加；使用 System.Net;

private void Search(string url)

{

string rl;

WebRequest Request = WebRequest.Create(url.Trim());

WebResponse Response = Request.GetResponse();

Stream resStream = Response.GetResponseStream();

StreamReader sr = new StreamReader(resStream, Encoding.Default);

StringBuilder sb = new StringBuilder();

while ((rl = sr.ReadLine()) != null)

{

sb.Append(rl);

}

string str = sb.ToString().ToLower();

string str_get = mid(str, "", "");

int start = 0;

while (true)

{

if (str_get == null)

break;

string strResult = mid(str_get, "href=\"", "\"", out start);

if (strResult == null)

break;

else

{

lab[url] += strResult;

str_get = str_get.Substring(start);

}

private string mid(string istr, string startString, string endString)

{

int iBodyStart = istr.IndexOf(startString, 0); //开始位置

if (iBodyStart == -1)

return null;

iBodyStart += startString.Length; //第一次字符位置起的长度

int iBodyEnd = istr.IndexOf(endString, iBodyStart); //第二次字符在第一次字符位置起的首次位置

if (iBodyEnd == -1)

return null;

iBodyEnd += endString.Length; //第二次字符位置起的长度

string strResult = istr.Substring(iBodyStart, iBodyEnd - iBodyStart - 1);

return strResult;

}

private string mid(string istr, string startString, string endString, out int iBodyEnd)

{

//初始化out参数,否则不能return

iBodyEnd = 0;

int iBodyStart = istr.IndexOf(startString, 0); //开始位置

if (iBodyStart == -1)

return null;

iBodyStart += startString.Length; //第一次字符位置起的长度

iBodyEnd = istr.IndexOf(endString, iBodyStart); //第二次字符在第一次字符位置起的首次位置

if (iBodyEnd == -1)

return null;

iBodyEnd += endString.Length; //第二次字符位置起的长度

string strResult = istr.Substring(iBodyStart, iBodyEnd - iBodyStart - 1);

return strResult;

}

　　好的，以上就是全部代码了。如果你想运行它，你必须自己修改一些细节。

0

2021-09-10

c爬虫抓取网页数据

0 个评论

要回复文章请先登录或注册

AI时代内容工厂

c爬虫抓取网页数据( 2019-03-24我想发网络爬虫)

0 个评论

发起人