c# - C# 使用 HttpWebRequests 提取名称

Question

我是动漫迷，我想获得所有动漫角色的完整列表，所以我遇到了这个网站： http ://www.animevice.com/characters/?page=1 我的目标是提取所有名称并添加他们到listBox1。这是我当前的代码：

        try
        {
        while (true)
        {
            HttpWebRequest req = (HttpWebRequest)HttpWebRequest.Create("http://www.animevice.com/characters/?page=" + n);
            req.Method = "GET";
            req.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:15.0) Gecko/20100101 Firefox/15.0";
            req.KeepAlive = true;

            HttpWebResponse response = (HttpWebResponse)req.GetResponse();
            Stream responseData = response.GetResponseStream();
            StreamReader reader = new StreamReader(responseData);
            string responseFromServer = reader.ReadToEnd();
            string m = "<a href=\"(.*)\" class=\"name\">(.*)</a>";
            Match match = Regex.Match(responseFromServer, m, RegexOptions.IgnoreCase);
            if (match.Success)
            {
                listBox1.Items.Add(match.Groups[2]Value.ToString());

            }
            if (listBox1.Items.Count % 50 == 0)
            {
                n++;
            }
        }
}
catch { }

然而，这给了我很多次名单上的第一个名字（Monkey D. Luffy）。有什么解决办法吗？干杯

score 1 · Accepted Answer

我会使用像HtmlAgilityPack这样的真正的 html 解析器来解析 html 而不是正则表达式。

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(responseFromServer);
var names = doc.DocumentNode.SelectNodes("//a[@class='name']")
                .Select(a=>a.InnerText)
                .ToList();

listBox1.DataSource = names;

score 0 · Accepted Answer

您只读取了一个页面名称。

反而：

Match match = Regex.Match(responseFromServer, m, RegexOptions.IgnoreCase);
if (match.Success)
{
    listBox1.Items.Add(match.Groups[2]Value.ToString());

}
if (listBox1.Items.Count % 50 == 0)
{
    n++;
}

尝试这个：

var matches = Regex.Matches(responseFromServer, m, RegexOptions.IgnoreCase);
foreach (var item in matches)
{
    var match = item as Match;
    if (match.Success)
    {
        listBox1.Items.Add(match.Groups[2]Value.ToString());    
    }
    if (list.Count % 50 == 0)
    {
        n++;
    }
}

score 0 · Accepted Answer

using (StreamReader reader = new StreamReader(responseData))
  {
        string line;
        while ((line = reader.ReadLine()) != null)
        {
             string m = "<a href=\"(.*)\" class=\"name\">(.*)</a>";
             Match match = Regex.Match(line, m, RegexOptions.IgnoreCase);
             if (match.Success)
             {
                 listBox1.Items.Add(match.Groups[2].Value.ToString());
             }
         }
  }

c# - C# 使用 HttpWebRequests 提取名称

3 回答 3

Related

Reference