c# - 在已知模式之间查找文本

Question

我有一个网页源代码，其中有多次出现

<div class="detName">some unpredictable text</div>

我希望能够收集所有some unpredictable text.

我试过类似的东西：

var match = Regex.Match(pageSourceCode, @"<div class='detName'>/(A-Za-z0-9\-]+)\</div>", RegexOptions.IgnoreCase);

但是没有成功，对于这个问题有什么好的解决方案？

score 2 · Accepted Answer

不要使用正则表达式来解析 HTML，您可以使用HTML Agility Pack：

string html = "<div class=\"detName\">some unpredictable text</div>";
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
HtmlAgilityPack.HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//div[contains(@class,'detName')]");
foreach (var node in nodes)
{
    Console.WriteLine(node.InnerText);
}

score 0 · Accepted Answer

var match = Regex.Match(pageSourceCode, @"(?<=<div class='detName'>)(.*)(?=</div>)", RegexOptions.IgnoreCase);

c# - 在已知模式之间查找文本

2 回答 2

Related

Reference