c# - 获取2个html标签之间的文本c#

Question

我正在尝试获取提供的 html（跨度）之间的数据（在本例中为 31）

这是原始代码（来自 chrome 中的检查元素）

<span id="point_total" class="tooltip" oldtitle="Note: If the number is black, your points are actually a little bit negative.  Don't worry, this just means you need to start subbing again." aria-describedby="ui-tooltip-0">31</span>

我有一个包含页面源的富文本框，这里是相同的代码，但在富文本框的第 51 行：

<DIV id=point_display>You have<BR><SPAN id=point_total class=tooltip jQuery16207621750175125325="23" oldtitle="Note: If the number is black, your points are actually a little bit negative.  Don't worry, this just means you need to start subbing again.">17</SPAN><BR>Points </DIV><IMG style="FLOAT: right" title="Gain subscribers" border=0 alt="When people subscribe to you, you lose a point" src="http://static.subxcess.com/images/page/decoration/remove-1-point.png"> </DIV>

我该怎么做呢？我已经尝试了几种方法，但它们似乎都不适合我。

我正在尝试从此页面检索点值：http: //www.subxcess.com/sub4sub.php 该数字会根据您的订阅者而变化。

score 11 · Accepted Answer

您可能对此非常具体：

var regex = new Regex(@"<span id=""point_total"" class=""tooltip"" oldtitle="".*?"" aria-describedby=""ui-tooltip-0"">(.*?)</span>");

var match = regex.Match(@"<span id=""point_total"" class=""tooltip"" oldtitle=""Note: If the number is black, your points are actually a little bit negative.  Don't worry, this just means you need to start subbing again."" aria-describedby=""ui-tooltip-0"">31</span>");

var result = match.Groups[1].Value;

score 11 · Accepted Answer

您需要使用HtmlAgilityPack来执行此操作，这非常简单：

HtmlDocument doc = new HtmlDocument();
doc.Load("filepath");

HtmlNode node = doc.DocumentNode.SelectSingleNode("//span"); //Here, you can also do something like (".//span[@id='point_total' class='tooltip' jQuery16207621750175125325='23' oldtitle='Note: If the number is black, your points are actually a little bit negative.  Don't worry, this just means you need to start subbing again.']"); to select specific spans, etc...

string value = node.InnerText; //this string will contain the value of span, i.e. <span>***value***</span>

正则表达式虽然是一个可行的选择，但如果可能的话，您通常希望避免用于解析 html（请参阅此处）

在可持续性方面，您需要确保您了解页面源（即刷新几次并查看每次刷新后您的目标跨度是否嵌套在相同的父级中，确保页面位于相同的通用格式等...，然后使用上述原则导航到跨度）。

score 1 · Accepted Answer

有多种可能性。

正则表达式
让 HTML 被解析为 XML 并通过XPath获取值
遍历所有元素。如果您使用跨度标记，请跳过所有字符，直到找到结束的“>”。那么你需要的值就是下一个开 '<' 之前的一切

另请查看System.Windows.Forms.HtmlDocument

c# - 获取2个html标签之间的文本c#

3 回答 3

Related

Reference