c# - SyndicationFeed - 项目摘要（RSS 描述） - 仅从中提取文本

Question

我正在使用 SyndicationFeed 类来消耗一些文章的 rss 提要。我想知道如何只从项目的摘要字段中获取文本，而不需要 html 标签。例如，有时（并非总是）它包含 html 标签，例如：div、img、h、p 标签：/a>/div> ,img src='http"

我想摆脱所有标签。另外，我不确定它是否会在 RSS 提要中提供完整的描述。

我应该为此使用正则表达式吗？其他方法？

XmlReader reader = XmlReader.Create(response.GetResponseStream());

SyndicationFeed feed = SyndicationFeed.Load(reader);

foreach (SyndicationItem item in feed.Items)
{

     string description= item.Summary;  //This contains tags and not only the article text

}

score 3 · Accepted Answer

是的，我想正则表达式是实现这一目标的最简单的内置方法......

// Get rid of the tags
description = Regex.Replace(description, @"<.+?>", String.Empty);

// Then decode the HTML entities
description = WebUtility.HtmlDecode(description);

c# - SyndicationFeed - 项目摘要（RSS 描述） - 仅从中提取文本

1 回答 1

Related

Reference