c# - 在 C# 中使用 HtmlAgilityPack 获取其他元素内的特定元素

Question

我正在做一个需要解析大量 html 文件的项目。我需要<p>从一个内部得到每一个<div class="story-body">

到目前为止，我有这段代码，它可以满足我的要求，但我想知道如何使用 xpath 表达式来做到这一点。我试过这个：

textBody.SelectNodes ("What to put here? I tried //p but it gives every p in document not inside the one div")

但没有成功。有任何想法吗？

public void Parse(){
   HtmlNode title = doc.DocumentNode.SelectSingleNode ("//h1[(@class='story-header')]");
   HtmlNode textBody = doc.DocumentNode.SelectSingleNode ("//div[(@class='story-body')]");

   XmlText textT;
   XmlText textS;

   string story = "";

   if(title != null){
     textT = xmlDoc.CreateTextNode(title.InnerText);
     titleElement.AppendChild(textT);
     Console.WriteLine(title.InnerText);
   }

   foreach (HtmlNode node in textBody.ChildNodes) {
      if(node.Name == "p" || (node.Name == "span" && node.GetAttributeValue("class", "class") == "cross-head")){
         story += node.InnerText + "\n\n";
         Console.WriteLine(node.InnerText);
      }
   }

   textS = xmlDoc.CreateTextNode (story);

   storyElement.AppendChild (textS);

   try
   {
        xmlDoc.Save("test.xml");            
   }
   catch (Exception e)
   {
        Console.WriteLine(e.Message);
   }
}

score 0 · Accepted Answer

这是一件相当简单的事情，您只需将 a 添加.到字符串 like .//p，这样您就只能获得当前节点的子节点。

另一种方法是像这样调用 SelectNodes：

doc.DocumentNode.SelectNodes("//div[(@class='story-body')]/p");

c# - 在 C# 中使用 HtmlAgilityPack 获取其他元素内的特定元素

1 回答 1

Related

Reference