c# - 得到所有
来自某个内部的元素
用 C#

Question

<div>我有一个由几个元素组成的网页。

我想编写一个程序，<div>在某个<h4>标题之后打印 a 内的所有 li 元素。谁能给我一些帮助或示例代码？

<div id="content">
    <h4>Header</h4>
    <ul>
        <li><a href...></a> THIS IS WHAT I WANT TO GET</li>
    </ul>
</div>

score 2 · Accepted Answer

在 C# 中解析 HTML 时，不要尝试自己编写。HTML 敏捷包几乎可以肯定能够做你想做的事！

哪些部分是不变的：

DIV 中的“id”？
H4

搜索一个完整的 HTML 文档并单独对 H4 做出反应可能会一团糟，而如果您知道 DIV 具有“内容”的 ID，那么只需寻找它！

var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(yourHtml);

if ( doc.DocumentNode != null )
{
   var divs = doc.DocumentNode
                 .SelectNodes("//div")
                 .Where(e => e.Descendants().Any(e => e.Name == "h4"));

   // You now have all of the divs with an 'h4' inside of it.

   // The rest of the element structure, if constant needs to be examined to get
   // the rest of the content you're after.
}

score 0 · Accepted Answer

如果它是一个网页，为什么需要进行 HTML 解析。您用于构建网页的技术不会允许访问页面的所有元素。例如，如果您使用的是 ASP.NET，您可以将 id 分配给您的 UL 和 LI（带有 runat 服务器标签），并且它们可以在后面的代码中使用？

您能解释一下您要做什么吗？如果您尝试发出网络请求，请将 html 下载为字符串，然后报废 HTML 将是有意义的

编辑认为这应该工作

HtmlDocument doc = new HtmlDocument();
doc.Load(myHtmlFile);

    foreach (HtmlNode p in doc.DocumentNode.SelectNodes("//div"))
    {
        if(p.Attributes["id"].Value == "content")
        {
            foreach(HtmlNode child in p.ChildNodes.SelectNodes("//ul"))
            {
                if(p.PreviousSibling.InnerText() == "Header")
                {
                    foreach(HtmlNode liNodes in p.ChildNodes)
                    {
                        //liNodes represent all childNode
                    }
                }
        }
    }

score 0 · Accepted Answer

<li></li>如果您想要的只是标签下方所有标签之间的内容，<div id="content">并且紧跟在标签之后<h4>，那么这应该就足够了：

//Load your document first.
//Load() accepts a Stream, a TextReader, or a string path to the file on your computer
//If the entire document is loaded into a string, then use .LoadHtml() instead.
HtmlDocument mainDoc = new HtmlDocument();
mainDoc.Load("c:\foobar.html");


//Select all the <li> nodes that are inside of an element with the id of "content"
// and come directly after an <h4> tag.
HtmlNodeCollection processMe = mainDoc.GetElementbyId("content")
                                      .SelectNodes("//h4/following-sibling::*[1]//li");

//Iterate through each <li> node and print the inner text to the console
foreach (HtmlNode listElement in processMe)
{
    Console.WriteLine(listElement.InnerText);
}

c# - 得到所有来自某个内部的元素用 C#

3 回答 3

Related

Reference

c# - 得到所有
来自某个内部的元素
用 C#