0

我有这个 html

<div class="postrow firs">
        <h2 class="title icon">
            This is the title
        </h2>
        <div class="content">
            <div id="post_message_1668079">
                <blockquote class="postcontent restore ">
                <div>Category</div>
                                         <div>Authour: Kim</div>
                    line 1<br /> line2
                </blockquote>
            </div>
        </div>
    </div>      <div class="postrow">
        <h2 class="title icon">
            This is the title
        </h2>
        <div class="content">
            <div id="post_message_1668079">
                <blockquote class="postcontent restore ">
                <div>Category</div>
                    line 1<br /> line2
                </blockquote>
            </div>
        </div>
    </div>

我想从每个具有“postrow”类的 div 中提取以下内容,并且可能还有其他类,例如<div class="postrow first">. 所以“第一”课不是我关心的,只需要在开始时有“后排”。

  1. 带有类标题的标签内的内容
  2. “blockquote”标签中的 HTML。但不是任何带有此标签的 div。

我试过的代码:

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
            doc.LoadHtml("http://localhost/vanilla/");
            List<string> facts = new List<string>();
            foreach (HtmlNode li in doc.DocumentNode.SelectNodes("//div[@class='postrow']"))
            {
                facts.Add(li.InnerHtml);
                foreach (String s in facts)
                {
                    textBox1.Text += s + "/n";
                }
            }
4

1 回答 1

1

您的代码有问题,您必须将 html 作为字符串而不是路径

doc.LoadHtml("http://localhost/vanilla/");

反而

var request = (HttpWebRequest)WebRequest.Create("http://localhost/vanilla/");
String response = request.GetResponse();

doc.loadHtml(response);

现在迭代解析的html

于 2013-07-15T13:16:16.140 回答