我有这个 html
<div class="postrow firs">
<h2 class="title icon">
This is the title
</h2>
<div class="content">
<div id="post_message_1668079">
<blockquote class="postcontent restore ">
<div>Category</div>
<div>Authour: Kim</div>
line 1<br /> line2
</blockquote>
</div>
</div>
</div> <div class="postrow">
<h2 class="title icon">
This is the title
</h2>
<div class="content">
<div id="post_message_1668079">
<blockquote class="postcontent restore ">
<div>Category</div>
line 1<br /> line2
</blockquote>
</div>
</div>
</div>
我想从每个具有“postrow”类的 div 中提取以下内容,并且可能还有其他类,例如<div class="postrow first">
. 所以“第一”课不是我关心的,只需要在开始时有“后排”。
- 带有类标题的标签内的内容
- “blockquote”标签中的 HTML。但不是任何带有此标签的 div。
我试过的代码:
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml("http://localhost/vanilla/");
List<string> facts = new List<string>();
foreach (HtmlNode li in doc.DocumentNode.SelectNodes("//div[@class='postrow']"))
{
facts.Add(li.InnerHtml);
foreach (String s in facts)
{
textBox1.Text += s + "/n";
}
}