0

I'm trying to figure out how to grab DOM elements from a webpage. Here is the function I'm using:

private void processHTML(String htmlContent)
{
    IHTMLDocument2 htmlDocument = (IHTMLDocument2)new mshtml.HTMLDocument();
    htmlDocument.write(htmlContent);

    IHTMLElementCollection allElements = htmlDocument.all;

    webBrowser1.DocumentText = allElements.item("storytext").innerHTML;
    textBox2.Text = allElements.item("chap_select").length.ToString();
}

If I set a breakpoint at either of the last two lines and then check the allElements collection, I'm able to find the SELECT element. It correctly shows the ID as being chap_select and the length property shows 13 for the particular document that is being passed. For some reason the length that is being put into the textBox2 field is 2, however.

Any suggestions on what I'm doing wrong here? I've spent several hours trying to figure this out, but have not been able to find any code samples of somebody trying to grab this property of a SELECT.

4

1 回答 1

2

而不是使用IHTMLDocument2mshtml.HTMLDocument我建议使用更容易使用HTML Agility Pack

什么是 Html Agility Pack (HAP)?

这是一个敏捷的 HTML 解析器,它构建一个读/写 DOM 并支持普通的 XPATH 或 XSLT(实际上你不必了解 XPATH 或 XSLT 就可以使用它,不用担心......)。它是一个 .NET 代码库,允许您解析“网络之外”的 HTML 文件。解析器对“真实世界”格式错误的 HTML 非常宽容。对象模型与 System.Xml 的提议非常相似,但用于 HTML 文档(或流)。

类似(未经测试):

var doc = new HtmlDocument();
doc.LoadHtml(htmlContent);
textBox2.Text = doc.DocumentNode
            .SelectNodes("//select[@id='chap_select']/option").Count().ToString();
于 2012-09-02T16:12:47.017 回答