1

我有一个简单(而且很奇怪)的问题。当我手动将WebBrowser.DocumentText属性设置为某个 HTML 字符串时,它会在随机字符后将其切断。我使用的 HTML 是其他页面的纯 HTML,通过 HtmlAgilityPack 下载(在实际应用程序中,我对其进行了一些处理,但即使没有任何处理,也存在错误)。当我在 Internet Explorer 中加载同一页面时,整个页面都正确呈现。

这是最小的示例:

const string url = "http://www.zip-codes.com/county/IL-COOK.asp";
var doc = new HtmlWeb().Load(url);

HtmlNode basehref = new HtmlNode(HtmlNodeType.Element, doc, 0) { Name = "base" };
basehref.Attributes.Add("href", url.Substring(0, url.LastIndexOf("/") + 1));
doc.DocumentNode.SelectSingleNode("//head").ChildNodes.Insert(0, basehref);

string html;
using (var writer = new StringWriter()) {
    doc.Save(writer);
    html = writer.ToString();
}

var thread = new Thread(() => {
    var browser = new WebBrowser {
        Location = new Point(0, 0),
        Size = new Size(1920, 1080),
        ScriptErrorsSuppressed = true,
        AllowNavigation = true,
        DocumentText = html
    };
    browser.DocumentCompleted += (sender, e) => {
        Console.WriteLine(html.Length);
        Console.WriteLine(browser.DocumentText.Length);
        Application.ExitThread();
    };
    Application.Run();
});
thread.SetApartmentState(ApartmentState.STA);
thread.Start();
thread.Join();

它输出:

35259
20477
4

2 回答 2

3

I tried your code without Application.ExitThread() and as it turns, DocumentCompleted gets fired twice, the second time the length looks being correct. Thus, the website your're trying to load probably has some dynamic content or is refreshing itself. I haven't dug into what it does, but rather went ahead and removed all scripts, styles and iframes:

    const string url = "http://www.zip-codes.com/county/IL-COOK.asp";
    var doc = new HtmlWeb().Load(url);

    doc.DocumentNode.Descendants()
                    .Where(n => n.Name == "script" || n.Name == "style" || n.Name == "iframe")
                    .ToList()
                    .ForEach(n => n.Remove());

Now DocumentCompleted gets fired once, and the document length is consistent.

于 2013-08-15T02:43:20.837 回答
0

我以这种方式解决了:

const string url = "http://www.zip-codes.com/county/IL-COOK.asp";
var doc = new HtmlWeb().Load(url);

HtmlNode basehref = new HtmlNode(HtmlNodeType.Element, doc, 0) { Name = "base" };
basehref.Attributes.Add("href", url.Substring(0, url.LastIndexOf("/") + 1));
doc.DocumentNode.SelectSingleNode("//head").ChildNodes.Insert(0, basehref);

string html;
using (var writer = new StringWriter()) {
    doc.Save(writer);
    html = writer.ToString();
}

var thread = new Thread(() => {
    var browser = new WebBrowser {
        Location = new Point(0, 0),
        Size = new Size(1920, 1080),
        ScriptErrorsSuppressed = true,
        AllowNavigation = true,
        DocumentText = html
    };
    browser.DocumentCompleted += (sender, e) => {
        Console.WriteLine(html.Length);
        Console.WriteLine(browser.DocumentText.Length);
        //Application.ExitThread();

        if (browser.ReadyState == WebBrowserReadyState.Complete)
        {
                Application.ExitThread();   // Stops the thread
        }
    };
    Application.Run();
});
thread.SetApartmentState(ApartmentState.STA);
thread.Start();
thread.Join();
于 2017-04-27T15:15:14.577 回答