4

I want to use the HTML ability pack on a WebBrowser that has loaded all the things I need (It clicks a button with code to load every video on the channel) (It loads a YouTube channel, and then loads all the videos on said channel.) Now if I try to get all the videos details (I have a working code that gets the first 30 videos of a channel into a listview) it will still show only the first 30, but I have all the videos loaded on the WebBrowser page (It shows all videos) I am using this to get whats currently loaded from the WebBrowser

enter image description here

but it still only loads the first 30 videos instead of all the videos loaded from the WebBrowser .

4

1 回答 1

5

如果目标网站大量使用 AJAX(如 Youtube 那样),则很难(如果不是不可能)确定页面何时完成加载和执行所有动态脚本。但是您可以通过处理window.onload事件并为非确定性 AJAX 调用留出一两秒钟的时间来接近。然后调用webBrowser.Document.DomDocument.documentElement.outerHTMLviadynamic获取当前渲染的 HTML。

例子:

private void Form1_Load(object sender, EventArgs e)
{
    DownloadAsync("http://www.example.com").ContinueWith(
        (task) => MessageBox.Show(task.Result),
        TaskScheduler.FromCurrentSynchronizationContext());
}

async Task<string> DownloadAsync(string url)
{
    TaskCompletionSource<bool> onloadTcs = new TaskCompletionSource<bool>();
    WebBrowserDocumentCompletedEventHandler handler = null;

    handler = delegate
    {
        this.webBrowser.DocumentCompleted -= handler;

        // attach to subscribe to DOM onload event
        this.webBrowser.Document.Window.AttachEventHandler("onload", delegate
        {
            // each navigation has its own TaskCompletionSource
            if (onloadTcs.Task.IsCompleted)
                return; // this should not be happening
            // signal the completion of the page loading
            onloadTcs.SetResult(true);
        });
    };

    // register DocumentCompleted handler
    this.webBrowser.DocumentCompleted += handler;

    // Navigate to url
    this.webBrowser.Navigate(url);

    // continue upon onload
    await onloadTcs.Task;

    // artificial delay for AJAX
    await Task.Delay(1000);

    // the document has been fully loaded, can access DOM here
    return ((dynamic)this.webBrowser.Document.DomDocument).documentElement.outerHTML;
}

[编辑]这是帮助解决 OP 问题的最后一段代码:

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(((dynamic)this.webBrowser1.Document.DomDocument).documentElement.ou‌​terHTML); 
于 2013-09-15T09:57:32.970 回答