如果目标网站大量使用 AJAX(如 Youtube 那样),则很难(如果不是不可能)确定页面何时完成加载和执行所有动态脚本。但是您可以通过处理window.onload
事件并为非确定性 AJAX 调用留出一两秒钟的时间来接近。然后调用webBrowser.Document.DomDocument.documentElement.outerHTML
viadynamic
获取当前渲染的 HTML。
例子:
private void Form1_Load(object sender, EventArgs e)
{
DownloadAsync("http://www.example.com").ContinueWith(
(task) => MessageBox.Show(task.Result),
TaskScheduler.FromCurrentSynchronizationContext());
}
async Task<string> DownloadAsync(string url)
{
TaskCompletionSource<bool> onloadTcs = new TaskCompletionSource<bool>();
WebBrowserDocumentCompletedEventHandler handler = null;
handler = delegate
{
this.webBrowser.DocumentCompleted -= handler;
// attach to subscribe to DOM onload event
this.webBrowser.Document.Window.AttachEventHandler("onload", delegate
{
// each navigation has its own TaskCompletionSource
if (onloadTcs.Task.IsCompleted)
return; // this should not be happening
// signal the completion of the page loading
onloadTcs.SetResult(true);
});
};
// register DocumentCompleted handler
this.webBrowser.DocumentCompleted += handler;
// Navigate to url
this.webBrowser.Navigate(url);
// continue upon onload
await onloadTcs.Task;
// artificial delay for AJAX
await Task.Delay(1000);
// the document has been fully loaded, can access DOM here
return ((dynamic)this.webBrowser.Document.DomDocument).documentElement.outerHTML;
}
[编辑]这是帮助解决 OP 问题的最后一段代码:
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(((dynamic)this.webBrowser1.Document.DomDocument).documentElement.outerHTML);