1

我有一个程序(使用 VS 2010 的 C#),它使用WebBrowser组件执行类似浏览器的活动。

该程序的目的是抓取网页。第一个问题是大约之后。50 页我收到一个 JavaScript 错误(内存不足(见图))

脚本错误

要忽略此错误,我使用以下命令:

ScriptErrorsSuppressed = true

上面的命令解决了脚本错误,但产生了另一个问题:

我也使用Links.InvokeMember("click");为了滚动页面,或者点击 Ajax 链接。

因此程序出现错误并ScriptErrorsSuppressed禁用它,但随后invoke停止单击页面......并且爬行停止。

有谁知道如何解决这个问题?

4

2 回答 2

1

问题是您受下载并在浏览器控件中运行的客户端代码的支配。如果这不正确,那么您会遇到泄漏和这些内存问题。

我唯一能想到的就是尝试在几页之后的某个时间处理您的浏览器控件并重新创建它,看看这是否有帮助。

于 2012-08-21T10:16:18.823 回答
0

If you want to crawl webpages, you really shouldn't be using the webbrowser control. Use the httpWebRequest class and make your request, get the html string, and you can loop through the links and DOM objects of the HTML string using MSHTML, so you give it to mshtml and it will turn that HTML string into a nice object you can loop through (so you don't need to try and parse the links out using string manipulation, you just loop through everything as they are all turned into objects thanks to mshtml).

Of course, this way, all that javascript and what have you will not run, and you won't have to waste bandwidth and time loading all those images and having all those elements drawn to the screen when they are not needed.

Get me? Let me know if you need more help.

于 2012-09-01T21:18:35.603 回答