0

I've been using Selenium as a scraper/crawler, because I need a page's content after JS is evaluated. I have five EC2 machines that are each running selenium and a couple instances of the scraper I wrote.

However, I'm noticing some really odd behavior. After a couple hours, selenium stops on all the machines at around the same time. Given that I start selenium and the scrapers at the same time on all servers, this leads me to believe that there's some issue with selenium that pops up after long periods of time.

Here's selenium's log:

14:34:58.628 INFO - RemoteWebDriver instances should connect to: http://127.0.0.1:4444/wd/hub
14:34:58.629 INFO - Version Jetty/5.1.x
14:34:58.630 INFO - Started HttpContext[/selenium-server/driver,/selenium-server/driver]
14:34:58.631 INFO - Started HttpContext[/selenium-server,/selenium-server]
14:34:58.631 INFO - Started HttpContext[/,/]
14:34:58.753 INFO - Started org.openqa.jetty.jetty.servlet.ServletHandler@6a669053
14:34:58.753 INFO - Started HttpContext[/wd,/wd]
14:34:58.764 INFO - Started SocketListener on 0.0.0.0:4444
14:34:58.765 INFO - Started org.openqa.jetty.jetty.Server@2ef36617
21:24:41.031 INFO - Shutting down...

Another interesting thing I noticed: on each cluster, I always have at one scraper instance with this error:

File "SiteScraper.py", line 238, in _add_rendered_html
    self.browser.get(url)
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 168, in get
    self.execute(Command.GET, {'url': url})
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 156, in execute
    self.error_handler.check_response(response)
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/errorhandler.py", line 147, in check_response
    raise exception_class(message, screen, stacktrace)
WebDriverException: Message: u'Modal dialog present'

I think this means that selenium or firefox (the browser that I'm using with web driver) is popping up a modal after a certain period of time.

Has anyone had a similar problem/any insight on how to fix this?

4

1 回答 1

0

当您在机器上启动 selenium 独立服务器时,请尝试使用 java 命令的 -timeout 选项。首先尝试将其设置为非常小的值,以验证它是否导致了问题。比可能将其增加到非常高的值。

于 2012-08-01T18:32:22.663 回答