python - selenium with python web crawler

Question

I want to screen scrape a web site having multiple pages. These pages are loaded dynamically without changing the URL. Hence I'm using selenium to screen scrape it. But I'm getting an exception for this simple program.

import re
from contextlib import closing
from selenium.webdriver import Firefox 

url="http://www.samsung.com/in/consumer/mobile-phone/mobile-phone/smartphone/"

with closing(Firefox()) as browser:
    n = 2
    link = browser.find_element_by_link_text(str(n))
    link.click()
    #web_page=browser.page_source
    #print type(web_page)

Error is as follows

raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: u'Unable to locate element: {"method":"link text","selector":"2"}' ; Stacktrace: Method FirefoxDriver.prototype.findElementInternal_ threw an error in file:///tmp/tmpMJeeTr/extensions/fxdriver@googlecode.com/components/driver_component.js

Is it the problem with the url given or with the firefox browser. Would be great help if someone helped me.

score 1 · Accepted Answer

我正在开发一个可能涵盖您（或其他人）用例的python模块：

https://github.com/cmwslw/selenium-crawler

它将记录的 selenium 脚本转换为爬虫函数，从而避免编写上述任何代码。它适用于动态加载内容的页面。我希望有人觉得这很有用。

score 1 · Accepted Answer

我认为您的主要问题是页面本身需要一段时间才能加载，并且您立即尝试访问该链接（可能尚未呈现，因此堆栈跟踪）。您可以尝试的一件事是在您的中使用隐式等待1browser，这将告诉您browser等待一段时间以使元素在超时之前出现。在您的情况下，您可以尝试以下操作，这将在轮询 DOM 以获取特定项目（在本例中为链接文本）时最多等待 10 秒2：

browser.implicitly_wait(10)
n = 2
link = browser.find_element_by_link_text(str(n))
link.click()
#web_page=browser.page_source
#print type(web_page)

python - selenium with python web crawler

2 回答 2

Related

Reference