python - 使用 Python 迭代时出现 StaleElementException

Question

我正在尝试为亚马逊结果创建一个基本的网络爬虫。当我遍历结果时，有时会到达结果的第 5 页（有时只有第 2 页），然后StaleElementException抛出 a。当我在抛出异常后查看浏览器时，我可以看到驱动程序/页面没有向下滚动到页码所在的位置（底部栏）。

我的代码：

driver.get('https://www.amazon.com/s/ref=nb_sb_noss_1?url=search-alias%3Daps&field-keywords=sonicare+toothbrush')

for page in range(1,last_page_number +1):

    driver.implicitly_wait(10)

    bottom_bar = driver.find_element_by_class_name('pagnCur')
    driver.execute_script("arguments[0].scrollIntoView(true);", bottom_bar)

    current_page_number = int(driver.find_element_by_class_name('pagnCur').text)

    if page == current_page_number:
        next_page = driver.find_element_by_xpath('//div[@id="pagn"]/span[@class="pagnLink"]/a[text()="{0}"]'.format(current_page_number+1))
        next_page.click()
        print('page #',page,': going to next page')
    else:
        print('page #: ', page,'error')

我看过这个问题，我猜可以应用类似的修复，但我不确定如何在页面上找到消失的东西。此外，根据打印语句发生的速度，我可以看到implicitly_wait(10)实际上并没有等待整整 10 秒。

异常指向以“driver.execute_script”开头的行。这是一个例外：

StaleElementReferenceException: Message: The element reference of <span class="pagnCur"> is stale; either the element is no longer attached to the DOM, it is not in the current frame context, or the document has been refreshed

有时我会得到一个 ValueError：

ValueError: invalid literal for int() with base 10: ''

所以这些错误/异常让我相信等待页面完全刷新是有问题的。

score 3 · Accepted Answer

此错误消息...

StaleElementReferenceException: Message: The element reference of <span class="pagnCur"> is stale; either the element is no longer attached to the DOM, it is not in the current frame context, or the document has been refreshed

...意味着该元素的先前引用现在已过时，并且该元素引用不再存在于页面的 DOM 上。

此问题背后的常见原因是：

元素在 HTML 中的位置发生了变化。
该元素不再附加到 DOM 树。
元素所在的网页已刷新。
之前的 element 实例已被JavaScript或AjaxCall刷新。

这个用例

保留您滚动浏览scrollIntoView()和打印一些有用的调试消息的概念，我做了一些小的调整来诱导WebDriverWait，您可以使用以下解决方案：

代码块：

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

options = Options()
options.add_argument("start-maximized")
options.add_argument('disable-infobars')
options.add_argument("--disable-extensions")
driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
driver.get("https://www.amazon.com/s/ref=nb_sb_noss_1?url=search-alias%3Daps&field-keywords=sonicare+toothbrush")
while True:
    try:
        current_page_number_element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "span.pagnCur")))
        driver.execute_script("arguments[0].scrollIntoView(true);", current_page_number_element)
        current_page_number = current_page_number_element.get_attribute("innerHTML")
        WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "span.pagnNextArrow"))).click()
        print("page # {} : going to next page".format(current_page_number))
    except:
        print("page # {} : error, no more pages".format(current_page_number))
        break
driver.quit()

控制台输出：

page # 1 : going to next page
page # 2 : going to next page
page # 3 : going to next page
page # 4 : going to next page
page # 5 : going to next page
page # 6 : going to next page
page # 7 : going to next page
page # 8 : going to next page
page # 9 : going to next page
page # 10 : going to next page
page # 11 : going to next page
page # 12 : going to next page
page # 13 : going to next page
page # 14 : going to next page
page # 15 : going to next page
page # 16 : going to next page
page # 17 : going to next page
page # 18 : going to next page
page # 19 : going to next page
page # 20 : error, no more pages

score 3 · Accepted Answer

如果您只想让脚本遍历所有结果页面，则不需要任何复杂的逻辑 - 只需在可能的情况下单击 Next 按钮：

from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.common.exceptions import TimeoutException

driver = webdriver.Chrome()

driver.get('https://www.amazon.com/s/ref=nb_sb_noss_1?url=search-alias%3Daps&field-keywords=sonicare+toothbrush')

while True:
    try:
        wait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, 'a > span#pagnNextString'))).click()
    except TimeoutException:
        break

PS 另请注意，implicitly_wait(10)不应该等待整整 10 秒，而是等待最多 10 秒，让元素出现在 HTML DOM中。因此，如果在 1 或 2 秒内找到元素，则等待完成，您将不会等待休息 8-9 秒......

python - 使用 Python 迭代时出现 StaleElementException

2 回答 2

这个用例

Related

Reference