0

我正在尝试使用 selenium 从以下网站抓取表格: https ://web.archive.org/web/20120220031809/http://simcentral.net/ibaf/games/1

使用代码:

from selenium import webdriver as wd
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup as bs
from pandas.io.html import read_html
import pandas as pd
import numpy as np
import re
os.chdir('c:/Users/Owner')
bat=pd.DataFrame()

driver = wd.Chrome()
wait = WebDriverWait(driver,15)
driver.get('https://web.archive.org/web/20120220031809/http://simcentral.net/ibaf/games/1')
page=driver.find_element_by_xpath('//*[contains(concat( " ", @class, " " ), concat( " ", "regtext", " " ))] | //*[contains(concat( " ", @class, " " ), concat( " ", "normal", " " ))]')
table_html=page.get_attribute('innerHTML')

driver.quit()

我收到以下错误:

StaleElementReferenceException: stale element reference: element is not attached to the page document

我上网查了一下,明白了这个问题,但不知道该怎么办。其他问题似乎是通过 xpath 以外的方式拉取元素。我知道它停止工作,table_html=因为如果我删除它,上面的任何东西都会正常工作并且浏览器会按预期关闭。

谢谢你的帮助。

4

1 回答 1

0

尝试这个

import pandas as pd
data = pd.read_html("https://web.archive.org/web/20120220031809/http://simcentral.net/ibaf/games/1")
于 2021-01-28T16:45:57.073 回答