python - 使用 beautifulSoup 从动态表中隔离数据

翻译自：https://stackoverflow.com/questions/30239820 2015-05-14T14:23:00.133

821 次

我正在尝试从table(1)中提取数据，该表有几个过滤器选项。我正在使用 BeautifulSoup 并通过请求进入此页面。代码摘录：

from bs4 import BeautifulSoup


tt = Contact_page.content # webpage with table
soup = BeautifulSoup(tt)
R_tables = soup.find('div', {'class': 'responsive-table'})

使用 find_all("tr") 和 find_all("th") 会产生空集。使用 R_tables.findChildren 只会下降到没有孩子的“formrow”。从formrow 到我的tr/th 标签，我无法通过BS4 访问它。

R_tables 结果在表 3 中。该文件的 XPath 是

"//*[@id="kronos_body"]/div[3]/div[2]/div[3]/script/text()

如何获取数据的每一行信息？soup.find("r") 和 soup.find("f") 也会产生空集。

如果这篇文章草率，请提前原谅我，这是我的第一篇。我会在评论中链接我最相似的线程，我不能链接超过 2 次。

编辑 1：显然 BS 不识别除变量之外的任何 javascript（如果我错了，请纠正我，我仍然相对较新）。还有其他模块可以帮助我吗？我被提议使用 Ghost 和 Selenium，但我不会使用 Selenium。

python - 使用 beautifulSoup 从动态表中隔离数据

0 回答 0

Related

Reference