beautifulsoup - html5lib 让 BeautifulSoup 漏掉一个元素

翻译自：https://stackoverflow.com/questions/37052097 2016-05-05T13:36:03.213

139 次

继续尝试从总统辩论中提取成绩单，我还没有开始使用 html5lib 作为 BeautifulSoup 的解析器。

但是，现在当我运行（以前工作的）代码来查找带有实际成绩单的元素时，它会出错并声称没有找到任何这样的跨度。

这是代码：

from bs4 import BeautifulSoup
import html5lib
import urllib

file = urllib.urlopen('http://www.presidency.ucsb.edu/ws/index.php?pid=111395')
soup = BeautifulSoup(file, "html5lib")
transcript = soup.find_all("span", class_="displaytext")[0]

这是错误：

IndexError                                
Traceback (most recent call last)
<ipython-input-5-2c227e8c4a25> in <module>()
  1 file = urllib.urlopen('http://www.presidency.ucsb.edu/ws/index.php?pid=111395')
  2 soup = BeautifulSoup(file, "html5lib")
----> 3 transcript = soup.find_all("span", class_="displaytext")[0]

IndexError: list index out of range

这是我正在调用的页面的相关部分，证明我没有疯，有一个跨度为“displaytext”类

 <span class="displaytext">
           <b>
            PARTICIPANTS:
           </b>
           <br/>
           Former Governor Jeb Bush (FL);

我错过了什么？如果我在没有在soup调用中调用“html5lib”的情况下运行它，它可以正常工作（但由于没有相应结束标签的虚假虚假标签调用，我会得到以后的错误）。

beautifulsoup - html5lib 让 BeautifulSoup 漏掉一个元素

0 回答 0

Related

Reference