html - Python 3.3 中的 BeautifulSoup 错误

Question

我只是想解析这个网站，但我一直在使用 BeautifulSoup 时遇到错误。有人可以帮我找出问题吗？

import urllib
import urllib.request
import beautifulsoup




html = urllib.request.urlopen('http://yugioh.wikia.com/wiki/Card_Tips:Blue-Eyes_White_Dragon').read()
soup = beautifulsoup.bs4(html)
texts = soup.findAll(text=True)

def visible(element):
    if element.parent.name in ['style', 'script', '[document]', 'head', 'title']:
        return False
    elif re.match('<!--.*-->', str(element)):
        return False
    return True

visible_texts = filter(visible, texts)

score 0 · Accepted Answer

您混淆了模块名和类名。而不是：

import beautifulsoup

你需要：

import bs4

而不是：

beautifulsoup.bs4(...)

你需要：

bs4.BeautifulSoup(...)

此外，在 Beautiful Soup 的最新版本中，下划线变体优于名称的驼峰变体，因为它更适合其他 Python 约定：

soup.find_all(...)

此外，根据您的用途visible_texts，您可能需要 alist而不是 lazy filter：

visible_texts = list(filter(visible, texts))

html - Python 3.3 中的 BeautifulSoup 错误

1 回答 1

Related

Reference