python - isinstance 无法与 beautifulsoup 一起正常工作（NameError）

Question

我正在使用 isinstance 选择一些 html 标签并将它们传递给 Beautifulsoup 函数。问题是我不断从应该是完全可执行的代码中获取 NameErrors。

def horse_search(tag):
    return (tag.has_attr('href') and isinstance(tag.previous_element, span))

...

for tag in soup.find_all(horse_search):
   print (tag)

NameError：未定义全局名称“span”

此外，我使用 isinstance 和 tag.previous_element 从 Beautifulsoup 文档中的示例代码中得到错误

def surrounded_by_strings(tag):
    return (isinstance(tag.next_element, NavigableString)
            and isinstance(tag.previous_element, NavigableString))

for tag in soup.find_all(surrounded_by_strings):
    print tag.name

NameError：未定义全局名称“NavigableString”

有什么问题？谢谢！

score 0 · Accepted Answer

要查找所有具有 span 父级和 href 属性的锚点，请执行以下操作：

for span in soup.find_all('span'):
    for a in span.find_all('a'):
        if a.has_attr('href'):
            print a['href']

然而，虽然这很好，但在大多数情况下，使用一些支持 xpath 的工具可能会更好，例如，使用 lxml 和 xpath，您的代码可以看起来像这样整洁：

from lxml import etree
etree.parse(url, etree.HTMLParser()).xpath('//span/a/@href')

python - isinstance 无法与 beautifulsoup 一起正常工作（NameError）

1 回答 1

Related

Reference