python - BeautifulSoup Tag Removal 意外结果

Question

所以我写了一些代码来只提取<p>一些 HTML 代码的标签中的内容。这是我的代码

soup = BeautifulSoup(my_string, 'html')
no_tags=' '.join(el.string for el in soup.find_all('p', text=True))

对于大多数运行它的示例，它可以按照我想要的方式工作，但是我注意到在示例中，例如

<p>hello, how are you <code>other code</code> my name is joe</p>

它什么也不返回。我想这是因为标签中还有其他<p>标签。所以要明确一点，我希望它返回的是

hello, how are you my name is joe

也就是说，我想要<p>标签内的所有内容，但只有第一级。我想忽略标签内其他标签中包含的所有内容<p>。有人可以帮我解决如何处理这些例子吗？

score 1 · Accepted Answer

您好，我认为您可以使用它来提取 p 标签内的文本。

my_string = "<p>hello, how are you <code>other code</code> my name is joe</p>"
soup = BeautifulSoup(my_string, 'html')

soup.code.extract()
text = soup.p.get_text()
print text

python - BeautifulSoup Tag Removal 意外结果

1 回答 1

Related

Reference