python - 从网站列表中提取数据，没有多余的标签

Question

工作代码：通过 python 和美丽的汤进行谷歌字典查找-> 只需执行并输入一个单词。

我很简单地从特定列表项中提取了第一个定义。但是，要获得纯数据，我必须在换行符处拆分数据，然后将其剥离以删除多余的列表标签。

我的问题是，有没有一种方法可以提取特定列表中包含的数据，而无需进行上述字符串操作——也许是我还没有看到的漂亮汤中的一个函数？

这是代码的相关部分：

# Retrieve HTML and parse with BeautifulSoup.
    doc = userAgentSwitcher().open(queryURL).read()
    soup = BeautifulSoup(doc)

# Extract the first list item -> and encode it.
    definition = soup('li', limit=2)[0].encode('utf-8')

# Format the return as word:definition removing superfluous data.
    print word + " : " + definition.split("<br />")[0].strip("<li>")

score 1 · Accepted Answer

我认为您正在寻找 findAll(text=True) 这将从标签中提取文本

definitions = soup('ul')[0].findAll(text=True)

将返回在标签边界处中断的所有文本内容的列表

python - 从网站列表中提取数据，没有多余的标签

1 回答 1

Related

Reference