2

第一次使用 BeautifulSoup 并且无法了解如何从某个特定节点中提取文本。这是我的代码

html:

...
<p class="dsm">...</p>
<ul class="also">
    <li>once as the adjective <i class="ab">abdrea</i> (<span class="at">groups</span>)</li>
    <li>twice as the noun <i class="ab">shokdia</i> (<span class="at">techs</span>)</li>
</ul>
...

Python:

current_page = urlopen(url)
current_soup = BeautifulSoup(current_page, 'html.parser')
derivative_list = current_soup.select('p.dsm + ul.also li')
for li in derivative_list:
    print(li)

输出:

<li>once as the adjective <i class="ab">abdrea</i> (<span class="at">groups</span>)</li>
<li>twice as the noun <i class="ab">shokdia</i> (<span class="at">techs</span>)</li>

它输出正确的列表项,但我想要得到的是 i.ab 和 span.at 的文本值,类似于

所需的输出:

abdrea, groups
shokdia, techs
4

2 回答 2

3

获得所有<li>标签的列表后,只需遍历它们并分别找到<i class="ab"><span class="at">标签的文本。

for li in soup.select('p.dsm + ul.also li'):
    print(li.i.text, li.span.text)
# abdrea groups
# shokdia techs

如果标签里面还有其他的<i>和标签,可以在变量上使用。<span><li>find()li

for li in soup.select('p.dsm + ul.also li'):
    print(li.find('i', class_='ab').text, li.find('span', class_='at').text)
于 2018-05-23T07:53:41.967 回答
1

您正在寻找的确切答案:

data = """<ul class="also">
    <li>once as the adjective <i class="ab">abdrea</i> (<span class="at">groups</span>)</li>
    <li>twice as the noun <i class="ab">shokdia</i> (<span class="at">techs</span>)</li>
</ul>"""

from bs4 import BeautifulSoup
page_soup = BeautifulSoup(data, "html.parser")
i_data, span_data= zip([x.text for x in page_soup.find_all("i")], [y.text for y in page_soup.find_all("span")])
 
print(i_data )
print(span_data)

输出:

(u'abdrea', u'groups')
(u'shokdia', u'techs')
于 2018-05-23T07:22:14.823 回答