python - Python/BeautifulSoup - 如何在
和
标签

Question

我有以下 html 代码

<ol>
<li>If someone is <b>able</b> to do something, they <a href="/wiki/can" title="can">can</a> do it.
<dl>
<dd><i>I'm busy today, so I won't be <b>able</b> to see you.</i></dd>
</dl>
</li>
</ol>

如何提取<li>和<dl>标签之间的文本。

我试过这个：

from bs4 import BeautifulSoup

s = """<ol>
    <li>If someone is <b>able</b> to do something, they <a href="/wiki/can" title="can">can</a> do it.
    <dl>
    <dd><i>I'm busy today, so I won't be <b>able</b> to see you.</i></dd>
    </dl>
    </li>
    </ol>
"""

soup = BeautifulSoup(s)

for line in soup.find_all('ol'):
    print line.li.get_text()

这将打印

If someone is able to do something, they can do it.

I'm busy today, so I won't be able to see you.

我只想要第一行。

If someone is able to do something, they can do it.

score 4 · Accepted Answer

循环遍历对象的后代，line.li收集所有NavigableString文本对象，遇到<dl>标签就停下来：

from bs4 import NavigableString

for line in soup.find_all('ol'):
    result = []
    for descendant in line.li.descendants:
        if isinstance(descendant, NavigableString):
            result.append(unicode(descendant).strip())
        elif descendant.name == 'dl':
            break

    print u' '.join(result)

演示：

>>> for line in soup.find_all('ol'):
...     result = []
...     for descendant in line.li.descendants:
...         if isinstance(descendant, NavigableString):
...             result.append(unicode(descendant).strip())
...         elif descendant.name == 'dl':
...             break
...     print u' '.join(result)
... 
If someone is able to do something, they can do it.

如果您想对所有 <li>标签（不仅仅是第一个）执行此操作，则需要循环使用<li>找到的标签.find_all()：

for line in soup.find_all('ol'):
    for item in line.find_all('li'):
        result = []
        for descendant in item.descendants:
            if isinstance(descendant, NavigableString):
                result.append(unicode(descendant).strip())
            elif descendant.name == 'dl':
                break

        print u' '.join(result)

python - Python/BeautifulSoup - 如何在和标签

1 回答 1

Related

Reference

python - Python/BeautifulSoup - 如何在
和
标签