python - 如何在 BeautifulSoup - Python 中跳过相同的标签

Question

我目前正在为 Scrapers 编写代码，并且越来越喜欢 Python，尤其是 BeautifulSoup。

仍然......当通过 html 解析时，我遇到了一个困难的部分，我只能以一种不太漂亮的方式使用。

我想抓取 HTML 代码，尤其是以下代码段：

<div class="title-box">
    <h2>
        <span class="result-desc">
            Search results <strong>1</strong>-<strong>10</strong> out of <strong>10,009</strong> about <strong>paul mccartney</strong><a href="alert/settings" class="title-email-alert-promo x-title-alerts-promo">Create email Alert</a>
        </span>
    </h2>
</div>

所以我所做的是通过使用以下方法识别 div：

comment = TopsySoup.find('div', attrs={'class' : 'title-box'})

然后丑陋的部分出现了。为了获得我想要的数字：10,009，我使用：

catcher = comment.strong.next.next.next.next.next.next.next

有人可以告诉我是否有更好的方法吗？

score 3 · Accepted Answer

怎么样comment.find_all('strong')[2].text？

它实际上可以缩写为comment('strong')[2].text，因为像调用Tag函数一样调用对象与调用对象相同find_all。

>>> comment('strong')[2].text
u'10,009'

python - 如何在 BeautifulSoup - Python 中跳过相同的标签

1 回答 1

Related

Reference