python - 使用 Python 提取在特定包含 DIV 中找到的 DIV ID 名称

Question

我一直在使用 lxml 通过 xpath 从页面中提取数据。到目前为止，一切都很好。但我有一个新的挑战：

我必须提取包含 DIV 中的所有 div 的 ID，并将这些 ID 名称传递到列表中。我猜我可以使用 Beautiful Soup 来做到这一点（或者也可能是 lxml）我只是不确定如何去做：

例如，在此我将不得不提取“信标”和“扁豆”：

    <div id="live-events">

       <div class ="events" id="beacon"> 
           ....other things...
       </div>

       <div class="events" id ="lentil">
          ....other things...
       </div>

    </div>

建议？

谢谢！

score 0 · Accepted Answer

这很简单：

>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup("""
...     <div id="live-events">
... 
...        <div class ="events" id="beacon"> 
...            ....other things...
...        </div>
... 
...        <div class="events" id ="lentil">
...           ....other things...
...        </div>
... 
...     </div>
... """)
>>> live_events = soup.find(id="live-events")
>>> ids = [div["id"] for div in live_events.find_all("div")]
>>> ids
[u'beacon', u'lentil']

python - 使用 Python 提取在特定包含 DIV 中找到的 DIV ID 名称

1 回答 1

Related

Reference