0

假设我有一些类似于此的 html:

<div id="content">
  <span class="green">something</span>
  <span class="blue">something</span>
  <span class="red">something</span>
  <span class="green">something</span>
  <span class="yellow">something</span>
</div>

使用 cssselect 获取第二个元素的最佳方法是什么?我总是可以这样做cssselect('span.green'),然后从结果中选择第二个元素,但是在一个包含数百个元素的大页面中,我想它会慢得多。

4

1 回答 1

1

虽然这不是您问题的答案,但这是我这样做的方式:

使用 XPath 而不是 cssselect:

>>> from lxml.etree import tostring
>>> from lxml.html.soupparser import fromstring
>>> x = tostring('<div id="content"><span class="green">something</span><span class="blue">something</span><span class="red">something</span><span class="green">something</span><span class="yellow">something</span></div>')
>>> x.xpath('//span[@class="green"][2]')
[<Element span at b6df71ac>]
>>> x.xpath('//span[@class="green"][2]')[0]
<Element span at b6df71ac>
>>> tostring(x.xpath('//span[@class="green"][2]')[0])
'<span class="green">something</span>'

或者,如果您更喜欢 Python 中的元素列表:

>>> x.xpath('//span[@class="green"]')
[<Element span at b6df71ac>, <Element span at b6df720c>]
>>> tostring(x.xpath('//span[@class="green"]')[1])
'<span class="green">something</span>'
于 2012-06-13T08:03:25.287 回答