python - 有没有办法在 Python 中为 lxml 指定固定（或可变）数量的元素

Question

必须有一种更简单的方法来做到这一点。我需要大量 html 文档中的一些文本。在我的测试中，找到它的最可靠方法是在 div 元素的 text_content 中查找特定单词。如果我想检查具有我的文本的元素上方的特定元素，我一直在枚举我的 div 元素列表并使用具有我的文本的元素的索引，然后通过对索引进行操作来指定前一个元素。但我相信一定有更好的方法。我似乎无法弄清楚。

如果不清楚

for pair in enumerate(list_of_elements):
    if 'the string' in pair[1].text_content():
        thelocation=pair[0]

the_other_text=list_of_elements[thelocation-9].text_content()

或者

theitem.getprevious().getprevious().getprevious().getprevious().getprevious().getprevious().getprevious().getprevious().getprevious().text_content()

score 3 · Accepted Answer

lxml支持XPath：

from lxml import etree
root = etree.fromstring("...your xml...")

el, = root.xpath("//div[text() = 'the string']/preceding-sibling::*[9]")

score 1 · Accepted Answer

这行得通吗？

from itertools import islice
ancestor = islice(theitem.iterancestors(), 4) # To get the fourth ancestor

编辑我是个白痴，这不成功。您需要将其包装在一个辅助函数中，如下所示：

def nthparent(element, n):
    parent = islice(element.iterancestors(), n, n+1)
    return parent[0] if parent else None

ancestor = nthparent(theitem, 4) # to get the 4th parent

score 0 · Accepted Answer

0

使用类似simplehtmldom的东西，然后提供一个索引？

于 2010-03-02T21:43:47.283 回答

python - 有没有办法在 Python 中为 lxml 指定固定（或可变）数量的元素

3 回答 3

Related

Reference