python - 如何在 lxml 解析中获取确切日期？

Question

解析 html 文档时，我遇到了一个奇怪的问题。html 文档的跨度如下所示：

<span class="time">Thu May 17, 2012 12:20 pm</span>

当我解析它时（它在 td 内）：

row.xpath('string(./td/span/text())')

我得到以下信息：

Wed May 16, 2012 11:20 pm

可能是什么问题？

score 1 · Accepted Answer

可能，./td/span匹配多个元素。当您使用string()XPath 时，仅处理第一个结果：

>>> html = """<html>
...             <td><span class="time">Wed May 16, 2012 11:20 pm</span></td>
...             <td><span class="time">Thu May 17, 2012 12:20 pm</span></td>
...           </html>"""
>>> t = etree.fromstring(html)
>>> t.xpath('string(./td/span)')
'Wed May 16, 2012 11:20 pm'

您应该编写一个更具体的 XPath 来获取您想要的行，或者遍历这些行：

>>> for row in t.xpath("./td/span"):
...     print(row.xpath("string(.)"))
...     
Wed May 16, 2012 11:20 pm
Thu May 17, 2012 12:20 pm

（注意：我已经删除了text()，因为在这种情况下不需要它。text() 可能不会像你认为的那样做。）

python - 如何在 lxml 解析中获取确切日期？

1 回答 1

Related

Reference