python - 如何从字符串中获取所需数据

Question

例如，我有字符串

s = '\r\n<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> \r\n\r\n<p>\r\n\t\r\n\t\t<A HREF="../temp/Table 32012419252223.xls">Click to download</A>\r\n\r\n\t\r\n\t</P>'

我只需/temp/Table 32012419252223.xls要从上面的字符串中获取。

其次，我有一个链接，例如

link = "www.example.com/flow/hardway/joshing/high"

现在我需要"joshing/high"用第一个（/temp/Table 32012419252223.xls）的结果替换上面的链接。

score 2 · Accepted Answer

如果要解析 HTML 或 XML 文档，请使用适当的库。使用 lxml 和 xpath 的示例是：

from lxml.html.soupparser import fromstring
from urlparse import urljoin

s = 'yourhtml'
h = fromstring(s)
print urljoin(link, h.xpath('//a[1]/@href')[0]))

获取页面上的第一个链接。如果 HTML 更复杂，您还可以使用更复杂的 XPath 表达式。

python - 如何从字符串中获取所需数据

1 回答 1

Related

Reference