python - 如何使用 lxml 进行 Python XPath 不区分大小写的搜索？

Question

我正在尝试使用lower-caseXPath 中的函数匹配国家或地区。translate有点乱，所以使用小写字母并且我的 Python 版本 2.6.6 支持 XPath 2.0 我相信因为小写字母仅在 XPath 2.0 中可用。

我正在寻找如何在我的情况下使用小写字母。希望这个例子是不言自明的。我正在寻找['USA', 'US']输出（如果小写评估 Country 和 country 相同，则可能会同时发生这两个国家）。

HTML：文档.htm

<html>
    <table>
        <tr>
            <td>
                Name of the Country : <span> USA </span>
            </td>
        </tr>
        <tr>
            <td>
                Name of the country : <span> UK </span>
            </td>
        </tr>
</table>

Python ：

import lxml.html as lh

doc = open('doc.htm', 'r')
out = lh.parse(doc)
doc.close()

print out.xpath('//table/tr/td[text()[contains(. , "Country")]]/span/text()')
# Prints : [' USA ']
print out.xpath('//table/tr/td[text()[contains(. , "country")]]/span/text()')
# Prints : [' UK ']

print out.xpath('//table/tr/td[lower-case(text())[contains(. , "country")]]/span/text()')
# Prints : [<Element td at 0x15db2710>]

更新：

out.xpath('//table/tr/td[text()[contains(translate(., "ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz") , "country")]]/span/text()')

现在问题仍然存在，我可以将翻译部分存储为全局变量“handlecase”并在执行 XPath 时打印该全局变量吗？

像这样的工作：

handlecase = """translate(., "ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz")"""

out.xpath('//table/tr/td[text()[contains(%s , "country")]]/span/text()' % (handlecase))

但为了简单和可读性，我想像这样运行它：

out.xpath('//table/tr/td[text()[contains(handlecase , "country")]]/span/text()')

score 5 · Accepted Answer

我相信得到你想要的最简单的事情就是编写一个 XPath 扩展函数。

通过这样做，您可以编写一个lower-case()函数或不区分大小写的搜索。

您可以在此处找到详细信息：http: //lxml.de/extensions.html

score 3 · Accepted Answer

使用：

   //td[translate(substring(text()[1], string-length(text()[1]) - 9),
                  'COUNTRY :',
                  'country'
                  )
        =
         'country'
       ]
        /span/text()

基于 XSLT 的验证：

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/">
  <xsl:copy-of select=
  "//td[translate(substring(text()[1], string-length(text()[1]) - 9),
                  'COUNTRY :',
                  'country'
                  )
        =
         'country'
       ]
        /span/text()
       "/>
 </xsl:template>
</xsl:stylesheet>

当此转换应用于提供的 XML 文档时：

<html>
        <table>
            <tr>
                <td>
                    Name of the Country : <span> USA </span>
                </td>
            </tr>
            <tr>
                <td>
                    Name of the country : <span> UK </span>
                </td>
            </tr>
        </table>
</html>

计算 XPath 表达式并将选定的两个文本节点复制到输出：

 USA  UK

说明：

我们使用实现 XPath 2.0 标准函数的 XPath 1.0 表达式的特定变体ends-with($text, $s)：这是：

......

$s = substring($text, string-length($text) - string-length($s) +1)

.2. 下一步是使用该translate()函数将结束的 10 个字符的长字符串转换为小写，消除任何空格或任何“：”字符。

.3. 如果结果是字符串（全小写）“country”，那么我们选择 this 的 s= spanchild 的子文本节点（在这种情况下只有一个） td。

python - 如何使用 lxml 进行 Python XPath 不区分大小写的搜索？

2 回答 2

Related

Reference