纯 XPath 1.0 解决方案(无扩展功能):
//a[starts-with(@href, 'http://biz.yahoo.com/ic/')
and
substring(@href, string-length(@href)-4) = '.html'
and
string-length
(substring-before
(substring-after(@href, 'http://biz.yahoo.com/ic/'),
'.')
) = 3
and
translate(substring-before
(substring-after(@href, 'http://biz.yahoo.com/ic/'),
'.'),
'0123456789',
''
)
= ''
]
这个 XPath 表达式可以像这样“用英文阅读”:
选择a
文档中的any,其href
属性的字符串值以字符串开头,以字符串"'http://biz.yahoo.com/ic/"
结尾".html"
,并且在开始和结束子字符串之间的子字符串的长度为3,并且同一子字符串仅由数字组成。
基于 XSLT 的验证:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:copy-of select=
"//a[starts-with(@href, 'http://biz.yahoo.com/ic/')
and
substring(@href, string-length(@href)-4) = '.html'
and
string-length
(substring-before
(substring-after(@href, 'http://biz.yahoo.com/ic/'),
'.')
) = 3
and
translate(substring-before
(substring-after(@href, 'http://biz.yahoo.com/ic/'),
'.'),
'0123456789',
''
)
= ''
]
"/>
</xsl:template>
</xsl:stylesheet>
当此转换应用于以下 XML 文档时:
<html>
<body>
<a href="http://biz.yahoo.com/ic/123.html">Link1</a>
<a href="http://biz.yahoo.com/ic/1234.html">Incorrect</a>
<a href="http://biz.yahoo.com/ic/x23.html">Incorrect</a>
<a href="http://biz.yahoo.com/ic/621.html">Link2</a>
</body>
</html>
计算 XPath 表达式并将选定节点复制到输出:
<a href="http://biz.yahoo.com/ic/123.html">Link1</a>
<a href="http://biz.yahoo.com/ic/621.html">Link2</a>
正如我们所见,只选择了正确的、想要的a
元素。