我正在尝试使用 XSLT 将 HTML 文档转换为纯文本文档。但是,我对 XSLT 还是很陌生,我不明白为什么我的转换输出与我想要的输出不同。
我的输入 HTML 文档:
<html>
<body>
<h1>Heading 1</h1>
<p class="first">First paragraph.</p>
<p class="para">Regular paragraph 1.</p>
<p class="para">Regular paragraph 2.</p>
<p class="para">Regular paragraph 3.</p>
<p class="last">Last paragraph.</p>
<h2 class="someclass">Heading 2</h2>
<p class="first">First paragraph 2.</p>
<p class="para">Regular paragraph 4.</p>
<p class="para">Regular paragraph 5.</p>
<p class="para">Regular paragraph 6.</p>
</body>
</html>
我想要的输出(纯文本):
Heading (h1): Heading 1
Para (first): First paragraph.
Para (regular): Regular paragraph 1.
Para (regular): Regular paragraph 2.
Para (regular): Regular paragraph 3.
Para (last): Last paragraph.
Heading (someclass): Heading 2
Para (first): First paragraph 2.
Para (regular): Regular paragraph 4.
Para (regular): Regular paragraph 5.
Para (regular): Regular paragraph 6.
我的 XSLT:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:for-each select="//p[@class='first']">
Para (first): <xsl:value-of select="."/>
</xsl:for-each>
<xsl:for-each select="//p[@class='para']">
Para (regular): <xsl:value-of select="."/>
</xsl:for-each>
<xsl:for-each select="//p[@class='last']">
Para (last): <xsl:value-of select="."/>
</xsl:for-each>
<xsl:for-each select="//h1">
Heading (h1): <xsl:value-of select="."/>
</xsl:for-each>
<xsl:for-each select="//h2[@class='someclass']">
Heading (someclass): <xsl:value-of select="."/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
应用上述 XSLT 输入 HTML 文档的结果:
Para (first): First paragraph.
Para (first): First paragraph 2.
Para (regular): Regular paragraph 1.
Para (regular): Regular paragraph 2.
Para (regular): Regular paragraph 3.
Para (regular): Regular paragraph 4.
Para (regular): Regular paragraph 5.
Para (regular): Regular paragraph 6.
Para (last): Last paragraph.
Heading (h1): Heading 1
Heading (someclass): Heading 2
我想要做的是将 HTML 文档中的标签内容放入纯文本中,以便内容出现在 HTML 文档中。相反,这种转换所做的是将与相同 XPATH 匹配的所有元素一个接一个地放置。
我怀疑该解决方案正在使用apply-templates元素,但是我不明白它是如何工作的,因此在上面的示例中使用它时遇到了麻烦。