我正处于将基于 Word 的文档转换为 XML 的非常痛苦的过程中。我遇到了以下问题:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<p>
<element>This one is taken care of.</element> Some more text. „<hi rend="italics">Is this a
quote</hi>?” (Source). </p>
<p>
<element>This one is taken care of.</element> Some more text. „<hi rend="italics">This is a
quote</hi>” (Source). </p>
<p>
<element>This one is taken care of.</element> Some more text. „<hi rend="italics">This is
definitely a quote</hi>!” (Source). </p>
<p>
<element>This one is taken care of.</element> Some more text.„<hi rend="italics">This is a
first quote</hi>” (Source). „<hi rend="italics">Sometimes there is a second quote as
well</hi>!?” (Source). </p>
</root>
<p>
节点具有混合内容。<element>
我已经在之前的迭代中处理过。但现在问题在于部分出现在文本节点内<hi rend= "italics"/>
和部分作为文本节点的引用和来源。
如何使用 XSLT 2.0 来:
- 匹配
<hi rend="italics">
最后一个字符为“„”的文本节点之前的所有节点? - 输出
<hi rend="italics">
as的内容<quote>...</quote>
,去掉引号(“„”和“””),但在<quote/>
紧跟在<hi rend="italics">
?的同级之后出现的任何问号和感叹号中包括 - 将节点后面的“(”和“)”之间的文本节点转换
<hi rend="italics">
为<source>...</source>
不带括号的节点。 - 包括最后的句号。
换句话说,我的输出应该是这样的:
<root>
<p>
<element>This one is taken care of.</element> Some more text. <quote>Is this a quote?</quote> <source>Source</source>.
</p>
<p>
<element>This one is taken care of.</element> Some more text. <quote>This is a quote</hi> <source>Source</source>.
</p>
<p>
<element>This one is taken care of.</element> Some more text. <quote>This is definitely a quote!</hi> <source>Source</source>.
</p>
<p>
<element>This one is taken care of.</element> Some more text. <quote>This is a first quote</quote> <source>Source</source>. <quote>Sometimes there is a second quote as well!?</quote> <source>Source</source>.
</p>
</root>
我从来没有处理过像这样的混合内容和字符串操作,整个事情真的让我失望。我将非常感谢您的提示。