2

我有一个捕获嵌套组的正则表达式,我想输出与这些组相关的嵌套 XML,就像fn:analyze-string. 这是一个简单的例子:

正则表达式

((Luckenbach|Houston|Little Rock),\s(TX|AK))

输入

Let's go to Luckenbach, TX with Waylon and Willie and the boys.

期望的输出

<s:analyze-string-result xmlns:s="http://www.w3.org/2009/xpath-functions/analyze-string">
    <s:non-match>Let's go to </s:non-match>
    <s:match>
        <s:group nr="1">
            <s:group nr="2">Luckenbach</s:group>, <s:group nr="3">TX</s:group
        </s:group>
    </s:match>
    <s:non-match> with Waylon and Willie and the boys.</s:non-match>
</s:analyze-string-result>

问题是似乎没有办法递归处理in中的regex-group()值(或者像 xQuery fn:analyze-string() 一样将它们作为 XML 访问)。xsl:analyze-stringxsl:matching-substring

该解决方案需要足够通用以使用不同的正则表达式,其中许多具有多个级别的嵌套捕获组。

4

1 回答 1

2

当上下文节点包含示例文本时,以下会产生所需的输出:

    <snip>
        <xsl:analyze-string 
                select="." 
                regex="((Luckenbach|Houston|Little Rock),\s(TX|AK))">
            <xsl:matching-substring>
                <location>
                    <city><xsl:value-of select="regex-group(2)"/></city>
                    <xsl:text>, </xsl:text>
                    <state><xsl:value-of select="regex-group(3)"/></state>
                </location>                       
            </xsl:matching-substring>
            <xsl:non-matching-substring>
                <xsl:value-of select="."/>
            </xsl:non-matching-substring>
        </xsl:analyze-string>    
    </snip>

如果您只想生成<snip>如果 REGEX 匹配,您可以稍微调整 REGEX 和组的处理:

        <xsl:analyze-string 
                select="." 
                regex="((.*)((Luckenbach|Houston|Little Rock),\s(TX|AK))(.*))">
            <xsl:matching-substring>
                <snip>
                    <xsl:value-of select="regex-group(2)"/>
                    <location>
                        <city><xsl:value-of select="regex-group(4)"/></city>
                        <xsl:text>, </xsl:text>
                        <state><xsl:value-of select="regex-group(5)"/></state>
                    </location>
                    <xsl:value-of select="regex-group(6)"/>
                </snip>   
            </xsl:matching-substring>
            <xsl:non-matching-substring>
                <xsl:value-of select="."/>
            </xsl:non-matching-substring>
        </xsl:analyze-string> 

如果要重现 XQuery 函数analyze-string()的行为,可以定义自己的自定义函数:

<xsl:function name="my:analyze-string" as="item()*" xmlns:my="http://stackoverflow.com/questions/13187307/output-nested-regex-groups-as-nested-xml-using-xslanalyze-string">
    <xsl:param name="val" />

    <analyze-string-result xmlns="http://www.w3.org/2005/xpath-functions">   
        <xsl:analyze-string select="$val" regex="((.*)((Luckenbach|Houston|Little Rock),\s(TX|AK))(.*))">
            <xsl:matching-substring>
                <xsl:for-each select="1 to 6">
                    <xsl:if test="regex-group(.)">
                        <match>
                            <group  nr="{.}">
                                <xsl:value-of select="regex-group(.)"/>
                            </group>
                        </match>
                    </xsl:if>
                </xsl:for-each>
            </xsl:matching-substring>
            <xsl:non-matching-substring>
                <non-match>
                    <xsl:value-of select="."/>
                </non-match> 
            </xsl:non-matching-substring>
        </xsl:analyze-string>    
    </analyze-string-result>   
</xsl:function>

当像这样调用时:

 <xsl:variable name="value" 
      select='"Let&apos;s go to Luckenbach, TX with Waylon and Willie and the boys."'/>
 <xsl:copy-of select="my:analyze-string($value)"
    xmlns:my="http://stackoverflow.com/questions/13187307/output-nested-regex-groups-as-nested-xml-using-xslanalyze-string"/>  

它产生以下输出:

<analyze-string-result xmlns="http://www.w3.org/2005/xpath-functions"
                       xmlns:my="http://stackoverflow.com/questions/13187307/output-nested-regex-groups-as-nested-xml-using-xslanalyze-string">
   <match>
      <group nr="1">Let's go to Luckenbach, TX with Waylon and Willie and the boys.</group>
   </match>
   <match>
      <group nr="2">Let's go to </group>
   </match>
   <match>
      <group nr="3">Luckenbach, TX</group>
   </match>
   <match>
      <group nr="4">Luckenbach</group>
   </match>
   <match>
      <group nr="5">TX</group>
   </match>
   <match>
      <group nr="6"> with Waylon and Willie and the boys.</group>
   </match>
</analyze-string-result>
于 2012-11-02T00:50:59.940 回答