1

我正在尝试编写一个 XQuery 函数来标记分隔符上的字符串,同时忽略嵌套括号表达式中的分隔符,例如

tokenizeOutsideBrackets("1,(2,3)" , ",")         => ( "1" , "(2,3)" ) 
tokenizeOutsideBrackets("1,(2,(3,4))" , ",")     => ( "1" , "(2,(3,4))" )
tokenizeOutsideBrackets("1,(2,(3,(4,5)))" , ",") => ( "1" , "(2,(3,(4,5)))" )
tokenizeOutsideBrackets("1,(2,(3,4),5),6" , ",") => ( "1" , "(2,(3,4),5)" , "6" )

如果我有递归正则表达式或命令式语言,这将是相当微不足道的,但我正在努力在 XQuery 中找到一种简单易行的方法来执行此操作。

谢谢!

4

3 回答 3

1

这个 XQuery 表达式:

tokenize(replace('1,(2,(3,4),5),6','([0123456789]+|\(.*\))(,)?','$1;'),';')

输出:

1 (2,(3,4),5) 6

更新:如果会有类似的字符串'1,(2,3),(4,5),6',那么你将需要一个解析器来解析这个语法:

exp ::= term ( ',' term ) *

term ::= num | '(' exp ')'

num ::= ( '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' ) +
于 2011-04-05T15:36:12.670 回答
0

一种方法是先拆分,然后将带有不平衡括号的令牌连接到其右侧邻居。

下面的代码将为您提供所需的结果。它使用 fn:tokenize 进行拆分,然后 (tail-) 递归处理结果标记,当前面的标记具有不匹配的 "(" 和 ")" 计数时连接。这种方法存在一些缺陷,即未能正确匹配左右括号,并且将 $delimiter 视为模式和文字。需要更多编码才能正确处理,但是您可能会明白。

declare function local:tokenizeOutsideBrackets($string, $delimiter)
{
  local:joinBrackets(tokenize($string, $delimiter), $delimiter, ())
};

declare function local:joinBrackets($tokens, $delimiter, $result)
{
  if (empty($tokens)) then
    $result
  else 
    let $last := $result[last()]
    let $new-result :=
      if (string-length(translate($last, "(", "")) 
        = string-length(translate($last, ")", ""))) then
       ($result, $tokens[1])
      else
       ($result[position() < last()], concat($last, $delimiter, $tokens[1]))
    return local:joinBrackets($tokens[position() > 1], $delimiter, $new-result)
};
于 2011-04-05T11:41:24.393 回答
0

一直在玩,下面的功能似乎可以工作,虽然我不禁想到有一个更简单的方法。

此代码使用 functx:index-of-string 函数来查找所有分隔符的索引。然后,它尝试找到第一个分隔符,其中左边的所有内容都有相同数量的左括号和右括号。找到后,将在此分隔符右侧的所有内容中重复此操作。

declare function local:tokenizeOutsideBrackets(
  $arg as xs:string?,
  $delimiter as xs:string) as xs:string*
{
  if (contains($arg, $delimiter))
  then
    (:find positions of all the delimiters:)
    let $delimiterPositions := (
      functx:index-of-string($arg,$delimiter),
      string-length($arg)+1 (:Add in end of string too:)
    )

    (:strip out all the fragments that have matching
      brackets to the left of each delimiter:)
    let $fragments :=
      for $endPos in $delimiterPositions
      let $candidateString := substring($arg,1,$endPos - 1)
      return
        if (local:hasMatchedBrackets($candidateString))
        then $candidateString
        else ()
    let $firstFragment := $fragments[1]
    let $endPos := string-length($firstFragment)

    (:recursively return the first matching fragment,
      plus the fragments in the remaining string:)
    return
    (
      $firstFragment,
      local:tokenizeOutsideBrackets(
        substring(
          $arg,
          $endPos+string-length($delimiter)+1,
          string-length($arg) - $endPos -(string-length($delimiter))
        ),
        $delimiter
      )
    )
  else if ($arg='') then () else ($arg)
};

declare function local:hasMatchedBrackets($arg as xs:string) as xs:boolean 
{
  count(tokenize($arg,'\(')) = count(tokenize($arg,'\)'))
};
于 2011-04-05T16:27:02.787 回答