5

Consider the following toy example. I want to match in Go a name with a regexp where the name is sequences of letters a separated by single #, so a#a#aaa is valid, but a# or a##a are not. I can code the regexp in the following two ways:

r1 := regexp.MustCompile(`^a+(#a+)*$`)
r2 := regexp.MustCompile(`^(a+#)*a+$`)

Both of these work. Now consider more complex task of matching a sequence of names separated by single slash. As in above, I can code that in two ways:

^N+(/N+)*$
^(N+/)*N+$

where N is a regexp for the name with ^ and $ stripped. As I have two cases for N, so now I can have 4 regexps:

    ^a+(#a+)*(/a+(#a+)*)*$
    ^(a+#)*a+(/a+(#a+)*)*$
    ^((a+#)*a+/)*a+(#a+)*$
    ^((a+#)*a+/)*(a+#)*a+$

The question is why when matching against the string "aa#a#a/a#a/a" the first one fails while the rest 3 cases work as expected? I.e. what causes the first regexp to mismatch? The full code is:

package main

import (
    "fmt"
    "regexp"
)

func main() {
    str := "aa#a#a/a#a/a"
    regs := []string {
        `^a+(#a+)*(/a+(#a+)*)*$`,
        `^(a+#)*a+(/a+(#a+)*)*$`,
        `^((a+#)*a+/)*a+(#a+)*$`,
        `^((a+#)*a+/)*(a+#)*a+$`,
    }    
    for _, r := range(regs) {
        fmt.Println(regexp.MustCompile(r).MatchString(str))
    } 
}

Surprisingly it prints false true true true

4

2 回答 2

0

您可以尝试减轻量词的原子子嵌套。
如果你没有锚,
如果使用像这样的嵌套可选量词,表达式可能会在回溯中爆炸,
当它找不到直接解决方案时。

Go 可能对此犹豫不决,迫使它彻底失败,而不是
大规模回溯,但不确定。

试试这样的东西,看看它是否有效。

 # ^a+(#?a)*(/a+(#?a)*)*$

 ^ 
 a+
 (                    # (1 start)
      \#?
      a 
 )*                   # (1 end)
 (                    # (2 start)
      /
      a+
      (                    # (3 start)
           \#?
           a 
      )*                   # (3 end)
 )*                   # (2 end)
 $

编辑:(转自评论)如果复杂度太高,有些引擎甚至不会编译它,有些会在运行时默默地失败,有些会锁定在回溯循环中。

这个细微的差别就是问题所在

坏:太复杂(#?a+)*
好:没有嵌套的、开放式的量词(#?a)*

如果您遇到此问题,请去掉嵌套,通常可以解决
回溯问题。


eit2:如果您需要分隔符并希望确保分隔符仅位于中间并由a's 包围,您可以试试这个

https://play.golang.org/p/oM6B6H3Kdx

 #  ^a+[#/](a[#/]?)*a+$

 ^ 
 a+
 [\#/] 
 (                             # (1 start)
      a
      [\#/]? 

 )*                            # (1 end)
 a+
 $ 

或这个

https://play.golang.org/p/WihqSjH_dI

 # ^a+(?:[#/]a+)+$

 ^ 
 a+ 
 (?:
      [\#/] 
      a+
 )+
 $
于 2015-07-27T21:56:35.097 回答
0

这一定是 golang 正则表达式引擎中的错误。一个更简单的测试用例是在工作时^a(/a+(#a+)*)*$无法匹配,请参阅http://play.golang.org/p/CDKxVeXW98a/a#a^(a)(/a+(#a+)*)*$

我提交了https://github.com/golang/go/issues/11905

于 2015-07-28T07:39:20.250 回答