1

我正在编写一个递归算法,通过解析正则表达式来构建一个有限状态自动机。自动机遍历表达式,将字符推入堆栈,将操作符推入“操作符堆栈”。当我遇到“(”(表示分组操作)时,我将一个“子自动机”压入堆栈,并将模式的其余部分传递给子自动机进行解析。当该自动机遇到“)”时,它会传递其余的到父自动机完成解析的字符串。这是代码:

var NFA = function(par) {
  this.stack = [];
  this.op_stack = [];
  this.parent = par;

};

NFA.prototype.parse = function(pattern) {
  var done = false;
  for(var i in pattern) {
    if (done === true) {
      break;
    }
    switch(pattern.charAt(i)) {
      case "(":
        var sub_nfa = new NFA(this);
        this.stack.push(sub_nfa);
        sub_nfa.parse(pattern.substring(i+1, pattern.length));
        done = true;
        break;
      case ")":
        if (this.parent !== null) {
          var len = pattern.length;
          /*TROUBLE SPOT*/
          this.parent.parse(pattern.substring(i, pattern.length));
          done = true;
          break;
        }
      case "*":
        this.op_stack.push(operator.KLEENE);
        break;
      case "|":
        this.op_stack.push(operator.UNION);
        break;
      default:
        if(this.stack.length > 0) {
          //only push concat after we see at least one symbol
          this.op_stack.push(operator.CONCAT);        
        }
        this.stack.push(pattern.charAt(i));
    }
  }
};

注意标有“故障点”的区域。给定正则表达式“(a|b)a”,调用 this.parent.parse 只被调用一次:当子自动机遇到“)”时。此时,pattern.substring(i, pattern.length) = ")a"。这“有效”,但它不正确,因为我需要在将字符串传递给父自动机之前使用“)”输入。但是,如果我将调用更改为 this.parent.parse(pattern.substring(i+1, pattern.length),则解析会得到空字符串!我已尝试单步执行代码,但无法解释此行为。什么是我失踪了?

在 Juan 的建议下,我做了一个快速的 jsfiddle 来显示在尝试使用该算法解析“(a|b)a”时出现的问题。在 ")" 的情况下,它使用 i 索引处的子字符串和 i+1 索引处的子字符串填充一个空 div。它表明,虽然 i 处的子字符串中有 2 个字符,但 i+1 处的子字符串是空字符串!这是链接:http: //jsfiddle.net/XC6QM/1/

编辑:我编辑了这个问题以反映使用 charAt(i) 不会改变算法行为的事实。

4

2 回答 2

1

I think the previous answer was on the right track. But there also looks to me to be an off-by-one error. Shouldn't you be increasing the index for your substring? You don't want to include the ")" in the parent parse, right?

this.parent.parse(pattern.substring(i + 1, pattern.length));

But this will still fail because of the error Juan mentioned. A quick temporary fix to test this would be to convert the i to a number:

this.parent.parse(pattern.substring(+i + 1, pattern.length));

This might do it for you. But you should probably go back and switch away from the for-in loop on the string. I think that's causing your issue. Turn it into an array with str.split('') and then use an integer to loop. That will prevent further such issues.

于 2013-02-19T19:15:54.040 回答
0

真正的问题是您使用 afor in来遍历字符串的字符。使用for in 循环,您i将成为一个字符串,因此,当您尝试执行时i+1,它会进行字符串连接。

如果您将迭代更改为

for(var i=0; i < pattern.length; i++) {

然后一切正常http://jsfiddle.net/XC6QM/2/

斯科特的回答正确地确定了问题,但我认为他的解决方案(将索引转换为数字)并不理想。你最好从一个数字索引开始循环

此外,您不应使用括号来访问字符串中的字符,这在 IE 7 中不起作用

 switch(pattern[i]) {

应该

 switch(pattern.charAt(i)) { 
于 2013-02-19T18:46:59.917 回答