javascript - Javascript Regexp重复行匹配无法正常工作

Question

我正在编写一个 Javascript 代码来解析一些语法文件，这是相当多的代码，但我会在这里发布相关信息。我正在使用 Javascript Regexp 来匹配字符串中包含的重复行。字符串包含，例如（假设字符串名称是行）：

    如果
    别的
    ;
    打印
    {
    }
    测试1
    测试1
    =
    +
    -
    *
    /
    (
    )
    数
    细绳
    评论
    ID
    测试2
    测试2

应该发生的是在“test1”和“test2”上找到的匹配项。然后它应该删除重复项，留下 test1 和 test2 的 1 个实例。正在发生的事情根本不是对手。我对我的正则表达式很有信心，但 javascript 可能会做一些我没想到的事情。这是对上面给出的字符串进行处理的代码：

var rex = new RegExp("(.*)(\r?\n\1)+","g");
var re = '/(.*)(\r?\n\1)+/g';

rex.lastIndex = 0;


var m = rex.exec(lines);
    if (m) {
        alert("Found Duplicate");
        var linenum = lines.search(re);            //Get line number of error
        alert("Error: Symbol Defined twice\n");
        alert("Error occured on line: " + linenum);
        lines = lines.replace(rex,"");         //Gets rid of the duplicate
    }

它永远不会进入 if(m) 语句。因此找不到匹配项。我在这里测试了正则表达式：http: //regexpal.com/ 在我的代码中使用正则表达式以及提供的示例文本。它匹配得很好，所以我有点不知所措。如果有人可以提供帮助，那就太好了。

谢谢你。

编辑：忘了补充，我在firefox中测试这个，它只需要在firefox中工作。不确定这是否重要。

score 0 · Accepted Answer

var str = 'if\nelse\n;\nprint\n{\n}\ntest1\ntest1\n=\n+\n-\n*\n/\n(\n)\nnum\nstring\ncomment\nid\ntest2\ntest2\ntest2\ntest2\ntest2';
console.log(str);
str = str.replace(/\r\n?/g,'');
// I prefer replacing all the newline characters with \n's here
str = str.replace(/(^|\n)([^\n]*)(\n\2)+/g,function(m0,m1,m2,m3,ind) {
    var line = str.substr(0,ind).split(/\n/).length + 1;
    var msg = '[Found duplicate]';
    msg += '\nFollowing symbol defined more than once';
    msg += '\n\tsymbol: ' + m2;
    msg += '\n\ton line ' + line;
    console.log(msg);
    return m1 + m2;
});
console.log(str);

否则，您可以跳过第一行并将模式更改为

/(^|\r\n?|\n)([^\r\n]*)((?:\r\n?|\n)\2)+/g

请注意，这[^\n]*也会捕获多个空行。如果您想确保它匹配（并替换）非空行，那么您可能需要使用[^\n]+.

[编辑]

对于记录，每个m代表每个arguments对象，m0整个匹配也是如此，m1是第一个子组 ( (^|\n))，m2是第二个子组 ( ([^\n]*))，m3是最后一个子组 ( (\n\2))。我本可以使用arguments[n]，但这些更短。

与返回值一样，由于 Javascript 使用的正则表达式风格缺乏后向性，此模式正在捕获可能的前面换行符（除非它是第一行），因此它需要返回匹配项和前面的换行符（如果有）。这就是为什么它不应该m2只返回。

score 0 · Accepted Answer

First error: \ in a JS string is also an escape character.

var rex = new RegExp("(.*)(\r?\n\1)+","g");

should be written

var rex = new RegExp("(.*)(\\r?\\n\\1)+","g");
// or, shorter:
var rex = /(.*)(\r?\n\1)+/g;

if you want to make it work. In the case of the RegExp constructor, you’re passing the pattern as a string to the constructor function. This means you need to escape each \ backslash that occurs in the pattern. If you use a regexp literal, you don’t need to escape them, since they’re not in a string, but retain their ‘normal’ properties in the regexp pattern.

Second error, your expression

var re = '/(.*)(\r?\n\1)+/g';

is wrong. What you’re doing here is assigning a string literal to a variable. I’m assuming you meant to assign a regular expression literal, which should be written like this:

var re = /(.*)(\r?\n\1)+/g;

Third error: the last line

lines = lines.replace(rex,"");         //Gets rid of the duplicate

removes both instances of all duplicate lines! If you want to keep the first instance of each duplicate, you should use

lines = lines.replace(rex, "$1");

And finally, this method only detects two consecutive identical lines. Is that what you want, or do you need to detect any duplicates, wherever they may be?

javascript - Javascript Regexp重复行匹配无法正常工作

2 回答 2

Related

Reference