2

我是javascript中正则表达式的新手。我有一个类似下面的字符串:

这里的东西(这里的东西(这里的东西和这里的东西)别的东西别的东西)asdf(asdfas)这里的东西这里的东西(这里的东西别的东西别的东西)”

从上面的字符串中,我试图根据以下规则捕获一组文本:

  • 捕获以or开头并以"and"or"or"结尾的行"or""and"
  • 捕获的行可以有很多括号。
  • 如果运算符"or""and"在括号中,则忽略它们

从上面的字符串中,我期待一组结果,如下所示

  • 这里的东西(这里的东西(这里的东西和这里的东西)别的东西别的东西)asdf(asdfas)
  • 和这里的东西
  • 或这里的东西(这里的其他东西和其他东西或其他东西)

我尝试了许多正则表达式,其中一个接近我想要的是:

(and|or)\s.((?!(and|or)).)*

我也可以使用非正则表达式解决方案。

4

2 回答 2

3

这是一个工作小提琴http://jsfiddle.net/e8tMb/

(如果您对支持嵌套括号的示例感兴趣,我在此答案的底部添加了一个)

这个实现不是纯正则表达式,但是,在我看来它是可以理解的。它循环遍历字符串,并以非常简单的方式完全按照您指定的方式执行。

假设我们有我们的字符串:

var str="and something here ( something else here and something else or something else) and something here or something here ( something else here and something else or something else)";

我们可以根据相关的标点符号对其进行标记

var tokens = str.split(/( |\(|\))/g) 

结果是:

["and", " ", "something", " ", "here", " ", "", "(", "", " ", "something", " ", "else", " ", "here", " ", "and", " ", "something", " ", "else", " ", "or", " ", "something", " ", "else", ")", "", " ", "and", " ", "something", " ", "here", " ", "or", " ", "something", " ", "here", " ", "", "(", "", " ", "something", " ", "else", " ", "here", " ", "and", " ", "something", " ", "else", " ", "or", " ", "something", " ", "else", ")", ""]

现在,我们可以迭代这些标记并简单地检查句子: var str="and something here (some else here and something else or something else) and something here or something here (some else here and something else or something else)";

var tokens = str.split(/( |\(|\))/g);

var inParans = false;
var sentences = [];
var lastIndex = 0;
for(var i=0;i<tokens.length;i++){
    if(tokens[i] === "("){
        inParans = true;
    } else
    if(tokens[i] === ")"){
        inParans = false;
    } else
    if((tokens[i] === "and" || tokens[i] === "or") && !inParans){
        sentences.push(tokens.slice(lastIndex,i).join("")); // add sentence
        lastIndex = i;
    }
}
sentences.push(tokens.slice(lastIndex).join(""));

document.body.innerHTML = (sentences.join("<br />"));

如果您想匹配嵌套的参数

小提琴http://jsfiddle.net/UbeS8/

对于 CS 理论中的正则表达式,由于泵引理(它们没有内存) ,不可能正确匹配嵌套数据。但是,使用我们的分词器,因为我们一开始并没有将自己限制在 RegExp 中,所以添加这种东西很容易,我们只计算括号。与正则表达式(严格意义上没有记忆)不同,我们可以使用变量轻松跟踪。这是这样的代码:

var tokens = str.split(/( |\(|\))/g);

var inParans = 0;
var sentences = [];
var lastIndex = 0;
for(var i=0;i<tokens.length;i++){
    if(tokens[i] === "("){
        inParans++;
    } else
    if(tokens[i] === ")"){
        inParans--;
        if(inParans < 0){ //invalid syntax
            throw new Error("Invalid syntax");
        }
        //If you don't want this to be an error, you can do what Scott suggested and do
        //            inParans = Math.max(inParans - 1, 0);
    } else
    if((tokens[i] === "and" || tokens[i] === "or") && (inParans===0)){ // no nesting added check
        sentences.push(tokens.slice(lastIndex,i).join("")); // add sentence
        lastIndex = i;
    }
}
sentences.push(tokens.slice(lastIndex).join(""));

document.body.innerHTML = (sentences.join("<br />"));
于 2013-05-03T12:54:59.330 回答
2

这应该适合您的需求(演示):

\b(?:and|or)\b((?:[(][^)]+[)]|.)+?)(?=\b(?:and|or)\b|$)

ands/or 之间的数据在第一组中被捕获。

于 2013-05-03T12:53:01.710 回答