javascript - RegEx 使用 RegExp.exec 从字符串中提取所有匹配项

Question

我正在尝试解析以下类型的字符串：

[key:"val" key2:"val2"]

里面有任意 key:"val" 对。我想获取键名和值。对于那些好奇的人，我正在尝试解析任务战士的数据库格式。

这是我的测试字符串：

[description:"aoeu" uuid:"123sth"]

这是为了强调除了空格之外的任何内容都可以在键或值中，冒号周围没有空格，并且值总是用双引号引起来。

在节点中，这是我的输出：

[deuteronomy][gatlin][~]$ node
> var re = /^\[(?:(.+?):"(.+?)"\s*)+\]$/g
> re.exec('[description:"aoeu" uuid:"123sth"]');
[ '[description:"aoeu" uuid:"123sth"]',
  'uuid',
  '123sth',
  index: 0,
  input: '[description:"aoeu" uuid:"123sth"]' ]

但description:"aoeu"也符合这种模式。我怎样才能找回所有的比赛？

score 270 · Accepted Answer

继续循环调用re.exec(s)以获取所有匹配项：

var re = /\s*([^[:]+):\"([^"]+)"/g;
var s = '[description:"aoeu" uuid:"123sth"]';
var m;

do {
    m = re.exec(s);
    if (m) {
        console.log(m[1], m[2]);
    }
} while (m);

试试这个 JSFiddle：https ://jsfiddle.net/7yS2V/

score 205 · Accepted Answer

str.match(pattern), 如果pattern有全局标志g, 会将所有匹配项作为数组返回。

例如：

const str = 'All of us except @Emran, @Raju and @Noman were there';
console.log(
  str.match(/@\w*/g)
);
// Will log ["@Emran", "@Raju", "@Noman"]

score 97 · Accepted Answer

要遍历所有匹配项，您可以使用以下replace函数：

var re = /\s*([^[:]+):\"([^"]+)"/g;
var s = '[description:"aoeu" uuid:"123sth"]';

s.replace(re, function(match, g1, g2) { console.log(g1, g2); });

score 61 · Accepted Answer

这是一个解决方案

var s = '[description:"aoeu" uuid:"123sth"]';

var re = /\s*([^[:]+):\"([^"]+)"/g;
var m;
while (m = re.exec(s)) {
  console.log(m[1], m[2]);
}

这是基于草坪的答案，但更短。

请注意，必须设置“g”标志以跨调用向前移动内部指针。

score 26 · Accepted Answer

str.match(/regex/g)

将所有匹配项作为数组返回。

如果出于某种神秘的原因，您需要附带的附加信息exec，作为先前答案的替代方法，您可以使用递归函数而不是循环来完成，如下所示（看起来也更酷:)。

function findMatches(regex, str, matches = []) {
   const res = regex.exec(str)
   res && matches.push(res) && findMatches(regex, str, matches)
   return matches
}

// Usage
const matches = findMatches(/regex/g, str)

如之前的评论所述，在g正则表达式定义的结尾处将指针在每次执行中向前移动是很重要的。

score 19 · Accepted Answer

我们终于开始看到内置matchAll功能，请参阅此处了解说明和兼容性表。看起来截至 2020 年 5 月，Chrome、Edge、Firefox 和 Node.js（12+）受支持，但不支持 IE、Safari 和 Opera。似乎它是在 2018 年 12 月起草的，所以给它一些时间来覆盖所有浏览器，但我相信它会到达那里。

内置matchAll函数很好，因为它返回一个iterable。它还为每场比赛返回捕获组！所以你可以做类似的事情

// get the letters before and after "o"
let matches = "stackoverflow".matchAll(/(\w)o(\w)/g);

for (match of matches) {
    console.log("letter before:" + match[1]);
    console.log("letter after:" + match[2]);
}

arrayOfAllMatches = [...matches]; // you can also turn the iterable into an array

似乎每个匹配对象都使用与match(). 因此，每个对象都是匹配组和捕获组的数组，以及三个附加属性index、input和groups。所以它看起来像：

[<match>, <group1>, <group2>, ..., index: <match offset>, input: <original string>, groups: <named capture groups>]

有关更多信息，matchAll还有一个Google 开发者页面。还有可用的polyfills/shims。

score 13 · Accepted Answer

如果你有 ES9

（意味着如果您的系统：Chrome、Node.js、Firefox 等支持 Ecmascript 2019 或更高版本）

使用新的yourString.matchAll( /your-regex/ ).

如果你没有 ES9

如果您的系统较旧，这里有一个易于复制和粘贴的功能

function findAll(regexPattern, sourceString) {
    let output = []
    let match
    // make sure the pattern has the global flag
    let regexPatternWithGlobal = RegExp(regexPattern,[...new Set("g"+regexPattern.flags)].join(""))
    while (match = regexPatternWithGlobal.exec(sourceString)) {
        // get rid of the string copy
        delete match.input
        // store the match data
        output.push(match)
    } 
    return output
}

示例用法：

console.log(   findAll(/blah/g,'blah1 blah2')   )

输出：

[ [ 'blah', index: 0 ], [ 'blah', index: 6 ] ]

score 11 · Accepted Answer

基于 Agus 的函数，但我更喜欢只返回匹配值：

var bob = "&gt; bob &lt;";
function matchAll(str, regex) {
    var res = [];
    var m;
    if (regex.global) {
        while (m = regex.exec(str)) {
            res.push(m[1]);
        }
    } else {
        if (m = regex.exec(str)) {
            res.push(m[1]);
        }
    }
    return res;
}
var Amatch = matchAll(bob, /(&.*?;)/g);
console.log(Amatch);  // yeilds: [&gt;, &lt;]

score 8 · Accepted Answer

可迭代更好：

const matches = (text, pattern) => ({
  [Symbol.iterator]: function * () {
    const clone = new RegExp(pattern.source, pattern.flags);
    let match = null;
    do {
      match = clone.exec(text);
      if (match) {
        yield match;
      }
    } while (match);
  }
});

循环使用：

for (const match of matches('abcdefabcdef', /ab/g)) {
  console.log(match);
}

或者如果你想要一个数组：

[ ...matches('abcdefabcdef', /ab/g) ]

score 6 · Accepted Answer

这是我获取匹配项的功能：

function getAllMatches(regex, text) {
    if (regex.constructor !== RegExp) {
        throw new Error('not RegExp');
    }

    var res = [];
    var match = null;

    if (regex.global) {
        while (match = regex.exec(text)) {
            res.push(match);
        }
    }
    else {
        if (match = regex.exec(text)) {
            res.push(match);
        }
    }

    return res;
}

// Example:

var regex = /abc|def|ghi/g;
var res = getAllMatches(regex, 'abcdefghi');

res.forEach(function (item) {
    console.log(item[0]);
});

score 3 · Accepted Answer

如果您能够使用matchAll这里的技巧：

Array.From有一个“选择器”参数，因此您可以将其投影到您真正需要的位置，而不是以一系列尴尬的“匹配”结果结束：

Array.from(str.matchAll(regexp), m => m[0]);

如果您已命名组，例如。( /(?<firstname>[a-z][A-Z]+)/g) 你可以这样做：

Array.from(str.matchAll(regexp), m => m.groups.firstName);

score 2 · Accepted Answer

从 ES9 开始，现在有一种更简单、更好的方法来获取所有匹配项，以及有关捕获组及其索引的信息：

const string = 'Mice like to dice rice';
const regex = /.ice/gu;
for(const match of string.matchAll(regex)) {
    console.log(match);
}

// ["mice", index: 0, input: "mice like to dice to dice", groups: undefined]

// ["dice", index: 13, input: "mice like to dice rice", groups: undefined]

// ["rice", index: 18, input: "mice like to dice rice", groups: undefined]

它目前在 Chrome、Firefox、Opera 中受支持。根据您阅读本文的时间，检查此链接以查看其当前支持。

score 1 · Accepted Answer

用这个...

var all_matches = your_string.match(re);
console.log(all_matches)

它将返回一个包含所有匹配项的数组......这会很好......但请记住，它不会考虑组......它只会返回完整的匹配项......

score 0 · Accepted Answer

我肯定会推荐使用 String.match() 函数，并为它创建一个相关的 RegEx。我的示例是一个字符串列表，这在扫描用户输入的关键字和短语时通常是必需的。

    // 1) Define keywords
    var keywords = ['apple', 'orange', 'banana'];

    // 2) Create regex, pass "i" for case-insensitive and "g" for global search
    regex = new RegExp("(" + keywords.join('|') + ")", "ig");
    => /(apple|orange|banana)/gi

    // 3) Match it against any string to get all matches 
    "Test string for ORANGE's or apples were mentioned".match(regex);
    => ["ORANGE", "apple"]

希望这可以帮助！

score 0 · Accepted Answer

这并不能真正帮助您解决更复杂的问题，但无论如何我都会发布此内容，因为对于不像您那样进行全局搜索的人来说，这是一个简单的解决方案。

我已将答案中的正则表达式简化为更清晰（这不是您确切问题的解决方案）。

var re = /^(.+?):"(.+)"$/
var regExResult = re.exec('description:"aoeu"');
var purifiedResult = purify_regex(regExResult);

// We only want the group matches in the array
function purify_regex(reResult){

  // Removes the Regex specific values and clones the array to prevent mutation
  let purifiedArray = [...reResult];

  // Removes the full match value at position 0
  purifiedArray.shift();

  // Returns a pure array without mutating the original regex result
  return purifiedArray;
}

// purifiedResult= ["description", "aoeu"]

这看起来比评论更冗长，这就是没有评论的样子

var re = /^(.+?):"(.+)"$/
var regExResult = re.exec('description:"aoeu"');
var purifiedResult = purify_regex(regExResult);

function purify_regex(reResult){
  let purifiedArray = [...reResult];
  purifiedArray.shift();
  return purifiedArray;
}

请注意，任何不匹配的组都将作为undefined值列在数组中。

此解决方案使用 ES6 扩展运算符来净化正则表达式特定值的数组。如果你想要 IE11 支持，你需要通过Babel运行你的代码。

score 0 · Accepted Answer

这是一个没有 while 循环的单行解决方案。

该顺序保留在结果列表中。

潜在的缺点是

它为每场比赛克隆正则表达式。
结果的形式与预期的解决方案不同。您需要再处理一次。

let re = /\s*([^[:]+):\"([^"]+)"/g
let str = '[description:"aoeu" uuid:"123sth"]'

(str.match(re) || []).map(e => RegExp(re.source, re.flags).exec(e))


[ [ 'description:"aoeu"',
    'description',
    'aoeu',
    index: 0,
    input: 'description:"aoeu"',
    groups: undefined ],
  [ ' uuid:"123sth"',
    'uuid',
    '123sth',
    index: 0,
    input: ' uuid:"123sth"',
    groups: undefined ] ]

score 0 · Accepted Answer

我的猜测是，如果存在诸如多余或缺少空格之类的边缘情况，则此边界较少的表达式也可能是一种选择：

^\s*\[\s*([^\s\r\n:]+)\s*:\s*"([^"]*)"\s*([^\s\r\n:]+)\s*:\s*"([^"]*)"\s*\]\s*$

如果您想探索/简化/修改表达式，它已在 regex101.com的右上角面板中进行了说明。如果您愿意，您还可以在此链接中观看它如何与一些示例输入匹配。

测试

const regex = /^\s*\[\s*([^\s\r\n:]+)\s*:\s*"([^"]*)"\s*([^\s\r\n:]+)\s*:\s*"([^"]*)"\s*\]\s*$/gm;
const str = `[description:"aoeu" uuid:"123sth"]
[description : "aoeu" uuid: "123sth"]
[ description : "aoeu" uuid: "123sth" ]
 [ description : "aoeu"   uuid : "123sth" ]
 [ description : "aoeu"uuid  : "123sth" ] `;
let m;

while ((m = regex.exec(str)) !== null) {
    // This is necessary to avoid infinite loops with zero-width matches
    if (m.index === regex.lastIndex) {
        regex.lastIndex++;
    }
    
    // The result can be accessed through the `m`-variable.
    m.forEach((match, groupIndex) => {
        console.log(`Found match, group ${groupIndex}: ${match}`);
    });
}

正则表达式电路

jex.im可视化正则表达式：

score -5 · Accepted Answer

这是我的答案：

var str = '[me nombre es] : My name is. [Yo puedo] is the right word'; 

var reg = /\[(.*?)\]/g;

var a = str.match(reg);

a = a.toString().replace(/[\[\]]/g, "").split(','));

javascript - RegEx 使用 RegExp.exec 从字符串中提取所有匹配项

18 回答 18

如果你有 ES9

如果你没有 ES9

测试

正则表达式电路

Related

Reference