javascript - 试图找到所有正则表达式匹配的索引，但有些被遗漏了

Question

我想在字符串中的第一个“e”之后找到每个元音的索引。

由于您无法直接从中获取捕获组的索引RegExp.exec(sInput)，但您可以获得包含实际捕获组前面所有内容的捕获组的长度，因此我用来执行此操作的正则表达式是/(.*?e.*?)(a|e|i|o|u)(.*)/.

所以设置基本上是这样的：

let re = /(.*?e.*?)(a|e|i|o|u)(.*)/g;
let sInput = "lorem ipsum";

let tMatches = [];
let tMatchIndices = [];
let iPrevIndex = 0;

while (result = re.exec(sInput)) {
    /*  result[0]: full match
        result[1]: match for 1st capture group (.*?e.*?)
        result[2]: match for 2nd capture group (a|e|i|o|u)
        result[3]: match for 3rd capture group (.*)
    */
    let index = result[1].length + iPrevIndex;
    let sMatch = result[2];
    tMatchIndices.push(index);
    tMatches[index] = sMatch;
    iPrevIndex = index + sMatch.length;
    re.lastIndex = iPrevIndex;
}

for (i = 0; i < tMatches.length; i++) {
  let index = tMatchIndices[i];
    console.log(tMatches[index] + " at index "+index);
}

问题在于输入字符串“lorem ipsum”，我需要“i”和“u”的索引......它只给我“i”的索引。

我知道它为什么这样做 - 将搜索索引推进到第一个匹配之后会切断应该触发下一个匹配的“e”。我坚持的是如何解决它。我不能只是简单地不推进搜索索引，否则它永远不会超过第一个匹配项。

我曾考虑过在进行过程中简单地从搜索字符串中删除每个匹配项，但是随后将其后的每个字符的索引都向左移动，因此我收集的索引对于原始的未截断的索引甚至都不准确细绳。

做什么？

score 0 · Accepted Answer

您可以通过积极的回顾来做到这一点：

'lorem ipsum'.replace(/(?<=e.*)[aiueo]/g, function(m, offset) {
  console.log(m + ' ==> ' + offset)
});

输出：

i ==> 6
u ==> 9

解释：

(?<=e.*)- 性格的正面回顾e
[aiueo]- 扫描元音
使用g标志重复
在替换功能中，您可以参考偏移量

score 0 · Accepted Answer

使事情保持简单的一种方法是剥离前导子字符串，直到并包括第一个e. 然后，一次一个字符地迭代剩余的字符串，沿途检查元音。

sInput = "lorem ipsum";
nInput = sInput.replace(/^.*?(?:e|$)/, "");
var index = sInput.length - nInput.length;
var indices = [];
var counter = 0;
for (var i=0; i < nInput.length; i++) {
    if (/[aeiou]/.test(nInput.charAt(i))) {
        indices[counter++] = i + index;
    }
}

console.log(indices);

关于输出：

01234567890
lorem ipsum
      ^  ^  [6, 9]

javascript - 试图找到所有正则表达式匹配的索引，但有些被遗漏了

2 回答 2

Related

Reference