ruby - 在同一行上查找和编辑多个正则表达式匹配

Question

我想将降价添加到（gollum）wiki 页面中的关键短语，该页面将以以下形式链接到相关的 wiki 页面：

This is the key phrase.

变成

This is the [[key phrase|Glossary#key phrase]].

我有一个关键短语列表，例如：

keywords = ["golden retriever", "pomeranian", "cat"]

还有一份文件：

Sue has 1 golden retriever. John has two cats.
Jennifer has one pomeranian. Joe has three pomeranians.

我想遍历每一行并找到每个关键字的每个匹配项（还不是链接）。我目前的尝试如下所示：

File.foreach(target_file) do |line|
    glosses.each do |gloss|
        len = gloss.length
        # Create the regex. Avoid anything that starts with [
        # or (, ends with ] or ), and ignore case.
        re = /(?<![\[\(])#{gloss}(?![\]\)])/i
        # Find every instance of this gloss on this line.
        positions = line.enum_for(:scan, re).map {Regexp.last_match.begin(0) }
        positions.each do |pos|
            line.insert(pos, "[[")
            # +2 because we just inserted 2 ahead.
            line.insert(pos+len+2, "|#{page}\##{gloss}]]")
        end
    end
    puts line
end

但是，如果同一行上的同一关键字有两个匹配项，这将遇到问题。因为我在行中插入了一些东西，所以我为每个匹配找到的位置在第一个匹配之后并不准确。我知道我每次都可以调整我插入的大小，但是，因为我的插入对于每种光泽都有不同的大小，这似乎是最暴力、最笨拙的解决方案。

有没有一种解决方案可以让我同时在同一行上进行多次插入，而无需每次进行多次任意调整？

score 2 · Accepted Answer

在查看@BryceDrew 的在线python 版本后，我意识到 ruby 可能也有一种填充匹配的方法。我现在有一个更简洁和更快的解决方案。

首先，我需要对我的注释进行正则表达式：

glosses.push(/(?<![\[\(])#{gloss}(?![\]\)])/i)

注意：该正则表达式的大部分是前瞻和后瞻断言，以防止捕获已经是链接一部分的短语。

然后，我需要将所有这些联合起来：

re = Regexp.union(glosses)

之后，就像在每一行上执行gsub并填写我的匹配项一样简单：

File.foreach(target_file) do |line|
  line = line.gsub(re) {|match| "[[#{match}|Glossary##{match.downcase}]]"}
  puts line
end

ruby - 在同一行上查找和编辑多个正则表达式匹配

1 回答 1

Related

Reference