regex - 如何替换 TextWrangler 中两个分隔符之间的项目

Question

我想在音标斜杠之间替换一个音标，如下所示：

/anycharacter*ou*anycharacter/

到

/anycharacter*au*anycharacter/

我的意思是，在所有情况下，我都想在任何两个拼音斜杠之间用“au”替换“ou”。例如：

<font size=+2 color=#E66C2C> jocose /dʒə'kous/</font>
    =  suj vour ver / suwj dduaf

进入

<font size=+2 color=#E66C2C> jocose /dʒə'kaus/</font>
    =  suj vour ver / suwj dduaf

文本文件包含 HTML 代码和一些文本正斜杠（如 A/B 而不是 A 或 B）
字符串“anycharacter”可以是任何字符，一个或多个或没有字符。例如：/folou/、/houl/、/sou/、/dʒə'kousnis/...

到目前为止，我一直在使用：

Find: \/(.*?)\bou*\b(.*?)\/\s
Replace: /\1au\2\3\4/

但它会找到任何 /.../ 之间的所有字符串，包括正常的正斜杠和 HTLM 斜杠，并且在替换它时会绕过 /gou/、/tou/ 等项目。与上面的示例一样，输出为：

<font size=+2 color=#E66C2C> jocose /dʒə'kaus/</font>
    =  suj vaur ver / suwj dduaf

注意：将普通斜线之前的“vour”替换为“vaur”不是我的目的。

您能指导我如何解决上述问题吗？非常感谢。

score 7 · Accepted Answer

可能满足您的需求（符合 POSIX ERE）的最简单的匹配表达式是：

(/[^ \t/<>]*?)ou([^ \t/<>]*?/)

分解，这意味着：

(             # Capture the following into back-reference #1
  /           #   match a literal '/'
  [^ \t<>]    #   match any character that is not a space, tab, slash, or angle bracket...
    *?        #     ...any number of times (even zero times), being reluctant
)             # end capture
ou            # match the letters 'ou'
(             # Capture the following into back-reference #2
  [^ \t/<>]   #   match any character that is not a space, tab, slash, or angle bracket...
    *?        #     ...any number of times (even zero times), being reluctant
  /           #   match a literal '/'
)             # end capture

然后使用替换表达式\1au\2

/如果字符之间有空格、制表符、尖括号 ( <and >) 或另一个正斜杠 ( )，这将忽略字符之间的文本/。如果您知道其他字符不会出现在这些表达式之一中，请将其添加到字符类（[]组）中

在我的模拟器中，它变成了这个文本：

<font size=+2 color=#E66C2C> jocose /dʒə'kous/</font>
    =  suj vour ver / suwj dduaf. 
Either A/B or B/C might happen, but <b>at any time</b> C/D might also occur

...进入本文：

<font size=+2 color=#E66C2C> jocose /dʒə'kaus/</font>
    =  suj vour ver / suwj dduaf. 
Either A/B or B/C might happen, but <b>at any time</b> C/D might also occur

有什么不懂的就问吧！如果您愿意，我还可以解释您之前尝试使用的问题的一些问题。

编辑：

上面的表达式匹配整个音标集，并完全替换它，使用匹配的某些部分并替换其他部分。下一场比赛的尝试将在当前比赛之后开始。

出于这个原因，如果ou在分隔的语音表达式中可能出现多次/，则上述正则表达式将需要运行多次。对于一次性执行，一种语言或工具需要同时支持可变长度的前瞻和后视（统称为环顾）

据我所知，这只是微软的 .Net Regex 和 JGSoft 的正则表达式“风味”（在 EditPad Pro 和 RegexBuddy 等工具中）。POSIX（UNIX grep 需要）不支持任何类型的环视，而 Python（我认为 TextWrangler使用）不支持可变长度环视。我相信如果没有可变长度的环顾，这是不可能的。

需要可变长度环视并执行您需要的表达式可能是这样的：

(?<=/[^ \t/<>]*?)ou(?=[^ \t/<>]*?/)

...并且替换表达式也需要修改，因为您只匹配（并因此替换）要替换的字符：

au

It works much the same except that it only matches the ou, then runs a check (called a zero-width assertion) to make sure that it is immediately preceded by a / and any number of certain characters, and immediately followed by any number of certain characters then a /.

regex - 如何替换 TextWrangler 中两个分隔符之间的项目

1 回答 1

Related

Reference