regex - 使用记事本 2 时正则表达式捕获替代字符串组的问题

Question

我有以下形式的正则表达式搜索，它在体面的文本编辑器（例如 VS Code）中工作得很好，但在 Notepad2 中却不行（这是我的客户可以使用的全部）：

http(s)?://www\.(somedomain\.com|otherdomain\.co\.uk|andanotherdomain\.net)

我将其分解并让第一个捕获组使用 [方括号]：

http[s]?工作得很好——不知道为什么！

......尽管为第二组保留了常规括号，但这是有效的：

http[s]?://www\.(somedomain\.com)

…但是一旦为替代字符串引入管道字符，Notepad2 就会崩溃。

谁能帮忙，也许可以解释一下为什么记事本 2 需要一些不同的东西？

请注意，此时我并不太担心替换。这是在 Notepad2 中引发错误的搜索模式。

score 2 · Accepted Answer

似乎 Notepad2 正则表达式搜索基于不支持交替的 POSIX BRE，并进行了一些修改。其他主要缺点之一是缺乏跨换行符支持的匹配。

所有支持的 Notepad2 正则表达式结构都可以在Notepad2 4.2.25 文档中检查：

Regular Expression Syntax

  Note: the Scintilla source code editing component supports only a
  basic subset of regular expression syntax, and searches are limited
  to single lines.

  .      Matches any character.

  (...)  This marks a region for tagging a match.

  \n     Where n is 1 through 9 refers to the first through ninth
         tagged region when replacing. For example, if the search
         string was Fred([1-9])XXX and the replace string was Sam\1YYY,
         when applied to Fred2XXX this would generate Sam2YYY.

  \<     This matches the start of a word.

  \>     This matches the end of a word.

  \x     This allows you to use a character x that would otherwise
         have a special meaning. For example, \[ would be interpreted
         as [ and not as the start of a character set.

  [...]  This indicates a set of characters, for example, [abc] means
         any of the characters a, b or c. You can also use ranges, for
         example [a-z] for any lower case character.

  [^...] The complement of the characters in the set. For example,
         [^A-Za-z] means any character except an alphabetic character.

  ^      This matches the start of a line (unless used inside a set,
         see above).

  $      This matches the end of a line.

  ?      This matches 0 or 1 times. For example, a?b matches ab and b.

  *      This matches 0 or more times. For example, Sa*m matches Sm,
         Sam, Saam, Saaam and so on.

  +      This matches 1 or more times. For example, Sa+m matches Sam,
         Saam, Saaam and so on.

  *?     Causes * and + to behave non-greedy. For example, <.+> matches
  +?     all HTML tags on a line, whereas <.+?> matches only one tag.

  \d     Any decimal digit.
  \D     Any character that is not a decimal digit.

  \s     Any whitespace character.
  \S     Any character that is not a whitespace character.

  \w     Any "word" character.
  \W     Any "non-word" character.

  \xHH   Character with hex code HH.

  -----> Examples (don't use quotes)
         - Quote lines: find "^" replace with "> "
         - Unquote lines: find "^> " replace with ""
         - Remove line numbers: find "^[0-9]+" replace with ""
         - Convert tabs to double spaces: find "\t" replace with "  "
         - Remove NULL bytes: find "\x00" replace with ""

regex - 使用记事本 2 时正则表达式捕获替代字符串组的问题

1 回答 1

Related

Reference