ruby - Is there any way to shorten this regular expression?

Question

I want to match strings in the format of A0123456, E0123456, or IN:A0123456Q, etc. I originally made this regex

^(IN:)?[AE][0-9]{7}Q?$

but it was matching IN:E012346 without the Q at the end. So I created this regex

(^IN:[AE][0-9]{7}Q$)|(^[AE][0-9]{7}$)

Is there any way to shorten this regex so that it requires both IN: and Q if they are present, but not if neither are present?

Edit: The regex will be used in Ruby.

Edit 2: I changed the regex to reflect that I was matching the wrong strings, as it would still match IN:A0123456.

Edit 3: Both answers below are valid, but since I am using Ruby 2.0 and prefer a regex expression I can use in case I change my application and don't want to use the Ruby flavor of subexpression calls, I chose to accept matt's answer.

score 5 · Accepted Answer

第二个正则表达式有问题：

^(IN:[AE][0-9]{7}Q)|([AE][0-9]{7})$

的|优先级低于串联，因此正则表达式将被解析为：

^(IN:[AE][0-9]{7}Q)        # Starts with (IN:[AE][0-9]{7}Q)
|                          # OR
([AE][0-9]{7})$            # Ends with ([AE][0-9]{7})

要解决此问题，只需使用非捕获组：

^(?:(IN:[AE][0-9]{7}Q)|([AE][0-9]{7}))$

它确保输入字符串匹配任何一种格式，而不仅仅是以某种格式开始或结束（这显然是不正确的）。

关于缩短正则表达式，您可以根据需要替换[0-9]为\d，但它可以。

~~我认为没有任何其他方法可以在 Ruby 的默认支持级别内缩短正则表达式。~~

子程序调用

仅供参考，在 Perl/PCRE 中，您可以使用子例程调用来缩短它：

^(?:([AE][0-9]{7})|(IN:(?1)Q))$

(?1)指的是第一个捕获组定义的模式，即[AE][0-9]{7}. 正则表达式实际上是相同的，只是看起来更短。这个带有输入的演示IN:E0123463Q显示了第 2 组捕获的整个文本（并且没有为第 1 组捕获的文本）。

在 Ruby 中，存在类似的概念 子表达式调用，只是语法略有不同。Ruby 使用\g<name>or\g<number>来引用我们想要重用其模式的捕获组：

^(?:([AE][0-9]{7})|(IN:\g<1>Q))$

在 Ruby 1.9.7 下的 rubular 上的测试用例，对于 input IN:E0123463Q，返回E0123463为组 1IN:E0123463Q的匹配项和组 2 的匹配项。

Ruby (1.9.7) 的实现似乎记录了组 1 的捕获文本，即使组 1 没有直接参与匹配。在 PCRE 中，子程序调用不捕获文本。

条件正则表达式

还有条件正则表达式允许您检查某个捕获组是否匹配某些内容。您可以查看马特的答案以获取更多信息。

score 3 · Accepted Answer

如果您使用的是 Ruby 2.0，则可以使用if-then-else 条件匹配（在 Ruby 文档中未记录，但确实存在）：

/^(IN:)?[AE][0-9]{7}(?(1)Q|)$/

条件部分(?(1)Q|)表示如果组号 1 匹配，则匹配Q，否则不匹配。由于组号 1 是(IN:)，因此可以实现您想要的。

ruby - Is there any way to shorten this regular expression?

2 回答 2

子程序调用

条件正则表达式

Related

Reference