c# - 正则表达式：确定字符串是数字还是变量

Question

我正在尝试结合两种Regular Expression模式来确定字符串是双精度值还是变量。我的限制如下：

变量只能以 _ 或字母（AZ，忽略大小写）开头，但后面可以跟零个或多个 _s、字母或数字。

这是我到目前为止所拥有的，但我无法让它正常工作。

String varPattern = @"[a-zA-Z_](?: [a-zA-Z_]|\d)*";
String doublePattern = @"(?: \d+\.\d* | \d*\.\d+ | \d+ ) (?: [eE][\+-]?\d+)?";

String pattern = String.Format("({0}) | ({1})",
                             varPattern, doublePattern);
Regex.IsMatch(word, varPattern, RegexOptions.IgnoreCase)

似乎它正在捕获两种正则表达式模式，但我需要它是/或。

例如，_A2 2 使用上面的代码是有效的，但 _A2 是无效的。

有效变量的一些示例如下：

_X6 , _ , A , Z_2_A

一些无效变量的例子如下：

2_X6 , $2 , T_2$

我想我只需要澄清正则表达式的模式格式。格式我不清楚。

score 2 · Accepted Answer

如前所述，您在正则表达式中放入的文字空格是正则表达式的一部分。除非在正则表达式扫描的文本中存在相同的空格，否则您不会得到匹配项。如果你想使用空格来制作你的正则表达式，你需要指定RegexOptions.IgnorePatternWhitespace，之后，如果你想匹配任何空格，你必须明确地这样做，或者通过指定\s,\x20等。

应该注意的是，如果你指定了RegexOptions.IgnorePatternWhitespace，你可以使用 Perl 风格的注释（#到行尾）来记录你的正则表达式（就像我在下面所做的那样）。对于复杂的正则表达式，5 年后的某个人——可能就是你！——会感谢你的好意。

我认为，您的 [可能是预期的] 模式也比它们需要的更复杂。与您指定的标识符规则匹配的正则表达式如下：

[a-zA-Z_][a-zA-Z0-9_]*

分解成它的组成部分：

[a-zA-Z_]     # match an upper- or lower-case letter or an underscore, followed by
[a-zA-Z0-9_]* # zero or more occurences of an upper- or lower-case letter, decimal digit or underscore

匹配数字/浮点文字的常规样式的正则表达式是：

([+-]?[0-9]+)(\.[0-9]+)?([Ee][+-]?[0-9]+)?

分解成它的组成部分：

(        # a mandatory group that is the integer portion of the value, consisting of
  [+-]?  # - an optional plus- or minus-sign, followed by
  [0-9]+ # - one or more decimal digits
)        # followed by
(        # an optional group that is the fractional portion of the value, consisting of
  \.     # - a decimal point, followed by
  [0-9]+ # - one or more decimal digits
)?       # followed by,
(        # an optional group, that is the exponent portion of the value, consisting of
  [Ee]   # - The upper- or lower-case letter 'E' indicating the start of the exponent, followed by
  [+-]?  # - an optional plus- or minus-sign, followed by
  [0-9]+ # - one or more decimal digits.
)?       # Easy!

注意： 一些语法在值的符号是一元运算符还是值的一部分以及是否+允许前导符号方面有所不同。对于类似的东西123245.是否有效，语法也会有所不同（例如，没有小数位的小数点是否有效？）

要结合这两个正则表达式，

首先，用括号将它们分组（您可能想要命名包含组，就像我所做的那样）：
```
(?<identifier>[a-zA-Z_][a-zA-Z0-9_]*)
(?<number>[+-]?[0-9]+)(\.[0-9]+)?([Ee][+-]?[0-9]+)?
```

接下来，结合交替操作，|：

(?<identifier>[a-zA-Z_][a-zA-Z0-9_]*)|(?<number>[+-]?[0-9]+)(\.[0-9]+)?([Ee][+-]?[0-9]+)?

最后，将整个 shebang 括在一个 @"..." 文字中，你应该很高兴。

这就是它的全部内容。

score 1 · Accepted Answer

默认情况下，正则表达式中不会忽略空格，因此对于当前表达式中的每个空格，它都会在该字符串中查找空格。在表达式中添加RegexOptions.IgnorePatternWhitespace标志或删除空格。

您还需要添加一些字符串锚点的开头和结尾（^和$分别），这样您就不会只匹配字符串的一部分。

score 1 · Accepted Answer

除非明确设置 IgnorePatterWhiteSpace，否则应避免在正则表达式中包含空格。为了确保只匹配完整的单词，您应该包含行首 (^) 和行尾 ($) 字符。我还建议您构建整个表达式模式，而不是String.Format("({0}) | ({1})", ...)像这里那样使用。

鉴于您的示例，以下内容应该有效：

string pattern = @"(?:^[a-zA-Z_][a-zA-Z_\d]*)|(?:^\d+(?:\.\d+){0,1}(?:[Ee][\+-]\d+){0,1}$)";

c# - 正则表达式：确定字符串是数字还是变量

3 回答 3

Related

Reference