鉴于您的原始正则表达式:
^((([^<>()[\]\\.,;:\s@\""]+(\.[^<>()[\]\\.,;:\s@\""]+)*)|(\"".+\""))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,})))$
我已经破译了这一点,并以自由间距模式将其写出来并附有评论。我在这里以 Python 的原始文本格式呈现它,以便您可以看到本地正则表达式,因为它被呈现给正则表达式引擎(在字符串解释之后):
原生正则表达式中注释的原始表达式:
re_commented = r'''
# Match an email address.
^ # Anchor to start of string.
( # ($1:) Entire string.
( # $2: FIRST PART (before @).
( [^<>()[\]\\.,;:\s@\""]+ # ($3:) Either one
( # ($4:) or more
\. # dot separated
[^<>()[\]\\.,;:\s@\""]+ # parts.
)* # ($4:)
) # ($3:)
| ( # ($5:) Or FIRST PART is
\"".+\"" # a doubly, double quoted string.
) # ($5:)
) # $2: FIRST PART (before @).
@ # Required @ separates parts.
( # $6: LAST PART (after @).
( \[ # ($7:) LAST PART is Either
[0-9]{1,3}\. # an IPv4 domain address
[0-9]{1,3}\. # (i.e. 10.0.0.255)
[0-9]{1,3}\. # between
[0-9]{1,3} # square
\] # brackets.
) # ($7:)
| ( # ($8:) Or LAST PART is
([a-zA-Z\-0-9]+\.)+ # a DNS style dot separated
[a-zA-Z]{2,} # named domain.
) # ($8:)
) # $6: LAST PART (after @).
) # ($1:) Entire string.
$ # Anchor to end of string.
'''
您现在可以清楚地看到,此正则表达式正在尝试验证电子邮件地址。似乎有人已经进入并编辑了文件并损坏了双引号 - (每个实例都\""
应该"
与正则表达式引擎一样)。另请注意,该\""
序列在字符类中没有任何危害,因为它等同于双引号的单个实例。但是,它会造成恶作剧,因为它显示为电子邮件第一部分的第二种选择,即\"".+\""
. 这是一个更正的版本,它解决了双引号的问题。我在这里展示了它,它以自由空格模式和 Java 片段的形式进行了完整的注释,展示了所有引号和反斜杠的正确转义。
修复了在 Java 正则表达式字符串中注释的表达式:
Pattern re_valid = Pattern.compile(
" # Match an email address. (Rev:20121105_1100 fixed quotes.) \n" +
" ^ # Anchor to start of string. \n" +
" ( # ($1:) Entire string. \n" +
" ( # $2: FIRST PART (before @). \n" +
" ( [^<>()\\[\\]\\\\.,;:\\s@\"]+ # ($3:) Either one \n" +
" ( # ($4:) or more \n" +
" \\. # dot separated \n" +
" [^<>()\\[\\]\\\\.,;:\\s@\"]+ # parts. \n" +
" )* # ($4:) \n" +
" ) # ($3:) \n" +
" | ( # ($5:) Or FIRST PART is \n" +
" \".+\" # a double quoted string. \n" +
" ) # ($5:) \n" +
" ) # $2: FIRST PART (before @). \n" +
" @ # Required @ separates parts.\n" +
" ( # $6: LAST PART (after @). \n" +
" ( \\[ # ($7:) LAST PART is Either \n" +
" [0-9]{1,3}\\. # an IPv4 domain address \n" +
" [0-9]{1,3}\\. # (i.e. 10.0.0.255) \n" +
" [0-9]{1,3}\\. # between \n" +
" [0-9]{1,3} # square \n" +
" \\] # brackets. \n" +
" ) # ($7:) \n" +
" | ( # ($8:) Or LAST PART is \n" +
" ([a-zA-Z\\-0-9]+\\.)+ # a DNS style dot separated \n" +
" [a-zA-Z]{2,} # named domain. \n" +
" ) # ($8:) \n" +
" ) # $6: LAST PART (after @). \n" +
" ) # ($1:) Entire string. \n" +
" $ # Anchor to end of string. ",
Pattern.COMMENTS);
请注意,此正则表达式还有其他大多数小问题(Google“电子邮件验证”更多。)此外,许多分组括号是不必要的。
最后一条评论 - Java 在编写和评论正则表达式方面很糟糕!