regex - 正则表达式：检查重复组是否至少包含一次字母

Question

我正在学习正则表达式，并且有一项任务是制作一个表达式来验证 URL（我有一个必须验证并且必须失败的 URL 的特定列表）。这是我目前拥有的

^((https?:\/\/)(?=.*[A-Za-z]+.*)(([A-Za-z0-9]+\-*[A-Za-z0-9]+|[A-Za-z0-9])\.){1,}([A-Za-z]+)\/?$)

在所有其他 URL 中，这些 URL 必须验证：

http://1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa
http://0test.com/

但是，这些必须失败：

http://1234567890123456789012345678901234567890123456789012345678901234.com
http://0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.com

他们肯定失败了，因为他们在域名中没有字母（仅在顶级域名中），我不明白如何排除它们。

我添加了一个积极的前瞻：

(?=.*[A-Za-z]+.*)

我希望它只会检查以下重复的组：

(([A-Za-z0-9]+\-*[A-Za-z0-9]+|[A-Za-z0-9])\.){1,}

但它检查整个表达式直到最后，即它也检查顶级域名。我该如何解决这个问题？

score 1 · Accepted Answer

You have the right idea, but, as you said, you dont want the lookahead to account for the top level domain name. So include a copy of that match in your look ahead:

(?=.*[A-Za-z]+.*\.([A-Za-z]+)$\/?)
                 ^-------- will match the top level domain
                          ^ will ensure its the last part of the domain

I also changed your A-z to A-Za-z (wasn't sure if typo but reminder that A-z matches more than just letters)

EDIT: look behind doesnt work because it doesn't allow for variable sized matching. Added the \/? for possible / ending

score 1 · Accepted Answer

我认为你会更好地使用这个断言
(?=.*[A-Za-z]+.*\.[A-Za-z]+/?$)

使用它和一些重构，这个原始的正则表达式验证和无效
样本中的正确项目。

^(https?://)(?=.*[A-Za-z]+.*\.[A-Za-z]+/?$)((?:[A-Za-z0-9]+(?:-+[A-Za-z0-9]+)?\.)+)([A-Za-z]+)/?$

格式化和测试：

 ^ 
 ( https?:// )                 # (1)
 (?= .* [A-Za-z]+ .* \. [A-Za-z]+ /? $ )
 (                             # (2 start)
      (?:
           [A-Za-z0-9]+ 
           (?:
                -+
                [A-Za-z0-9]+ 
           )?
           \.
      )+
 )                             # (2 end)
 ( [A-Za-z]+ )                 # (3)
 /?
 $

regex - 正则表达式：检查重复组是否至少包含一次字母

2 回答 2

Related

Reference