regex - 我需要一个正则表达式来匹配一般 URL

Question

我需要使用任何协议（http、https、shttp、ftp、svn、mysql 和我不知道的东西）测试通用 URL。

我的第一关是这样的：

\w+://(\w+\.)+[\w+](/[\w]+)(\?[-A-Z0-9+&@#/%=~_|!:,.;]*)?

（PCRE和.NET所以没什么可看中的）

score 3 · Accepted Answer

3

根据RFC2396：

^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?

于 2008-11-20T22:48:46.157 回答

score 2 · Accepted Answer

将该 RegEx 添加为 wiki 答案：

[\w+-]+://([a-zA-Z0-9]+\.)+[[a-zA-Z0-9]+](/[%\w]+)(\?[-A-Z0-9+&@#/%=~_|!:,.;]*)?

选项 2（重新 CMS）

^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?

但这对于任何理智的东西来说都是松懈的，以使其更具限制性并与其他事物区分开来。

proto      ://  name      : pass      @  server    :port      /path     ? args
^([^:/?#]+)://(([^/?#@:]+(:[^/?#@:]+)?@)?[^/?#@:]+(:[0-9]+)?)(/[^?#]*)(\?([^#]*))?

score 1 · Accepted Answer

我是从一个稍微不同的方向来的。我想模拟 gchats 匹配something.co.uk和链接它的能力。所以我使用了一个正则表达式，它查找 a.没有后面的句点或两边的空格，然后抓住它周围的所有东西，直到它碰到空格。它确实匹配 URI 末尾的句点，但我稍后会取消它。因此，如果您更喜欢误报而不是错过一些潜力，这可能是一个选择

url_re = re.compile(r"""
           [^\s]             # not whitespace
           [a-zA-Z0-9:/\-]+  # the protocol and domain name
           \.(?!\.)          # A literal '.' not followed by another
           [\w\-\./\?=&%~#]+ # country and path components
           [^\s]             # not whitespace""", re.VERBOSE) 

url_re.findall('http://thereisnothing.com/a/path adn some text www.google.com/?=query#%20 https://somewhere.com other-countries.co.nz. ellipsis... is also a great place to buy. But try text-hello.com ftp://something.com')

['http://thereisnothing.com/a/path',
 'www.google.com/?=query#%20',
 'https://somewhere.com',
 'other-countries.co.nz.',
 'text-hello.com',
 'ftp://something.com']

regex - 我需要一个正则表达式来匹配一般 URL

3 回答 3

Related

Reference