我发现的最好和最权威的指南在这里:http: //jmrware.com/articles/2009/uri_regexp/URI_regex.html(回答您的问题,请参阅URI表条目)
来自 RFC3986 的所有这些规则都在表 2 中重现,以及每个规则的正则表达式实现。
此处提供了一个 javascript 实现:https ://github.com/jhermsmeier/uri.regex
作为参考,URI 正则表达式在下面重复:
# RFC-3986 URI component: URI
[A-Za-z][A-Za-z0-9+\-.]* : # scheme ":"
(?: // # hier-part
(?: (?:[A-Za-z0-9\-._~!$&'()*+,;=:]|%[0-9A-Fa-f]{2})* @)?
(?:
\[
(?:
(?:
(?: (?:[0-9A-Fa-f]{1,4}:) {6}
| :: (?:[0-9A-Fa-f]{1,4}:) {5}
| (?: [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:) {4}
| (?: (?:[0-9A-Fa-f]{1,4}:){0,1} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:) {3}
| (?: (?:[0-9A-Fa-f]{1,4}:){0,2} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:) {2}
| (?: (?:[0-9A-Fa-f]{1,4}:){0,3} [0-9A-Fa-f]{1,4})? :: [0-9A-Fa-f]{1,4}:
| (?: (?:[0-9A-Fa-f]{1,4}:){0,4} [0-9A-Fa-f]{1,4})? ::
) (?:
[0-9A-Fa-f]{1,4} : [0-9A-Fa-f]{1,4}
| (?: (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) \.){3}
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
)
| (?: (?:[0-9A-Fa-f]{1,4}:){0,5} [0-9A-Fa-f]{1,4})? :: [0-9A-Fa-f]{1,4}
| (?: (?:[0-9A-Fa-f]{1,4}:){0,6} [0-9A-Fa-f]{1,4})? ::
)
| [Vv][0-9A-Fa-f]+\.[A-Za-z0-9\-._~!$&'()*+,;=:]+
)
\]
| (?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
| (?:[A-Za-z0-9\-._~!$&'()*+,;=]|%[0-9A-Fa-f]{2})*
)
(?: : [0-9]* )?
(?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )*
| /
(?: (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+
(?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )*
)?
| (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+
(?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )*
|
)
(?:\? (?:[A-Za-z0-9\-._~!$&'()*+,;=:@/?]|%[0-9A-Fa-f]{2})* )? # [ "?" query ]
(?:\# (?:[A-Za-z0-9\-._~!$&'()*+,;=:@/?]|%[0-9A-Fa-f]{2})* )? # [ "#" fragment ]