ruby - 简单的 URL 清理

Question

我正在尝试进行一些基本的网址清理，以便

www.google.com
www.google.com/
http://google.com
http://google.com/
https://google.com
https://google.com/

被替换为http://www.google.com（或在开头https://www.google.com的情况下）。https://

基本上我想检查一个正则表达式http/https的开头和/结尾是否有。

我正在尝试这样的事情：

"https://google.com".match(/^(http:\/\/|https:\/\/)(.*)(\/)*$/)在这种情况下，我得到： => #<MatchData "https://google.com" 1:"https://" 2:"google.com" 3:nil> 这很好。

不幸的是：

"https://google.com/".match(/^(http:\/\/|https:\/\/)(.*)(\/)*$/)我得到： => #<MatchData "https://google.com/" 1:"https://" 2:"google.com/" 3:nil>并且想拥有2:"google.com" 3:"/"

知道怎么做吗？

score 6 · Accepted Answer

如果您发现错误，那很明显；）

您正在尝试：

^(http:\/\/|https:\/\/)(.*)(\/)*$

答案是使用：

^(http:\/\/|https:\/\/)(.*?)(\/)*$

这使得操作符“非贪婪”，因此尾部正斜杠不会被“。”吞噬。操作员。

编辑：

事实上，你真的应该使用：

^(http:\/\/|https:\/\/)?(www\.)?(.*?)(\/)*$

这样，您还将匹配前两个示例，其中没有“http(s)://”。您还拆分了“www”部分的价值/存在。在行动：http ://www.rubular.com/r/VUoIUqCzzX

编辑2：

我很无聊，想完善这个：P

干得好：

^(https?:\/\/)?(?:www\.)?(.*?)\/?$

现在，您需要做的就是将您的网站替换为第一个匹配项（或“http://”，如果为零），然后是“www.”，然后是第二个匹配项。

在行动：http ://www.rubular.com/r/YLeO5cXcck

（18个月后）编辑：

看看我的很棒的 ruby gem，这将有助于解决你的问题！

https://github.com/tom-lord/regexp-examples

/(https?:\/\/)?(?:www\.)?google\.com\/?/.examples # => 
  ["google.com",
   "google.com/",
   "www.google.com",
   "www.google.com/",
   "http://google.com",
   "http://google.com/",
   "http://www.google.com",
   "http://www.google.com/",
   "https://google.com",
   "https://google.com/",
   "https://www.google.com",
   "https://www.google.com/"]

/(https?:\/\/)?(?:www\.)?google\.com\/?/.examples.map(&:subgroups) # =>
  [[],
   [],
   [],
   [],
   ["http://"],
   ["http://"],
   ["http://"],
   ["http://"],
   ["https://"],
   ["https://"],
   ["https://"],
   ["https://"]]

ruby - 简单的 URL 清理

1 回答 1

Related

Reference