ruby - 删除 URL 中第一个斜杠之前的所有内容？

Question

使用正则表达式，我如何删除/URL 中第一个路径之前的所有内容？

示例网址：https://www.example.com/some/page?user=1&email=joe@schmoe.org

从此，我只想/some/page?user=1&email=joe@schmoe.org

如果它只是根域（即。https://www.example.com/），那么我只想/返回。

域可能有也可能没有子域，它可能有也可能没有安全协议。真的最终只是想在第一个路径斜线之前去掉任何东西。

如果重要的话，我正在运行 Ruby 1.9.3。

score 13 · Accepted Answer

不要为此使用正则表达式。使用URI类。你可以写：

require 'uri'

u = URI.parse('https://www.example.com/some/page?user=1&email=joe@schmoe.org')
u.path #=> "/some/page"
u.query #=> "user=1&email=joe@schmoe.org"

# All together - this will only return path if query is empty (no ?)
u.request_uri #=> "/some/page?user=1&email=joe@schmoe.org"

score 5 · Accepted Answer

 require 'uri'

 uri = URI.parse("https://www.example.com/some/page?user=1&email=joe@schmoe.org")

 > uri.path + '?' + uri.query
  => "/some/page?user=1&email=joe@schmoe.org"

正如 Gavin 还提到的，为此使用 RegExp 并不是一个好主意，尽管它很诱人。您可能有带有特殊字符的 URL，甚至其中包含 Unicode 字符，这是您在编写 RegExp 时没有预料到的。这尤其会发生在您的查询字符串中。使用 URI 库是更安全的方法。

score 0 · Accepted Answer

同样可以使用String#index

索引（子字符串 [，偏移量]）

str = "https://www.example.com/some/page?user=1&email=joe@schmoe.org"
offset = str.index("//") # => 6
str[str.index('/',offset + 2)..-1]
# => "/some/page?user=1&email=joe@schmoe.org"

score 0 · Accepted Answer

我强烈同意在这种情况下使用 URI 模块的建议，而且我认为自己对正则表达式并不擅长。尽管如此，展示一种可能的方式来做你所要求的似乎是值得的。

test_url1 = 'https://www.example.com/some/page?user=1&email=joe@schmoe.org'
test_url2 = 'http://test.com/'
test_url3 = 'http://test.com'

regex = /^https?:\/\/[^\/]+(.*)/

regex.match(test_url1)[1]
# => "/some/page?user=1&email=joe@schmoe.org"

regex.match(test_url2)[1]
# => "/"

regex.match(test_url3)[1]
# => ""

请注意，在最后一种情况下，URL 没有尾随'/'，因此结果是空字符串。

正则表达式 ( /^https?:\/\/[^\/]+(.*)/) 表示字符串以 ( ^) http( http) 开头，可选地后跟s( s?)，后跟://( :\/\/) 后跟至少一个非斜线字符 ( [^\/]+)，后跟零个或多个字符，我们想要捕获那些字符 ( (.*))。

我希望你觉得这个例子和解释很有教育意义，我再次建议不要在这种情况下实际使用正则表达式。URI 模块使用起来更简单，也更健壮。

ruby - 删除 URL 中第一个斜杠之前的所有内容？

4 回答 4

Related

Reference