ruby - 正则表达式在斜杠或第一个问号之前匹配所有内容？

Question

我正在尝试提出一个正则表达式，它将优雅地匹配 URL 中的所有内容，在域名之后，第一个 ? 之前，最后一个斜杠或 URL 的结尾，如果两者都不存在。

这是我想出的，但在某些情况下似乎失败了：

regex = /[http|https]:\/\/.+?\/(.+)[?|\/|]$/

总之：

http://nytimes.com/2013/07/31/a-new-health-care-approach-dont-hide-the-price/应该返回 2013/07/31/a-new-health-care-approach-不要隐藏价格

http://nytimes.com/2013/07/31/a-new-health-care-approach-dont-hide-the-price?id=2应该返回 2013/07/31/a-new-health-care -方法-不要隐藏价格

http://nytimes.com/2013/07/31/a-new-health-care-approach-dont-hide-the-price应该返回 2013/07/31/a-new-health-care-approach-dont -隐藏价格

score 8 · Accepted Answer

请不要为此使用正则表达式。使用 URI 库：

require 'uri'
str_you_want = URI("http://nytimes.com/2013/07/31/a-new-health-care-approach-dont-hide-the-price").path

为什么？

查看有关这个著名问题的所有内容，以很好地讨论为什么这些事情是一个坏主意。

此外，这个 XKCD 确实说明了原因：是的。

简而言之，正则表达式是一个非常强大的工具，但是当您处理由数百页复杂标准组成的事物时，已经有一个库可以更快、更容易、更正确地完成它，为什么要重新发明这个轮子呢？

score 4 · Accepted Answer

如果允许前瞻

((2[0-9][0-9][0-9].*)(?=\?\w+)|(2[0-9][0-9][0-9].*)(?=/\s+)|(2[0-9][0-9][0-9].*).*\w)

复制 + 粘贴到http://regexpal.com/

使用 ruby 正则表达式测试器查看此处：http ://rubular.com/r/uoLLvTwkaz

使用 javascript 正则表达式的图像，但效果相同

在此处输入图像描述

(?=)只是一个前瞻

我基本上设置了从 2XXX 到（按此顺序）的三个匹配项：

(?=\?\w+)  # lookahead for a question mark followed by one or more word characters
(?=/\s+)   # lookahead for a slash         followed by one or more whitespace characters
.*\w       # match up to the last word character

我很确定不需要一些括号，但我只是复制粘贴。

您可能可以修复前缀，我只是假设您想要 2XXX ，其中 X 是要匹配的数字。

另外，请大家不要使用干草叉，正则表达式并不总是最好的，但它会在您需要时为您提供。

此外，所有内容都有 xkcd ( https://xkcd.com/208/ )：

ruby - 正则表达式在斜杠或第一个问号之前匹配所有内容？

2 回答 2

为什么？

Related

Reference