regex - 通过正则表达式修剪 URL，但不是根

Question

我有很多 URL，大约 100K 大。

它看起来像这样：

blog.example.com/ilovecats/2011/02/10/the-bling-ring/
blog.example.com/fas24
blog.example.com/morg
blog.example.com/whistlermoar/
blog.example.com/punny/
blog.example.com/punny/2012/10/blog.example.com/punny/2012/10/01/my-mom-is-alien/blog.example.com/anniesblog/2012/10/12/i-
_
_丢失我的iphone
blog.example.com/anniesblog/2012/10/page/3/
blog.example.com/anniesblog/2012/10/page/4
blog.example.com/anniesblog/2012/10/page/ 5
blog.example.com/alfva/blog.example.com/dudewheresmycar/blog.example.com/mynameisbilly/blog.example.com/mynameisbilly/page/23/blog.example.com/anotherflower/category/axel/blog
_
_
_
_
_ .example.com/naxramas/
blog.example.com/angeleoooo/blog.example.com/angeleoooo/2011/01/01/blog.example.com/angeleoooo/2011/01/01/happynew-years/
_
_

我希望删除 example.com/username/ 之后的所有内容，因此剩余列表将如下所示：

blog.example.com/ilovecats/blog.example.com/fas24
blog.example.com/morg
blog.example.com/whistlermoar/blog.example.com/punny/blog.example.com/anniesblog/blog.example 。
_
_
_
com/alfva/
blog.example.com /
dudewheresmycar /blog.example.com/mynameisbilly/blog.example.com/anotherflower/blog.example.com/
naxramas /
blog.example.com /
angeleoooo /

我听说 Regex 是一种这样做的方式，所以我已经在谷歌上搜索了几个小时，但我的时间已经不多了。

有人能帮我吗？

（安装了记事本++）

score 2 · Accepted Answer

您可以使用：

(blog.example.com/\w+\/?).*

将其放入查找并确保在搜索模式中选择“正则表达式”。

在替换中，输入：

\1

并全部更换。

score 0 · Accepted Answer

这是要搜索的正则表达式。

^([.\w]+\/\w+\/?).*

这里是替代品。

\1

让我们分解一下。除非您仔细分解它们，否则正则表达式看起来就像您正在吹口哨进入调制解调器。

^        only match strings starting at the beginning of a line.
(        begin gathering a bunch of stuff so we can replace it with \1
   [.\w]+   accept a sequence of either dots or characters that appear in words
   \/       accept a / 
   \w+      accept a sequence of characters that can appear in words
   \/?      accept a /, optionally (hence the ?)
)        the end of the parenthesis started above
.*       accept the rest of the string.

请注意，我使用 + 字符进行重复，因为它匹配一个或多个字符。我本可以使用 *，并且在正则表达式的最后一项中这样做了。这匹配零个或多个重复。

regex - 通过正则表达式修剪 URL，但不是根

2 回答 2

Related

Reference