python - 用分组替换正则表达式

Question

给定一个字符串\url{www.mywebsite.com/home/us/index.html}'，我想用替换 URL 的倒数第二个正斜杠部分www.example.com/，这样它就变成了：

\url{www.example.com/us/index.html}`

我假设 URL 中至少存在一个正斜杠。现在这就是我尝试过的。

>>> pattern = r'(\url{).*([^/]*/[^/]*})'
>>> prefix = r'\1www.example.com/\2'
>>> re.sub(pattern, prefix, '\url{www.mywebsite.com/home/us/index.html}')
'\\url{www.example.com//index.html}'

我不确定为什么该us部分未包含在结果中，即使我明确包含[^/]*在正则表达式中。

score 1 · Accepted Answer

还使用lookhead/lookbehind：

import re
# match anything that has a preceding '{' up to the last two slashes:
pattern = r'(?<={).*(?=(?:[^/]*/){2})'
prefix = r'www.example.com'
print re.sub(pattern, prefix, '\url{www.mywebsite.com/home/us/index.html}')

输出

\url{www.example.com/us/index.html}

或根本不使用正则表达式：

l='\url{www.mywebsite.com/home/us/index.html}'.split(r"/")[-2:]
l=['\url{www.example.com', l[0], l[1]]
print "/".join(l)

score 1 · Accepted Answer

贪婪.*匹配所有内容，直到最后一个斜线。然后你的组匹配/index.html，第一个[^/]*不匹配（因为*什么都不匹配）。

在你之后包含一个斜杠，.*以强制在.*倒数第二个斜杠之前停止，防止它消耗us你想要留给组捕获的内容：

>>> pattern = r'(\url{).*/([^/]*/[^/]*})'
>>> re.sub(pattern, prefix, '\url{www.mywebsite.com/home/us/index.html}')
'\\url{www.example.com/us/index.html}'

python - 用分组替换正则表达式

2 回答 2

Related

Reference