python - python正则表达式模式提取两个字符之间的值

Question

我正在尝试以以下形式从网址中提取一个 ID 号

http://www.domain.com/some-slug-here/person/237570
http://www.domain.com/person/237570

这些网址中的任何一个也可以有参数

http://www.domain.com/some-slug-here/person/237570?q=some+search+string
http://www.domain.com/person/237570?q=some+search+string

我尝试使用以下表达式从上述 url 中捕获 '237570' 的 id 值，但每一种都可以，但在所有四种 url 场景中都有效。

(?<=person\/)(.*)(?=\?)
(?<=person\/)(.*)(?=\?|\z)
(?<=person\/)(.*)(?=\??*)

我所看到的是它正在获取 237570 但包括 ? 以及 url 中紧随其后的字符。当您点击？、/ 或字符串末尾时，我怎么能说停止捕获？

score 2 · Accepted Answer

细绳：

http://www.domain.com/some-slug-here/person/1234?q=some+search+string
http://www.domain.com/person/3456?q=some+search+string
http://www.domain.com/some-slug-here/person/5678
http://www.domain.com/person/7890

正则表达式：

person\/(\d{1,})

输出：

>>> regex.findall(string)
[u'1234', u'3456', u'5678', u'7890']

score 1 · Accepted Answer

不要.*用来匹配ID。.将匹配任何字符（换行符除外，除非您使用 DOTALL 选项）。只需匹配一堆数字：(.*)-->(\d+)

python - python正则表达式模式提取两个字符之间的值

2 回答 2

Related

Reference