1
import regex
frase = "text https://www.gamivo.com/product/sea-of-thieves-pc-xbox-one other text https://www.gamivo.com/product/fifa-21-origin-eng-pl-cz-tr"
x = regex.findall(r"/((http[s]?:\/\/)?(www\.)?(gamivo\.com\S*){1})", frase) 
print(x)

结果:

[('www.gamivo.com/product/sea-of-thieves-pc-xbox-one', '', 'www.', 'gamivo.com/product/sea-of-thieves-pc-xbox-one'), ('www.gamivo.com/product/fifa-21-origin-eng-pl-cz-tr', '', 'www.', 'gamivo.com/product/fifa-21-origin-eng-pl-cz-tr')]

我想要类似的东西:

[('https://www.gamivo.com/product/sea-of-thieves-pc-xbox-one', 'https://gamivo.com/product/fifa-21-origin-eng-pl-cz-tr')]

我怎样才能做到这一点?

4

2 回答 2

1

你需要

  1. 删除使//的匹配无效的初始字符,因为出现在之后https://http:///http
  2. 删除不必要的捕获组和{1}量词
  3. 将可选捕获组转换为非捕获组。

请参阅此 Python 演示

import re
frase = "text https://www.gamivo.com/product/sea-of-thieves-pc-xbox-one other text https://www.gamivo.com/product/fifa-21-origin-eng-pl-cz-tr"
print( re.findall(r"(?:https?://)?(?:www\.)?gamivo\.com\S*", frase) )
# => ['https://www.gamivo.com/product/sea-of-thieves-pc-xbox-one', 'https://www.gamivo.com/product/fifa-21-origin-eng-pl-cz-tr']

也请参阅正则表达式演示。另外,请参阅相关的re.findall 行为怪异的帖子。

于 2021-07-23T09:23:10.053 回答
0

试试这个,它将采用从 https 开始的字符串到单个空格或换行符。

import re
frase = "text https://www.gamivo.com/product/sea-of-thieves-pc-xbox-one other text https://www.gamivo.com/product/fifa-21-origin-eng-pl-cz-tr"
x = re.findall('(https?://(?:[^\s]*))', frase)
print(x)
# ['https://www.gamivo.com/product/sea-of-thieves-pc-xbox-one', 'https://www.gamivo.com/product/fifa-21-origin-eng-pl-cz-tr']
于 2021-07-23T12:43:26.393 回答