python - 正则表达式返回所有字符，直到“/”向后搜索

Question

我在使用这个正则表达式时遇到了问题，我想我快到了。

m =re.findall('[a-z]{6}\.[a-z]{3}\.[a-z]{2} (?=\" target)', 'http://domain.com.uy " target')

这给了我想要的“精确”输出。那是domain.com.uy但显然这只是一个例子，因为[a-z]{6}它只匹配前 6 个字符，这不是我想要的。

我希望它返回domain.com.uy，所以基本上该指令将匹配任何字符，直到遇到“/”（向后）。

编辑：

m =re.findall('\w+\.[a-z]{3}\.[a-z]{2} (?=\" target)', 'http://domain.com.uy " target')

非常接近我想要的，但不会匹配“_”或“-”。

为了完整起见，我不需要http://

我希望这个问题足够清楚，如果我留下任何可以解释的地方，请要求任何需要的澄清！

预先感谢！

score 1 · Accepted Answer

另一种选择是使用积极的后视，例如(?<=//)：

>>> re.search(r'(?<=//).+(?= \" target)', 
...           'http://domain.com.uy " target').group(0)
'domain.com.uy'

请注意，如果需要，这将匹配 url 本身中的斜杠：

>>> re.search(r'(?<=//).+(?= \" target)',
...           'http://example.com/path/to/whatever " target').group(0)
'example.com/path/to/whatever'

如果您只想要裸域，没有任何路径或查询参数，您可以使用r'(?<=//)([^/]+)(/.*)?(?= \" target)'并捕获组 1：

>>> re.search(r'(?<=//)([^/]+)(/.*)?(?= \" target)',
...           'http://example.com/path/to/whatever " target').groups()
('example.com', '/path/to/whatever')

score 1 · Accepted Answer

如果正则表达式不是必需的，并且您只是希望从 Python 中的 URL 中提取 FQDN。使用urlparse和str.split()：

>>> from urlparse import urlparse
>>> url = 'http://domain.com.uy " target'
>>> urlparse(url)
ParseResult(scheme='http', netloc='domain.com.uy " target', path='', params='', query='', fragment='')

这已将 URL 分解为其组成部分。我们想要netloc：

>>> urlparse(url).netloc
'domain.com.uy " target'

在空白处拆分：

>>> urlparse(url).netloc.split()
['domain.com.uy', '"', 'target']

只是第一部分：

>>> urlparse(url).netloc.split()[0]
'domain.com.uy'

score 0 · Accepted Answer

0

试试这个（也许你需要/在 Python 中转义）：

/([^/]*)$

于 2011-06-22T19:35:59.567 回答

score 0 · Accepted Answer

就这么简单：

[^/]+(?= " target)

但请注意http://domain.com/folder/site.php不会返回域。并记住在字符串中正确转义正则表达式。

python - 正则表达式返回所有字符，直到“/”向后搜索

4 回答 4

Related

Reference