python - 如果键不存在，正则表达式匹配文件路径的一部分

Question

如果文件路径的一部分不包含在python中使用正则表达式的某个关键字，我正在尝试匹配它。例如，将正则表达式应用于“/exclude/this/test/other”不应匹配，而“/this/test/other”应返回不包括“other”的文件路径，即“/this/test”，并且其中“其他”是任何目录。到目前为止，我正在使用这个

In [153]: re.findall("^(((?!exclude).)*(?=test).*)?", "/exclude/this/test/other")
Out[153]: [('', '')]

re.findall("^(((?!exclude).)*(?=test).*)?", "/this/test/other")
Out[152]: [('/this/test/other', '/')]

但我不能让它在“测试”之后停止匹配，也有一些空匹配。有任何想法吗？

score 2 · Accepted Answer

in如果您只需要检查关键字是否存在，只需使用：

In [33]: s1="/exclude/this/test"

In [34]: s2="this/test"

In [35]: 'exclude' in s1
Out[35]: True

In [36]: 'exclude' in s2
Out[36]: False

编辑：或者如果您只想要测试之前的路径：

if 'exclude' not in s:
    re.findall(r'(.+test)',s)

score 2 · Accepted Answer

你得到了额外的结果，因为（1）你使用findall()而不是search()，（2）你使用捕获组而不是非捕获

>>> import re
>>> re.search(r'^(?:(?:(?!exclude).)*(?=test)*)$', "/this/test").group(0)
'/this/test'

这也适用findall()，但是当您匹配整个字符串时，这并没有什么意义。更重要的是，您的正则表达式的包含部分不起作用。检查这个：

>>> re.search(r'^(?:(?:(?!exclude).)*(?=test)*)$', "/this/foo").group(0)
'/this/foo'

那是因为*in(?=test)*使前瞻成为可选的，这使得它毫无意义。但是摆脱 the*并不是真正的解决方案，因为excludeandtest可能是较长单词的一部分，例如excludexxor yyytest。这是一个更好的正则表达式：

r'^(?=.*/test\b)(?!.*/exclude\b)(?:/\w+)+$'

测试：

>>> re.search(r'^(?=.*/test\b)(?!.*/exclude\b)(?:/\w+)+$', '/this/test').group()
'/this/test'
>>> re.search(r'^(?=.*/test\b)(?!.*/exclude\b)(?:/\w+)+$', '/this/foo').group()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'

编辑：我看到你修复了“可选的前瞻”问题，但现在整个正则表达式是可选的！

编辑：如果您希望它在之后停止匹配/test，请尝试以下操作：

r'^(?:/(?!test\b|exclude\b)\w+)*/test\b'

(?:/(?!test\b|exclude\b)\w+)*匹配零个或多个路径组件，只要它们不是/testor /exclude。

score 1 · Accepted Answer

如果你的匹配比in一个简单的关键字更复杂，那么如果你做了两个正则表达式可能会更清楚：

import re
s1="/exclude/this/test"
s2="this/test"

for s in (s1,s2):
    if re.search(r'exclude',s): 
        print 'excluding:',s
        continue
    print s, re.findall(r'test',s)

印刷：

excluding: /exclude/this/test
this/test ['test']

如果这是您的目标，您可以使两个正则表达式紧凑：

print [(s,re.findall(r'test',s)) for s in s1,s2 if not re.search(r'exclude',s)]

编辑

如果我理解您的编辑，这有效：

s1="/exclude/this/test/other"
s2="/this/test/other"

print [(s,re.search(r'(.*?)/[^/]+$',s).group(1)) for s in s1,s2 if not re.search(r'exclude',s)]

印刷：

[('/this/test/other', '/this/test')]

python - 如果键不存在，正则表达式匹配文件路径的一部分

3 回答 3

Related

Reference