python - 您可以使用 Python 正则表达式从偏移量向后搜索吗？

Question

给定一个字符串和该字符串中的字符偏移量，我可以使用 Python 正则表达式向后搜索吗？

我要解决的实际问题是在字符串中的特定偏移量处获取匹配的短语，但我必须匹配该偏移量之前的第一个实例。

在我有一个符号长的正则表达式（例如：单词边界）的情况下，我正在使用反转字符串的解决方案。

my_string = "Thanks for looking at my question, StackOverflow."
offset = 30
boundary = re.compile(r'\b')
end = boundary.search(my_string, offset)
end_boundary = end.start()
end_boundary

输出：33

end = boundary.search(my_string[::-1], len(my_string) - offset - 1)
start_boundary = len(my_string) - end.start()
start_boundary

输出：25

my_string[start_boundary:end_boundary]

输出：“问题”

但是，如果我有一个可能涉及多个字符的更复杂的正则表达式，这种“反向”技术将不起作用。例如，如果我想匹配出现在指定偏移量之前的“ing”的第一个实例：

my_new_string = "Looking feeding dancing prancing"
offset = 16 # on the word dancing
m = re.match(r'(.*?ing)', my_new_string) # Except looking backwards

理想输出：喂食

我可能会使用其他方法（将文件分成几行，并向后迭代这些行），但向后使用正则表达式似乎是一个概念上更简单的解决方案。

score 7 · Accepted Answer

使用正向回溯确保单词前至少有 30 个字符：

# re like: r'.*?(\w+)(?<=.{30})'
m = re.match(r'.*?(\w+)(?<=.{%d})' % (offset), my_string)
if m: print m.group(1)
else: print "no match"

对于另一个例子，负面的后视可能会有所帮助：

my_new_string = "Looking feeding dancing prancing"
offset = 16
m = re.match(r'.*(\b\w+ing)(?<!.{%d})' % offset, my_new_string)
if m: print m.group(1)

首先贪心匹配任何字符但回溯，直到它无法向后匹配 16 个字符 ( (?<!.{16}))。

score 1 · Accepted Answer

我们可以利用 python 的正则表达式引擎对贪婪的偏好（有点，不是真的），并告诉它我们想要的是尽可能多的字符，但不超过 30，然后...。

那么，一个合适的正则表达式可以是r'^.{0,30}(\b)'. 我们想要第一次捕获的开始。

>>> boundary = re.compile(r'^.{0,30}(\b)')
>>> boundary.search("hello, world; goodbye, world; I am not a pie").start(1)
30
>>> boundary.search("hello, world; goodbye, world; I am not").start(1)
30
>>> boundary.search("hello, world; goodbye, world; I am").start(1)
30
>>> boundary.search("hello, world; goodbye, pie").start(1)
26
>>> boundary.search("hello, world; pie").start(1)
17

python - 您可以使用 Python 正则表达式从偏移量向后搜索吗？

2 回答 2

Related

Reference