python - 找不到正确的正则表达式语法来匹配换行符或字符串结尾

Question

这感觉像是一个非常简单的问题，但我在任何地方都找不到答案。

（注意：我使用的是 Python，但这无关紧要。）

假设我有以下字符串：

s = "foo\nbar\nfood\nfoo"

我只是试图找到一个正则表达式，它将匹配“foo”的两个实例，但不匹配“food”，基于“food”中的“foo”不是紧跟换行符或结尾的事实细绳。

这可能是表达我的问题的一种过于复杂的方式，但它提供了一些具体的东西。

以下是我尝试过的一些事情，有结果（注意：我想要的结果是 [ foo\n, foo]）：

foo[\n\Z] => [ 'foo\n']

foo(\n\Z) => [ '\n', ''] <= 这似乎与换行符和 EOS 匹配，但不是foo

foo($|\n) => [ '\n', '']

(foo)($|\n) => [( foo, '\n'), ( foo, '')] <= 差不多了，这是一个可用的 B 计划，但我想找到完美的解决方案。

我发现唯一有效的是：

foo$|foo\n => [ 'foo\n', `'foo']

对于这样一个简单的例子来说这很好，但是很容易看出它是如何使用更大的表达式变得笨拙的（是的，这个foo东西是我实际使用的更大表达式的替代品）。

有趣的是：我能找到的最接近我的问题的 SO 问题是这个：在正则表达式中，匹配字符串的结尾或特定字符

在这里，我可以简单地替换\n我的“特定角色”。现在，接受的答案使用 regex /(&|\?)list=.*?(&|$)/。我注意到 OP 正在使用 JavaScript（问题被标记为javascript标签），所以 JavaScript 正则表达式解释器可能不同，但是当我在 Python 中使用上述正则表达式的问题中给出的确切字符串时，我得到了不好的结果：

>>> findall("(&|\?)list=.*?(&|$)", "index.php?test=1&list=UL")
[('&', '')]
>>> findall("(&|\?)list=.*?(&|$)", "index.php?list=UL&more=1")
[('?', '&')]

所以，我很难过。

score 11 · Accepted Answer

>>> import re
>>> re.findall(r'foo(?:$|\n)', "foo\nbar\nfood\nfoo")
['foo\n', 'foo']

(?:...)做一个非捕获组。

这是有效的，因为（来自re 模块参考）：

re.findall（模式，字符串，标志=0）

返回字符串中模式的所有非重叠匹配，作为字符串列表。从左到右扫描字符串，并按找到的顺序返回匹配项。如果模式中存在一个或多个组，则返回组列表；如果模式有多个组，这将是一个元组列表。空匹配包含在结果中，除非它们触及另一个匹配的开始。

score 4 · Accepted Answer

您可以在模式中使用并包含re.MULTILINE可选的换行符：$

s = "foo\nbar\nfood\nfoo"
pattern = re.compile('foo$\n?', re.MULTILINE)
print re.findall(pattern, s)
# -> ['foo\n', 'foo']

score 1 · Accepted Answer

If you're only concerned with foo:

In [42]: import re

In [43]: strs="foo\nbar\nfood\nfoo"

In [44]: re.findall(r'\bfoo\b',strs)
Out[44]: ['foo', 'foo']

\b is denotes a word boundary:

\b

Matches the empty string, but only at the beginning or end of a word. A word is defined as a sequence of alphanumeric or underscore characters, so the end of a word is indicated by whitespace or a non-alphanumeric, non-underscore character. Note that formally, \b is defined as the boundary between a \w and a \W character (or vice versa), or between \w and the beginning/end of the string, so the precise set of characters deemed to be alphanumeric depends on the values of the UNICODE and LOCALE flags. For example, r'\bfoo\b' matches 'foo', 'foo.', '(foo)', 'bar foo baz' but not 'foobar' or 'foo3'. Inside a character range, \b represents the backspace character, for compatibility with Python’s string literals.

(Source)

python - 找不到正确的正则表达式语法来匹配换行符或字符串结尾

3 回答 3

Related

Reference