python - 替代 `match = re.match(); 如果匹配：...`成语？

Question

如果您想检查某些内容是否与正则表达式匹配，如果是，请打印第一组，您可以这样做..

import re
match = re.match("(\d+)g", "123g")
if match is not None:
    print match.group(1)

这完全是迂腐的，但是中间match变量有点烦人..

像 Perl 这样的语言通过为匹配组创建新的$1..变量来做到这一点，比如 ..$9

if($blah ~= /(\d+)g/){
    print $1
}

从这个 reddit 评论中，

with re_context.match('^blah', s) as match:
    if match:
        ...
    else:
        ...

..我认为这是一个有趣的想法，所以我写了一个简单的实现：

#!/usr/bin/env python2.6
import re

class SRE_Match_Wrapper:
    def __init__(self, match):
        self.match = match

    def __exit__(self, type, value, tb):
        pass

    def __enter__(self):
        return self.match

    def __getattr__(self, name):
        if name == "__exit__":
            return self.__exit__
        elif name == "__enter__":
            return self.__name__
        else:
            return getattr(self.match, name)

def rematch(pattern, inp):
    matcher = re.compile(pattern)
    x = SRE_Match_Wrapper(matcher.match(inp))
    return x
    return match

if __name__ == '__main__':
    # Example:
    with rematch("(\d+)g", "123g") as m:
        if m:
            print(m.group(1))

    with rematch("(\d+)g", "123") as m:
        if m:
            print(m.group(1))

（这个功能理论上可以修补到_sre.SRE_Match对象中）

with如果您可以跳过语句代码块的执行，如果没有匹配项，那就太好了，这将简化为..

with rematch("(\d+)g", "123") as m:
    print(m.group(1)) # only executed if the match occurred

..但根据我从PEP 343中推断出的内容，这似乎是不可能的

有任何想法吗？正如我所说，这真的是微不足道的烦恼，几乎到了代码高尔夫的地步。

score 12 · Accepted Answer

我不认为这是微不足道的。如果我经常编写这样的代码，我不想在我的代码周围撒上多余的条件。

这有点奇怪，但您可以使用迭代器来做到这一点：

import re

def rematch(pattern, inp):
    matcher = re.compile(pattern)
    matches = matcher.match(inp)
    if matches:
        yield matches

if __name__ == '__main__':
    for m in rematch("(\d+)g", "123g"):
        print(m.group(1))

奇怪的是，它使用迭代器来处理不迭代的东西——它更接近于条件，乍一看它可能会为每次匹配产生多个结果。

上下文管理器不能完全跳过它的托管函数似乎很奇怪。虽然这不是“with”的明确用例之一，但它似乎是一个自然的扩展。

score 4 · Accepted Answer

开始Python 3.8，并引入赋值表达式（PEP 572）（:=运算符），我们现在可以捕获re.match(r'(\d+)g', '123g')变量match中的条件值，以便检查它是否不是None，然后在条件主体中重新使用它：

>>> if match := re.match(r'(\d+)g', '123g'):
...   print(match.group(1))
... 
123
>>> if match := re.match(r'(\d+)g', 'dddf'):
...   print(match.group(1))
...
>>>

score 3 · Accepted Answer

另一个不错的语法是这样的：

header = re.compile('(.*?) = (.*?)$')
footer = re.compile('(.*?): (.*?)$')

if header.match(line) as m:
    key, value = m.group(1,2)
elif footer.match(line) as m
    key, value = m.group(1,2)
else:
    key, value = None, None

score 1 · Accepted Answer

根据 Glen Maynard 的解决方案，我有另一种方法：

for match in [m for m in [re.match(pattern,key)] if m]:
    print "It matched: %s" % match

与 Glen 的解决方案类似，这会迭代 0（如果不匹配）或 1（如果匹配）次。

不需要子，但结果不那么整洁。

score 0 · Accepted Answer

在这种情况下，我不认为 usingwith是解决方案。您必须在BLOCK部件中引发异常（由用户指定）并让__exit__方法返回True以“吞下”异常。所以它永远不会好看。

我建议使用类似于 Perl 语法的语法。制作你自己的扩展re模块（我称之为rex）并让它在其模块命名空间中设置变量：

if rex.match('(\d+)g', '123g'):
    print rex._1

正如您在下面的评论中看到的那样，此方法既不是范围安全的，也不是线程安全的。只有当您完全确定您的应用程序将来不会成为多线程并且从您使用 this 的范围调用的任何函数也将使用相同的方法时，您才会使用它。

score 0 · Accepted Answer

如果你在一个地方做了很多这些，这里有一个替代答案：

import re
class Matcher(object):
    def __init__(self):
        self.matches = None
    def set(self, matches):
        self.matches = matches
    def __getattr__(self, name):
        return getattr(self.matches, name)

class re2(object):
    def __init__(self, expr):
        self.re = re.compile(expr)

    def match(self, matcher, s):
        matches = self.re.match(s)
        matcher.set(matches)
        return matches

pattern = re2("(\d+)g")
m = Matcher()
if pattern.match(m, "123g"):
    print(m.group(1))
if not pattern.match(m, "x123g"):
    print "no match"

您可以使用与 re 相同的线程安全性编译一次正则表达式，为整个函数创建一个可重用的 Matcher 对象，然后您可以非常简洁地使用它。这还有一个好处，您可以以明显的方式反转它 - 使用迭代器来做到这一点，您需要传递一个标志来告诉它反转它的结果。

但是，如果您只对每个函数进行一次匹配，这并没有多大帮助；您不想将 Matcher 对象保留在比这更广泛的上下文中；它会导致与 Blixt 的解决方案相同的问题。

score 0 · Accepted Answer

这看起来不是很漂亮，但是您可以getattr(object, name[, default])像这样使用它从内置函数中受益：

>>> getattr(re.match("(\d+)g", "123g"), 'group', lambda n:'')(1)
'123'
>>> getattr(re.match("(\d+)g", "X23g"), 'group', lambda n:'')(1)
''

要模拟if match 打印组流程，您可以（ab）以for这种方式使用该语句：

>>> for group in filter(None, [getattr(re.match("(\d+)g", "123g"), 'group', None)]):
        print(group(1))
123
>>> for group in filter(None, [getattr(re.match("(\d+)g", "X23g"), 'group', None)]):
        print(group(1))
>>>

当然，你可以定义一个小函数来做这些脏活：

>>> matchgroup = lambda p,s: filter(None, [getattr(re.match(p, s), 'group', None)])
>>> for group in matchgroup("(\d+)g", "123g"):
        print(group(1))
123
>>> for group in matchgroup("(\d+)g", "X23g"):
        print(group(1))
>>>

score 0 · Accepted Answer

不是完美的解决方案，但确实允许您为同一个 str 链接多个匹配选项：

class MatchWrapper(object):
  def __init__(self):
    self._matcher = None

  def wrap(self, matcher):
    self._matcher = matcher

  def __getattr__(self, attr):
    return getattr(self._matcher, attr)

def match(pattern, s, matcher):
  m = re.match(pattern, s)
  if m:
    matcher.wrap(m)
    return True
  else:
    return False

matcher = MatchWrapper()
s = "123g";
if _match("(\d+)g", line, matcher):
  print matcher.group(1)
elif _match("(\w+)g", line, matcher):
  print matcher.group(1)
else:
  print "no match"

score 0 · Accepted Answer

这是我的解决方案：

import re

s = 'hello world'

match = []
if match.append(re.match('w\w+', s)) or any(match):
    print('W:', match.pop().group(0))
elif match.append(re.match('h\w+', s)) or any(match):
    print('H:', match.pop().group(0))
else:
    print('No match found')

您可以根据需要使用尽可能多的elif子句。

更好的是：

import re

s = 'hello world'

if vars().update(match=re.match('w\w+', s)) or match:
    print('W:', match.group(0))
elif vars().update(match=re.match('h\w+', s)) or match:
    print('H:', match.group(0))
else:
    print('No match found')

追加和更新都返回None。因此，您必须在每种情况下都使用or部分来实际检查表达式的结果。

不幸的是，这只适用于代码位于顶层的情况，即不在函数中。

score 0 · Accepted Answer

这就是我所做的：

def re_match_cond (match_ref, regex, text):
    match = regex.match (text)
    del match_ref[:]
    match_ref.append (match)
    return match

if __name__ == '__main__':
    match_ref = []
    if re_match_cond (match_ref, regex_1, text):
        match = match_ref[0]
        ### ...
    elif re_match_cond (match_ref, regex_2, text):
        match = match_ref[0]
        ### ...
    elif re_match_cond (match_ref, regex_3, text):
        match = match_ref[0]
        ### ...
    else:
        ### no match
        ### ...

也就是说，我将一个列表传递给函数以模拟传递引用。

python - 替代 `match = re.match(); 如果匹配：...`成语？

10 回答 10

Related

Reference