0

处理一个简单的脚本处理文本文件和日志。它必须从命令行获取替换的正则表达式列表。例如:

./myscript.py --replace=s/foo/bar/ --replace=s@/etc/hosts@/etc/foo@ --replace=@test\@email.com@root\@email.com@

有没有一种简单的方法可以为 python re 库提供用户指定的替换模式?并且该模式是否针对字符串运行?任何优雅的解决方案?

如果可能的话,我想避免编写自己的解析器。请注意,我希望支持 /g 或 /i 等修饰符。

谢谢!

4

3 回答 3

0

就像评论中提到的那样,您可以使用,但这显然re.compile()仅适用于matching 和ing 。search假设您只有替换,您可能会执行以下操作:

modifiers_map = {
    'i': re.IGNORE,
    ...
}

for replace in replacements:
    # Look for a generalized separator in front of a command
    m = re.match(r'(s?)(.)([^\2]+)\2([^\2]+)\2([ig]*)', replace)
    if not m:
        print 'Invalid command: %s' % replace
        continue
    command, separator, query, substitution, modifiers = m.groups()
    # Convert the modifiers to flags
    flags = reduce(operator.__or__, [modifiers_map[char] for char in modifiers], 0)
    # This needs a little bit of tweaking if you want to support
    # group matching (like \1, \2, etc.). This also assumes that
    # you're only getting 's' as a command
    my_text = re.sub(query, substitution, my_text, flags=flags)

可以说,这是一个粗略的草稿,但我认为它可以让你 90% 的找到你想要的东西。

于 2013-03-26T19:03:39.690 回答
0

感谢您的回答。鉴于任何提议的解决方案的复杂性以及标准库中缺乏预先支持的解析器,我只是加倍努力并实现了自己的解析器。

它并不比其他提案复杂得多,见下文。我现在只需要编写测试。

谢谢!

class Replacer(object):
  def __init__(self, patterns=[]):
    self.patterns = []
    for pattern in patterns:
      self.AddPattern(pattern)

  def ParseFlags(self, flags):
    mapping = {
      'g': 0, 'i': re.I, 'l': re.L, 'm': re.M, 's': re.S, 'u': re.U, 'x': re.X,
      'd': re.DEBUG
    }

    result = 0
    for flag in flags:
      try:
        result |= mapping[flag]
      except KeyError:
        raise ValueError(
            "Invalid flag: %s, known flags: %s" % (flag, mapping.keys()))
    return result

  def Apply(self, text):
    for regex, repl in self.patterns:
      text = regex.sub(repl, text)
    return text

  def AddPattern(self, pattern):
    separator = pattern[0]
    match = []
    for position, char in enumerate(pattern[1:], start=1):
      if char == separator:
        if pattern[position - 1] != '\\':
          break
        match[-1] = separator
        continue
      match += char
    else:
      raise ValueError("Invalid pattern: could not find divisor.")

    replacement = []
    for position, char in enumerate(pattern[position + 1:], start=position + 1):
      if char == separator:
        if pattern[position - 1] != '\\':
          break
        replacement[-1] = separator
        continue
      replacement += char
    else:
      raise ValueError(
          "Invalid pattern: could not find divisor '%s'." % separator)

    flags = self.ParseFlags(pattern[position + 1:])
    match = ''.join(match)
    replacement = ''.join(replacement)
    self.patterns.append((re.compile(match, flags=flags), replacement))
于 2013-03-27T16:41:07.793 回答
0

您可以使用空格作为分隔符来利用 shell 的命令行解析器:

$ myscript --replace=foo bar \
>          --replace=/etc/hosts /etc/foo gi \
>          --replace=test@email.com root@email.com 

g标志在 Python 中是默认的,因此您需要为其添加特殊支持:

#!/usr/bin/env python
import re
from argparse import ArgumentParser
from functools import partial

all_re_flags = 'Lgimsux' # regex flags
parser = ArgumentParser(usage='%(prog)s [--replace PATTERN REPL [FLAGS]]...')
parser.add_argument('-e', '--replace', action='append', nargs='*')
args = parser.parse_args()
print(args.replace)

subs = [] # replacement functions: input string -> result
for arg in args.replace:
    count = 1 # replace only the first occurrence if no `g` flag
    if len(arg) == 2:
        pattern, repl = arg
    elif len(arg) == 3:
        pattern, repl, flags = arg
        if ''.join(sorted(flags)) not in all_re_flags:
            parser.error('invalid flags %r for --replace option' % flags)
        if 'g' in flags: # add support for `g` flag
            flags = flags.replace('g', '')
            count = 0 # replace all occurrences
        if flags: # embed flags
            pattern = "(?%s)%s" % (flags, pattern)
    else:
        parser.error('wrong number of arguments for --replace option')
    subs.append(partial(re.compile(pattern).sub, repl, count=count))

你可以使用subs如下:

input_string = 'a b a b'
for replace in subs:
    print(replace(input_string))

例子:

$ ./myscript -e 'a b' 'no flag' -e 'a B' 'with flags' ig

输出:

[['a b', 'no flag'], ['a B', 'with flags', 'ig']]
no flag a b
with flags with flags
于 2013-03-27T14:05:37.113 回答