35

Version 1.4.0\n我有形如and的字符串Version 1.15.6\n,我想要一种从它们中提取三个数字的简单方法。我知道我可以使用 format 方法将变量放入字符串中;我基本上想倒着做,像这样:

# So I know I can do this:
x, y, z = 1, 4, 0
print 'Version {0}.{1}.{2}\n'.format(x,y,z)
# Output is 'Version 1.4.0\n'

# But I'd like to be able to reverse it:

mystr='Version 1.15.6\n'
a, b, c = mystr.unformat('Version {0}.{1}.{2}\n')

# And have the result that a, b, c = 1, 15, 6

我发现其他人也问过同样的问题,但回复是针对他们的特定情况的: 使用 Python 格式字符串反向解析

一个一般的答案(如何format()反过来做)会很棒!不过,我的具体案例的答案也会非常有帮助。

4

7 回答 7

10

只是为了建立Uche 的答案,我一直在寻找一种通过带有 kwargs 的模式来反转字符串的方法。所以我把以下功能放在一起:

def string_to_dict(string, pattern):
    regex = re.sub(r'{(.+?)}', r'(?P<_\1>.+)', pattern)
    values = list(re.search(regex, string).groups())
    keys = re.findall(r'{(.+?)}', pattern)
    _dict = dict(zip(keys, values))
    return _dict

哪个按以下方式工作:

>>> p = 'hello, my name is {name} and I am a {age} year old {what}'

>>> s = p.format(name='dan', age=33, what='developer')
>>> s
'hello, my name is dan and I am a 33 year old developer'
>>> string_to_dict(s, p)
{'age': '33', 'name': 'dan', 'what': 'developer'}

>>> s = p.format(name='cody', age=18, what='quarterback')
>>> s
'hello, my name is cody and I am a 18 year old quarterback'
>>> string_to_dict(s, p)
{'age': '18', 'name': 'cody', 'what': 'quarterback'}
于 2016-04-25T10:46:38.380 回答
8
>>> import re
>>> re.findall('(\d+)\.(\d+)\.(\d+)', 'Version 1.15.6\n')
[('1', '15', '6')]
于 2012-08-07T11:35:22.430 回答
5

编辑:另请参阅此答案parse以获取有关and的更多信息parmatter

pypi 包parse很好地满足了这个目的:

pip install parse

可以这样使用:

>>> import parse
>>> result=parse.parse('Version {0}.{1}.{2}\n', 'Version 1.15.6\n')
<Result ('1', '15', '6') {}>
>>> values=list(result)
>>> print(values)
['1', '15', '6']

请注意,文档说parse包默认情况下不完全模拟格式规范迷你语言;它还使用 . 指定的一些类型指示符re。特别要注意的是,s默认情况下这意味着“空白”,而不是str. s通过将默认类型更改为str(使用) ,可以轻松地将其修改为与格式规范一致extra_types

result = parse.parse(format_str, string, extra_types=dict(s=str))

string.Formatter这是使用包修改内置类parse以添加unformat我自己使用的功能的概念性想法:

import parse
from string import Formatter
class Unformatter(Formatter):
    '''A parsable formatter.'''
    def unformat(self, format, string, extra_types=dict(s=str), evaluate_result=True):
        return parse.parse(format, string, extra_types, evaluate_result)
    unformat.__doc__ = parse.Parser.parse.__doc__

重要提示:方法名称parse已被Formatter该类使用,因此我选择unformat了避免冲突。

更新:您可能会像这样使用它 - 与类非常相似string.Formatter

格式(与 相同'{:d} {:d}'.format(1, 2)):

>>> formatter = Unformatter() 
>>> s = formatter.format('{:d} {:d}', 1, 2)
>>> s
'1 2' 

取消格式化:

>>> result = formatter.unformat('{:d} {:d}', s)
>>> result
<Result (1, 2) {}>
>>> tuple(result)
(1, 2)

如上所示,这当然是非常有限的用途。但是,我提出了一个 pypi 包(parmatter - 一个最初供我自己使用但也许其他人会发现它有用的项目),它探讨了如何将这个想法用于更有用的工作的一些想法。该软件包严重依赖于上述parse软件包。编辑:几年后的经验,我意识到parmatter(我的第一个包裹!)是一个可怕的、令人尴尬的想法,并且已经删除了它。

于 2017-05-30T15:47:14.503 回答
4

实际上 Python 正则表达式库已经提供了您所要求的一般功能。您只需稍微更改模式的语法

>>> import re
>>> from operator import itemgetter
>>> mystr='Version 1.15.6\n'
>>> m = re.match('Version (?P<_0>.+)\.(?P<_1>.+)\.(?P<_2>.+)', mystr)
>>> map(itemgetter(1), sorted(m.groupdict().items()))
['1', '15', '6']

如您所见,您必须将 (un) 格式字符串从 {0} 更改为 (?P<_0>.+)。您甚至可以使用 (?P<_0>\d+) 来要求小数。此外,您必须对某些字符进行转义,以防止它们被解释为正则表达式特殊字符。但这反过来可以再次自动化,例如

>>> re.sub(r'\\{(\d+)\\}', r'(?P<_\1>.+)', re.escape('Version {0}.{1}.{2}'))
'Version\\ (?P<_0>.+)\\.(?P<_1>.+)\\.(?P<_2>.+)'
于 2012-08-07T15:38:09.457 回答
3

前段时间我做了下面的代码,它与格式相反,但仅限于我需要的情况。

而且,我从未尝试过,但我认为这也是parse library

我的代码:

import string
import re

_def_re   = '.+'
_int_re   = '[0-9]+'
_float_re = '[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?'

_spec_char = '[\^$.|?*+()'

def format_parse(text, pattern):
    """
    Scan `text` using the string.format-type `pattern`

    If `text` is not a string but iterable return a list of parsed elements

    All format-like pattern cannot be process:
      - variable name cannot repeat (even unspecified ones s.t. '{}_{0}')
      - alignment is not taken into account
      - only the following variable types are recognized:
           'd' look for and returns an integer
           'f' look for and returns a  float

    Examples::

        res = format_parse('the depth is -42.13', 'the {name} is {value:f}')
        print res
        print type(res['value'])
        # {'name': 'depth', 'value': -42.13}
        # <type 'float'>

        print 'the {name} is {value:f}'.format(**res)
        # 'the depth is -42.130000'

        # Ex2: without given variable name and and invalid item (2nd)
        versions = ['Version 1.4.0', 'Version 3,1,6', 'Version 0.1.0']
        v = format_parse(versions, 'Version {:d}.{:d}.{:d}')
        # v=[{0: 1, 1: 4, 2: 0}, None, {0: 0, 1: 1, 2: 0}]

    """
    # convert pattern to suitable regular expression & variable name
    v_int = 0   # available integer variable name for unnamed variable 
    cur_g = 0   # indices of current regexp group name 
    n_map = {}  # map variable name (keys) to regexp group name (values)
    v_cvt = {}  # (optional) type conversion function attached to variable name
    rpattern = '^'    # stores to regexp pattern related to format pattern        

    for txt,vname, spec, conv in string.Formatter().parse(pattern):
        # process variable name
        if len(vname)==0:
            vname = v_int
            v_int += 1
        if vname not in n_map:
            gname = '_'+str(cur_g)
            n_map[vname] = gname
            cur_g += 1                   
        else:    
            gname = n_map[vname]

        # process type of required variables 
        if   'd' in spec: vtype = _int_re;   v_cvt[vname] = int
        elif 'f' in spec: vtype = _float_re; v_cvt[vname] = float
        else:             vtype = _def_re;

        # check for regexp special characters in txt (add '\' before)
        txt = ''.join(map(lambda c: '\\'+c if c in _spec_char else c, txt))

        rpattern += txt + '(?P<'+gname+'>' + vtype +')'

    rpattern += '$'

    # replace dictionary key from regexp group-name to the variable-name 
    def map_result(match):
        if match is None: return None
        match = match.groupdict()
        match = dict((vname, match[gname]) for vname,gname in n_map.iteritems())
        for vname, value in match.iteritems():
            if vname in v_cvt:
                match[vname] = v_cvt[vname](value)
        return match

    # parse pattern
    if isinstance(text,basestring):
        match = re.search(rpattern, text)
        match = map_result(match)
    else:
        comp  = re.compile(rpattern)
        match = map(comp.search, text)
        match = map(map_result, match)

    return match

对于您的情况,这是一个使用示例:

versions = ['Version 1.4.0', 'Version 3.1.6', 'Version 0.1.0']
v = format_parse(versions, 'Version {:d}.{:d}.{:d}')
# v=[{0: 1, 1: 4, 2: 0}, {0: 3, 1: 1, 2: 6}, {0: 0, 1: 1, 2: 0}]

# to get the versions as a list of integer list, you can use:
v = [[vi[i] for i in range(3)] for vi in filter(None,v)]

注意filter(None,v)删除不可解析的版本(返回无)。这里没有必要。

于 2013-08-30T13:42:31.450 回答
2

a, b, c = (int(i) for i in mystr.split()[1].split('.'))

将为您int提供a,bc

>>> a
1
>>> b
15
>>> c
6

根据您的数字/版本格式的规则或不规则,即一致,您可能需要考虑使用正则表达式,但如果它们保持这种格式,如果它适用于您,我会倾向于更简单的解决方案.

于 2012-08-07T11:34:22.997 回答
0

如果您不想使用 parse 模块,这里有一个解决方案。它将格式字符串转换为具有命名组的正则表达式。它做了一些假设(在文档字符串中描述)在我的情况下是可以的,但在你的情况下可能不行。

def match_format_string(format_str, s):
    """Match s against the given format string, return dict of matches.

    We assume all of the arguments in format string are named keyword arguments (i.e. no {} or
    {:0.2f}). We also assume that all chars are allowed in each keyword argument, so separators
    need to be present which aren't present in the keyword arguments (i.e. '{one}{two}' won't work
    reliably as a format string but '{one}-{two}' will if the hyphen isn't used in {one} or {two}).

    We raise if the format string does not match s.

    Example:
    fs = '{test}-{flight}-{go}'
    s = fs.format('first', 'second', 'third')
    match_format_string(fs, s) -> {'test': 'first', 'flight': 'second', 'go': 'third'}
    """

    # First split on any keyword arguments, note that the names of keyword arguments will be in the
    # 1st, 3rd, ... positions in this list
    tokens = re.split(r'\{(.*?)\}', format_str)
    keywords = tokens[1::2]

    # Now replace keyword arguments with named groups matching them. We also escape between keyword
    # arguments so we support meta-characters there. Re-join tokens to form our regexp pattern
    tokens[1::2] = map(u'(?P<{}>.*)'.format, keywords)
    tokens[0::2] = map(re.escape, tokens[0::2])
    pattern = ''.join(tokens)

    # Use our pattern to match the given string, raise if it doesn't match
    matches = re.match(pattern, s)
    if not matches:
        raise Exception("Format string did not match")

    # Return a dict with all of our keywords and their values
    return {x: matches.group(x) for x in keywords}
于 2018-08-17T14:47:03.143 回答