1

考虑到以下几点:

import re
sequence = 'FFFFFF{7}FFFFFF'
patterns = [ ('([0-9a-fA-F]+)', 'Sequence'),
    ('(\\([0-9a-fA-F]+\\|[0-9a-fA-F]+\\))', 'Option'),
    ('({[0-9a-fA-F]+})', 'Range'),
    ('(\\[[0-9a-fA-F]+:([0-9a-fA-F]+|\*)\\])', 'Slice'),
    ('(\\?\\?)+', 'Byte_value_Wildcard'),
    ('(\\*)+', 'Byte_length_wildcard') ]
fragment_counter = 0
fragment_dict= {}
fragments_list = []
while sequence:
    found = False
    for pattern, name in patterns:
        m = re.match (pattern,sequence)
        if m:
            fragment_counter+=1
            m = m.groups () [0]
            fragment_dict["index"]=fragment_counter
            fragment_dict["fragment_type"]=name
            fragment_dict["value"]=m
            print fragment_dict
            fragments_list.append(fragment_dict)
            sequence = sequence [len (m):]
            found = True
            break
     if not found: raise Exception ('Unrecognized sequence')

print fragments_list

每次点击“print fragment_dict”行时,我都会得到正确的(预期的)输出:

{'index': 1, 'fragment_type': 'Sequence', 'value': 'FFFFFF'}
{'index': 2, 'fragment_type': 'Range', 'value': '{7}'}
{'index': 3, 'fragment_type': 'Sequence', 'value': 'FFFFFF'}

但是,列表项 fragments_list是最终字典的 3 个副本,而不是我期望的每一行:

[{'index': 3, 'fragment_type': 'Sequence', 'value': 'FFFFFF'}, {'index': 3, 'fragment_type': 'Sequence', 'value': 'FFFFFF'}, {'index': 3, 'fragment_type': 'Sequence', 'value': 'FFFFFF'}]

我假设这是因为append引用字典的实例,而不是复制字典的每次迭代。我查看了使用该list()功能,但在 dict 项目上它只是给了我一个 dict 键的列表。

我究竟做错了什么?
我并不拘泥于数据类型,我只需要一种方法来为我找到的每个片段保存 3 个数据元素(也许是第 4 个)。

4

2 回答 2

3

你很亲密。代替从任何序列list()生成新的函数(在这种情况下,作为序列的 a 是其键的序列),使用从任何映射生成新的函数。listdictdict()dict

或者,也许更简单,只使用该copy方法。

或者,更简单地说,只是将它fragment_dict= {}移到循环中,这样你就可以构建一个新dict的而不是继续重复使用同一个。

于 2013-02-19T01:51:24.553 回答
1

很高兴看到您实际上正在使用我对您最后一个问题的回答的代码。当我的回答确实有帮助时,我总是很高兴。

考虑到,在您之前的问题中,您表示此解析只是之后处理已解析令牌的热身,您可能会考虑这一点:

为您拥有的每种令牌类型创建一个类。在每个内部实现一个process方法,有时会进行实际处理(而不是我在下面的代码中执行的 wolfing、bearing、foxing 和 badgering)。

然后用类解析整个流Stream。您可以通过遍历令牌,Stream.tokens并且可以处理调用包含的所有令牌Stream.process

您将这些类放在一个 python 文件中,将其导入到您的主代码中,您只需要创建一个实例Stream来解析和处理它。

像这样的东西:

#! /usr/bin/python3.2

import re

class Sequence:
    def __init__ (self, raw): self.__raw = raw
    def __str__ (self): return 'Sequence {}'.format (self.__raw)
    def process (self): print ('Wolfing sequence {}'.format (self.__raw) )

class Option:
    def __init__ (self, raw): self.__raw = raw
    def __str__ (self): return 'Option {}'.format (self.__raw)
    def process (self): print ('Foxing option {}'.format (self.__raw) )

class Range:
    def __init__ (self, raw): self.__raw = raw
    def __str__ (self): return 'Range {}'.format (self.__raw)
    def process (self): print ('Bearing range {}'.format (self.__raw) )

class Slice:
    def __init__ (self, raw): self.__raw = raw
    def __str__ (self): return 'Slice {}'.format (self.__raw)
    def process (self): print ('Badgering slice {}'.format (self.__raw) )


class Stream:
    patterns = [ ('([0-9a-fA-F]+)', Sequence),
        ('(\\([0-9a-fA-F]+\\|[0-9a-fA-F]+\\))', Option),
        ('({[0-9a-fA-F]+})', Range),
        ('(\\[[0-9a-fA-F]+:[0-9a-fA-F]+\\])', Slice) ]

    def __init__ (self, stream):
        self.__tokens = []
        while stream:
            found = False
            for pattern, cls in self.patterns:
                m = re.match (pattern, stream)
                if m:
                    m = m.groups () [0]
                    self.__tokens.append (cls (m) )
                    stream = stream [len (m):]
                    found = True
                    break
            if not found: raise Exception ('Unrecognized sequence')

    @property
    def tokens (self): return (token for token in self.__tokens)

    def process (self):
        for token in self.__tokens: token.process ()

stream = Stream ('524946(46|58){4}434452[22:33]367672736E')
print ('These are the tokens:')
for idx, token in enumerate (stream.tokens):
    print ('{} at position {}.'.format (token, idx) )

print ('\nNow let\'s process them all:')
stream.process ()

这产生:

These are the tokens:
Sequence 524946 at position 0.
Option (46|58) at position 1.
Range {4} at position 2.
Sequence 434452 at position 3.
Slice [22:33] at position 4.
Sequence 367672736E at position 5.

Now let's process them all:
Wolfing sequence 524946
Foxing option (46|58)
Bearing range {4}
Wolfing sequence 434452
Badgering slice [22:33]
Wolfing sequence 367672736E
于 2013-02-19T03:08:17.627 回答