python - 从字典中解码字符串的更快方法是什么？（Python）

Question

我必须使用具有相应二进制代码的字符字典来解码一串 1 和 0。字典是这样设置的：

self.codes = {'a': '00', 'h': '111', 'e': '110', 'l': '01', 'o': '10'}

这些代码是这样的，没有二进制代码是任何其他代码的前缀。目前我有这个功能来做解码：

def decode(self, ciphertext):
        start = 0
        while start < len(ciphertext):
            for k in self.codes:
                res = ciphertext.find(self.codes[k], start)
                if res == start:
                    decoded += k
                    start += len(self.codes[k])
        return decoded

（其中 ciphertext = 要解码的文本，self.codes = 表示编码系统的字典）

不出所料，这真的很慢。谁能指出我对这种事情更有效的算法？

[编辑] 谢谢大家的好建议。我会在回家的几个小时内尝试实施它们。

score 2 · Accepted Answer

你可以这样做：

def __init__(self, ...):
    ...
    self.codes = {'a': '00', 'h': '111', 'e': '110', 'l': '01', 'o': '10'}
    self.decode_codes = {value: key for key, value in self.codes.items()}

def decode(self, ciphertext):
    result = ''
    chunk = ''

    for c in ciphertext:
        chunk += c

        if chunk in self.decode_codes:
            result += self.decode_codes[chunk]
            chunk = ''


    return result

由于您说没有两个代码包含在彼此的开头，因此这种方法将起作用。

score 1 · Accepted Answer

如果您翻转键和值，您的代码将运行得更快：

self.codes = {'00': 'a', '111': 'h', '110': 'e', '01': 'l', '10': 'o'}

现在你可以做这样的伪代码：

ciphertext = 你的密文（例如“001111100110”）

明文 = ""

当前令牌 = ""

1) 将当前位的密文添加到 currenttoken 的末尾

2) self.codes 是否包含 currenttoken？

2a) 如果是这样，将 self.codes[currenttoken] 添加到明文中，并且 currenttoken = ""

字典访问是 O(1)，O(1) 超过一个字符串的长度是 O(n)。

如果您还需要未翻转的键/值来快速加密字符串，那么我建议您准备两本字典，一本用于快速编码，另一本用于快速解码。

score 1 · Accepted Answer

这是一个使用反向字典的更高效的解码器。我试图尽可能多地预先计算东西。

codes = {'a': '00', 'h': '111', 'e': '110', 'l': '01', 'o': '10'}

# generate the reverse dictionary and min/max length from the above
decodes = dict((v, k) for (k, v) in codes.iteritems())
# can use dict comprehension for above in Python >= 2.7
lenrange = range(min(len(k) for k in decodes), max(len(k) for k in decodes) + 1)
# for Python 3, add here: lenrange = list(lenrange)    

def decode(ciphertext):
    start, plaintext = 0, []
    plaintext_append = plaintext.append  # for speed
    while True:
        for n in lenrange:
            key = ciphertext[start:start+n]
            if key in decodes:
                plaintext_append(decodes[key])
                start += n
                break
        else:     # return result when code can't be found (usually at end)
            return "".join(plaintext)

score 1 · Accepted Answer

我试图实现 Blender 的答案，不得不做一些微小的改变：

codes = {'a': '00', 'h': '111', 'e': '110', 'l': '01', 'o': '10'}
decode_codes = {value: key for key, value in codes.items()}

def decode(ciphertext):
    result = ''
    chunk = ''

    for c in ciphertext:
        chunk += c
        if chunk in decode_codes:
            result += decode_codes[chunk]
            chunk = ''

    return result

def encode(plaintext):
    return ''.join(codes[c] for c in plaintext)

print decode(encode("hello"))

python - 从字典中解码字符串的更快方法是什么？（Python）

4 回答 4

Related

Reference