python - 从字典中编码（非常长的）字符串的有效方法是什么？（Python）

Question

我有一个字典，格式为{'要编码的字符':'对应的二进制代码'等}。我一直在这样编码：

def encode(self, text): 
    encoded = ""
    def generator():
        for ch in text:
            yield self.codes[ch]  # Get the encoded representation from the dictionary
    return ''.join(generator())

这适用于短字符串，但对于长度新颖的字符串，它太慢以至于无法使用。像这样对字符串进行编码的更快方法是什么？或者我应该完全重新考虑如何存储和操作我的数据？

更多代码：

我一直在使用进行测试print c.encode(f)，其中 f 是一个字符串（我刚刚检查过），而 c 是编码器对象。这适用于较短的文件 - 我测试了多达 3000 个字符。感谢 thg435 我的编码功能现在

 def encode(self, text):
        return ''.join(map(self.codes.get,text))

self.codes是一个映射字典 - 当输入字符串 'hello' 时，它将被设置为{'h': '01', 'e': '00', 'l': '10', 'o': '11'}. 我觉得我应该放更多代码，但我已经测试了参数（'text'）和字典，所以我不确定什么是相关的，因为它们似乎是唯一可能影响此函数运行时的东西. 在编码之前调用的函数在速度方面工作正常 - 我知道这一点，因为我一直在使用打印语句来检查它们的输出，并且它总是在执行时间的几秒钟内打印出来。

score 3 · Accepted Answer

这似乎是最快的：

''.join(map(codes.get, text))

时间：

codes = {chr(n): '[%d]' % n for n in range(255)}


def encode1(text): 
    return ''.join(codes[c] for c in text)

def encode2(text): 
    import re
    return re.sub(r'.', lambda m: codes[m.group()], text)

def encode3(text): 
    return ''.join(map(codes.get, text))


import timeit

a = 'foobarbaz' * 1000

print timeit.timeit(lambda: encode1(a), number=100)
print timeit.timeit(lambda: encode2(a), number=100)
print timeit.timeit(lambda: encode3(a), number=100)


# 0.113456964493
# 0.445501089096
# 0.0811159610748

python - 从字典中编码（非常长的）字符串的有效方法是什么？（Python）

1 回答 1

Related

Reference