我正在使用霍夫曼代码对一组项目进行编码,并且与最终代码一起,我想返回编码的中间节点,但子节点的数据连接到中间节点的数据中.
例如,如果我要对这组符号和概率进行编码:
tree = [('a',0.25),('b',0,25),('c',0.25),('d',0.125),('e',0.125)]
我希望退回以下内容:
tree = [['ab','0'],['cde','1'],['a','00'],['b','01'],['c','10'],['de','11'],['d','110'],['e','111']]
我正在使用以下代码来生成霍夫曼树:
import heapq
#symbfreq = list of symbols and associated frequencies
def encode(symbfreq):
#create a nested list as a tree
tree = [[wt, [sym, ""]] for sym, wt in symbfreq]
#turn the tree into a heap
heapq.heapify(tree)
while len(tree)>1:
#pop the lowest two nodes off the heap, sorted on the length
lo, hi = sorted([heapq.heappop(tree), heapq.heappop(tree)], key=len)
#add the next bit of the codeword
for pair in lo[1:]:
pair[1] = '0' + pair[1]
for pair in hi[1:]:
pair[1] = '1' + pair[1]
#push a new node, which is the sum of the two lowest probability nodes onto the heap
heapq.heappush(tree, [lo[0]+hi[0]] + lo[1:] + hi[1:])
return sorted(heapq.heappop(tree)[1:], key=lambda p: (len(p[-1]), p))
霍夫曼算法是:
1.Create a leaf node for each symbol and add it to the priority queue.
2.While there is more than one node in the queue:
3.Remove the node of highest priority (lowest probability) twice to get two nodes.
4.Create a new internal node with these two nodes as children and with probability equal to the sum of the two nodes' probabilities.
5.Add the new node to the queue.
6.The remaining node is the root node and the tree is complete.
我一辈子都想不出一种方法来阻止中间节点被覆盖(即我想保留在第 4 阶段创建的中间节点)。