2

我正在使用霍夫曼代码对一组项目进行编码,并且与最终代码一起,我想返回编码的中间节点,但子节点的数据连接到中间节点的数据中.

例如,如果我要对这组符号和概率进行编码:

tree = [('a',0.25),('b',0,25),('c',0.25),('d',0.125),('e',0.125)]

我希望退回以下内容:

tree = [['ab','0'],['cde','1'],['a','00'],['b','01'],['c','10'],['de','11'],['d','110'],['e','111']]

我正在使用以下代码来生成霍夫曼树:

import heapq

#symbfreq = list of symbols and associated frequencies
def encode(symbfreq):
    #create a nested list as a tree
    tree = [[wt, [sym, ""]] for sym, wt in symbfreq]
    #turn the tree into a heap
    heapq.heapify(tree)
    while len(tree)>1:
        #pop the lowest two nodes off the heap, sorted on the length
        lo, hi = sorted([heapq.heappop(tree), heapq.heappop(tree)], key=len)
        #add the next bit of the codeword
        for pair in lo[1:]:
            pair[1] = '0' + pair[1]
        for pair in hi[1:]:
            pair[1] = '1' + pair[1]
        #push a new node, which is the sum of the two lowest probability nodes onto the heap 
        heapq.heappush(tree, [lo[0]+hi[0]] + lo[1:] + hi[1:])
    return sorted(heapq.heappop(tree)[1:], key=lambda p: (len(p[-1]), p))

霍夫曼算法是:

1.Create a leaf node for each symbol and add it to the priority queue.
2.While there is more than one node in the queue:
    3.Remove the node of highest priority (lowest probability) twice to get two nodes.
    4.Create a new internal node with these two nodes as children and with probability equal to the sum of the two nodes' probabilities.
    5.Add the new node to the queue.
6.The remaining node is the root node and the tree is complete.

我一辈子都想不出一种方法来阻止中间节点被覆盖(即我想保留在第 4 阶段创建的中间节点)。

4

1 回答 1

2

我不知道在构建时如何做(构建永远不会弹出的输出树的一部分),但您可以很容易地检索中间节点:

huffman_tree = encode(tree)
complete_tree = huffman_tree

get_intermediate_node = lambda val, arr :   ''.join( [ char for char,binary in itertools.ifilter( lambda node : node[1].startswith( val ),arr)] ) 

for val in range( next_power_of_two( len(huffman_tree) ) ):
    bvalue = bin(val)[2:] 
    node = [ get_intermediate_node( bvalue , huffman_tree) , bvalue ] 
    if node not in complete_tree:
        complete_tree.append( node)

print sorted( complete_tree , key=lambda p: (len(p[-1]), p) )

>>> [['ab', '0'], ['cde', '1'], ['a', '00'], ['b', '01'], ['c', '10'],
    ['de', '11'], ['', '100'], ['', '101'], ['d', '110'], ['e', '111']]

(您仍然需要修剪空节点)

于 2013-11-04T13:12:13.567 回答