algorithm - 霍夫曼算法中的二进制前缀码

Question

在霍夫曼编码算法中，有一个引理说：

最优二元前缀码对应的二叉树是满的

但我不知道为什么。你如何证明这个引理？

score 2 · Accepted Answer

Any binary code for data can be represented as a binary tree. The code is represented by the path from the root to the leaf, with a left edge representing a 0 in the prefix and a right one representing 1 (or vice versa.) Keep in mind that for each symbol there is one leaf node.

To prove that an optimal code will be represented by a full binary tree, let's recall what a full binary tree is, It is a tree where each node is either a leaf or has two chilren.

Let's assume that a certain code is optimal and is represented by a non-full tree. So there is a certain vertex u with only a single child v. The edge between u and v adds the bit x to the prefix code of the symbols (at the leaves) in the subtree rooted at v. From this tree I can remove the edge x and replace u with v, thus decreasing the length of the prefix code of all symbols in the subtree rooted at v by one. This reduces the number of bits in the representation of at least one symbol (when v is a singleton node.)

This shows that the tree didnt actually represent an optimal code, and my premise was wrong. Thus proving the lemma.

score 1 · Accepted Answer

来自维基百科，

一棵完整的二叉树（有时是二叉树或严格的二叉树）是一棵树，其中除了叶子之外的每个节点都有两个孩子。

生成霍夫曼代码树的方式肯定会生成完整的二叉树。因为在算法的每一步，我们都会从队列中移除优先级最高（概率最低）的两个节点，并以这两个节点为子节点创建一个新的内部节点。

algorithm - 霍夫曼算法中的二进制前缀码

2 回答 2

Related

Reference