在霍夫曼编码算法中,有一个引理说:
最优二元前缀码对应的二叉树是满的
但我不知道为什么。你如何证明这个引理?
Any binary code for data can be represented as a binary tree. The code is represented by the path from the root to the leaf, with a left edge representing a 0 in the prefix and a right one representing 1 (or vice versa.) Keep in mind that for each symbol there is one leaf node.
To prove that an optimal code will be represented by a full binary tree, let's recall what a full binary tree is, It is a tree where each node is either a leaf or has two chilren.
Let's assume that a certain code is optimal and is represented by a non-full tree. So there is a certain vertex u with only a single child v. The edge between u and v adds the bit x to the prefix code of the symbols (at the leaves) in the subtree rooted at v. From this tree I can remove the edge x and replace u with v, thus decreasing the length of the prefix code of all symbols in the subtree rooted at v by one. This reduces the number of bits in the representation of at least one symbol (when v is a singleton node.)
This shows that the tree didnt actually represent an optimal code, and my premise was wrong. Thus proving the lemma.
来自维基百科,
一棵完整的二叉树(有时是二叉树或严格的二叉树)是一棵树,其中除了叶子之外的每个节点都有两个孩子。
生成霍夫曼代码树的方式肯定会生成完整的二叉树。因为在算法的每一步,我们都会从队列中移除优先级最高(概率最低)的两个节点,并以这两个节点为子节点创建一个新的内部节点。