Contents
Kraft–McMillan inequality
In coding theory, the Kraft–McMillan inequality gives a necessary and sufficient condition for the existence of a prefix code (in Leon G. Kraft's version) or a uniquely decodable code (in Brockway McMillan's version) for a given set of codeword lengths. Its applications to prefix codes and trees often find use in computer science and information theory. The prefix code can contain either finitely many or infinitely many codewords. Kraft's inequality was published in. However, Kraft's paper discusses only prefix codes, and attributes the analysis leading to the inequality to Raymond Redheffer. The result was independently discovered in. McMillan proves the result for the general case of uniquely decodable codes, and attributes the version for prefix codes to a spoken observation in 1955 by Joseph Leo Doob.
Applications and intuitions
Kraft's inequality limits the lengths of codewords in a prefix code: if one takes an exponential of the length of each valid codeword, the resulting set of values must look like a probability mass function, that is, it must have total measure less than or equal to one. Kraft's inequality can be thought of in terms of a constrained budget to be spent on codewords, with shorter codewords being more expensive. Among the useful properties following from the inequality are the following statements:
Formal statement
Let each source symbol from the alphabet be encoded into a uniquely decodable code over an alphabet of size r with codeword lengths Then Conversely, for a given set of natural numbers satisfying the above inequality, there exists a uniquely decodable code over an alphabet of size r with those codeword lengths.
Example: binary trees
Any binary tree can be viewed as defining a prefix code for the leaves of the tree. Kraft's inequality states that Here the sum is taken over the leaves of the tree, i.e. the nodes without any children. The depth is the distance to the root node. In the tree to the right, this sum is
Proof
Proof for prefix codes
First, let us show that the Kraft inequality holds whenever the code for S is a prefix code. Suppose that. Let A be the full r-ary tree of depth \ell_n (thus, every node of A at level < \ell_n has r children, while the nodes at level \ell_n are leaves). Every word of length over an r-ary alphabet corresponds to a node in this tree at depth \ell. The ith word in the prefix code corresponds to a node v_i; let A_i be the set of all leaf nodes (i.e. of nodes at depth \ell_n) in the subtree of A rooted at v_i. That subtree being of height, we have Since the code is a prefix code, those subtrees cannot share any leaves, which means that Thus, given that the total number of nodes at depth \ell_n is r^{\ell_n}, we have from which the result follows. Conversely, given any ordered sequence of n natural numbers, satisfying the Kraft inequality, one can construct a prefix code with codeword lengths equal to each \ell_i by choosing a word of length \ell_i arbitrarily, then ruling out all words of greater length that have it as a prefix. There again, we shall interpret this in terms of leaf nodes of an r-ary tree of depth \ell_n. First choose any node from the full tree at depth \ell_1; it corresponds to the first word of our new code. Since we are building a prefix code, all the descendants of this node (i.e., all words that have this first word as a prefix) become unsuitable for inclusion in the code. We consider the descendants at depth \ell_n (i.e., the leaf nodes among the descendants); there are such descendant nodes that are removed from consideration. The next iteration picks a (surviving) node at depth \ell_2 and removes further leaf nodes, and so on. After n iterations, we have removed a total of nodes. The question is whether we need to remove more leaf nodes than we actually have available — r^{\ell_n} in all — in the process of building the code. Since the Kraft inequality holds, we have indeed and thus a prefix code can be built. Note that as the choice of nodes at each step is largely arbitrary, many different suitable prefix codes can be built, in general.
Proof of the general case
Now we will prove that the Kraft inequality holds whenever S is a uniquely decodable code. (The converse needs not be proven, since we have already proven it for prefix codes, which is a stronger claim.) The proof is by Jack I. Karush. We need only prove it when there are finitely many codewords. If there are infinitely many codewords, then any finite subset of it is also uniquely decodable, so it satisfies the Kraft–McMillan inequality. Taking the limit, we have the inequality for the full code. Denote. The idea of the proof is to get an upper bound on C^m for and show that it can only hold for all m if C \leq 1. Rewrite C^m as Consider all m-powers S^m, in the form of words, where are indices between 1 and n. Note that, since S was assumed to uniquely decodable, implies. This means that each summand corresponds to exactly one word in S^m. This allows us to rewrite the equation to where q_\ell is the number of codewords in S^m of length \ell and \ell_{max} is the length of the longest codeword in S. For an r-letter alphabet there are only r^\ell possible words of length \ell, so. Using this, we upper bound C^m: Taking the m-th root, we get This bound holds for any. The right side is 1 asymptotically, so must hold (otherwise the inequality would be broken for a large enough m).
Alternative construction for the converse
Given a sequence of n natural numbers, satisfying the Kraft inequality, we can construct a prefix code as follows. Define the ith codeword, Ci, to be the first \ell_i digits after the radix point (e.g. decimal point) in the base r representation of Note that by Kraft's inequality, this sum is never more than 1. Hence the codewords capture the entire value of the sum. Therefore, for j > i, the first \ell_i digits of Cj form a larger number than Ci, so the code is prefix free.
Generalizations
The following generalization is found in. The previous theorem is the special case when .There is a generalization to quantum code.
This article is derived from Wikipedia and licensed under CC BY-SA 4.0. View the original article.
Wikipedia® is a registered trademark of the
Wikimedia Foundation, Inc.
Bliptext is not
affiliated with or endorsed by Wikipedia or the
Wikimedia Foundation.