Statistical
Compressors
Concept
Algorithm
Example
Comparison (H vs. SF) |
An Example Applying the Shannon-Fano algorithm to the file with variable
symbols frequencies cited earlier, we get the result below. The first dividing line is
placed between the B and the C, assigning a count of 21 to the
upper group and 14 to the lower group, which is the closest to half. This means that
A and B will have a binary code starting with 0, while
C, D, and E will have binary codes starting with 1, as
shown. Applying the recursive definition of the algorithm, the next division occurs
betwees A and B, putting A on a leaf with code 00 and
B on another leaf with code 01. After performing four divisions, the three
most frequent symbols have a 2-bit code while the remaining, rarer symbols have 3-bit
codes.
A |
14 |
|
Second
Division |
|
0 |
0 |
|
B |
7 |
First
Division |
|
0 |
1 |
|
|
|
|
|
|
|
|
C |
5 |
|
Third
Division |
|
1 |
0 |
|
D |
5 |
|
Fourth
Division |
1 |
1 |
0 |
E |
4 |
|
|
1 |
1 |
1 |
The binary tree resulting from applying the Shannon-Fano
algorithm to the afore-mentioned text file is given below, and differs slightly from the
one obtained by applying the Huffman coding to the same exact character set.
Note that as with the Huffman tree, since all the characters are situated at the leaves
of the tree, the possibility that one code is a prefix of another code is altogether
avoided. The table below shows the resultant bit codes for each character:
Symbol |
Bit Code |
A |
00 |
B |
01 |
C |
10 |
D |
110 |
E |
111 |
|