Dictionary-based
Compressors
Concept
Algorithm
Example
Variations |
The Algorithm This is precisely the idea behind the LZW compression algorithm.
It starts with a 4K "dictionary" of all the single character with indexes from 0
to 255 (which the standard ASCII set). It then starts to expand the dictionary as it
processes the text, using the entries 256 to 4095 to refer to substrings. As a new string
is parsed, a new code is generated. Strings for parsing are formed by appending the
current character K to an existing string w. The algorithm for LZW compression is shown
below:
set w = NIL
loop
read a character K
if wK exists in the dictionary
w = wK
else
output the code for w
add wK to the string table
w = K
endloop
As the dictionary grows, redundant strings will be
coded as a single 2-byte number, resulting in a compressed file.
|