Lossless Data Compression: LZ77

Dictionary-based Compressors
Concept
Algorithm
Example
Shortcomings
Variations

Shortcomings of LZ77

One of the main limitations of the LZ77 algorithm is that it uses only a small window into previously seen text, which means it continuously throws away valuable dictionary entries because they slide out of the dictionary. The longest match possible is roughly the size of the look-ahead buffer; many of the matches actually existing in the file may actually be much longer.

The sliding window makes the algorithm biased toward exploiting recency in the text, and this is not necessarily a loss. This can be a significant advantage, e.g. in a phonebook streams of data tend "look" more like what has been seen recently that what was seen long ago." (Nelson, 277) However, the street address would only show compression faintly, when two listings with the same last name lived at the same address.

One obvious way to get around this problem is to increase the size of the search and look-ahead buffers. However, changing these parameters will drastically increase the CPU time needed for compression. Since string comparisons between the search buffer phrases and the look-ahead buffer proceed sequentially, the runtime here will increase in direct proportion to the length of the look-ahead buffer. Also, changing the size of the search buffer will only marginally affect the efficiency because we are still using an algorithm that relies on recency to perform compression.