Splay Trees for Data Compression
Abstract
We present applications of splay trees to two topics in data
compression. First is a variant of the move-to-front (mtf) data
compression (of Bentley,Sleator Tarjan and Wei) algorithm, where we introduce
secondary
list(s). This seems to capture higher-order correlations.
An implementation of this algorithm with
Sleator-Tarjan splay trees runs in time (provably) proportional to the
entropy of the input sequence. When tested on some telephony
data, compression ratio and run time showed significant improvements
over original mtf-algorithm, making it competitive or better than
popular programs. For stationary ergodic sources, we analyse the
compression and output distribution of the original mtf-algorithm,
which suggests why the secondary list is appropriate to introduce. We
also derive analytical upper bounds on the average codeword length in
terms of stochastic parameters of the source.
Secondly, we consider the compression (or coding) of source sequences
where the codewords are required to preserve the alphabetic order of
the source symbols. We describe the use of the semisplay tree, the
regular splay tree, and splay trees with depth three and four for this
application, and derive upper bounds on their compression efficiency.
For example, the average codeword length of the semisplay tree is is
no more than twice the source entropy plus some minor terms.