The family of Dense Codes (as text compressors).

This web is dedicated to the family of dense compressors. A new family of compression techniques well-suited for text compression that was born in 2003. Some of these techniques are semistatic(2) and others permit us to perform dynamic compression (4). All of them are statistical codes that use a word-based strategy and generate byte-oriented codewords. We also include here a dynamic word-based byte-oriented Huffman-based compressor (DPH), that was developed to obtain a base-line to compare our dynamic codes with.

The 2 semistatic techniques, which were developed in 2003, are called: End-Tagged Dense Code (ETDC) and (s,c)-Dense Code (SCDC).

The 4 dynamic dense codes are: Dynamic End-Tagged Dense Code (DETDC), Dynamic (s,c)-Dense Code (SCDC) , Dynamic Lightweight End-Tagged Dense Code (DLETDC), and Dynamic Lightweight(s,c)-Dense Code (DLSCDC) .

As a brief description, all these compressors obtain compression ratios around 30-35%, while being very fast at compression and decompression (some of them overcome gzip). As other techniques such as Plain Huffman or Tagged Huffman, our semistatic compressors (ETDC and SCDC) permit to search compressed text much faster than searching the uncompressed text. This is also applicable to our Lightweight compressors (DLETDC) and (DLSCDC), which represent an interesting contribution, since they become the first dynamic compression techniques that permit searching the compressed text faster than the uncompressed text. For a more detailled description, please refer to section publications.

We aim here at giving a small description of all our compressors and also at providing our source code under GNU General Public License. Moreover, we also include a small framework that allows to perform searches about our codes (except for DPH).

Other applications.

Our dense codes have been successfully used in different developments:

About us...

We are 3 researchers from the University of A Coruña (Spain) and 1 from the University of Chile (Chile).

About our developments

Please note that the compressor/decompressor/search programs developed here are the prototypes used to validate our techniques empirically (as a part of our research interest), and not a final product.

Source code is available in Section downloads .

Supported in part by Ministerio de Ciencia e Innovación (PGE and FEDER) [TIN2006-15071-C03-03, TIN2009-14560-C03-02, TIN2010-21246-C02-01, and CDTI CEN-20091048]; Xunta de Galicia (Feder) [2010/17] and (Agrupación Estratéxica) [CN 2012/211]; and Fondecyt [1-110066]