Antonio Fariña's PhD abstract.

Text databases are growing in the last years due to the widespread use of digital libraries, document databases and mainly because of the continuous growing of the Web. Compression comes up as an ideal solution that permits to reduce both storage requirements and input/output operations. Therefore, it is useful when transmitting data through a network.

Even though compression appeared in the rst half of the 20th century, in the last decade, new Huffman-based compression techniques appeared. Those techniques use words as the symbols to be compressed. They do not only improve the compression ratio obtained by other well-known methods (e.g. Ziv-Lempel), but also allow to efficiently perform searches inside the compressed text avoiding the need for decompression before the search. As a result, those searches are much faster than searches inside plain text.

Following the idea of word-based compression, in this thesis, we developed four new compression techniques that make up a new family of compressors. They are based in the utilization of dense codes. Among these four techniques, the rst two ones are semi-static techniques and the others are dynamic methods. They are called: End-Tagged Dense Code, (s,c)-Dense Code, Dynamic End-Tagged Dense Code, and Dynamic (s,c)-Dense Code.

Moreover, in this thesis, we have implemented a rst prototype of a word-based byte-oriented dynamic Hu man compressor. This technique was developed with the aim of having a competitive technique to compare against our two dynamic methods.

Our empirical results, obtained from the systematic empirical validation of our compressors in real corpora, show that our techniques become a fundamental contribution in the area of compression. Since these techniques compress more, and more efficiently than other widely used compressors (e.g. gzip, compress, etc.), they can be applied to both Text Retrieval systems and to systems oriented to data.

Supported in part by MCyT (PGE and FEDER) grant(TIC2003-06593), Xunta de Galicia grant(PGIDIT05SIN10502PR) and CYTED VII.19 RIBIDI Project.