# Incremental encoding

> Mediated Wiki article. Canonical URL: https://mediated.wiki/source/Incremental_encoding
> Markdown URL: https://mediated.wiki/source/Incremental_encoding.md
> Source: https://en.wikipedia.org/wiki/Incremental_encoding
> Source revision: 1339882670
> License: Creative Commons Attribution-ShareAlike 4.0 International (https://creativecommons.org/licenses/by-sa/4.0/)

'''Incremental encoding''', also known as '''front compression''', '''back compression''', or '''front coding''', is a type of [delta encoding](/source/delta_encoding) [compression algorithm](/source/compression_algorithm) whereby common [prefix](/source/Prefix_(linguistics))es or [suffixes](/source/Affix) and their lengths are recorded so that they do not need to be duplicated. This algorithm is particularly well-suited for compressing [sorted](/source/sorting) [data](/source/data), e.g., a list of [word](/source/word)s from a [dictionary](/source/dictionary).

For example:
{| class="wikitable"
|-
! Input
! Common prefix
! Compressed output
|-
|
<pre>
myxa
myxophyta
myxopod
nab
nabbed
nabbing
nabit
nabk
nabob
nacarat
nacelle
</pre>
|
<pre>
no preceding word
'myx'
'myxop'
no common prefix
'nab'
'nabb'
'nab'
'nab'
'nab'
'na'
'nac'
</pre>
|
<pre>
0 myxa
3 ophyta
5 od
0 nab
3 bed
4 ing
3 it
3 k
3 ob
2 carat
3 elle
</pre>
|-
| align="right" | 64 bytes
|
| align="left" | 46 bytes
|-
|}

The encoding used to store the common prefix length itself varies from application to application. Typical techniques are storing the value as a single byte; [delta encoding](/source/delta_encoding), which stores only the change in the common prefix length; and various [universal codes](/source/Universal_code_(data_compression)). It may be combined with other general [lossless data compression](/source/lossless_data_compression) techniques such as [entropy encoding](/source/entropy_encoding) and [dictionary coder](/source/dictionary_coder)s to compress the remaining suffixes.

== Applications ==

Incremental encoding is widely used in information retrieval to compress the lexicons used in [search indexes](/source/Index_(search_engine)); these list all the words found in all the documents and a pointer for each one to a list of locations. Typically, it compresses these indexes by about 40%.<ref>Ian H. Witten, Alistair Moffat, Timothy C. Bell. Managing Gigabytes. Second edition. Academic Press. {{ISBN|1-55860-570-3}}. Section 4.1: Accessing the lexicon, subsection Front coding, pp.159&ndash;161.</ref>

As one example, incremental encoding is used as a starting point by the  [GNU locate](/source/GNU_locate) utility, in an index of filenames and directories. The [GNU locate](/source/GNU_locate) utility further uses [bigram](/source/bigram) encoding to further shorten popular filepath prefixes.

== References ==

<references />

{{Compression methods}}

Category:Lossless compression algorithms
Category:Database index techniques
Category:Data compression

{{storage-software-stub}}

---
Adapted from the Wikipedia article [Incremental encoding](https://en.wikipedia.org/wiki/Incremental_encoding) by Wikipedia contributors ([contributor history](https://en.wikipedia.org/wiki/Incremental_encoding?action=history)). Available under [Creative Commons Attribution-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/). Changes may have been made.
