# Prefix code

> Mediated Wiki article. Canonical URL: https://mediated.wiki/source/Prefix_code
> Markdown URL: https://mediated.wiki/source/Prefix_code.md
> Source: https://en.wikipedia.org/wiki/Prefix_code
> Source revision: 1347243775
> License: Creative Commons Attribution-ShareAlike 4.0 International (https://creativecommons.org/licenses/by-sa/4.0/)

Set of codewords, none a prefix of another

A **prefix code** is a type of [code](/source/Code) system distinguished by its possession of the **prefix property**, which requires that there is no whole [code word](/source/Code_word_(communication)) in the system that is a [prefix](/source/Prefix_(computer_science)) (initial segment) of any other code word in the system. It is trivially true for fixed-length codes, so only a point of consideration for [variable-length codes](/source/Variable-length_code).

For example, a code with code { a , bb } {\displaystyle \{{\texttt {a}},{\texttt {bb}}\}} has the prefix property; a code consisting of { a , b , ab , aa } {\displaystyle \{{\texttt {a}},{\texttt {b}},{\texttt {ab}},{\texttt {aa}}\}} does not, because a is a prefix of ab and also of aa. A prefix code is a [uniquely decodable code](/source/Uniquely_decodable_code): given a complete and accurate sequence, a receiver can identify each word without requiring a special marker between words. However, there are uniquely decodable codes that are not prefix codes; for instance, the reverse of a prefix code is still uniquely decodable (it is a suffix code), but it is not necessarily a prefix code.

Prefix codes are also known as **prefix-free codes**, **prefix condition codes** and **instantaneous codes**. Although [Huffman coding](/source/Huffman_coding) is just one of many algorithms for deriving prefix codes, prefix codes are also widely referred to as "Huffman codes", even when the code was not produced by a Huffman algorithm. The term **comma-free code** is sometimes also applied as a synonym for prefix-free codes[1][2] but in most mathematical books and articles (e.g.[3][4]) a comma-free code is used to mean a [self-synchronizing code](/source/Self-synchronizing_code), a subclass of prefix codes.

Using prefix codes, a message can be transmitted as a sequence of concatenated code words, without any [out-of-band](/source/Out-of-band_data) markers or (alternatively) special markers between words to [frame](/source/Framing_(telecommunication)) the words in the message. The recipient can decode the message unambiguously, by repeatedly finding and removing sequences that form valid code words. This is not generally possible with codes that lack the prefix property, for example { 0 , 1 , 10 , 11 } {\displaystyle \{{\texttt {0}},{\texttt {1}},{\texttt {10}},{\texttt {11}}\}} : a receiver reading a 1 at the start of a code word would not know whether that was the complete code word 1, or merely the prefix of the code word 10 or 11; so the string 10 could be interpreted either as a single codeword or as the concatenation of the words 1 then 0.

The variable-length [Huffman codes](/source/Huffman_coding), [telephone country codes](/source/Telephone_country_code), the country and publisher parts of [ISBNs](/source/ISBN), the Secondary Synchronization Codes used in the [UMTS](/source/UMTS) [W-CDMA](/source/W-CDMA) 3G Wireless Standard, and the [instruction sets](/source/Instruction_set) (machine language) of most computer microarchitectures are prefix codes.

Prefix codes are not [error-correcting codes](/source/Error-correcting_codes). In practice, a message might first be compressed with a prefix code, and then encoded again with [channel coding](/source/Channel_coding) (including error correction) before transmission.

For every [uniquely decodable](/source/Variable-length_code#Uniquely_decodable_codes) code there is a prefix code that has the same code word lengths.[5] [Kraft's inequality](/source/Kraft's_inequality) characterizes the sets of code word lengths that are possible in a uniquely decodable code.[6]

## Techniques

If every word in the code has the same length, the code is called a **fixed-length code**, or a **block code** (though the term [block code](/source/Block_code) is also used for fixed-size [error-correcting codes](/source/Error-correcting_code) in [channel coding](/source/Channel_coding)). For example, [ISO 8859-15](/source/ISO_8859-15) letters are always 8 bits long. [UTF-32/UCS-4](/source/UTF-32%2FUCS-4) letters are always 32 bits long. [ATM cells](/source/Asynchronous_Transfer_Mode) are always 424 bits (53 bytes) long. A fixed-length code of fixed length k {\displaystyle k} bits can encode up to 2 k {\displaystyle 2^{k}} source symbols.

A fixed-length code is necessarily a prefix code. It is possible to turn any code into a fixed-length code by padding fixed symbols to the shorter prefixes in order to meet the length of the longest prefixes. Alternately, such padding codes may be employed to introduce redundancy that allows autocorrection and/or synchronisation. However, fixed length encodings are inefficient in situations where some words are much more likely to be transmitted than others.

[Truncated binary encoding](/source/Truncated_binary_encoding) is a straightforward generalization of fixed-length codes to deal with cases where the number of symbols *n* is not a power of two. Source symbols are assigned codewords of length k {\displaystyle k} and k + 1 {\displaystyle k+1} , where k {\displaystyle k} is chosen so that 2 k < n ≤ 2 k + 1 {\displaystyle 2^{k}<n\leq 2^{k+1}} .

[Huffman coding](/source/Huffman_coding) is a more sophisticated technique for constructing variable-length prefix codes. The Huffman coding algorithm takes as input the frequencies that the code words should have, and constructs a prefix code that minimizes the weighted average of the code word lengths. (This is closely related to minimizing the entropy.) This is a form of [lossless data compression](/source/Lossless_data_compression) based on [entropy encoding](/source/Entropy_encoding).

Some codes mark the end of a code word with a special "comma" symbol (also called a [Sentinel value](/source/Sentinel_value)), different from normal data.[7] This is somewhat analogous to the spaces between words in a sentence; they mark where one word ends and another begins. If every code word ends in a comma, and the comma does not appear elsewhere in a code word, the code is automatically prefix-free. However, reserving an entire symbol only for use as a comma can be inefficient, especially for languages with a small number of symbols. [Morse code](/source/Morse_code) is an everyday example of a variable-length code with a comma. The long pauses between letters, and the even longer pauses between words, help people recognize where one letter (or word) ends, and the next begins. Similarly, [Fibonacci coding](/source/Fibonacci_coding) uses a 11 to mark the end of every code word.

[Self-synchronizing codes](/source/Self-synchronizing_code) are prefix codes that allow [frame synchronization](/source/Frame_synchronization).

## Related concepts

A **suffix code** is a set of words none of which is a suffix of any other; equivalently, a set of words which are the reverse of a prefix code. As with a prefix code, the representation of a string as a concatenation of such words is unique. A **bifix code** is a set of words which is both a prefix and a suffix code.[8] An **optimal prefix code** is a prefix code with minimal average length. That is, assume an alphabet of n symbols with probabilities p ( A i ) {\displaystyle p(A_{i})} for a prefix code C. If C' is another prefix code and λ i ′ {\displaystyle \lambda '_{i}} are the lengths of the codewords of C', then ∑ i = 1 n λ i p ( A i ) ≤ ∑ i = 1 n λ i ′ p ( A i ) {\displaystyle \sum _{i=1}^{n}{\lambda _{i}p(A_{i})}\leq \sum _{i=1}^{n}{\lambda '_{i}p(A_{i})}\!} .[9]

## Prefix codes in use today

Examples of prefix codes include:

- variable-length [Huffman codes](/source/Huffman_coding)

- [telephone country codes](/source/Telephone_country_code)

- [Chen–Ho encoding](/source/Chen%E2%80%93Ho_encoding)

- the country and publisher parts of [ISBNs](/source/ISBN)

- the Secondary Synchronization Codes used in the [UMTS](/source/UMTS) [W-CDMA](/source/W-CDMA) 3G Wireless Standard

- [VCR Plus+ codes](/source/VCR_Plus)

- [Unicode Transformation Format](/source/Unicode_Transformation_Format), in particular the [UTF-8](/source/UTF-8) system for encoding [Unicode](/source/Unicode) characters, which is both a prefix-free code and a [self-synchronizing code](/source/Self-synchronizing_code)[10]

- [opcode prefixes](/source/Opcode_prefix) used in computer instruction sets

- [variable-length quantity](/source/Variable-length_quantity)

### Techniques

Commonly used techniques for constructing prefix codes include [Huffman codes](/source/Huffman_coding) and the earlier [Shannon–Fano codes](/source/Shannon%E2%80%93Fano_coding), and [universal codes](/source/Universal_code_(data_compression)) such as:

- [Elias delta coding](/source/Elias_delta_coding)

- [Elias gamma coding](/source/Elias_gamma_coding)

- [Elias omega coding](/source/Elias_omega_coding)

- [Fibonacci coding](/source/Fibonacci_coding)

- [Levenshtein coding](/source/Levenshtein_coding)

- [Unary coding](/source/Unary_coding)

- [Golomb Rice code](/source/Golomb_Rice_code)

- [Straddling checkerboard](/source/Straddling_checkerboard) (simple cryptography technique which produces prefix codes)

- binary coding[11]

## Notes

1. **[^](#cite_ref-1)** US [Federal Standard 1037C](/source/Federal_Standard_1037C)

1. **[^](#cite_ref-2)** [*ATIS Telecom Glossary 2007*](https://web.archive.org/web/20100708083829/http://www.atis.org/glossary/definition.aspx?id=6416), archived from [the original](http://www.atis.org/glossary/definition.aspx?id=6416) on July 8, 2010, retrieved December 4, 2010

1. **[^](#cite_ref-3)** Berstel, Jean; Perrin, Dominique (1985), *Theory of Codes*, Academic Press

1. **[^](#cite_ref-4)** [Golomb, S. W.](/source/Solomon_W._Golomb); [Gordon, Basil](/source/Basil_Gordon); Welch, L. R. (1958), ["Comma-Free Codes"](https://books.google.com/books?id=oRgtS14oa-sC&pg=PA202), *Canadian Journal of Mathematics*, **10** (2): 202–209, [doi](/source/Doi_(identifier)):[10.4153/CJM-1958-023-9](https://doi.org/10.4153%2FCJM-1958-023-9), [S2CID](/source/S2CID_(identifier)) [124092269](https://api.semanticscholar.org/CorpusID:124092269)

1. **[^](#cite_ref-LTU2015_5-0)** Le Boudec, Jean-Yves, Patrick Thiran, and Rüdiger Urbanke. Introduction aux sciences de l'information: entropie, compression, chiffrement et correction d'erreurs. PPUR Presses polytechniques, 2015.

1. **[^](#cite_ref-BRS75_6-0)** Berstel et al (2010) p.75

1. **[^](#cite_ref-7)** A. Jones, J. ["Development of Trigger and Control Systems for CMS"](https://web.archive.org/web/20110613183447/http://www.imperial.ac.uk/research/hep/group/theses/JJones.pdf) (PDF). High Energy Physics, Blackett Laboratory, Imperial College, London. p. 70. Archived from [the original](http://www.imperial.ac.uk/research/hep/group/theses/JJones.pdf) (PDF) on Jun 13, 2011.

1. **[^](#cite_ref-BPR58_8-0)** Berstel et al (2010) p.58

1. **[^](#cite_ref-9)** [McGill COMP 423 Lecture notes](https://www.cim.mcgill.ca/~langer/423/lecture2.pdf)

1. **[^](#cite_ref-10)** Pike, Rob (2003-04-03). ["UTF-8 history"](http://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt).

1. **[^](#cite_ref-11)** [Shevchuk, Y. V.](https://en.wikipedia.org/w/index.php?title=Yury_V._Shevchuk&action=edit&redlink=1) (2018), ["Vbinary: variable length integer coding revisited"](http://psta.psiras.ru//read/psta2018_4_239-252.pdf) (PDF), *Program Systems: Theory and Applications*, **9** (4): 239–252, [doi](/source/Doi_(identifier)):[10.25209/2079-3316-2018-9-4-239-252](https://doi.org/10.25209%2F2079-3316-2018-9-4-239-252)

## References

- Berstel, Jean; Perrin, Dominique; Reutenauer, Christophe (2010). [*Codes and automata*](http://www-igm.univ-mlv.fr/~berstel/LivreCodes/Codes.html). Encyclopedia of Mathematics and its Applications. Vol. 129. Cambridge: [Cambridge University Press](/source/Cambridge_University_Press). [ISBN](/source/ISBN_(identifier)) [978-0-521-88831-8](https://en.wikipedia.org/wiki/Special:BookSources/978-0-521-88831-8). [Zbl](/source/Zbl_(identifier)) [1187.94001](https://zbmath.org/?format=complete&q=an:1187.94001).

- [Elias, Peter](/source/Peter_Elias) (1975). "Universal codeword sets and representations of the integers". *IEEE Trans. Inf. Theory*. **21** (2): 194–203. [doi](/source/Doi_(identifier)):[10.1109/tit.1975.1055349](https://doi.org/10.1109%2Ftit.1975.1055349). [ISSN](/source/ISSN_(identifier)) [0018-9448](https://search.worldcat.org/issn/0018-9448). [Zbl](/source/Zbl_(identifier)) [0298.94011](https://zbmath.org/?format=complete&q=an:0298.94011).

- D.A. Huffman, "A method for the construction of minimum-redundancy codes", Proceedings of the I.R.E., Sept. 1952, pp. 1098–1102 (Huffman's original article)

- [Profile: David A. Huffman](https://web.archive.org/web/20070220234037/http://www.huffmancoding.com/david/scientific.html), [Scientific American](/source/Scientific_American), Sept. 1991, pp. 54–58 (Background story)

- [Thomas H. Cormen](/source/Thomas_H._Cormen), [Charles E. Leiserson](/source/Charles_E._Leiserson), [Ronald L. Rivest](/source/Ronald_L._Rivest), and [Clifford Stein](/source/Clifford_Stein). *[Introduction to Algorithms](/source/Introduction_to_Algorithms)*, Second Edition. MIT Press and McGraw-Hill, 2001. [ISBN](/source/ISBN_(identifier)) [0-262-03293-7](https://en.wikipedia.org/wiki/Special:BookSources/0-262-03293-7). Section 16.3, pp. 385–392.

- This article incorporates [public domain material](/source/Copyright_status_of_works_by_the_federal_government_of_the_United_States) from [*Federal Standard 1037C*](https://web.archive.org/web/20220122224547/https://www.its.bldrdoc.gov/fs-1037/fs-1037c.htm). [General Services Administration](/source/General_Services_Administration). Archived from [the original](https://www.its.bldrdoc.gov/fs-1037/fs-1037c.htm) on 2022-01-22.

## External links

- [Codes, trees and the prefix property](http://plus.maths.org/issue10/features/infotheory/index.html) by Kona Macphee

v t e Data compression methods Lossless type Entropy Adaptive coding Arithmetic Asymmetric numeral systems Golomb Huffman Adaptive Canonical Modified Range Shannon Shannon–Fano Shannon–Fano–Elias Tunstall Unary Universal Exp-Golomb Fibonacci Gamma Levenshtein Dictionary Byte-pair encoding Lempel–Ziv 842 LZ4 LZJB LZO LZRW LZSS LZW LZWL Snappy Other BWT CTW CM Delta Incremental DMC DPCM Grammar Re-Pair Sequitur LDCT MTF PAQ PPM RLE Hybrid LZ77 + Huffman Deflate LZX LZS LZ77 + ANS LZFSE LZ77 + Huffman + ANS Zstandard LZ77 + Huffman + context Brotli LZSS + Huffman LHA/LZH LZ77 + Range LZMA LZHAM RLE + BWT + MTF + Huffman bzip2 Lossy type Transform Discrete cosine transform DCT MDCT DST FFT Wavelet Daubechies DWT SPIHT Predictive DPCM ADPCM LPC ACELP CELP LAR LSP WLPC Motion Compensation Estimation Vector Psychoacoustic Audio Concepts Bit rate ABR CBR VBR Companding Convolution Dynamic range Latency Nyquist–Shannon theorem Sampling Silence compression Sound quality Speech coding Sub-band coding Codec parts A-law μ-law DPCM ADPCM DM FT FFT LPC ACELP CELP LAR LSP WLPC MDCT Psychoacoustic model Image Concepts Chroma subsampling Coding tree unit Color space Compression artifact Image resolution Macroblock Pixel PSNR Quantization Standard test image Texture compression Methods Chain code DCT Deflate Fractal KLT LP RLE Wavelet Daubechies DWT EZW SPIHT Video Concepts Bit rate ABR CBR VBR Display resolution Frame Frame rate Frame types Interlace Video characteristics Video quality Codec parts DCT DPCM Deblocking filter Lapped transform Motion Compensation Estimation Vector Wavelet Daubechies DWT Theory Compressed data structures Compressed suffix array FM-index Entropy Information theory Timeline Kolmogorov complexity Prefix code Quantization Rate–distortion Redundancy Symmetry Smallest grammar problem Community Hutter Prize People Mark Adler David A. Huffman Phil Katz

---
Adapted from the Wikipedia article [Prefix code](https://en.wikipedia.org/wiki/Prefix_code) by Wikipedia contributors ([contributor history](https://en.wikipedia.org/wiki/Prefix_code?action=history)). Available under [Creative Commons Attribution-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/). Changes may have been made.
