# Truncated binary encoding > Mediated Wiki article. Canonical URL: https://mediated.wiki/source/Truncated_binary_encoding > Markdown URL: https://mediated.wiki/source/Truncated_binary_encoding.md > Source: https://en.wikipedia.org/wiki/Truncated_binary_encoding > Source revision: 1282068246 > License: Creative Commons Attribution-ShareAlike 4.0 International (https://creativecommons.org/licenses/by-sa/4.0/) {{More references|date=December 2009}} '''Truncated binary encoding''' is an [entropy encoding](/source/entropy_encoding) typically used for uniform [probability distribution](/source/probability_distribution)s with a finite alphabet. It is parameterized by an alphabet with total size of number ''n''. It is a slightly more general form of [binary encoding](/source/Binary_numeral_system) when ''n'' is not a [power of two](/source/power_of_two). If ''n'' is a power of two, then the coded value for 0 ≤ ''x'' < ''n'' is the simple binary code for ''x'' of length log₂(''n''). Otherwise let ''k'' = floor(log₂(''n'')), such that 2^''k'' < ''n'' < 2^''k''+1and let ''u'' = 2^''k''+1 − ''n''. Truncated binary encoding assigns the first ''u'' symbols codewords of length ''k'' and then assigns the remaining ''n'' − ''u'' symbols the last ''n'' − ''u'' codewords of length ''k'' + 1. Because all the codewords of length ''k'' + 1 consist of an unassigned codeword of length ''k'' with a "0" or "1" appended, the resulting code is a [prefix code](/source/prefix_code). == History == Used since at least 1984, '''phase-in codes''', also known as '''economy codes''',Eastman, Willard L, ''et al.'' (Aug. 1984) [https://patentimages.storage.googleapis.com/86/7d/56/dd49314023387d/US4464650.pdf Apparatus and Method for Compressing Data Signals and Restoring the Compressed Data Signals], US Patent 4,464,650.Acharya, Tinku et Já Já, Joseph F. (oct. 1996), [https://www.sciencedirect.com/science/article/abs/pii/0020025596000898 An on-line variable-length binary encoding of text], Information Sciences, vol 94 no 1-4, p. 1-22.Job van der Zwan. [https://observablehq.com/@jobleonard/phase-in-codes "Phase-in Codes"]. are also known as truncated binary encoding. ==Example with ''n'' = 5== For example, for the alphabet {0, 1, 2, 3, 4}, ''n'' = 5 and 2² ≤ ''n'' < 2³, hence ''k'' = 2 and ''u'' = 2³ − 5 = 3. Truncated binary encoding assigns the first ''u'' symbols the codewords 00, 01, and 10, all of length 2, then assigns the last ''n'' − ''u'' symbols the codewords 110 and 111, the last two codewords of length 3. For example, if ''n'' is 5, plain binary encoding and truncated binary encoding allocates the following [codewords](/source/Code_word_(communication)). Digits shown ~~struck~~ are not transmitted in truncated binary. {| class="wikitable" style="text-align:center" |- ! Truncated
binary !!colspan=3| Encoding !! Standard
binary |- |align=right| 0 ||bgcolor=silver| 0 || 0 || 0 ||align=left| 0 |- |align=right| 1 ||bgcolor=silver| 0 || 0 || 1 ||align=left| 1 |- |align=right| 2 ||bgcolor=silver| 0 || 1 || 0 ||align=left| 2 |- |align=right| UNUSED ||bgcolor=silver| 0 ||bgcolor=silver| 1 ||bgcolor=silver| 1 ||align=left| 3 |- |align=right| UNUSED ||bgcolor=silver| 1 ||bgcolor=silver| 0 ||bgcolor=silver| 0 ||align=left| 4 |- |align=right| UNUSED ||bgcolor=silver| 1 ||bgcolor=silver| 0 ||bgcolor=silver| 1 ||align=left| 5/UNUSED |- |align=right| 3 || 1 || 1 || 0 ||align=left| 6/UNUSED |- |align=right| 4 || 1 || 1 || 1 ||align=left| 7/UNUSED |} It takes 3 bits to encode ''n'' using straightforward binary encoding, hence 2³ − ''n'' = 8 − 5 = 3 are unused. In numerical terms, to send a value ''x'', where 0 ≤ ''x'' < ''n'', and where there are 2^''k'' ≤ ''n'' < 2^''k''+1 symbols, there are ''u'' = 2^''k''+1 − ''n'' unused entries when the alphabet size is rounded up to the nearest power of two. The process to encode the number ''x'' in truncated binary is: if ''x'' is less than ''u'', encode it in ''k'' binary bits; if ''x'' is greater than or equal to ''u'', encode the value ''x'' + ''u'' in ''k'' + 1 binary bits. ==Example with ''n'' = 10== Another example, encoding an alphabet of size 10 (between 0 and 9) requires 4 bits, but there are 2⁴ − 10 = 6 unused codes, so input values less than 6 have the first bit discarded, while input values greater than or equal to 6 are offset by 6 to the end of the binary space. (Unused patterns are not shown in this table.) {| class="wikitable" style="text-align:center" |- ! Input
value !! Offset !! Offset
value !! Standard
binary || Truncated
binary |- | 0 || 0 || 0 || 0000 || 000 |- | 1 || 0 || 1 || 0001 || 001 |- | 2 || 0 || 2 || 0010 || 010 |- | 3 || 0 || 3 || 0011 || 011 |- | 4 || 0 || 4 || 0100 || 100 |- | 5 || 0 || 5 || 0101 || 101 |- |colspan=5| |- | 6 || 6 || 12 || 0110 || 1100 |- | 7 || 6 || 13 || 0111 || 1101 |- | 8 || 6 || 14 || 1000 || 1110 |- | 9 || 6 || 15 || 1001 || 1111 |} To decode, read the first ''k'' bits. If they encode a value less than ''u'', decoding is complete. Otherwise, read an additional bit and subtract ''u'' from the result. ==Example with ''n'' = 7== Here is a more extreme case: with ''n'' = 7 the next power of 2 is 8, so ''k'' = 2 and ''u'' = 2³ − 7 = 1: {| class="wikitable" style="text-align:center" |- ! Input
value !! Offset !! Offset
value !! Standard
binary || Truncated
binary |- | 0 || 0 || 0 || 000 || 00 |- |colspan=5| |- | 1 || 1 || 2 || 001 || 010 |- | 2 || 1 || 3 || 010 || 011 |- | 3 || 1 || 4 || 011 || 100 |- | 4 || 1 || 5 || 100 || 101 |- | 5 || 1 || 6 || 101 || 110 |- | 6 || 1 || 7 || 110 || 111 |} This last example demonstrates that a leading zero bit does not always indicate a short code; if ''u'' < 2^''k'', some long codes will begin with a zero bit. == Simple algorithm == Generate the truncated binary encoding for a value ''x'', 0 ≤ ''x'' < ''n'', where ''n'' > 0 is the size of the alphabet containing ''x''. ''n'' need not be a power of two. string TruncatedBinary (int x, int n) { // Set k = floor(log2(n)), i.e., k such that 2^k <= n < 2^(k+1). int k = 0, t = n; while (t > 1) { k++; t >>= 1; } // Set u to the number of unused codewords = 2^(k+1) - n. int u = (1 << k + 1) - n; if (x < u) return Binary(x, k); else return Binary(x + u, k + 1)); } The routine Binary is expository; usually just the rightmost len bits of the variable ''x'' are desired. Here we simply output the binary code for ''x'' using len bits, padding with high-order 0s if necessary. string Binary (int x, int len) { string s = ""; while (x != 0) { if (even(x)) s = '0' + s; else s = '1' + s; x >>= 1; } while (s.Length < len) s = '0' + s; return s; } == On efficiency == If ''n'' is not a power of two, and ''k''-bit symbols are observed with probability ''p'', then (''k'' + 1)-bit symbols are observed with probability 1 − ''p''. We can calculate the expected number of bits per symbol

b_e

as :

b_e = p k + (1 - p) (k + 1).

Raw encoding of the symbol has

b_u = k + 1

bits. Then relative space saving ''s'' (see [Data compression ratio](/source/Data_compression_ratio)) of the encoding can be defined as :

s = 1 - \frac{b_e}{b_u} = 1 - \frac{p k + (1 - p) (k + 1)}{k + 1}.

When simplified, this expression leads to :

s = \frac{p}{k + 1} = \frac{p}{b_u}.

This indicates that relative efficiency of truncated binary encoding increases as probability ''p'' of ''k''-bit symbols increases, and the raw-encoding symbol bit-length

b_u

decreases. ==See also== * [Benford's law](/source/Benford's_law) * [Golomb coding](/source/Golomb_coding) ==References== {{Reflist}} {{DEFAULTSORT:Truncated Binary Encoding}} Category:Entropy coding Category:Lossless compression algorithms --- Adapted from the Wikipedia article [Truncated binary encoding](https://en.wikipedia.org/wiki/Truncated_binary_encoding) by Wikipedia contributors ([contributor history](https://en.wikipedia.org/wiki/Truncated_binary_encoding?action=history)). Available under [Creative Commons Attribution-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/). Changes may have been made.