Entropy coding

{{Short description|Lossless data compression scheme}} In information theory, an '''entropy coding''' (or '''entropy encoding''') is any lossless data compression method that attempts to approach the lower bound declared by Shannon's source coding theorem, which states that any lossless data compression method must have an expected code length greater than or equal to the entropy of the source.<ref name="Duda2015">{{Cite book |last1=Duda |first1=Jarek |last2=Tahboub |first2=Khalid |last3=Gadgil |first3=Neeraj J. |last4=Delp |first4=Edward J. |title=2015 Picture Coding Symposium (PCS) |chapter=The use of asymmetric numeral systems as an accurate replacement for Huffman coding |date=May 2015 |chapter-url=https://ieeexplore.ieee.org/document/7170048/ |pages=65–69 |doi=10.1109/PCS.2015.7170048|isbn=978-1-4799-7783-3 |s2cid=20260346 |url=http://ruj.uj.edu.pl/xmlui/handle/item/24054 }}</ref><ref name="Shannon1948">{{Cite journal |last=Shannon |first=Claude E. |title=A Mathematical Theory of Communication |journal=Bell System Technical Journal |volume=27 |issue=3 |pages=379–423 |year=1948 |doi=10.1002/j.1538-7305.1948.tb01338.x}}</ref>

More precisely, the source coding theorem states that for any source distribution, the expected code length satisfies <math>\operatorname E_{x\sim P}[\ell(d(x))] \geq \operatorname E_{x\sim P}[-\log_b(P(x))]</math>, where <math>\ell</math> is the function specifying the number of symbols in a code word, <math>d</math> is the coding function, <math>b</math> is the number of symbols used to make output codes and <math>P</math> is the probability of the source symbol. An entropy coding attempts to approach this lower bound.<ref name="Shannon1948" /><ref name="CoverThomas">{{Cite book |last1=Cover |first1=Thomas M. |last2=Thomas |first2=Joy A. |title=Elements of Information Theory |publisher=John Wiley & Sons |edition=2nd |year=2006 |isbn=978-0-471-24195-9}}</ref>

Two of the most common entropy coding techniques are Huffman coding and arithmetic coding.<ref name="Huffman1952">{{Cite journal |last=Huffman |first=David |title=A Method for the Construction of Minimum-Redundancy Codes |journal=Proceedings of the IRE |publisher=Institute of Electrical and Electronics Engineers (IEEE) |volume=40 |issue=9 |year=1952 |issn=0096-8390 |doi=10.1109/jrproc.1952.273898 |pages=1098–1101}}</ref><ref name="Sayood2017">{{Cite book |last=Sayood |first=Khalid |title=Introduction to Data Compression |publisher=Morgan Kaufmann |edition=5th |year=2017 |isbn=978-0-12-809474-7}}</ref> If the approximate entropy characteristics of a data stream are known in advance (especially for signal compression), a simpler static code may be useful. These static codes include universal codes (such as Elias gamma coding or Fibonacci coding) and Golomb codes (such as unary coding or Rice coding).<ref name="Sayood2017" />

Since 2014, data compressors have started using the asymmetric numeral systems (ANS) family of entropy coding techniques, which allows combination of the compression ratio of arithmetic coding with a processing cost similar to Huffman coding.<ref name="Duda2013">{{Cite journal |last=Duda |first=Jarek |title=Asymmetric numeral systems: entropy coding combining speed of Huffman coding with compression rate of arithmetic coding |journal=arXiv preprint |year=2013 |arxiv=1311.2540 |bibcode=2013arXiv1311.2540D}}</ref><ref name="Duda2015" /> ANS has been adopted by compressors developed by Facebook (Zstandard), Apple (LZFSE), and Google (Draco), among others.<ref name="Duda2013" />

== Intuitive explanation ==

Entropy coding exploits the fact that some symbols occur more frequently than others. When symbol probabilities are unequal, some outcomes are more predictable, and this predictability can be used to represent the data in fewer bits. Conversely, when all symbols are equally likely, each symbol carries the maximum possible amount of information and no compression is possible.<ref name="CoverThomas" /><ref name="Shannon1948" />

'''When compression is not possible:''' A stream of independent fair coin flips, where heads and tails each occur with probability 0.5, has an entropy of 1 bit per symbol, exactly the cost of storing one binary digit. Since every symbol already takes the minimum possible space, there is no redundancy to exploit, and no entropy coding method can make the data any smaller on average. The same principle applies to larger alphabets: independent ternary symbols (0, 1, 2) each with probability 1/3 have an entropy of about 1.585 bits per symbol, the maximum for a three-symbol alphabet, and are likewise incompressible.<ref name="CoverThomas" /><ref name="Shannon1948" />

'''When compression is possible:''' If the same binary source instead produces 1s with probability 0.9 and 0s with probability 0.1, the entropy drops to about 0.469 bits per symbol. This is well below the 1-bit storage cost, because the predominance of 1s makes each symbol partially predictable. An entropy coder such as arithmetic coding can exploit this predictability to achieve a compression ratio of roughly 2.1:1 by assigning shorter codes to the more common symbol.<ref name="CoverThomas" /><ref name="Sayood2017" />

'''Practical example:''' English text has an alphabet of roughly 27 characters (26 letters plus a space). If all characters occurred equally often, each would require about 4.75 bits. However, because letter frequencies are highly unequal ('e' occurs far more often than 'z') and letters are not independent ('u' almost always follows 'q'), the true entropy of English has been estimated at roughly 1.0 to 1.5 bits per character. This large gap is what makes English text highly compressible.<ref name="Shannon1951">{{Cite journal |last=Shannon |first=Claude E. |title=Prediction and Entropy of Printed English |journal=Bell System Technical Journal |volume=30 |issue=1 |pages=50–64 |year=1951 |doi=10.1002/j.1538-7305.1951.tb01366.x}}</ref><ref name="CoverThomas" />

== Entropy as a measure of similarity ==

Besides using entropy coding as a way to compress digital data, an entropy encoder can also be used to measure the amount of similarity between streams of data and already existing classes of data. This is done by generating an entropy coder/compressor for each class of data; unknown data is then classified by feeding the uncompressed data to each compressor and seeing which compressor yields the highest compression. The coder with the best compression is probably the coder trained on the data that was most similar to the unknown data.<ref name="Cilibrasi2005">{{Cite journal |last1=Cilibrasi |first1=Rudi |last2=Vitányi |first2=Paul M. B. |title=Clustering by Compression |journal=IEEE Transactions on Information Theory |volume=51 |issue=4 |pages=1523–1545 |year=2005 |doi=10.1109/TIT.2005.844059 |s2cid=911}}</ref> This approach is grounded in the concept of normalized compression distance, a parameter-free, universal similarity metric based on compression that approximates the uncomputable normalized information distance.<ref name="Cilibrasi2005" /><ref name="Vitanyi2009">{{Cite book |last1=Vitányi |first1=Paul M. B. |last2=Balbach |first2=Frank J. |last3=Cilibrasi |first3=Rudi L. |last4=Li |first4=Ming |chapter=Normalized Information Distance |title=Information Theory and Statistical Learning |publisher=Springer |year=2009 |doi=10.1007/978-0-387-84816-7_3 |isbn=978-0-387-84816-7}}</ref>

==References== {{reflist}}

== External links == * ''[http://www.inference.phy.cam.ac.uk/mackay/itila/book.html Information Theory, Inference, and Learning Algorithms]'', by David MacKay (2003), gives an introduction to Shannon theory and data compression, including the Huffman coding and arithmetic coding. * ''[http://iphome.hhi.de/wiegand/assets/pdfs/VBpart1.pdf Source Coding],'' by T. Wiegand and H. Schwarz (2011).

Category:Entropy coding Category:Entropy and information Category:Data compression