Comma code

{{one source |date=April 2024}} A '''comma code''' is a type of prefix-free code in which a '''comma''', a particular symbol or sequence of symbols, occurs at the end of a code word and never occurs otherwise.<ref name="Wade1994">{{cite book |last=Wade |first=Graham |title=Signal Coding and Processing |url=https://books.google.com/books?id=CJswCy7_W8YC&pg=PA56 |date=8 September 1994 |publisher=Cambridge University Press |isbn=978-0-521-42336-6 |page=56}}</ref> This is an intuitive way to express arrays.

For example, Fibonacci coding is a comma code in which the comma is <code>11</code>. <code>11</code> and <code>1011</code> are valid Fibonacci code words, but <code>101</code>, <code>0111</code>, and <code>11011</code> are not.

== Examples == * Unary coding, in which the comma is <code>0</code>. This allows NULL values (when the code and comma is a single <code>0</code>, the value can be taken as a NULL or a 0). * Fibonacci coding, in which the comma is <code>11</code>. The comma of 11 implies that the two codes that are being used to depict data are 0,10. Which can translate to the exact bits 0,1 when representing arbitrary bit strings or numbers. If representing arbitrary bit strings or numbers using this method, one would simply write a '0' for a 0 and a '10' for a 1, and a '11' for the comma / separator and repeat the separator/comma for a NULL. That constructs a Fibonacci lookalike code which looks like a Fibonacci code but translates directly to the bit string rather than the number represented by the Fibonacci series. With standard Fibonacci coding, one is representing every integer as a Fibonnaci code, and the mapping of integer->code->integer encoding and decoding require Fibonacci analysis. With the Fibonacci lookalike codes, one takes the bitstring or a bitwise number and writes it as a series of 0s and 10s and ends the string/number with a '11'. This allows one to express arrays. {| class="wikitable" !Symbol !Reverse Binary representation !Fibonacci code word !Fibonacci lookalike code !Punctured Elias Code |- |1 |1 |11 |11 |1 1 |- |2 |01 |011 |0 11 |1 01 |- |3 |11 |0011 |10 11 |01 11 |- |4 |001 |1011 |0 0 11 |1 001 |- |5 |101 |00011 |10 0 11 |01 101 |- |6 |011 |10011 |0 10 11 |01 011 |- |7 |111 |01011 |10 10 11 |001 111 |- |8 |0001 |000011 |0 0 0 11 |1 0001 |- |9 |1001 |100011 |10 0 0 11 |01 1001 |- |10 |0101 |010011 |0 10 0 11 |01 0101 |- |11 |1101 |001011 |10 10 0 11 |001 1101 |- |12 |0011 |101011 |0 0 10 11 |01 0011 |- |13 |1011 |0000011 |10 0 10 11 |001 1011 |- |14 |0111 |1000011 |0 10 10 11 |001 0111 |} The Fibonacci code can be deconstructed into the data part and the pre-comma ( not the 11 but the count of 1s in the data ). This is a [https://www.firstpr.com.au/audiocomp/lossless/TechRep137.pdf punctured Elias code], which simply writes the count of 1s in the digits to follow. Or one can construct the Fibonacci code from the punctured Elias code by writing a 10 for every 1 in the data and a 11 for the last 1 in the data string. If the data is a random bitstring, then one can simply write 0 for 0 in the bitstring and 10 for 1 in the bitstring and a 11 as the comma/separator. This allows a NULL value which is simply 11.

This method allows a bitstring or number of length n to be expressed in '''1.5n+2''' bits assuming 0s and 1s are present in equal measure in the data.

* Loaded comma in Fibonacci lookalike codes - If there is guaranteed to be one bit in the bitstring, then that can be placed verbatim after the comma '11' and the rest of the bits translated 0 -> 0 and 1 -> 10.

{| class="wikitable" style="text-align: center;" !Symbol !Code !Loaded Comma |- |0 |0 | |- |1 |10 | |- |Last 0 | |110 |- |Last 1 | |111 |} This method allows a non null bitstring to be expressed in '''1.5n+1.5 bits''' assuming 0s and 1s are present in equal measure in the data. * All Huffman codes can be converted to comma codes by prepending a <code>1</code> to the entire code and using a single <code>0</code> as a code and the comma.

{| class="wikitable" style="text-align: center;" !Symbol !Code !Comma Code |- |Comma | - (N/A) |0 |- |0 |00 |100 |- |1 |01 |101 |- |2 |10 |110 |- |3 |11 |111 |}

The definition of word being a number of symbols ending in a comma, the equivalent of a space character. * 50% commas in all data axiom – All implied data specifically variable length bijective data can be shown to be consisting of exactly 50% of commas.

All scrambled data or suitably curated same-length data exhibits so called implied probability.

Such data that can be termed 'generic data' can be analysed using any interleaving unary code as headers where additional bijective bits (equal to the length of the unary code just read) are read as data while the unary code serves as an introduction or header for the data. This header serves as a comma. The data can be read in an interleaving fashion between each bit of the header or in post read fashion when the data is only read after the entire unary header code is read like Chen-Ho encoding.

It can be seen by random walk techniques and by statistical summation that all generic data has a header or comma of an average of 2 bits and data of an additional 2 bits (minimum 1).

This also allows for an inexpensive base increase algorithm before transmission in non binary communication channels, like base-3 or base-5 communication channels. {| class="wikitable" style="text-align: center;" !n !RL code !Next code !Bijective data (non-NULL) !Commas |- |1 |1<code>''?''</code> |0<code>''?''</code> |? (1=1,2=2) |, |- |2 |1<code>''?''</code>1<code>''?''</code> |0<code>''?''</code>0<code>''?''</code> |?? (3,4,5,6=11,12,21,22) |,, |- |3 |1<code>''?''</code>1<code>''?''</code>1<code>''?''</code> |0<code>''?''</code>0<code>''?''</code>0<code>''?''</code> |??? |,,, |- |4 |1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code> |0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code> |???? |,,,, |- |5 |1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code> |0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code> |????? |,,,,, |- |6 |1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code> |0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code> |?????? |,,,,,, |- |7 |1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code> |0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code> |??????? |,,,,,,, |- |8 |1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code> |0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code> |???????? |,,,,,,,, |- |9 |1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code> |0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code> |????????? |,,,,,,,,, |- |10 |1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code> |0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code> |?????????? |,,,,,,,,,, |- | colspan="5" |... |}

Where '?' is '1' or '2' for the value of the bijective digit that requires no further processing.

Of course we use a single comma to separate each field of data, therefore showing that all the data consists of 50% of commas. This is quite visible from an implied probability of 50% for the <code>0</code> code in Huffman base 3 codes: <code>0</code>,<code>10</code>,<code>11</code> (net 2/3 or 66.66% commas) or the base-5 comma code shown above. The cost-per-character quotient of higher base communication has to maintain near logarithmic values <math display="inline">\frac{log(base)}{log(2)}</math>for the data and less than 2-bits for the comma character to maintain cost effectiveness.

This method has an assurance of a '1' or '2' after every '0' (comma) and this property can be useful when designing around timing concerns in transmission. It can be somewhat expensive to convert a known binary value to ternary unless ternary bit costs are reduced to similar to binary bit costs, so this bit can be multiplexed in a separate binary channel if costs agree (this may require a read of an additional 'tail'/trailing portion of 2-bits pure data for the binary channel (from after the first bit of the first change as this is not an instantly-decodable code, simply read if using an instantly decodable unary code) to be similar to the 2 average ternary bits remaining on the primary channel equivalent to <math display="inline">2*\frac{log(3)}{log(2)}=3.17</math> bits before cost comparisons are factored in).

Not considering multiplexing, this method has a read efficiency of 3 ternary digits for a read of 4 binary bits or 1.33 bits. Or <math display="inline">\frac{4/3}\frac{log(3)}{log(2)}=84.12%</math>

This method allows a bitstring or number of length n to be expressed in '''2n''' bits assuming 0s and 1s are present in equal measure in the data.

* 66.66% (2/3) commas in all data axiom - All implied data specifically variable length data can be shown to be consisting of exactly 66.66% (2/3) of commas.

{| class="wikitable" style="text-align: center;" !n !RL code !Next code !Bijective data (has NULL) !Commas |- |1 |1 |0 |NULL (or 0) |, |- |2 |1<code>''?''</code>1 |0<code>''?''</code>0 |? (1=1,2=2) |,, |- |3 |1<code>''?''</code>1<code>''?''</code>1 |0<code>''?''</code>0<code>''?''</code>0 |?? (3,4,5,6=11,12,21,22) |,,, |- |4 |1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1 |0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0 |??? |,,,, |- |5 |1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1 |0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0 |???? |,,,,, |- |6 |1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1 |0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0 |????? |,,,,,, |- |7 |1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1 |0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0 |?????? |,,,,,,, |- |8 |1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1 |0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0 |??????? |,,,,,,,, |- |9 |1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1 |0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0 |???????? |,,,,,,,,, |- |10 |1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1 |0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0 |????????? |,,,,,,,,,, |- | colspan="5" |... |}

Where '?' is '1' or '2' for the value of the bijective digit that requires no further processing. This method results in statistical similarity to a simple 'implied read' of Huffman base 3 codes: <code>0</code>,<code>10</code>,<code>11</code> (net 2/3 or 66.66% commas).

It can be seen by random walk techniques and by statistical summation that all generic data has a header or comma of an average of 2 bits and data of an additional 1 bit (minimum 0).

This has no assurance of a '1' or '2' after every '0' (comma) a property that can be useful when designing around timing concerns in transmission.

This method has a read efficiency of 2 ternary digits for a read of 3 binary bits or 1.5 binary bits/ternary digit. Or <math display="inline">\frac{3/2}\frac{log(3)}{log(2)}=94.64%</math>

This method allows a bitstring or number of length n to be expressed in '''2n+1''' bits assuming 0s and 1s are present in equal measure in the data. One can assume that the value 0 is 0 bits or an empty string "" followed by the 1.

* 34.375% | 31.25% (~ 1/3) write commas for efficiency gains using number partitioning – Implied reads and writes using number partitioning techniques ('m' numbers divided into 'n' partitions result in n^m permutations) similar to Chen-Ho and Hertz encoding show greater efficiency of both reads and writes similar to nearly random distribution. Thus the use of codes makes less sense and the use of higher bases becomes more important. Similarly, a 'write' comma becomes any number in the base, a 'read' comma is the header shown below, Huffman base 4 codes: <code>0</code>,<code>10</code>,<code>110</code>,<code>111</code>. The main advantage to this technique apart from higher efficiency is that there is no base conversion required which would require the entire stream to be read first and then converted. The disadvantage is that the average number length becomes higher and similar to random number generation and timing concerns that govern ternary transmission come to the fore. With m=2 and n=2, we get, not forgetting that a value of '(2)' is essentially 0-bits:

{| class="wikitable" border="1" style="text-align:center" ! colspan="6" scope="col" |Binary encoding | rowspan="6" | ! colspan="3" scope="col" |Ternary digits |- ! scope="col" |READS - Code space (128 states) ! scope="col" |b3 ! scope="col" |b2 ! scope="col" |b1 ! scope="col" |b0 ! rowspan="5" scope="col" | !Values encoded ! scope="col" |Description ! scope="col" |WRITES - Occurrences (100 states) |- |50% (64 states) |'''0''' | bgcolor="#cef2e0" |a | bgcolor="#cedff2" |b | |(0–1) (0–1) |Two lower digits |44.44% (45 states) |- bgcolor="#f2f2f2" |25% (32 states) |'''1''' |'''0''' | bgcolor="#cef2e0" |a | |(2) (0–1) | rowspan="2" |One lower digit, one higher digit |22.22% (22 states) |- |12.5% (16 states) |'''1''' |'''1''' |'''0''' | bgcolor="#cedff2" |b |(0–1) (2) |22.22% (22 states) |- bgcolor="#f2f2f2" |12.5% (16 states) |'''1''' |'''1''' |'''1''' | |(2) (2) |Two higher digits |11.11% (11 states) |}

This method therefore has a read efficiency of 2 ternary digits for a read of <math display="inline">50*3+25*3+12.5*4+12.5*3=3.125</math> binary bits or 1.5625 binary bits/ternary digit. Or <math display="inline">\frac{3.125*\frac{1}2}\frac{log(3)}{log(2)}=98.58%</math>.

A write efficiency of 2 ternary digits for a write of <math display="inline">\frac{4}9*3+\frac{2}9*3+\frac{2}9*4+\frac{1}9*3=3.22</math> bits or 1.61 binary bits/ternary digit, or <math display="inline">\frac{\frac{log(3)}{log(2)}}{\frac{29}{9}*\frac{1}2}=98.38%</math>

* Cardinal numbers for efficient base conversion – Since it has been ascertained that comma codes are very similar to base conversion, the only concern being efficiency and timing, the direct conversion/mapping of 19 binary bits <math display="inline">2^{19} = 524288</math> numbers to 12 ternary trits <math display="inline">3^{12} = 531441</math> numbers allow for an efficiency of <math display="inline">\frac{2^{19}}{3^{12}}=98.65%</math> or <math display="inline">\frac{log_3{2^{19}}}{12}=99.9%</math> efficiency depending upon the method of calculation. This works because <math display="inline">2^{19} < 3^{12}</math> and <math display="inline">2^{19}</math>≃ <math display="inline">3^{12}</math>. This of course is more of a theoretical construct and has no mention about timing when trying to apply this to ternary transmission methods. It does however leave <math display="inline">531441 - 524288 = 7153</math> codes to design around timing concerns.

== See also == * Self-synchronizing code

== References == <references/>

Category:Coding theory