# Delta encoding

> Mediated Wiki article. Canonical URL: https://mediated.wiki/source/Delta_encoding
> Markdown URL: https://mediated.wiki/source/Delta_encoding.md
> Source: https://en.wikipedia.org/wiki/Delta_encoding
> Source revision: 1356931719
> License: Creative Commons Attribution-ShareAlike 4.0 International (https://creativecommons.org/licenses/by-sa/4.0/)

{{Short description|Type of data transmission method}}
{{Distinguish|Elias delta coding|delta modulation}}

{{Merging from|Data differencing|afd=Data differencing|date=April 2026}}
'''Delta encoding''' is a way of storing or transmitting [data](/source/data) in the form of ''[differences](/source/Data_differencing)'' (deltas) between sequential data rather than complete files; more generally this is known as [data differencing](/source/data_differencing). Delta encoding is sometimes called '''delta compression''', particularly where archival histories of changes are required (e.g., in [revision control software](/source/revision_control_software)).

The differences are recorded in discrete files called "deltas" or "diffs". In situations where differences are small – for example, the change of a few words in a large document or the change of a few records in a large table – delta encoding greatly reduces data redundancy. Collections of unique deltas are substantially more space-efficient than their non-encoded equivalents.

From a logical point of view, the difference between two data values is the information required to obtain one value from the other – see [relative entropy](/source/relative_entropy). The difference between identical values (under some [equivalence](/source/equivalence_relation)) is often called <math>0</math> or the neutral element.

== Numeric example ==
Perhaps the simplest example is storing values of bytes as differences (deltas) between sequential values, rather than the values themselves. So, instead of <math>(2, 4, 6, 9, 7)</math>, we would store <math>(2, 2, 2, 3, −2)</math>. This reduces the [variance](/source/variance) (range) of the values when neighbor samples are correlated, enabling a lower bit usage for the same data. [IFF](/source/Interchange_File_Format) [8SVX](/source/8SVX) sound format applies this encoding to raw sound data before applying compression to it. Not even all 8-bit sound [samples](/source/sampling_(signal_processing)) compress better when delta encoded, and the usability of delta encoding is even smaller for 16-bit and better samples. Therefore, compression algorithms often choose to delta encode only when the compression is better than without. However, in [video compression](/source/video_compression), delta frames can considerably reduce frame size and are used in virtually every video compression [codec](/source/codec). These frames require a different type of ''delta'' calculation.

==Definition==

=== Numeric ===
The delta of numeric values is straightforwardly defined by subtraction, as shown in the example above. Due to the property of arithmetic operations, the numeric delta is ''symmetric'': if the latter value <math>b</math> is known, it is possible to apply the delta <math>b-a</math> in reverse to obtain the previous value <math>a</math>.

It is also possible to produce a numeric "delta of deltas". This can be useful if the underlying data is generated by a process resembling <math>f(x) = ax + b</math>, as second-order delta would flatten the data to a string of zeros. [Timestamp](/source/Timestamp)s often behave this way.<ref name=tigerdata/> Even higher-order deltas may be useful for real-life data: for example, the measured distance ([pseudorange](/source/pseudorange)) to a navigation satellite over time is best compressed using a third-order delta, a "delta of deltas of deltas".<ref name="Hatanaka2008">{{cite journal|last1=Hatanaka|first1=Yuki|title=A Compression Format and Tools for GNSS Observation Data| journal = Bulletin of the [Geographical Survey Institute](/source/Geographical_Survey_Institute) | volume = 55 | pages = 21–30| year = 2008|url=https://www.gsi.go.jp/common/000045517.pdf|accessdate=2020-09-25}}</ref>

In addition to subtraction, the [bitwise](/source/bitwise) [exclusive or](/source/exclusive_or) (XOR) also produces a symmetric delta. [Time series database](/source/Time_series_database)s often use the XOR operation as a delta between floating-point numbers as it produces easily compressible differences consisting of mostly zero bits.<ref name=tigerdata>{{cite web |title=Time-series compression algorithms, explained |url=https://www.tigerdata.com/blog/time-series-compression-algorithms-explained |website=Tiger Data Blog |language=en |date=22 April 2020}}</ref>

Another variation is to change the distance between elements used for the delta operation: instead of calculating <math>b-a</math> in the sequence <math>(a, b, c, d)</math>, a "distance-2" delta would calculate <math>c-a</math>. This is seen in the "delta" filter of [xz](/source/XZ_Utils).<ref>{{man|1|xz}} "--delta[=options] ... Supported options: dist=distance Specify the distance of the delta calculation in bytes. distance must be 1–256.</ref>

====Sample C code====
The following [C](/source/C_(programming_language)) code performs a simple form of delta encoding and decoding on a sequence of characters:<!-- Yes, it is possible to write smaller/faster functions; but these are given because they are easily understandable -->

<syntaxhighlight lang="c">
void encode(uint8_t buffer[], size_t length) {
    uint8_t last = 0;
    for (size_t i = 0; i < length; ++i) {
        uint8_t current = buffer[i];
        buffer[i] = current - last;
        last = current;
    }
}

void decode(uint8_t buffer[], size_t length) {
    uint8_t last = 0;
    for (size_t i = 0; i < length; ++i) {
        uint8_t delta = buffer[i];
        buffer[i] = delta + last;
        last = buffer[i];
    }
}
</syntaxhighlight>

===Collections ===
For the difference between sequences or sets of values (e.g. character-strings and images), a delta can be defined in 2 ways, ''symmetric delta'' and ''directed delta''.
A ''symmetric delta'' between two sets includes enough information to make it reversible:
: <math>\Delta(\mathit{version}_1, \mathit{version}_2) = (\mathit{version}_1 \setminus \mathit{version}_2) \cup (\mathit{version}_2 \setminus \mathit{version}_1).</math>
The difference between two sequences is analogously defined, except that information for locating the additions and removals are added.
In computer implementations, it is exemplified by the output of [diff](/source/diff) (in the default, unified, and context modes): instructions in the form of "at position <math>x</math>, replace (old content) with (new content)".

A ''directed delta'', also called a change, is a sequence of (elementary) change operations which, when applied to <math>\mathit{version}_1</math>, yields another, <math>\mathit{version}_2</math> (note the correspondence to [transaction log](/source/transaction_log)s in databases). In computer implementations, they typically take the form of a language with two commands: ''copy data from <math>\mathit{version}_1</math>'' and ''write literal data''. An example is the output of diff in the ''edit script'' mode.

====Variants====
A variation of delta encoding which encodes differences between the [prefixes](/source/Prefix_(computer_science)) or [suffixes](/source/Suffix_(computer_science)) of [strings](/source/string_(computer_science)) is called [incremental encoding](/source/incremental_encoding). It is particularly effective for sorted lists with small differences between strings, such as a list of [word](/source/word)s from a [dictionary](/source/dictionary).

==Implementation issues==
The nature of the data to be encoded influences the effectiveness of a particular compression algorithm.

Delta encoding performs best when data has small or constant variation; for an unsorted data set, there may be little to no compression possible with this method.

In delta encoded transmission over a network where only a single copy of the file is available at each end of the communication channel, special [error control codes](/source/error-correction) are used to detect which parts of the file have changed since its previous version.
For example, [rsync](/source/rsync) uses a rolling [checksum](/source/checksum) algorithm based on Mark Adler's [adler-32](/source/adler-32) checksum.

==Examples==
===Binary delta compression===
{{excerpt|Binary delta compression}}

===Delta encoding in HTTP===
Another instance of use of delta encoding is [https://datatracker.ietf.org/doc/html/rfc3229 RFC 3229], "Delta encoding in HTTP", which proposes that [HTTP](/source/HTTP) servers should be able to send updated Web pages in the form of differences between versions (deltas), which should decrease Internet traffic, as most pages change slowly over time, rather than being completely rewritten repeatedly:

{{blockquote|This document describes how delta encoding can be supported as a compatible extension to HTTP/1.1.

Many HTTP (Hypertext Transport Protocol) requests cause the retrieval of slightly modified instances of resources for which the client already has a cache entry. Research has shown that such modifying updates are frequent, and that the modifications are typically much smaller than the actual entity. In such cases, HTTP would make more efficient use of network bandwidth if it could transfer a minimal description of the changes, rather than the entire new instance of the resource.

[...] We believe that it might be possible to support rsync using the "instance manipulation" framework described later in this document, but this has not been worked out in any detail.
}}

The suggested rsync-based framework was implemented in the ''rproxy'' system as a pair of HTTP proxies.<ref>{{cite web |title=rproxy: introduction |url=https://rproxy.samba.org/index.html |website=rproxy.samba.org}}</ref> Like the basic vcdiff-based implementation, both systems are rarely used.

=== Delta copying ===
''Delta copying'' is a fast way of copying a file that is partially changed, when a previous version is present on the destination location. With delta copying, only the changed part of a file is copied. It is usually used in [backup](/source/backup) or [file copying](/source/file_copying) software, often to save [bandwidth](/source/Bandwidth_(computing)) when copying between computers over a private network or the internet. One notable open-source example is [rsync](/source/rsync).<ref>{{Cite web |url=http://www.2brightsparks.com/bb/viewtopic.php?t=4473 |title=Feature request: Delta copying - 2BrightSparks |access-date=2016-04-29 |archive-date=2016-03-13 |archive-url=https://web.archive.org/web/20160313021725/http://www.2brightsparks.com/bb/viewtopic.php?t=4473 |url-status=dead }}</ref><ref>{{Cite web|url=https://www.bvckup2.com/support/forum/topic/739|title = Bvckup 2 &#124; Forum &#124; How delta copying works}}</ref><ref>[http://www.eggheadcafe.com/software/aspnet/33678264/delta-copying.aspx]{{Dead link|date=July 2019 |bot=InternetArchiveBot |fix-attempted=yes }}</ref>

==== Online backup ====
{{main|Online backup services}}
Many of the [online backup services](/source/online_backup_services) adopt this methodology, often known simply as ''deltas'', in order to give their users previous versions of the same file from previous backups. This reduces associated costs, not only in the amount of data that has to be stored as differing versions (as the whole of each changed version of a file has to be offered for users to access), but also those costs in the uploading (and sometimes the downloading) of each file that has been updated (by just the smaller delta having to be used, rather than the whole file).

==== Delta updates ====
{{main|delta update}}
For large software packages, there is usually little data changed between versions. Many vendors choose to use delta transfers to save time and bandwidth.

===Diff===
{{main|Diff}}
Diff is a file comparison program, which is mainly used for text files. By default, it generates symmetric deltas that are reversible. Two formats used for software [patches](/source/patch_(Unix)), ''context'' and ''unified'', provides additional context lines that allow for tolerating shifts in line number.

===Git===
{{main|Git (software)}}

The Git source code control system employs delta compression in an auxiliary "[http://git-scm.com/docs/git-repack git repack]" operation. Objects in the repository that have not yet been delta-compressed ("loose objects") are compared against a heuristically chosen subset of all other objects, and the common data and differences are concatenated into a "pack file" which is then compressed using conventional methods. In common use cases, where source or data files are changed incrementally between commits, this can result in significant space savings. The repack operation is typically performed as part of the " git gc"<ref>{{Cite web|url=https://git-scm.com/docs/git-gc|title=Git - git-gc Documentation|website=git-scm.com|accessdate=November 9, 2024}}</ref> process, which is triggered automatically when the numbers of loose objects or pack files exceed configured thresholds.

The format is documented in the pack-format page of the Git documentation. It implements a directed delta.<ref>{{cite web |title=Git - pack-format Documentation |url=https://git-scm.com/docs/pack-format |website=Git documentation |access-date=13 January 2020}}</ref>

===VCDIFF===
{{main|VCDIFF}}
One general format for directed delta encoding is VCDIFF, described in [https://datatracker.ietf.org/doc/html/rfc3284 RFC 3284]. [Free software](/source/Free_software) implementations include [Xdelta](/source/Xdelta) and open-vcdiff.

===GDIFF===
Generic Diff Format (GDIFF) is another directed delta encoding format. It was submitted to [W3C](/source/W3C) in 1997.<ref>{{Cite web|url=https://www.w3.org/TR/NOTE-gdiff-19970901.html|title=Generic Diff Format Specification|website=www.w3.org}}</ref> In many cases, VCDIFF has better compression rate than GDIFF.

===bsdiff===
Bsdiff is a binary diff program using [suffix sorting](/source/Suffix_array). For executables that contain many changes in pointer addresses, it performs better than VCDIFF-type "copy and literal" encodings. The intent is to find a way to generate a small diff without needing to parse assembly code (as in Google's Courgette). Bsdiff achieves this by allowing "copy" matches with errors, which are then corrected using an extra "add" array of bytewise differences. Since this array is mostly either zero or repeated values for offset changes, it takes up little space after compression.<ref>{{Cite web|url=http://www.daemonology.net/bsdiff/|title=Binary diff|website=www.daemonology.net|accessdate=November 9, 2024}}</ref>

Bsdiff is useful for delta updates. Google uses bsdiff in Chromium and Android. The ''deltarpm'' feature of the [RPM Package Manager](/source/RPM_Package_Manager) is based on a heavily modified bsdiff that can use a hash table for matching.<ref>{{cite web |title=rpmdelta/delta.c |url=https://github.com/rpm-software-management/deltarpm/blob/c5e0ca7482e2cfea5e4d902ffe488e0a71ed3e67/delta.c |publisher=rpm-software-management |access-date=13 January 2020 |date=3 July 2019}}</ref> [FreeBSD](/source/FreeBSD) also uses bsdiff for updates.<ref>{{cite web|author=Anonymous|website=GitHub Gist|url=https://gist.github.com/anonymous/e48209b03f1dd9625a992717e7b89c4f|title=NON-CRYPTANALYTIC ATTACKS AGAINST FREEBSD UPDATE COMPONENTS|date=May 2016}}</ref>

Since the 4.3 release of bsdiff in 2005, various improvements or fixes have been produced for it. Google maintains multiple versions of the code for each of its products.<ref>{{cite web |title=xtraeme/bsdiff-chromium: README.chromium |url=https://github.com/xtraeme/bsdiff-chromium/blob/master/README.chromium |website=GitHub |language=en |date=2012}}; {{cite web |title=courgette/third_party/bsdiff/README.chromium - chromium/src |url=https://chromium.googlesource.com/chromium/src/+/master/courgette/third_party/bsdiff/README.chromium |website=Git at Google}}; {{cite web |title= android/platform/external/bsdiff/|url=https://android.googlesource.com/platform/external/bsdiff/+/refs/heads/master|website=Git at Google}}</ref> FreeBSD takes many of Google's compatible changes, mainly a vulnerability fix and a switch to the faster {{code|divsufsort}} suffix-sorting routine.<ref>{{cite web|url=https://github.com/freebsd/freebsd/commits/master/usr.bin/bsdiff|website=GitHub|title=History for freebsd/usr.bin/bsdiff}}</ref> [Debian](/source/Debian) has a series of performance tweaks to the program.<ref>{{cite web|url=https://sources.debian.org/patches/bsdiff/ |website=Debian Patch Tracker |title=Package: bsdiff}}</ref>

''ddelta'' is a rewrite of bsdiff proposed for use in Debian's delta updates. Among other efficiency improvements, it uses a sliding window to reduce memory and CPU cost.<ref>{{cite web |last1=Klode |first1=Julian |title=julian-klode/ddelta |url=https://github.com/julian-klode/ddelta |website=GitHub |access-date=13 January 2020}}</ref>

==See also==

* {{Annotated link |Data differencing}}
* {{Annotated link |Interleaved deltas}}
* {{Annotated link |Source Code Control System}}
* {{Annotated link |String-to-string correction problem}}
* [Xdelta](/source/Xdelta): open-source delta encoder

==References==
{{reflist}}

==External links==
* {{IETF RFC|3229|link=no}} – Delta Encoding in HTTP

{{Compression methods}}
{{Version control software}}

{{DEFAULTSORT:Delta Encoding}}
Category:Lossless compression algorithms
Category:Data differencing
Category:Articles with example C code
Category:Data compression

---
Adapted from the Wikipedia article [Delta encoding](https://en.wikipedia.org/wiki/Delta_encoding) by Wikipedia contributors ([contributor history](https://en.wikipedia.org/wiki/Delta_encoding?action=history)). Available under [Creative Commons Attribution-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/). Changes may have been made.
