# File comparison

> Mediated Wiki article. Canonical URL: https://mediated.wiki/source/File_comparison
> Markdown URL: https://mediated.wiki/source/File_comparison.md
> Source: https://en.wikipedia.org/wiki/File_comparison
> Source revision: 1350408287
> License: Creative Commons Attribution-ShareAlike 4.0 International (https://creativecommons.org/licenses/by-sa/4.0/)

Diff and merge files on computers

This article is about data object, text, and file comparisons in computing. For other uses, see [Comparison](/source/Comparison_(disambiguation)).

The [KDE](/source/KDE) [diff](/source/Diff) tool *[Kompare](https://en.wikipedia.org/w/index.php?title=Kompare&action=edit&redlink=1)*

In [computing](/source/Computing), **file comparison** is the calculation and display of the differences and similarities between data objects, typically [text files](/source/Text_file) such as [source code](/source/Source_code).

The methods, implementations, and results are typically called a **diff**,[1] after the [Unix](/source/Unix) [diff utility](/source/Diff_utility). The output may be presented in a [graphical user interface](/source/Graphical_user_interface) or used as part of larger tasks in [networks](/source/Computer_network), [file systems](/source/File_system), or [revision control](/source/Revision_control).

Some widely used file comparison programs are [diff](/source/Diff), [cmp](/source/Cmp_(Unix)), [FileMerge](/source/Apple_Developer_Tools#FileMerge), [WinMerge](/source/WinMerge), [Beyond Compare](/source/Beyond_Compare), and [File Compare](/source/File_Compare).

Many [text editors](/source/Text_editor) and [word processors](/source/Word_processor) perform file comparison to highlight the changes to a file or document.

## Method types

Most file comparison tools find the [longest common subsequence](/source/Longest_common_subsequence_problem) between two files. Any data not in the longest common subsequence is presented as a change or an insertion or a deletion.

In 1978, Paul Heckel published an algorithm that identifies most moved blocks of text.[2] This is used in the [IBM History Flow tool](/source/IBM_History_Flow_tool).[3] Other file comparison programs find block moves.[*[clarification needed](https://en.wikipedia.org/wiki/Wikipedia:Please_clarify)*]

Some specialized file comparison tools find the [longest increasing subsequence](/source/Longest_increasing_subsequence) between two files.[4] The [rsync](/source/Rsync) protocol uses a [rolling hash](/source/Rolling_hash) function to compare two files on two distant computers with low communication overhead.

File comparison in word processors is typically at the word level, while comparison in most programming tools is at the line level. Byte or character-level comparison is useful in some specialized applications.

## Display

Display of file comparison varies, with the main approaches being either showing two files side-by-side, or showing a single file, with markup showing the changes from one file to the other. In either case, particularly side-by-side viewing, [code folding](/source/Code_folding) or [text folding](/source/Folding_editor) may be used to hide unchanged portions of the file, only showing the changed portions.[*[clarification needed](https://en.wikipedia.org/wiki/Wikipedia:Please_clarify)*]

## Reasoning

Comparison tools are used for various reasons. When one wishes to compare binary files, byte-level is probably best. But if one wishes to compare [text files](/source/Text_file) or [computer programs](/source/Computer_program), a side-by-side visual comparison is usually best.[5] This gives the user the chance to decide which file is the preferred one to retain, if the files should be merged to create one containing all the differences,[6] or perhaps to keep them both as-is for later reference, through some form of "versioning" control.

File comparison is an important, and most likely integral, part of [file synchronization](/source/File_synchronization) and [backup](/source/Backup). In backup methodologies, the issue of [data corruption](/source/Data_corruption) is an important one. Corruption occurs without warning and without one's knowledge; at least usually until too late to recover the missing parts. Usually, the only way to know for sure if a file has become corrupted is when it is next used or opened. Barring that, one must use a comparison tool to at least recognize that a difference has occurred. Therefore, all file sync or backup programs must include file comparison if these programs are to be actually useful and trusted.[7]

## Historical uses

Prior to file comparison, machines existed to compare magnetic tapes or punch cards. The [IBM 519 Card Reproducer](/source/IBM_519) could determine whether a deck of [punched cards](/source/Punched_card) were equivalent. In 1957, [John Van Gardner](https://en.wikipedia.org/w/index.php?title=John_Van_Gardner&action=edit&redlink=1) developed a system to compare the [check sums](/source/Check_sum) of loaded sections of [Fortran](/source/Fortran) programs to [debug](/source/Debugging) compilation problems on the [IBM 704](/source/IBM_704).[8]

## See also

- [Comparison of file comparison tools](/source/Comparison_of_file_comparison_tools)

- [Computer-assisted reviewing](/source/Computer-assisted_reviewing) – Text-comparison software

- [Data differencing](/source/Data_differencing) – Method for compressing changes over time

- [Delta encoding](/source/Delta_encoding) – Type of data transmission method

- [Document comparison](/source/Document_comparison) – Computer document process

- [Edit distance](/source/Edit_distance) – Computer science metric of string similarity

- [String metric](/source/String_metric) – Metric that measures the distance between two strings of text

- [Ramseyer Rule](/source/Ramseyer_Rule) - standard format for amending legal text

## References

1. **[^](#cite_ref-1)** ["diff", The Jargon File](http://catb.org/jargon/html/D/diff.html).

1. **[^](#cite_ref-2)** Heckel, Paul (1978), ["A Technique for Isolating Differences Between Files"](https://documents.scribd.com/docs/10ro9oowpo1h81pgh1as.pdf) (PDF), *Communications of the ACM*, **21** (4): 264–268, [doi](/source/Doi_(identifier)):[10.1145/359460.359467](https://doi.org/10.1145%2F359460.359467), [S2CID](/source/S2CID_(identifier)) [207683976](https://api.semanticscholar.org/CorpusID:207683976), retrieved 2011-12-04

1. **[^](#cite_ref-3)** Viégas, Fernanda B.; Wattenberg, Martin; Kushal, Kushal Dave (2004), [*Studying Cooperation and Conflict between Authors with history flow Visualizations*](http://domino.watson.ibm.com/cambridge/research.nsf/58bac2a2a6b05a1285256b30005b3953/53240210b04ea0eb85256f7300567f7e/$FILE/TR2004-19.pdf) (PDF), vol. 6, Vienna: CHI, pp. 575–582, retrieved 2011-12-01

1. **[^](#cite_ref-PatentUS7031972B2_4-0)** Liwei Ren; Jinsheng Gu; Luosheng Peng (18 April 2006). ["Algorithms for block-level code alignment of software binary files"](https://patents.google.com/patent/US7031972). *Google Patents*. USPTO. Retrieved 10 May 2019.

1. **[^](#cite_ref-5)** MacKenzie, David; Eggert, Paul; Stallman, Richard (2003). [*Comparing and Merging Files with Gnu Diff and Patch*](https://books.google.com/books?id=oIINAAAACAAJ). Network Theory. [ISBN](/source/ISBN_(identifier)) [978-0-9541617-5-0](https://en.wikipedia.org/wiki/Special:BookSources/978-0-9541617-5-0).

1. **[^](#cite_ref-6)** ["File comparison software: vc-dwim and vc-chlog"](http://www.gnu.org/software/vc-dwim/vc-dwim.html). *www.gnu.org*. Retrieved 2023-04-16.

1. **[^](#cite_ref-7)** ["SystemRescue - System Rescue Homepage"](https://www.system-rescue.org/). *www.system-rescue.org*. Retrieved 2023-04-16.

1. **[^](#cite_ref-8)** John Van Gardner. ["Fortran And The Genesis Of Project Intercept"](http://www.softwarepreservation.org/projects/FORTRAN/paper/John%20Van%20Gardner%20-%20Fortran%20And%20The%20Genesis%20Of%20Project%20Intercept.pdf) (PDF). Retrieved 2011-12-06.

## External links

Wikimedia Commons has media related to [File comparison](https://commons.wikimedia.org/wiki/Category:File_comparison).

v t e Computer files Types Binary file / text file Data file File format List of file formats List of File signatures Magic number Open file formats Proprietary file formats Metafile Sidecar file Sparse file Swap file System file Temporary file Zero-byte file Properties Filename 8.3 filename Long filename Filename mangling Filename extension List of filename extensions File attribute Extended file attributes File size Hidden file / Hidden directory Organisation Directory/folder NTFS links Temporary folder Directory structure File system Filesystem Hierarchy Standard Grid file system Semantic file system Path Operations Open Close Read Write Linking File descriptor Hard link Shortcut Alias Shadow Symbolic link Management Backup File comparison File copying Data compression File manager Comparison of file managers File system fragmentation File-system permissions File transfer File sharing File synchronization File verification

v t e Version control software Years, where available, indicate the date of first stable release. Systems with names in italics are no longer maintained or have planned end-of-life dates. Local only Free/open-source RCS (1982) SCCS (1973) Proprietary The Librarian (1969) Panvalet (1970s) PVCS (1985) QVCS (1991) Client–server Free/open-source CVS (1986, 1990 in C) CVSNT (1998) QVCS Enterprise (1998) Subversion (2000) Proprietary AccuRev SCM (2002) Azure DevOps Server (via TFVC) (2005) Services (via TFVC) (2014) ClearCase (1992) CMVC (1994) Dimensions CM (1980s) DSEE (1984) Integrity (2001) Perforce Helix (1995) SCLM (1980s?) Software Change Manager (1970s) StarTeam (1995) Surround SCM (2002) Synergy (1990) Team Concert (2008) Vault (2003) Visual SourceSafe (1994) Distributed Free/open-source BitKeeper (2000) Breezy (2017) Code Co-op (1997) Darcs (2002) DCVS (2002) Fossil (2007) Git (2005) GNU arch (2001) GNU Bazaar (2005) Mercurial (2005) Monotone (2003) Proprietary Azure DevOps Server (via Git) (2013) Services (via Git) (2014) TeamWare (1992) Plastic SCM (2006) Concepts Baseline Branch Trunk Changeset Commit Gated Delta compression Interleaved File comparison Fork Merge Monorepo Repository Tag Category Comparison List

---
Adapted from the Wikipedia article [File comparison](https://en.wikipedia.org/wiki/File_comparison) by Wikipedia contributors ([contributor history](https://en.wikipedia.org/wiki/File_comparison?action=history)). Available under [Creative Commons Attribution-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/). Changes may have been made.
