Reference genome

{{Short description|Digital nucleic acid sequence database}} [[File:Wellcome genome bookcase.png|thumb|right|250px|The first printout of the human reference genome presented as a series of books, displayed at the [[Wellcome Collection]], London]] A '''reference genome''' is a [[genome assembly]] that represents the [[genome|complete genetic sequence]] of an organism as a continuous string of [[nucleotide]]s (A, T, C, and G). For an assembly to serve as a reference genome, it is typically accompanied by annotations, produced through a process known as DNA or [[DNA annotation|genome annotation]]. The annotations specify the genomic coordinates (start and end locations) of [[gene]]s, [[exon]]s, [[intron]]s, and [[Messenger RNA|mRNA]], and are often paired with corresponding transcript (mRNA) and [[protein]] sequences (algorithm predicted or experimentally validated).<ref>{{Cite journal |last1=Ejigu |first1=Girum Fitihamlak |last2=Jung |first2=Jaehee |date=2020-09-18 |title=Review on the Computational Genome Annotation of Sequences Obtained by Next-Generation Sequencing |journal=Biology |language=en |volume=9 |issue=9 |pages=295 |doi=10.3390/biology9090295 |doi-access=free |issn=2079-7737 |pmc=7565776 |pmid=32962098}}</ref>

Reference genomes exist for a wide variety of [[species]], including species of [[virus]]es, [[bacteria]], [[Fungus|fungi]], [[plant]]s and [[animal]]s, and they differ in how they are constructed and represented. A reference may be derived from a single individual or from multiple individuals whose sequences are collapsed into one representative assembly - [[haplotype]]. Two main factors determine reference genome's assembly quality: the [[sequencing technology]] which affects sequence accuracy and the assembly level which indicates how complete the genome representation is.<ref>{{Cite journal |last1=Giani |first1=Alice Maria |last2=Gallo |first2=Guido Roberto |last3=Gianfranceschi |first3=Luca |last4=Formenti |first4=Giulio |date=2020-01-01 |title=Long walk to genomics: History and current approaches to genome sequencing and assembly |url=https://www.csbj.org/article/S2001-0370(19)30327-7/fulltext |journal=Computational and Structural Biotechnology Journal |language=English |volume=18 |pages=9–19 |doi=10.1016/j.csbj.2019.11.002 |issn=2001-0370 |pmc=6926122 |pmid=31890139}}</ref><ref>{{Cite journal |last1=Ballouz |first1=Sara |last2=Dobin |first2=Alexander |last3=Gillis |first3=Jesse A. |date=2019-08-09 |title=Is it time to change the reference genome? |journal=Genome Biology |volume=20 |issue=1 |pages=159 |doi=10.1186/s13059-019-1774-4 |doi-access=free |issn=1474-760X |pmc=6688217 |pmid=31399121}}</ref>

The ideal is a chromosome-level assembly, which is a complete DNA sequence for each chromosome with no unplaced segments. However, achieving this remains technically challenging, especially for large or repetitive genomes (dense in [[repetitive element]]s). Earlier sequencing technologies often produced assemblies at the ''contig'' (short contiguous sequences) or ''scaffold'' (ordered sets of contigs) level, with limited chromosomal context. The exact size of these fragments depends on the sequencing platform and bioinformatic methods available at the time.<ref>{{Cite journal |last1=Nurk |first1=Sergey |last2=Koren |first2=Sergey |last3=Rhie |first3=Arang |last4=Rautiainen |first4=Mikko |last5=Bzikadze |first5=Andrey V. |last6=Mikheenko |first6=Alla |last7=Vollger |first7=Mitchell R. |last8=Altemose |first8=Nicolas |last9=Uralsky |first9=Lev |last10=Gershman |first10=Ariel |last11=Aganezov |first11=Sergey |last12=Hoyt |first12=Savannah J. |last13=Diekhans |first13=Mark |last14=Logsdon |first14=Glennis A. |last15=Alonge |first15=Michael |date=April 2022 |title=The complete sequence of a human genome |journal=Science |language=en |volume=376 |issue=6588 |pages=44–53 |doi=10.1126/science.abj6987 |issn=0036-8075 |pmc=9186530 |pmid=35357919 |bibcode=2022Sci...376...44N }}</ref>

For assemblies that are not fully resolved, summary statistics such as N50 and L50 are commonly used to characterise contiguity and assembly fragmentation; these metrics are explained in the ''Contigs and'' ''Scaffolds'' section.

Reference genomes are central to ''[[omics]]'' research, particularly [[genomics]]. They provide a reference for "mapping" DNA sequence data from many individuals, enabling efficient identification of the genomic location of these sequences and the detection of [[Polymorphism (biology)|polymorphisms]] (sequence differences among individuals) through a process known as [[variant calling]].<ref>{{Cite journal |last1=Aganezov |first1=Sergey |last2=Yan |first2=Stephanie M. |last3=Soto |first3=Daniela C. |last4=Kirsche |first4=Melanie |last5=Zarate |first5=Samantha |last6=Avdeyev |first6=Pavel |last7=Taylor |first7=Dylan J. |last8=Shafin |first8=Kishwar |last9=Shumate |first9=Alaina |last10=Xiao |first10=Chunlin |last11=Wagner |first11=Justin |last12=McDaniel |first12=Jennifer |last13=Olson |first13=Nathan D. |last14=Sauria |first14=Michael E. G. |last15=Vollger |first15=Mitchell R. |date=April 2022 |title=A complete reference genome improves analysis of human genetic variation |journal=Science |language=en |volume=376 |issue=6588 |article-number=eabl3533 |doi=10.1126/science.abl3533 |issn=0036-8075 |pmc=9336181 |pmid=35357935}}</ref>

The limitations of this practice, such as reference bias and under-representation of population diversity, have led to the development of population-level reference sets and [[Pan-genome|pangenomes]].<ref>{{Cite journal |last1=Miga |first1=Karen H. |last2=Wang |first2=Ting |date=2021-08-31 |title=The Need for a Human Pangenome Reference Sequence |journal=Annual Review of Genomics and Human Genetics |language=en |volume=22 |issue=1 |pages=81–102 |doi=10.1146/annurev-genom-120120-081921 |issn=1527-8204 |pmc=8410644 |pmid=33929893}}</ref>

Reference genomes and their annotations are publicly accessible through online genome browsers and archives such as [[Ensembl genome database project|Ensembl]],<ref>{{Cite journal |last1=Flicek |first1=P. |last2=Aken |first2=B. L. |last3=Beal |first3=K. |last4=Ballester |first4=B. |last5=Caccamo |first5=M. |last6=Chen |first6=Y. |last7=Clarke |first7=L. |last8=Coates |first8=G. |last9=Cunningham |first9=F. |last10=Cutts |first10=T. |last11=Down |first11=T. |last12=Dyer |first12=S. C. |last13=Eyre |first13=T. |last14=Fitzgerald |first14=S. |last15=Fernandez-Banet |first15=J. |date=2007-12-23 |title=Ensembl 2008 |journal=Nucleic Acids Research |language=en |volume=36 |issue=Database |pages=D707–D714 |doi=10.1093/nar/gkm988 |issn=0305-1048 |pmc=2238821 |pmid=18000006}}</ref> the European Nucleotide Archive (ENA) at [[EMBL-EBI]], the [[UCSC Genome Browser]], and [[National Center for Biotechnology Information|NCBI]].

==Properties of reference genomes==

=== Measures of length === The length of a genome can be measured in multiple different ways.

A simple way to measure genome length is to count the number of base pairs in the assembly.<ref>{{cite web|title=Help - Glossary - Homo sapiens - Ensembl genome browser 87|url=http://www.ensembl.org/Help/Glossary?id=230|website=www.ensembl.org}}</ref>

The ''golden path'' is an alternative measure of length that omits redundant regions such as [[haplotype]]s and [[pseudoautosomal region|pseudo autosomal region]]s.<ref>{{cite web |title=Golden path length {{!}} VectorBase |url=https://vectorbase.org/glossary/golden-path-length |access-date=2016-12-12 |website=www.vectorbase.org|archive-url=https://web.archive.org/web/20200807004848/https://vectorbase.org/glossary/golden-path-length |archive-date=2020-08-07 }}</ref><ref>{{cite web|title=Help - Glossary - Homo sapiens - Ensembl genome browser 87|url=http://www.ensembl.org/Help/Glossary?id=229|website=www.ensembl.org}}</ref> It is usually constructed by layering sequencing information over a physical map to combine scaffold information. It is a 'best estimate' of what the [[genome]] will look like and typically includes gaps, making it longer than the typical base pair assembly.<ref>{{cite web|url=http://seqanswers.com/forums/showthread.php?t=45443|title=Whole assembly vs Golden path length in Ensembl? - SEQanswers|website=seqanswers.com|date=31 July 2014 |access-date=2016-12-12}}</ref>

=== Contigs and scaffolds === [[File:Contigs and Scaffolds.png|thumb|300x300px|Diagram of reads arrangement, forming [[contig]]s and these can be assembled into [[Scaffolding|scaffolds]] in the complete process of sequencing and assembly of a reference genome. The gap between contig 1 and 2 is indicated as sequenced, forming a scaffold, while the other gap is not sequenced and separates scaffold 1 and 2.]] Reference genomes assembly requires reads overlapping, creating [[contig]]s, which are contiguous DNA regions of [[consensus sequence]]s.<ref name="textbook">{{cite book|last1=Gibson|first1=Greg|last2=Muse|first2=Spencer V.|title=A Primer of Genome Science|edition=3rd|page=84|publisher=Sinauer Associates|year=2009|isbn=978-0-878-93236-8}}</ref> If there are gaps between contigs, these can be filled by [[Scaffolding (bioinformatics)|scaffolding]], either by contigs amplification with PCR and sequencing or by [[Bacterial artificial chromosome|Bacterial Artificial Chromosome (BAC)]] cloning.<ref>{{Cite web |title=Help - Glossary - Homo_sapiens - Ensembl genome browser 107 |url=http://www.ensembl.org/Help/Glossary |access-date=2022-09-26 |website=www.ensembl.org}}</ref><ref name="textbook" /> Filling these gaps is not always possible, in this case multiple scaffolds are created in a reference assembly.<ref>{{Cite journal |last1=Luo |first1=Junwei |last2=Wei |first2=Yawei |last3=Lyu |first3=Mengna |last4=Wu |first4=Zhengjiang |last5=Liu |first5=Xiaoyan |last6=Luo |first6=Huimin |last7=Yan |first7=Chaokun |date=2021-09-02 |title=A comprehensive review of scaffolding methods in genome assembly |journal=Briefings in Bioinformatics |volume=22 |issue=5 |article-number=bbab033 |doi=10.1093/bib/bbab033 |issn=1477-4054 |pmid=33634311}}</ref> Scaffolds are classified in 3 types: 1) Placed, whose chromosome, genomic coordinates and orientations are known; 2) Unlocalised, when only the chromosome is known but not the coordinates or orientation; 3) Unplaced, whose chromosome is not known.<ref>{{Cite web |title=Chromosomes, scaffolds and contigs |url=http://www.ensembl.org/info/genome/genebuild/chromosomes_scaffolds_contigs.html |access-date=2022-09-26 |website=www.ensembl.org}}</ref>

The number of [[contig]]s and [[Scaffolding|scaffolds]], as well as their average lengths are relevant parameters, among many others, for a reference genome assembly quality assessment since they provide information about the continuity of the final mapping from the original genome. The smaller the number of scaffolds per chromosome, until a single scaffold occupies an entire chromosome, the greater the continuity of the genome assembly.<ref>{{Cite journal |last1=Meader |first1=Stephen |last2=Hillier |first2=LaDeana W. |last3=Locke |first3=Devin |last4=Ponting |first4=Chris P. |last5=Lunter |first5=Gerton |date=May 2010 |title=Genome assembly quality: Assessment and improvement using the neutral indel model |journal=Genome Research |volume=20 |issue=5 |pages=675–684 |doi=10.1101/gr.096966.109 |issn=1088-9051 |pmc=2860169 |pmid=20305016}}</ref><ref>{{Cite journal |last1=Rice |first1=Edward S. |last2=Green |first2=Richard E. |date=2019-02-15 |title=New Approaches for Genome Assembly and Scaffolding |url=https://www.annualreviews.org/doi/10.1146/annurev-animal-020518-115344 |journal=Annual Review of Animal Biosciences |language=en |volume=7 |issue=1 |pages=17–40 |doi=10.1146/annurev-animal-020518-115344 |pmid=30485757 |s2cid=54121772 |issn=2165-8102|url-access=subscription }}</ref><ref>{{Cite journal |last1=Cao |first1=Minh Duc |last2=Nguyen |first2=Son Hoang |last3=Ganesamoorthy |first3=Devika |last4=Elliott |first4=Alysha G. |last5=Cooper |first5=Matthew A. |last6=Coin |first6=Lachlan J. M. |date=2017-02-20 |title=Scaffolding and completing genome assemblies in real-time with nanopore sequencing |journal=Nature Communications |language=en |volume=8 |issue=1 |article-number=14515 |doi=10.1038/ncomms14515 |pmid=28218240 |pmc=5321748 |bibcode=2017NatCo...814515C |issn=2041-1723|doi-access=free }}</ref> Other related parameters are [[N50, L50, and related statistics|N50]] and [[N50, L50, and related statistics|L50]]. N50 is the length of the contigs/scaffolds in which the 50% of the assembly is found in fragments of this length or greater, while L50 is the number of contigs/scaffolds whose length is N50. The higher the value of N50, the lower the value of L50, and vice versa, indicating high continuity in the assembly.<ref>{{Cite journal |last1=Mende |first1=Daniel R. |last2=Waller |first2=Alison S. |last3=Sunagawa |first3=Shinichi |last4=Järvelin |first4=Aino I. |last5=Chan |first5=Michelle M. |last6=Arumugam |first6=Manimozhiyan |last7=Raes |first7=Jeroen |last8=Bork |first8=Peer |date=2012-02-23 |title=Assessment of Metagenomic Assembly Using Simulated Next Generation Sequencing Data |journal=PLOS ONE |volume=7 |issue=2 |article-number=e31386 |doi=10.1371/journal.pone.0031386 |issn=1932-6203 |pmc=3285633 |pmid=22384016|bibcode=2012PLoSO...731386M |doi-access=free }}</ref><ref>{{Cite journal |last1=Alhakami |first1=Hind |last2=Mirebrahim |first2=Hamid |last3=Lonardi |first3=Stefano |date=2017-05-18 |title=A comparative evaluation of genome assembly reconciliation tools |journal=Genome Biology |volume=18 |issue=1 |page=93 |doi=10.1186/s13059-017-1213-3 |issn=1474-7596 |pmc=5436433 |pmid=28521789 |doi-access=free }}</ref><ref>{{Cite journal |last1=Castro |first1=Christina J. |last2=Ng |first2=Terry Fei Fan |date=2017-11-01 |title=U50: A New Metric for Measuring Assembly Output Based on Non-Overlapping, Target-Specific Contigs |journal=Journal of Computational Biology |volume=24 |issue=11 |pages=1071–1080 |doi=10.1089/cmb.2017.0013 |pmc=5783553 |pmid=28418726}}</ref>

==Mammalian genomes==

The human and mouse reference genomes are maintained and improved by the [[Genome Reference Consortium]] (GRC), a group of fewer than 20 scientists from a number of genome research institutes, including the [[European Bioinformatics Institute]], the [[National Center for Biotechnology Information]], the [[Sanger Institute]] and [[McDonnell Genome Institute]] at [[Washington University in St. Louis]]. GRC continues to improve reference genomes by building new alignments that contain fewer gaps, and fixing misrepresentations in the sequence.

===Human reference genome=== The original human reference genome was derived from thirteen anonymous volunteers from [[Buffalo, New York]]. Donors were recruited by advertisement in ''[[The Buffalo News]]'', on Sunday, March 23, 1997. The first ten male and ten female volunteers were invited to make an appointment with the project's [[genetic counselors]] and donate blood from which DNA was extracted. As a result of how the DNA samples were processed, about 80 percent of the reference genome came from eight people; one male, designated ''RP11'', accounts for 66 percent of the total genome. The [[ABO blood group system]] differs among humans, but the human reference genome contains only an [[ABO (gene)|O allele]], although the others are [[Genome annotation#Genome annotation|annotated]].<ref name="Guide">{{cite book |title=A short guide to the human genome | vauthors = Scherer S |year=2008 |publisher=CSHL Press |isbn=978-0-87969-791-4 |page=135 }}</ref><ref name=Editorial>{{cite journal | vauthors =  | title = E pluribus unum | journal = Nature Methods | volume = 7 | issue = 5 | page = 331 | date = May 2010 | pmid = 20440876 | doi = 10.1038/nmeth0510-331 | doi-access = free }}</ref><ref name="Change">{{cite journal | vauthors = Ballouz S, Dobin A, Gillis JA | title = Is it time to change the reference genome? | journal = Genome Biology | volume = 20 | issue = 1 | article-number = 159 | date = August 2019 | pmid = 31399121 | pmc = 6688217 | doi = 10.1186/s13059-019-1774-4 | doi-access = free }}</ref><ref name="PLOS_Rosen">{{cite journal | vauthors = Rosenfeld JA, Mason CE, Smith TM | title = Limitations of the human reference genome for personalized genomics | journal = PLOS ONE | volume = 7 | issue = 7 | article-number = e40294 | date = 11 July 2012 | pmid = 22811759 | pmc = 3394790 | doi = 10.1371/journal.pone.0040294 | doi-access = free | bibcode = 2012PLoSO...740294R }}</ref><ref name="NYT"/> [[File:Cost per Genome.png|thumb|444x444px|Evolution of the cost of sequencing a human genome from 2001 to 2021, compared to [[Moore's law|Moore's Law]]]] As the cost of [[DNA sequencing]] falls, and new [[full genome sequencing]] technologies emerge, more genome sequences continue to be generated. In several cases people such as [[James D. Watson]] had their genome assembled using [[Massive parallel sequencing|massive parallel DNA sequencing]].<ref name="Watson" /><ref>The exception to this is [[J. Craig Venter]] whose DNA was sequenced and assembled using [[shotgun sequencing]] methods.</ref> Comparison between the reference (assembly NCBI36/hg18) and Watson's genome revealed 3.3 million [[single nucleotide polymorphism]] differences, while about 1.4 percent of his DNA could not be matched to the reference genome at all.<ref name="NYT">{{cite news | title=Genome of DNA Pioneer Is Deciphered | vauthors = Wade N | work=New York Times | date=May 31, 2007 | access-date=February 21, 2009 | url=https://www.nytimes.com/2007/05/31/science/31cnd-gene.html}}</ref><ref name="Watson">{{cite journal | vauthors = Wheeler DA, Srinivasan M, Egholm M, Shen Y, Chen L, McGuire A, He W, Chen YJ, Makhijani V, Roth GT, Gomes X, Tartaro K, Niazi F, Turcotte CL, Irzyk GP, Lupski JR, Chinault C, Song XZ, Liu Y, Yuan Y, Nazareth L, Qin X, Muzny DM, Margulies M, Weinstock GM, Gibbs RA, Rothberg JM | display-authors = 6 | title = The complete genome of an individual by massively parallel DNA sequencing | journal = Nature | volume = 452 | issue = 7189 | pages = 872–876 | date = April 2008 | pmid = 18421352 | doi = 10.1038/nature06884 | doi-access = free | bibcode = 2008Natur.452..872W }}</ref> For regions where there is known to be large-scale variation, sets of alternate [[Locus (genetics)|loci]] are assembled alongside the reference locus. [[File:Human genome assembly GRCh38 chromosomes ideogram NCBI.png|thumb|496x496px|Chromosomes ideogram of the human reference genome assembly GRCh38/hg38. Characteristic bands patterns are displayed in black, grey and white, while the gaps and partially assembled regions are displayed in blue and rose, respectively. Reference: Genome Data Viewer of the NCBI.<ref>{{Cite web |title=Genome Data Viewer - NCBI |url=https://www.ncbi.nlm.nih.gov/gdv/browser/genome/?id=GCF_000001405.40 |access-date=2022-08-18 |website=www.ncbi.nlm.nih.gov}}</ref>]] The latest human reference genome assembly, released by the [[Genome Reference Consortium]], was GRCh38 in 2017.<ref>{{cite journal | vauthors = Schneider VA, Graves-Lindsay T, Howe K, Bouk N, Chen HC, Kitts PA, Murphy TD, Pruitt KD, Thibaud-Nissen F, Albracht D, Fulton RS, Kremitzki M, Magrini V, Markovic C, McGrath S, Steinberg KM, Auger K, Chow W, Collins J, Harden G, Hubbard T, Pelan S, Simpson JT, Threadgold G, Torrance J, Wood JM, Clarke L, Koren S, Boitano M, Peluso P, Li H, Chin CS, Phillippy AM, Durbin R, Wilson RK, Flicek P, Eichler EE, Church DM | display-authors = 6 | title = Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly | journal = Genome Research | volume = 27 | issue = 5 | pages = 849–864 | date = May 2017 | pmid = 28396521 | pmc = 5411779 | doi = 10.1101/gr.213611.116 }}</ref> Several patches were added to update it, the latest patch being GRCh38.p14, published on the 3rd of February 2022.<ref>{{Cite web |title=GRCh38.p14 - hg38 - Genome - Assembly - NCBI |url=https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000001405.40/ |access-date=2022-08-19 |website=www.ncbi.nlm.nih.gov}}</ref><ref>{{Cite web |last=Genome Reference Consortium |date=2022-05-09 |title=GenomeRef: GRCh38.p14 is now released! |url=https://genomeref.blogspot.com/2022/05/grch38p14-is-now-released.html |access-date=2022-08-19 |website=GRC Blog (GenomeRef)}}</ref> This build only has 349 gaps across the entire assembly, which implies a great improvement in comparison with the first version, which had roughly 150,000 gaps.<ref name="Editorial" /> The gaps are mostly in areas such as [[telomere]]s, [[centromere]]s, and long [[Repeated sequence (DNA)|repetitive sequences]], with the biggest gap along the long arm of the Y chromosome, a region of ~30 Mb in length (~52% of the Y chromosome's length).<ref>{{Cite web |title=GRCh38.p14 - hg38 - Genome - Assembly - NCBI - Statistics Report |url=https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000001405.40/ |access-date=2022-08-18 |website=www.ncbi.nlm.nih.gov}}</ref> The number of [[genomic library|genomic clone libraries]] contributing to the reference has increased steadily to >60 over the years, although individual ''RP11'' still accounts for 70% of the reference genome.<ref name="GRC_FAQ">{{cite web |title=How many individuals were sequenced for the human reference genome assembly? |url=https://www.ncbi.nlm.nih.gov/grc/help/faq/#human-reference-genome-individuals |access-date=7 April 2022 |website=Genome Reference Consortium}}</ref> Genomic analysis of this anonymous male suggests that he is of African-European ancestry.<ref name="GRC_FAQ" /> According to the GRC website, their next assembly release for the human genome (version GRCh39) is currently "indefinitely postponed".<ref name=":1">{{Cite web |title=Genome Reference Consortium |url=https://www.ncbi.nlm.nih.gov/grc |access-date=2022-08-18 |website=www.ncbi.nlm.nih.gov}}</ref>

In 2022, the Telomere-to-Telomere (T2T) Consortium,<ref>{{Cite web |title=Telomere-to-Telomere |url=https://www.genome.gov/about-genomics/telomere-to-telomere |access-date=2022-08-16 |website=NHGRI |language=en}}</ref> an open, community-based effort, published the first completely assembled reference genome (version T2T-CHM13), without any gaps in the assembly. It did not contain a Y-chromosome until version 2.0.<ref>{{cite journal | vauthors = Nurk S, Koren S, Rhie A, Rautiainen M, Bzikadze AV, Mikheenko A, Vollger MR, Altemose N, Uralsky L, Gershman A, Aganezov S, Hoyt SJ, Diekhans M, Logsdon GA, Alonge M, Antonarakis SE, Borchers M, Bouffard GG, Brooks SY, Caldas GV, Chen NC, Cheng H, Chin CS, Chow W, de Lima LG, Dishuck PC, Durbin R, Dvorkina T, Fiddes IT, Formenti G, Fulton RS, Fungtammasan A, Garrison E, Grady PG, Graves-Lindsay TA, Hall IM, Hansen NF, Hartley GA, Haukness M, Howe K, Hunkapiller MW, Jain C, Jain M, Jarvis ED, Kerpedjiev P, Kirsche M, Kolmogorov M, Korlach J, Kremitzki M, Li H, Maduro VV, Marschall T, McCartney AM, McDaniel J, Miller DE, Mullikin JC, Myers EW, Olson ND, Paten B, Peluso P, Pevzner PA, Porubsky D, Potapova T, Rogaev EI, Rosenfeld JA, Salzberg SL, Schneider VA, Sedlazeck FJ, Shafin K, Shew CJ, Shumate A, Sims Y, Smit AF, Soto DC, Sović I, Storer JM, Streets A, Sullivan BA, Thibaud-Nissen F, Torrance J, Wagner J, Walenz BP, Wenger A, Wood JM, Xiao C, Yan SM, Young AC, Zarate S, Surti U, McCoy RC, Dennis MY, Alexandrov IA, Gerton JL, O'Neill RJ, Timp W, Zook JM, Schatz MC, Eichler EE, Miga KH, Phillippy AM | display-authors = 6 | title = The complete sequence of a human genome | journal = Science | volume = 376 | issue = 6588 | pages = 44–53 | date = April 2022 | pmid = 35357919 | pmc = 9186530 | doi = 10.1126/science.abj6987 | s2cid = 247854936 | bibcode = 2022Sci...376...44N }}</ref><ref>{{Cite web |title=T2T-CHM13v2.0 - Genome - Assembly - NCBI |url=https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_009914755.1/ |access-date=2022-08-16 |website=www.ncbi.nlm.nih.gov}}</ref> This assembly allows for the examination of centromeric and pericentromeric sequence evolution. The consortium employed rigorous methods to assemble, clean, and validate complex repeat regions which are particularly difficult to sequence.<ref>{{Cite journal |last1=Altemose |first1=Nicolas |last2=Logsdon |first2=Glennis A. |last3=Bzikadze |first3=Andrey V. |last4=Sidhwani |first4=Pragya |last5=Langley |first5=Sasha A. |last6=Caldas |first6=Gina V. |last7=Hoyt |first7=Savannah J. |last8=Uralsky |first8=Lev |last9=Ryabov |first9=Fedor D. |last10=Shew |first10=Colin J. |last11=Sauria |first11=Michael E. G. |last12=Borchers |first12=Matthew |last13=Gershman |first13=Ariel |last14=Mikheenko |first14=Alla |last15=Shepelev |first15=Valery A. |date=April 2022 |title=Complete genomic and epigenetic maps of human centromeres |journal=Science |language=en |volume=376 |issue=6588 |article-number=eabl4178 |doi=10.1126/science.abl4178 |issn=0036-8075 |pmc=9233505 |pmid=35357911}}</ref> It used ultra-long–read (>100 kb) sequencing to accurately sequence [[Low copy repeats|segmental duplications]].<ref name=":2">{{Cite journal |last=Church |first=Deanna M. |date=April 2022 |title=A next-generation human genome sequence |url=https://www.science.org/doi/10.1126/science.abo5367 |journal=Science |language=en |volume=376 |issue=6588 |pages=34–35 |doi=10.1126/science.abo5367 |pmid=35357937 |bibcode=2022Sci...376...34C |issn=0036-8075|url-access=subscription }}</ref>

{{anchor|CHM13hTERT}} The T2T-CHM13 is sequenced from CHM13hTERT, a cell line from an essentially haploid [[Molar pregnancy|hydatidiform mole]]. "CHM" stands for "Complete Hydatidiform Mole", and "13" is its line number. "hTERT" stands for "human [[Telomerase reverse transcriptase|Telomerase Reverse Transcriptase]]". The cell line has been transfected with the TERT gene, which is responsible for maintaining telomere length and thus contributes to the [[Immortalised cell line|cell line's immortality]].<ref>{{Cite journal |last1=Steinberg |first1=Karyn Meltz |last2=Schneider |first2=Valerie A. |last3=Graves-Lindsay |first3=Tina A. |last4=Fulton |first4=Robert S. |last5=Agarwala |first5=Richa |last6=Huddleston |first6=John |last7=Shiryev |first7=Sergey A. |last8=Morgulis |first8=Aleksandr |last9=Surti |first9=Urvashi |last10=Warren |first10=Wesley C. |last11=Church |first11=Deanna M. |last12=Eichler |first12=Evan E. |last13=Wilson |first13=Richard K. |date=December 2014 |title=Single haplotype assembly of the human genome from a hydatidiform mole |journal=Genome Research |volume=24 |issue=12 |pages=2066–2076 |doi=10.1101/gr.180893.114 |issn=1088-9051 |pmc=4248323 |pmid=25373144}}</ref> A hydatidiform mole contains two copies of the same parental genome, and thus is essentially haploid. This eliminates allelic variation and allows better sequencing accuracy.<ref name=":2" />

Recent genome assemblies are as follows:<ref name=":0">{{cite web|url=https://genome.ucsc.edu/FAQ/FAQreleases.html#release1|title=UCSC Genome Bioinformatics: FAQ|website=genome.ucsc.edu|access-date=2016-08-18}}</ref> {| class="wikitable" |- !Release name !Date of release !Equivalent UCSC version |- |GRCh39 |Indefinitely postponed<ref name=":1" /> | - |- |T2T-CHM13 |January 2022 |hs1 |- |GRCh38 |Dec 2013 |hg38 |- |GRCh37 |Feb 2009 |hg19 |- |NCBI Build 36.1 |Mar 2006 |hg18 |- |NCBI Build 35 |May 2004 |hg17 |- |NCBI Build 34 |Jul 2003 |hg16 |}

==== Limitations ==== For much of a genome, the reference provides a good approximation of the DNA of any single individual. But in regions with high [[gene pool|allelic diversity]], such as the [[major histocompatibility complex]] in humans and the [[major urinary proteins]] of mice, the reference genome may differ significantly from other individuals.<ref name="MHCsc">{{cite journal | author = MHC Sequencing Consortium | title = Complete sequence and gene map of a human major histocompatibility complex. The MHC sequencing consortium | journal = Nature | volume = 401 | issue = 6756 | pages = 921–923 | date = October 1999 | pmid = 10553908 | doi = 10.1038/44853 | s2cid = 186243515 | bibcode = 1999Natur.401..921T }}</ref><ref name="Logan">{{cite journal | vauthors = Logan DW, Marton TF, Stowers L | title = Species specificity in major urinary proteins by parallel evolution | journal = PLOS ONE | volume = 3 | issue = 9 | article-number = e3280 | date = September 2008 | pmid = 18815613 | pmc = 2533699 | doi = 10.1371/journal.pone.0003280 | veditors = Vosshall LB | doi-access = free | bibcode = 2008PLoSO...3.3280L }}</ref><ref name="Hurstchapter">{{cite book |vauthors=Hurst J, Beynon RJ, Roberts SC, Wyatt TD |title=Urinary Lipocalins in Rodenta:is there a Generic Model? |series = Chemical Signals in Vertebrates 11 |publisher= Springer New York |date=October 2007 |isbn= 978-0-387-73944-1}}</ref> Due to the fact that the reference genome is a "single" distinct sequence, which gives its utility as an index or locator of genomic features, there are limitations in terms of how faithfully it represents the human genome and its [[Human genetic variation|variability]]. Most of the initial samples used for reference genome sequencing came from people of European ancestry. In 2010, it was found that, by ''de novo'' assembling genomes from African and Asian populations with the NCBI reference genome (version NCBI36), these genomes had ~5Mb sequences that did not align against any region of the reference genome.<ref>{{cite journal | vauthors = Li R, Li Y, Zheng H, Luo R, Zhu H, Li Q, Qian W, Ren Y, Tian G, Li J, Zhou G, Zhu X, Wu H, Qin J, Jin X, Li D, Cao H, Hu X, Blanche H, Cann H, Zhang X, Li S, Bolund L, Kristiansen K, Yang H, Wang J, Wang J | display-authors = 6 | title = Building the sequence map of the human pan-genome | journal = Nature Biotechnology | volume = 28 | issue = 1 | pages = 57–63 | date = January 2010 | pmid = 19997067 | doi = 10.1038/nbt.1596 | s2cid = 205274447 }}</ref>

Following projects to the Human Genome Project seek to address a deeper and more diverse characerization of the human genetic variability, which the reference genome is not able to represent. The [[International HapMap Project|HapMap Project]], active during the period 2002 -2010, with the purpose of creating a [[haplotype]]s map and their most common variations among different human populations. Up to 11 populations of different ancestry were studied, such as individuals of the [[Han Chinese|Han]] ethnic group from China, [[Gujarati people|Gujaratis]] from India, the [[Yoruba people|Yoruba]] people from Nigeria or [[Japanese people]], among others.<ref>{{cite journal | author = The International HapMap Consortium | title = A haplotype map of the human genome | journal = Nature | volume = 437 | issue = 7063 | pages = 1299–1320 | date = October 2005 | pmid = 16255080 | pmc = 1880871 | doi = 10.1038/nature04226 | bibcode = 2005Natur.437.1299T }}</ref><ref>{{cite journal | vauthors = Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, Gibbs RA, Belmont JW, Boudreau A, Hardenbol P, Leal SM, Pasternak S, Wheeler DA, Willis TD, Yu F, Yang H, Zeng C, Gao Y, Hu H, Hu W, Li C, Lin W, Liu S, Pan H, Tang X, Wang J, Wang W, Yu J, Zhang B, Zhang Q, Zhao H, Zhao H, Zhou J, Gabriel SB, Barry R, Blumenstiel B, Camargo A, Defelice M, Faggart M, Goyette M, Gupta S, Moore J, Nguyen H, Onofrio RC, Parkin M, Roy J, Stahl E, Winchester E, Ziaugra L, Altshuler D, Shen Y, Yao Z, Huang W, Chu X, He Y, Jin L, Liu Y, Shen Y, Sun W, Wang H, Wang Y, Wang Y, Xiong X, Xu L, Waye MM, Tsui SK, Xue H, Wong JT, Galver LM, Fan JB, Gunderson K, Murray SS, Oliphant AR, Chee MS, Montpetit A, Chagnon F, Ferretti V, Leboeuf M, Olivier JF, Phillips MS, Roumy S, Sallée C, Verner A, Hudson TJ, Kwok PY, Cai D, Koboldt DC, Miller RD, Pawlikowska L, Taillon-Miller P, Xiao M, Tsui LC, Mak W, Song YQ, Tam PK, Nakamura Y, Kawaguchi T, Kitamoto T, Morizono T, Nagashima A, Ohnishi Y, Sekine A, Tanaka T, Tsunoda T, Deloukas P, Bird CP, Delgado M, Dermitzakis ET, Gwilliam R, Hunt S, Morrison J, Powell D, Stranger BE, Whittaker P, Bentley DR, Daly MJ, de Bakker PI, Barrett J, Chretien YR, Maller J, McCarroll S, Patterson N, Pe'er I, Price A, Purcell S, Richter DJ, Sabeti P, Saxena R, Schaffner SF, Sham PC, Varilly P, Altshuler D, Stein LD, Krishnan L, Smith AV, Tello-Ruiz MK, Thorisson GA, Chakravarti A, Chen PE, Cutler DJ, Kashuk CS, Lin S, Abecasis GR, Guan W, Li Y, Munro HM, Qin ZS, Thomas DJ, McVean G, Auton A, Bottolo L, Cardin N, Eyheramendy S, Freeman C, Marchini J, Myers S, Spencer C, Stephens M, Donnelly P, Cardon LR, Clarke G, Evans DM, Morris AP, Weir BS, Tsunoda T, Mullikin JC, Sherry ST, Feolo M, Skol A, Zhang H, Zeng C, Zhao H, Matsuda I, Fukushima Y, Macer DR, Suda E, Rotimi CN, Adebamowo CA, Ajayi I, Aniagwu T, Marshall PA, Nkwodimmah C, Royal CD, Leppert MF, Dixon M, Peiffer A, Qiu R, Kent A, Kato K, Niikawa N, Adewole IF, Knoppers BM, Foster MW, Clayton EW, Watkin J, Gibbs RA, Belmont JW, Muzny D, Nazareth L, Sodergren E, Weinstock GM, Wheeler DA, Yakub I, Gabriel SB, Onofrio RC, Richter DJ, Ziaugra L, Birren BW, Daly MJ, Altshuler D, Wilson RK, Fulton LL, Rogers J, Burton J, Carter NP, Clee CM, Griffiths M, Jones MC, McLay K, Plumb RW, Ross MT, Sims SK, Willey DL, Chen Z, Han H, Kang L, Godbout M, Wallenburg JC, L'Archevêque P, Bellemare G, Saeki K, Wang H, An D, Fu H, Li Q, Wang Z, Wang R, Holden AL, Brooks LD, McEwen JE, Guyer MS, Wang VO, Peterson JL, Shi M, Spiegel J, Sung LM, Zacharia LF, Collins FS, Kennedy K, Jamieson R, Stewart J | display-authors = 6 | title = A second generation human haplotype map of over 3.1 million SNPs | journal = Nature | volume = 449 | issue = 7164 | pages = 851–861 | date = October 2007 | pmid = 17943122 | pmc = 2689609 | doi = 10.1038/nature06258 | bibcode = 2007Natur.449..851F }}</ref><ref>{{cite journal | vauthors = Altshuler DM, Gibbs RA, Peltonen L, Altshuler DM, Gibbs RA, Peltonen L, Dermitzakis E, Schaffner SF, Yu F, Peltonen L, Dermitzakis E, Bonnen PE, Altshuler DM, Gibbs RA, de Bakker PI, Deloukas P, Gabriel SB, Gwilliam R, Hunt S, Inouye M, Jia X, Palotie A, Parkin M, Whittaker P, Yu F, Chang K, Hawes A, Lewis LR, Ren Y, Wheeler D, Gibbs RA, Muzny DM, Barnes C, Darvishi K, Hurles M, Korn JM, Kristiansson K, Lee C, McCarrol SA, Nemesh J, Dermitzakis E, Keinan A, Montgomery SB, Pollack S, Price AL, Soranzo N, Bonnen PE, Gibbs RA, Gonzaga-Jauregui C, Keinan A, Price AL, Yu F, Anttila V, Brodeur W, Daly MJ, Leslie S, McVean G, Moutsianas L, Nguyen H, Schaffner SF, Zhang Q, Ghori MJ, McGinnis R, McLaren W, Pollack S, Price AL, Schaffner SF, Takeuchi F, Grossman SR, Shlyakhter I, Hostetter EB, Sabeti PC, Adebamowo CA, Foster MW, Gordon DR, Licinio J, Manca MC, Marshall PA, Matsuda I, Ngare D, Wang VO, Reddy D, Rotimi CN, Royal CD, Sharp RR, Zeng C, Brooks LD, McEwen JE | display-authors = 6 | title = Integrating common and rare genetic variation in diverse human populations | journal = Nature | volume = 467 | issue = 7311 | pages = 52–58 | date = September 2010 | pmid = 20811451 | pmc = 3173859 | doi = 10.1038/nature09298 | bibcode = 2010Natur.467...52T }}</ref><ref>{{Cite web |title=International HapMap Project |url=https://www.genome.gov/10001688/international-hapmap-project |access-date=2022-08-18 |website=Genome.gov |language=en}}</ref> The [[1000 Genomes Project]], carried out between 2008 and 2015, with the aim of creating a database that includes more than 95% of the variations present in the human genome and whose results can be used in studies of association with diseases ([[Genome-wide association study|GWAS]]) such as diabetes, cardiovascular or autoimmune diseases. A total of 26 ethnic groups were studied in this project, expanding the scope of the HapMap project to new ethnic groups such as the [[Mende people]] of Sierra Leone, the [[Vietnamese people]] or the [[Bengalis|Bengali people]].<ref>{{cite journal | vauthors = Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM, Gibbs RA, Hurles ME, McVean GA | display-authors = 6 | title = A map of human genome variation from population-scale sequencing | journal = Nature | volume = 467 | issue = 7319 | pages = 1061–1073 | date = October 2010 | pmid = 20981092 | pmc = 3042601 | doi = 10.1038/nature09534 | bibcode = 2010Natur.467.1061T }}</ref><ref>{{cite journal | vauthors = Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, Kang HM, Marth GT, McVean GA | display-authors = 6 | title = An integrated map of genetic variation from 1,092 human genomes | journal = Nature | volume = 491 | issue = 7422 | pages = 56–65 | date = November 2012 | pmid = 23128226 | pmc = 3498066 | doi = 10.1038/nature11632 | bibcode = 2012Natur.491...56T }}</ref><ref>{{cite journal | vauthors = Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, Marchini JL, McCarthy S, McVean GA, Abecasis GR | display-authors = 6 | title = A global reference for human genetic variation | journal = Nature | volume = 526 | issue = 7571 | pages = 68–74 | date = October 2015 | pmid = 26432245 | pmc = 4750478 | doi = 10.1038/nature15393 | bibcode = 2015Natur.526...68T }}</ref><ref>{{cite journal | vauthors = Sudmant PH, Rausch T, Gardner EJ, Handsaker RE, Abyzov A, Huddleston J, Zhang Y, Ye K, Jun G, Fritz MH, Konkel MK, Malhotra A, Stütz AM, Shi X, Casale FP, Chen J, Hormozdiari F, Dayama G, Chen K, Malig M, Chaisson MJ, Walter K, Meiers S, Kashin S, Garrison E, Auton A, Lam HY, Mu XJ, Alkan C, Antaki D, Bae T, Cerveira E, Chines P, Chong Z, Clarke L, Dal E, Ding L, Emery S, Fan X, Gujral M, Kahveci F, Kidd JM, Kong Y, Lameijer EW, McCarthy S, Flicek P, Gibbs RA, Marth G, Mason CE, Menelaou A, Muzny DM, Nelson BJ, Noor A, Parrish NF, Pendleton M, Quitadamo A, Raeder B, Schadt EE, Romanovitch M, Schlattl A, Sebra R, Shabalin AA, Untergasser A, Walker JA, Wang M, Yu F, Zhang C, Zhang J, Zheng-Bradley X, Zhou W, Zichner T, Sebat J, Batzer MA, McCarroll SA, Mills RE, Gerstein MB, Bashir A, Stegle O, Devine SE, Lee C, Eichler EE, Korbel JO | display-authors = 6 | title = An integrated map of structural variation in 2,504 human genomes | journal = Nature | volume = 526 | issue = 7571 | pages = 75–81 | date = October 2015 | pmid = 26432246 | pmc = 4617611 | doi = 10.1038/nature15394 | bibcode = 2015Natur.526...75. }}</ref> The [[Human Pangenome Project]], which started its initial phase in 2019 with the creation of the Human Pangenome Reference Consortium, seeks to create the largest map of human genetic variability taking the results of previous studies as a starting point.<ref>{{cite journal | vauthors = Miga KH, Wang T | title = The Need for a Human Pangenome Reference Sequence | journal = Annual Review of Genomics and Human Genetics | volume = 22 | issue = 1 | pages = 81–102 | date = August 2021 | pmid = 33929893 | pmc = 8410644 | doi = 10.1146/annurev-genom-120120-081921 }}</ref><ref>{{cite journal | vauthors = Wang T, Antonacci-Fulton L, Howe K, Lawson HA, Lucas JK, Phillippy AM, Popejoy AB, Asri M, Carson C, Chaisson MJ, Chang X, Cook-Deegan R, Felsenfeld AL, Fulton RS, Garrison EP, Garrison NA, Graves-Lindsay TA, Ji H, Kenny EE, Koenig BA, Li D, Marschall T, McMichael JF, Novak AM, Purushotham D, Schneider VA, Schultz BI, Smith MW, Sofia HJ, Weissman T, Flicek P, Li H, Miga KH, Paten B, Jarvis ED, Hall IM, Eichler EE, Haussler D | display-authors = 6 | title = The Human Pangenome Project: a global resource to map genomic diversity | journal = Nature | volume = 604 | issue = 7906 | pages = 437–446 | date = April 2022 | pmid = 35444317 | doi = 10.1038/s41586-022-04601-8 | pmc = 9402379 | bibcode = 2022Natur.604..437W | s2cid = 248297723 }}</ref>

=== Mouse reference genome === Recent mouse genome assemblies are as follows:<ref name=":0" /> {| class="wikitable" |- !Release name !Date of release !Equivalent UCSC version |- |GRCm39 |June 2020 |mm39 |- |GRCm38 |Dec 2011 |mm10 |- |NCBI Build 37 |Jul 2007 |mm9 |- |NCBI Build 36 |Feb 2006 |mm8 |- |NCBI Build 35 |Aug 2005 |mm7 |- |NCBI Build 34 |Mar 2005 |mm6 |}

== Other genomes == Since the Human Genome Project was finished, multiple international projects have started, focused on assembling reference genomes for many organisms. Model organisms (e.g., zebrafish (''[[Zebrafish|Danio rerio]]''), chicken (''[[Red junglefowl|Gallus gallus]]''), ''[[Escherichia coli]]'' etc.) are of special interest to the scientific community, as well as, for example, endangered species (e.g., Asian arowana (''[[Asian arowana|Scleropages formosus]])'' or the American bison (''[[American bison|Bison bison]]'')). As of August 2022, the NCBI database supports 71 886 partially or completely sequenced and assembled genomes from different species, such as 676 [[mammal]]s, 590 [[bird]]s and 865 [[fish]]es. Also noteworthy are the numbers of 1796 [[insect]]s genomes, 3747 [[Fungus|fungi]], 1025 [[plant]]s, 33 724 [[bacteria]], 26 004 [[virus]] and 2040 [[archaea]].<ref>{{Cite web |title=Genome List - Genome - NCBI |url=https://www.ncbi.nlm.nih.gov/genome/browse#!/overview/ |archive-url=https://web.archive.org/web/20111128200211/http://www.ncbi.nlm.nih.gov/genome/browse/#!/overview/ |archive-date=November 28, 2011 |access-date=2022-08-18 |website=www.ncbi.nlm.nih.gov}}</ref> A lot of these species have annotation data associated with their reference genomes that can be publicly accessed and ''visuali''zed in genome browsers such as [[Ensembl genome database project|Ensembl]] and [[UCSC Genome Browser]].<ref>{{Cite web |title=Species List |url=https://uswest.ensembl.org/info/about/species.html |access-date=2022-08-18 |website=uswest.ensembl.org |archive-date=2022-08-06 |archive-url=https://web.archive.org/web/20220806120818/https://uswest.ensembl.org/info/about/species.html }}</ref><ref>{{Cite web |title=GenArk: UCSC Genome Archive |url=https://hgdownload.soe.ucsc.edu/hubs/ |access-date=2022-08-18 |website=hgdownload.soe.ucsc.edu}}</ref>

Some examples of these international projects are: the [[Chimpanzee genome project|Chimpanzee Genome Project]], carried out between 2005 and 2013 jointly by the [[Broad Institute]] and the [[McDonnell Genome Institute]] of [[Washington University in St. Louis]], which generated the first reference genomes for 4 subspecies of ''[[Chimpanzee|Pan troglodytes]]'';<ref>{{Cite news |date=2016-03-04 |title=Chimpanzee Genome Project |language=en |work=BCM-HGSC |url=https://www.hgsc.bcm.edu/non-human-primates/chimpanzee-genome-project |access-date=2022-08-18}}</ref><ref>{{cite journal | vauthors = Prado-Martinez J, Sudmant PH, Kidd JM, Li H, Kelley JL, Lorente-Galdos B, Veeramah KR, Woerner AE, O'Connor TD, Santpere G, Cagan A, Theunert C, Casals F, Laayouni H, Munch K, Hobolth A, Halager AE, Malig M, Hernandez-Rodriguez J, Hernando-Herraez I, Prüfer K, Pybus M, Johnstone L, Lachmann M, Alkan C, Twigg D, Petit N, Baker C, Hormozdiari F, Fernandez-Callejo M, Dabad M, Wilson ML, Stevison L, Camprubí C, Carvalho T, Ruiz-Herrera A, Vives L, Mele M, Abello T, Kondova I, Bontrop RE, Pusey A, Lankester F, Kiyang JA, Bergl RA, Lonsdorf E, Myers S, Ventura M, Gagneux P, Comas D, Siegismund H, Blanc J, Agueda-Calpena L, Gut M, Fulton L, Tishkoff SA, Mullikin JC, Wilson RK, Gut IG, Gonder MK, Ryder OA, Hahn BH, Navarro A, Akey JM, Bertranpetit J, Reich D, Mailund T, Schierup MH, Hvilsom C, Andrés AM, Wall JD, Bustamante CD, Hammer MF, Eichler EE, Marques-Bonet T | display-authors = 6 | title = Great ape genetic diversity and population history | journal = Nature | volume = 499 | issue = 7459 | pages = 471–475 | date = July 2013 | pmid = 23823723 | pmc = 3822165 | doi = 10.1038/nature12228 | bibcode = 2013Natur.499..471P }}</ref> the [[100K Pathogen Genome Project]], which started in 2012 with the main goal of creating a database of reference genomes for 100 000 [[pathogen]] microorganisms to use in public health, outbreaks detection, agriculture and environment;<ref>{{Cite web |title=100K Pathogen Genome Project – Genomes for Public Health & Food Safety |url=https://100kgenomes.org/ |access-date=2022-08-18 |language=en-US}}</ref> the [[Earth BioGenome Project]], which started in 2018 and aims to sequence and catalog the genomes of all the eukaryotic organisms on Earth to promote biodiversity conservation projects. Inside this big-science project there are up to 50 smaller-scale affiliated projects such as the [[Africa BioGenome Project]] or the [[1000 Fungal Genomes Project]].<ref>{{cite journal | vauthors = Lewin HA, Robinson GE, Kress WJ, Baker WJ, Coddington J, Crandall KA, Durbin R, Edwards SV, Forest F, Gilbert MT, Goldstein MM, Grigoriev IV, Hackett KJ, Haussler D, Jarvis ED, Johnson WE, Patrinos A, Richards S, Castilla-Rubio JC, van Sluys MA, Soltis PS, Xu X, Yang H, Zhang G | display-authors = 6 | title = Earth BioGenome Project: Sequencing life for the future of life | journal = Proceedings of the National Academy of Sciences of the United States of America | volume = 115 | issue = 17 | pages = 4325–4333 | date = April 2018 | pmid = 29686065 | pmc = 5924910 | doi = 10.1073/pnas.1720115115 | bibcode = 2018PNAS..115.4325L | doi-access = free }}</ref><ref>{{Cite web |title=African BioGenome Project – Genomics in the service of conservation and improvement of African biological diversity |url=https://africanbiogenome.org/ |access-date=2022-08-18 |language=en-US}}</ref><ref>{{Cite web |title=1000 Fungal Genomes Project |url=https://mycocosm.jgi.doe.gov/mycocosm/home/1000-fungal-genomes |access-date=2022-08-18 |website=mycocosm.jgi.doe.gov}}</ref>

==See also== *[[European Reference Genome Atlas]]

== References == {{reflist|2}}

== External links == *[https://www.ncbi.nlm.nih.gov/grc Genome Reference Consortium]

[[Category:Genome projects]] [[Category:Genomics]] [[Category:Human genetics]] [[Category:Bioinformatics]] [[Category:DNA sequencing]]