# KEGG

> Mediated Wiki article. Canonical URL: https://mediated.wiki/source/KEGG
> Markdown URL: https://mediated.wiki/source/KEGG.md
> Source: https://en.wikipedia.org/wiki/KEGG
> Source revision: 1317570716
> License: Creative Commons Attribution-ShareAlike 4.0 International (https://creativecommons.org/licenses/by-sa/4.0/)

Collection of bioinformatics databases

KEGG Content Description Bioinformatics resource for deciphering the genome Organisms All Contact Research center Kyoto University Laboratory Kanehisa Laboratories Primary citation PMID 10592173 Release date 1995 Access Website www.kegg.jp Web service URL REST see KEGG API Tools Web KEGG Mapper

**KEGG** (**Kyoto Encyclopedia of Genes and Genomes**) is a collection of databases dealing with [genomes](/source/Genome), [biological pathways](/source/Biological_pathway), [diseases](/source/Disease), [drugs](/source/Drug), and [chemical substances](/source/Chemical_substance). KEGG is utilized for [bioinformatics](/source/Bioinformatics) research and education, including data analysis in [genomics](/source/Genomics), [metagenomics](/source/Metagenomics), [metabolomics](/source/Metabolomics) and other [omics](/source/Omics) studies, modeling and simulation in [systems biology](/source/Systems_biology), and [translational research](/source/Translational_research) in [drug development](/source/Drug_development).

The KEGG database project was initiated in 1995 by [Minoru Kanehisa](/source/Minoru_Kanehisa), professor at the Institute for Chemical Research, [Kyoto University](/source/Kyoto_University), under the then ongoing Japanese [Human Genome Program](/source/Human_Genome_Project).[1][2] Foreseeing the need for a computerized resource that can be used for biological interpretation of [genome sequence data](/source/Genome_project), he started developing the KEGG PATHWAY database. It is a collection of manually drawn KEGG pathway maps representing experimental knowledge on [metabolism](/source/Metabolism) and various other functions of the [cell](/source/Cell_(biology)) and the [organism](/source/Organism). Each pathway map contains a network of molecular interactions and reactions and is designed to link [genes](/source/Gene) in the genome to [gene products](/source/Gene_product) (mostly [proteins](/source/Protein)) in the pathway. This has enabled the analysis called KEGG pathway mapping, whereby the gene content in the genome is compared with the KEGG PATHWAY database to examine which pathways and associated functions are likely to be encoded in the genome.

According to the developers, KEGG is a "computer representation" of the [biological system](/source/Biological_system).[3] It integrates building blocks and wiring diagrams of the system—more specifically, genetic building blocks of genes and proteins, chemical building blocks of [small molecules](/source/Small_molecule) and reactions, and wiring diagrams of molecular interaction and reaction networks. This concept is realized in the following databases of KEGG, which are categorized into systems, genomic, chemical, and health information.[4]

- Systems information - PATHWAY: [pathway](/source/Biological_pathway) maps for cellular and organismal functions - MODULE: modules or functional units of genes - BRITE: hierarchical classifications of biological entities

- Genomic information - GENOME: complete [genomes](/source/Genome) - GENES: [genes](/source/Gene) and [proteins](/source/Protein) in the complete genomes - ORTHOLOGY: [ortholog](/source/Ortholog) groups of genes in the complete genomes

- Chemical information - COMPOUND, GLYCAN: [chemical compounds](/source/Chemical_compound) and [glycans](/source/Glycan) - REACTION, RPAIR, RCLASS: [chemical reactions](/source/Chemical_reaction) - ENZYME: [enzyme nomenclature](/source/Enzyme_nomenclature)

- Health information - DISEASE: human [diseases](/source/Disease) - DRUG: [approved drugs](/source/Approved_drugs) - ENVIRON: [crude drugs](/source/Crude_drug) and health-related substances

## Databases

### Systems information

The KEGG PATHWAY database, the wiring diagram database, is the core of the KEGG resource. It is a collection of pathway maps integrating many entities including genes, proteins, RNAs, chemical compounds, glycans, and chemical reactions, as well as disease genes and drug targets, which are stored as individual entries in the other databases of KEGG. The pathway maps are classified into the following sections:

- [Metabolism](/source/Metabolism)

- Genetic information processing ([transcription](/source/Transcription_(genetics)), [translation](/source/Translation_(biology)), [replication](/source/DNA_replication) and [repair](/source/DNA_repair), etc.)

- Environmental information processing ([membrane transport](/source/Membrane_transport), [signal transduction](/source/Signal_transduction), etc.)

- Cellular processes ([cell growth](/source/Cell_growth), [cell death](/source/Cell_death), [cell membrane](/source/Cell_membrane) functions, etc.)

- Organismal systems ([immune system](/source/Immune_system), [endocrine system](/source/Endocrine_system), [nervous system](/source/Nervous_system), etc.)

- Human [diseases](/source/Disease)

- [Drug development](/source/Drug_development)

The metabolism section contains aesthetically drawn global maps showing an overall picture of metabolism, in addition to regular metabolic pathway maps. The low-resolution global maps can be used, for example, to compare metabolic capacities of different organisms in genomics studies and different environmental samples in metagenomics studies. In contrast, KEGG modules in the KEGG MODULE database are higher-resolution, localized wiring diagrams, representing tighter functional units within a pathway map, such as subpathways conserved among specific organism groups and molecular complexes. KEGG modules are defined as characteristic gene sets that can be linked to specific metabolic capacities and other [phenotypic](/source/Phenotype) features, so that they can be used for automatic interpretation of genome and metagenome data.

Another database that supplements KEGG PATHWAY is the KEGG BRITE database. It is an [ontology](/source/Ontology_(information_science)) database containing hierarchical classifications of various entities including genes, proteins, organisms, diseases, drugs, and chemical compounds. While KEGG PATHWAY is limited to molecular interactions and reactions of these entities, KEGG BRITE incorporates many different types of relationships.

### Genomic information

Several months after the KEGG project was initiated in 1995, the first report of the completely sequenced [bacterial](/source/Bacteria) genome was published.[5] Since then all published complete genomes are accumulated in KEGG for both [eukaryotes](/source/Eukaryote) and [prokaryotes](/source/Prokaryote). The KEGG GENES database contains gene/protein-level information and the KEGG GENOME database contains organism-level information for these genomes. The KEGG GENES database consists of gene sets for the complete genomes, and genes in each set are given [annotations](/source/Genome_annotation) in the form of establishing correspondences to the wiring diagrams of KEGG pathway maps, KEGG modules, and BRITE hierarchies.

These correspondences are made using the concept of [orthologs](/source/Ortholog). The KEGG pathway maps are drawn based on experimental evidence in specific organisms but they are designed to be applicable to other organisms as well, because different organisms, such as human and mouse, often share identical pathways consisting of functionally identical genes, called orthologous genes or orthologs. All the genes in the KEGG GENES database are being grouped into such orthologs in the KEGG ORTHOLOGY (KO) database. Because the nodes (gene products) of KEGG pathway maps, as well as KEGG modules and BRITE hierarchies, are given KO identifiers, the correspondences are established once genes in the genome are annotated with KO identifiers by the [genome annotation](/source/Genome_annotation) procedure in KEGG.[4]

### Chemical information

The KEGG metabolic pathway maps are drawn to represent the dual aspects of the metabolic network: the genomic network of how genome-encoded [enzymes](/source/Enzyme) are connected to catalyze consecutive reactions and the chemical network of how chemical structures of [substrates](/source/Enzyme_substrate_(biology)) and [products](/source/Product_(biology)) are transformed by these reactions.[6] A set of enzyme genes in the genome will identify enzyme relation networks when superimposed on the KEGG pathway maps, which in turn characterize chemical structure transformation networks allowing interpretation of [biosynthetic](/source/Biosynthetic) and [biodegradation](/source/Biodegradation) potentials of the organism. Alternatively, a set of [metabolites](/source/Metabolite) identified in the metabolome will lead to the understanding of enzymatic pathways and enzyme genes involved.

The databases in the chemical information category, which are collectively called KEGG LIGAND, are organized by capturing knowledge of the chemical network. In the beginning of the KEGG project, KEGG LIGAND consisted of three databases: KEGG COMPOUND for chemical compounds, KEGG REACTION for chemical reactions, and KEGG ENZYME for reactions in the enzyme nomenclature.[7] Currently, there are additional databases: KEGG GLYCAN for glycans[8] and two auxiliary reaction databases called RPAIR (reactant pair alignments) and RCLASS (reaction class).[9] KEGG COMPOUND has also been expanded to contain various compounds such as [xenobiotics](/source/Xenobiotic), in addition to metabolites.

### Health information

In KEGG, diseases are viewed as perturbed states of the biological system caused by perturbants of genetic factors and environmental factors, and drugs are viewed as different types of perturbants.[10] The KEGG PATHWAY database includes not only the normal states but also the perturbed states of the biological systems. However, disease pathway maps cannot be drawn for most diseases because molecular mechanisms are not well understood. An alternative approach is taken in the KEGG DISEASE database, which simply catalogs known genetic factors and environmental factors of diseases. These catalogs may eventually lead to more complete wiring diagrams of diseases.

The KEGG DRUG database contains [active ingredients](/source/Active_ingredient) of [approved drugs](/source/Approved_drug) in Japan, the US, and Europe. They are distinguished by chemical structures and/or chemical components and associated with [target](/source/Drug_target) molecules, [metabolizing enzymes](/source/Drug_metabolism), and other molecular interaction network information in the KEGG pathway maps and the BRITE hierarchies. This enables an integrated analysis of drug interactions with genomic information. [Crude drugs](/source/Crude_drug) and other health-related substances, which are outside the category of approved drugs, are stored in the KEGG ENVIRON database. The databases in the health information category are collectively called KEGG MEDICUS, which also includes [package inserts](/source/Package_insert) of all marketed drugs in Japan.

## Subscription model

In July 2011 KEGG introduced a subscription model for FTP download due to a significant cutback of government funding. KEGG continues to be freely available through its website, but the subscription model has raised discussions about sustainability of bioinformatics databases.[11][12]

## See also

- [Comparative Toxicogenomics Database](/source/Comparative_Toxicogenomics_Database) - CTD integrates KEGG pathways with toxicogenomic and disease data

- [ConsensusPathDB](/source/ConsensusPathDB), a molecular functional interaction database, integrating information from KEGG

- [Gene Ontology](/source/Gene_Ontology) (GO)

- [PubMed](/source/PubMed)

- [Uniprot](/source/Uniprot)

- [Gene Disease Database](/source/Gene_Disease_Database)

## References

1. **[^](#cite_ref-pmid10592173_1-0)** Kanehisa M, Goto S (2000). ["KEGG: Kyoto Encyclopedia of Genes and Genomes"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC102409). *Nucleic Acids Res*. **28** (1): 27–30. [doi](/source/Doi_(identifier)):[10.1093/nar/28.1.27](https://doi.org/10.1093%2Fnar%2F28.1.27). [PMC](/source/PMC_(identifier)) [102409](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC102409). [PMID](/source/PMID_(identifier)) [10592173](https://pubmed.ncbi.nlm.nih.gov/10592173).

1. **[^](#cite_ref-pmid9287494_2-0)** Kanehisa M (1997). "A database for post-genome analysis". *Trends Genet*. **13** (9): 375–6. [doi](/source/Doi_(identifier)):[10.1016/S0168-9525(97)01223-7](https://doi.org/10.1016%2FS0168-9525%2897%2901223-7). [PMID](/source/PMID_(identifier)) [9287494](https://pubmed.ncbi.nlm.nih.gov/9287494).

1. **[^](#cite_ref-pmid16381885_3-0)** Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita KF, Itoh M, Kawashima S, Katayama T, Araki M, Hirakawa M (2006). ["From genomics to chemical genomics: new developments in KEGG"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1347464). *Nucleic Acids Res*. **34** (Database issue): D354–7. [doi](/source/Doi_(identifier)):[10.1093/nar/gkj102](https://doi.org/10.1093%2Fnar%2Fgkj102). [PMC](/source/PMC_(identifier)) [1347464](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1347464). [PMID](/source/PMID_(identifier)) [16381885](https://pubmed.ncbi.nlm.nih.gov/16381885).

1. ^ [***a***](#cite_ref-pmid24214961_4-0) [***b***](#cite_ref-pmid24214961_4-1) Kanehisa M, Goto S, Sato Y, Kawashima M, Furumichi M, Tanabe M (2014). ["Data, information, knowledge and principle: back to metabolism in KEGG"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3965122). *Nucleic Acids Res*. **42** (Database issue): D199–205. [doi](/source/Doi_(identifier)):[10.1093/nar/gkt1076](https://doi.org/10.1093%2Fnar%2Fgkt1076). [PMC](/source/PMC_(identifier)) [3965122](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3965122). [PMID](/source/PMID_(identifier)) [24214961](https://pubmed.ncbi.nlm.nih.gov/24214961).

1. **[^](#cite_ref-pmid7542800_5-0)** Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF, Kerlavage AR, Bult CJ, Tomb JF, Dougherty BA, Merrick JM, et al. (1995). "Whole-genome random sequencing and assembly of Haemophilus influenzae Rd". *Science*. **269** (5223): 496–512. [Bibcode](/source/Bibcode_(identifier)):[1995Sci...269..496F](https://ui.adsabs.harvard.edu/abs/1995Sci...269..496F). [doi](/source/Doi_(identifier)):[10.1126/science.7542800](https://doi.org/10.1126%2Fscience.7542800). [PMID](/source/PMID_(identifier)) [7542800](https://pubmed.ncbi.nlm.nih.gov/7542800). [S2CID](/source/S2CID_(identifier)) [10423613](https://api.semanticscholar.org/CorpusID:10423613).

1. **[^](#cite_ref-pmid23816707_6-0)** Kanehisa M (2013). "Chemical and genomic evolution of enzyme-catalyzed reaction networks". *FEBS Lett*. **587** (17): 2731–7. [doi](/source/Doi_(identifier)):[10.1016/j.febslet.2013.06.026](https://doi.org/10.1016%2Fj.febslet.2013.06.026). [hdl](/source/Hdl_(identifier)):[2433/178762](https://hdl.handle.net/2433%2F178762). [PMID](/source/PMID_(identifier)) [23816707](https://pubmed.ncbi.nlm.nih.gov/23816707). [S2CID](/source/S2CID_(identifier)) [40074657](https://api.semanticscholar.org/CorpusID:40074657).

1. **[^](#cite_ref-pmid9847234_7-0)** Goto S, Nishioka T, Kanehisa M (1999). ["LIGAND database for enzymes, compounds and reactions"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC148189). *Nucleic Acids Res*. **27** (1): 377–9. [doi](/source/Doi_(identifier)):[10.1093/nar/27.1.377](https://doi.org/10.1093%2Fnar%2F27.1.377). [PMC](/source/PMC_(identifier)) [148189](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC148189). [PMID](/source/PMID_(identifier)) [9847234](https://pubmed.ncbi.nlm.nih.gov/9847234).

1. **[^](#cite_ref-pmid16014746_8-0)** Hashimoto K, Goto S, Kawano S, Aoki-Kinoshita KF, Ueda N, Hamajima M, Kawasaki T, Kanehisa M (2006). ["KEGG as a glycome informatics resource"](https://doi.org/10.1093%2Fglycob%2Fcwj010). *Glycobiology*. **16** (5): 63R–70R. [doi](/source/Doi_(identifier)):[10.1093/glycob/cwj010](https://doi.org/10.1093%2Fglycob%2Fcwj010). [PMID](/source/PMID_(identifier)) [16014746](https://pubmed.ncbi.nlm.nih.gov/16014746).

1. **[^](#cite_ref-pmid23384306_9-0)** Muto A, Kotera M, Tokimatsu T, Nakagawa Z, Goto S, Kanehisa M (2013). ["Modular architecture of metabolic pathways revealed by conserved sequences of reactions"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3632090). *J Chem Inf Model*. **53** (3): 613–22. [doi](/source/Doi_(identifier)):[10.1021/ci3005379](https://doi.org/10.1021%2Fci3005379). [PMC](/source/PMC_(identifier)) [3632090](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3632090). [PMID](/source/PMID_(identifier)) [23384306](https://pubmed.ncbi.nlm.nih.gov/23384306).

1. **[^](#cite_ref-pmid19880382_10-0)** Kanehisa M, Goto S, Furumichi M, Tanabe M, Hirakawa M (2010). ["KEGG for representation and analysis of molecular networks involving diseases and drugs"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2808910). *Nucleic Acids Res*. **38** (Database issue): D355–60. [doi](/source/Doi_(identifier)):[10.1093/nar/gkp896](https://doi.org/10.1093%2Fnar%2Fgkp896). [PMC](/source/PMC_(identifier)) [2808910](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2808910). [PMID](/source/PMID_(identifier)) [19880382](https://pubmed.ncbi.nlm.nih.gov/19880382).

1. **[^](#cite_ref-pmid22144685_11-0)** Galperin MY, Fernández-Suárez XM (2012). ["The 2012 Nucleic Acids Research Database Issue and the online Molecular Biology Database Collection"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3245068). *Nucleic Acids Res*. **40** (Database issue): D1–8. [doi](/source/Doi_(identifier)):[10.1093/nar/gkr1196](https://doi.org/10.1093%2Fnar%2Fgkr1196). [PMC](/source/PMC_(identifier)) [3245068](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3245068). [PMID](/source/PMID_(identifier)) [22144685](https://pubmed.ncbi.nlm.nih.gov/22144685).

1. **[^](#cite_ref-NatureNews_12-0)** Hayden, EC (2013). ["Popular plant database set to charge users"](https://www.nature.com/news/popular-plant-database-set-to-charge-users-1.13642). *Nature*. [doi](/source/Doi_(identifier)):[10.1038/nature.2013.13642](https://doi.org/10.1038%2Fnature.2013.13642). [S2CID](/source/S2CID_(identifier)) [211729309](https://api.semanticscholar.org/CorpusID:211729309).

## External links

[Wikidata](/source/Wikidata) has the property:

- ***[KEGG ID (P665)](https://www.wikidata.org/wiki/Property_talk:P665)*** (see [uses](https://query.wikidata.org/embed.html#SELECT%20%3FWikidata_item_%20%3FWikidata_item_Label%20%3Fvalue%20%3FvalueLabel%20%3FEnglish_Wikipedia_article%20%23Show%20data%20in%20this%20order%0A%7B%0A%09%3FWikidata_item_%20wdt%3AP665%20%3Fvalue%20.%20%23Collecting%20all%20items%20which%20have%20P665%20data%2C%20from%20whole%20Wikidata%20item%20pages%0A%09OPTIONAL%20%7B%3FEnglish_Wikipedia_article%20schema%3Aabout%20%3FWikidata_item_%3B%20schema%3AisPartOf%20%3Chttps%3A%2F%2Fen.wikipedia.org%2F%3E%20.%7D%20%23If%20collected%20item%20has%20link%20to%20English%20Wikipedia%2C%20show%20that%0A%09SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22en%22%20%20%7D%20%23Show%20label%20in%20this%20language.%20%22en%22%20is%20English.%20%20%20%0A%7D%0ALIMIT%201000))

- [KEGG website](http://www.kegg.jp/)

- [GenomeNet mirror site](https://www.genome.jp/kegg/)

- The [entry for KEGG](https://web.archive.org/web/20120402032246/http://metadatabase.org/wiki/KEGG) in MetaBase

v t e Bioinformatics Databases Sequence databases: GenBank, European Nucleotide Archive, DNA Data Bank of Japan and China National GeneBank Secondary databases: UniProt, database of protein sequences grouping together Swiss-Prot, TrEMBL and Protein Information Resource Other databases: BioNumbers, Protein Data Bank, Ensembl, InterPro, KEGG, and Gene Ontology Specialised genomic databases: BOLD, Saccharomyces Genome Database, FlyBase, VectorBase, WormBase, Rat Genome Database, PHI-base, Arabidopsis Information Resource, GISAID and Zebrafish Information Network Software BLAST Bowtie Clustal EMBOSS HMMER MUSCLE PANGOLIN SAMtools SOAP suite TopHat Other Server: ExPASy Rosalind (education platform) Institutions Broad Institute Computational Biology Department (CBD) Microsoft Research - University of Trento Centre for Computational and Systems Biology (COSBI) Database Center for Life Science (DBCLS) DNA Data Bank of Japan (DDBJ) European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory (EMBL) Flatiron Institute J. Craig Venter Institute (JCVI) Joint Genome Institute (JGI) Max Planck Institute for Molecular Genetics (MPIMG) Max Planck Institute of Molecular Cell Biology and Genetics (MPI-CBG) US National Center for Biotechnology Information (NCBI) Japanese Institute of Genetics Netherlands Bioinformatics Centre (NBIC) Philippine Genome Center (PGC) Scripps Research Swiss Institute of Bioinformatics (SIB) Wellcome Sanger Institute Whitehead Institute Organizations African Society for Bioinformatics and Computational Biology (ASBCB) Australia Bioinformatics Resource (EMBL-AR) European Molecular Biology network (EMBnet) International Nucleotide Sequence Database Collaboration (INSDC) International Society for Biocuration (ISB) International Society for Computational Biology (ISCB) Student Council (ISCB-SC) Institute of Genomics and Integrative Biology (CSIR-IGIB) Japanese Society for Bioinformatics (JSBi) Meetings Basel Computational Biology Conference‎ ([BC2]) European Conference on Computational Biology (ECCB) Intelligent Systems for Molecular Biology (ISMB) International Conference on Bioinformatics (InCoB) International Conference on Computational Intelligence Methods for Bioinformatics and Biostatistics (CIBB) ISCB Africa ASBCB Conference on Bioinformatics Pacific Symposium on Biocomputing (PSB) Research in Computational Molecular Biology (RECOMB) File formats CRAM format FASTA format FASTQ format NeXML format Nexus format Pileup format SAM format Stockholm format VCF format GFF format GTF format Related topics Computational biology List of biobanks List of biological databases Molecular phylogenetics Sequencing Sequence database Sequence alignment Category Commons

---
Adapted from the Wikipedia article [KEGG](https://en.wikipedia.org/wiki/KEGG) by Wikipedia contributors ([contributor history](https://en.wikipedia.org/wiki/KEGG?action=history)). Available under [Creative Commons Attribution-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/). Changes may have been made.