KEGG (Kyoto Encyclopedia of Genes and Genomes) is a collection of databases dealing with

genome In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding ...

biological pathway A biological pathway is a series of interactions among molecules in a cell that leads to a certain product or a change in a cell. Such a pathway can trigger the assembly of new molecules, such as a fat or protein. Pathways can also turn genes on a ...

disease A disease is a particular abnormal condition that negatively affects the structure or function of all or part of an organism, and that is not immediately due to any external injury. Diseases are often known to be medical conditions that a ...

drug A drug is any chemical substance that causes a change in an organism's physiology or psychology when consumed. Drugs are typically distinguished from food and substances that provide nutritional support. Consumption of drugs can be via inhala ...

s, and

chemical substance A chemical substance is a form of matter having constant chemical composition and characteristic properties. Some references add that chemical substance cannot be separated into its constituent elements by physical separation methods, i.e., wit ...

s. KEGG is utilized for

bioinformatics Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combi ...

research and education, including data analysis in

genomics Genomics is an interdisciplinary field of biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, three-dim ...

metagenomics Metagenomics is the study of genetic material recovered directly from environmental or clinical samples by a method called sequencing. The broad field may also be referred to as environmental genomics, ecogenomics, community genomics or microb ...

metabolomics Metabolomics is the scientific study of chemical processes involving metabolites, the small molecule substrates, intermediates, and products of cell metabolism. Specifically, metabolomics is the "systematic study of the unique chemical fingerprin ...

and other

omics The branches of science known informally as omics are various disciplines in biology whose names end in the suffix ''-omics'', such as genomics, proteomics, metabolomics, metagenomics, phenomics and transcriptomics. Omics aims at the collective ...

studies, modeling and simulation in

systems biology Systems biology is the computational and mathematical analysis and modeling of complex biological systems. It is a biology-based interdisciplinary field of study that focuses on complex interactions within biological systems, using a holistic ...

, and translational research in

drug development Drug development is the process of bringing a new pharmaceutical drug to the market once a lead compound has been identified through the process of drug discovery. It includes preclinical research on microorganisms and animals, filing for re ...

. The KEGG database project was initiated in 1995 by

Minoru Kanehisa (born January 23, 1948) is a Japanese bioinformatician. He is a project professor at Kyoto University, technical director of Pathway Solutions Inc and president of NPO Bioinformatics Japan. He is one of Japan's most recognized and respected bio ...

, professor at the Institute for Chemical Research,

Kyoto University , mottoeng = Freedom of academic culture , established = , type = Public (National) , endowment = ¥ 316 billion (2.4 billion USD) , faculty = 3,480 (Teaching Staff) , administrative_staff = 3,978 (Total Staff) , students = ...

, under the then ongoing Japanese Human Genome Program. Foreseeing the need for a computerized resource that can be used for biological interpretation of genome sequence data, he started developing the KEGG PATHWAY database. It is a collection of manually drawn KEGG pathway maps representing experimental knowledge on

metabolism Metabolism (, from el, μεταβολή ''metabolē'', "change") is the set of life-sustaining chemical reactions in organisms. The three main functions of metabolism are: the conversion of the energy in food to energy available to run ...

and various other functions of the

cell Cell most often refers to: * Cell (biology), the functional basic unit of life Cell may also refer to: Locations * Monastic cell, a small room, hut, or cave in which a religious recluse lives, alternatively the small precursor of a monastery ...

and the

organism In biology, an organism () is any living system that functions as an individual entity. All organisms are composed of cells ( cell theory). Organisms are classified by taxonomy into groups such as multicellular animals, plants, and fu ...

. Each pathway map contains a network of molecular interactions and reactions and is designed to link

gene In biology, the word gene (from , ; "...Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or ''birth'' or ''gender'') can have several different meanings. The Mendelian gene is a b ...

s in the genome to gene products (mostly

protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, res ...

s) in the pathway. This has enabled the analysis called KEGG pathway mapping, whereby the gene content in the genome is compared with the KEGG PATHWAY database to examine which pathways and associated functions are likely to be encoded in the genome. According to the developers, KEGG is a "computer representation" of the

biological system A biological system is a complex network which connects several biologically relevant entities. Biological organization spans several scales and are determined based different structures depending on what the system is. Examples of biological syst ...

. It integrates building blocks and wiring diagrams of the system—more specifically, genetic building blocks of genes and proteins, chemical building blocks of small molecules and reactions, and wiring diagrams of molecular interaction and reaction networks. This concept is realized in the following databases of KEGG, which are categorized into systems, genomic, chemical, and health information. * Systems information ** PATHWAY: pathway maps for cellular and organismal functions ** MODULE: modules or functional units of genes ** BRITE: hierarchical classifications of biological entities * Genomic information ** GENOME: complete

s ** GENES:

s and

s in the complete genomes ** ORTHOLOGY:

ortholog Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a s ...

groups of genes in the complete genomes * Chemical information ** COMPOUND, GLYCAN:

chemical compound A chemical compound is a chemical substance composed of many identical molecules (or molecular entities) containing atoms from more than one chemical element held together by chemical bonds. A molecule consisting of atoms of only one element ...

s and

glycan The terms glycans and polysaccharides are defined by IUPAC as synonyms meaning "compounds consisting of a large number of monosaccharides linked glycosidically". However, in practice the term glycan may also be used to refer to the carbohydrate ...

s ** REACTION, RPAIR, RCLASS:

chemical reaction A chemical reaction is a process that leads to the chemical transformation of one set of chemical substances to another. Classically, chemical reactions encompass changes that only involve the positions of electrons in the forming and breaking ...

s ** ENZYME:

enzyme nomenclature Enzymes () are proteins that act as biological catalysts by accelerating chemical reactions. The molecules upon which enzymes may act are called substrates, and the enzyme converts the substrates into different molecules known as products. ...

* Health information ** DISEASE: human

s ** DRUG:

approved drugs An approved drug is a medicinal preparation that has been validated for a therapeutic use by a ruling authority of a government. This process is usually specific by country, unless specified otherwise. Process by country United States In the ...

** ENVIRON:

crude drug Crude drugs are plant or animal drugs that contain natural substances that have undergone only the processes of collection and drying. The term natural substances refers to those substances found in nature that have not had man-made changes made i ...

s and health-related substances

Databases

Systems information

The KEGG PATHWAY database, the wiring diagram database, is the core of the KEGG resource. It is a collection of pathway maps integrating many entities including genes, proteins, RNAs, chemical compounds, glycans, and chemical reactions, as well as disease genes and drug targets, which are stored as individual entries in the other databases of KEGG. The pathway maps are classified into the following sections: *

Metabolism Metabolism (, from el, μεταβολή ''metabolē'', "change") is the set of life-sustaining chemical reactions in organisms. The three main functions of metabolism are: the conversion of the energy in food to energy available to run ...

* Genetic information processing ( transcription,

translation Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. The English language draws a terminological distinction (which does not exist in every language) between ''transla ...

, replication and

repair The technical meaning of maintenance involves functional checks, servicing, repairing or replacing of necessary devices, equipment, machinery, building infrastructure, and supporting utilities in industrial, business, and residential installa ...

, etc.) * Environmental information processing (

membrane transport In cellular biology, membrane transport refers to the collection of mechanisms that regulate the passage of solutes such as ions and small molecules through biological membranes, which are lipid bilayers that contain proteins embedded in them. The ...

signal transduction Signal transduction is the process by which a chemical or physical signal is transmitted through a cell as a series of molecular events, most commonly protein phosphorylation catalyzed by protein kinases, which ultimately results in a cellula ...

, etc.) * Cellular processes (

cell growth Cell growth refers to an increase in the total mass of a cell, including both cytoplasmic, nuclear and organelle volume. Cell growth occurs when the overall rate of cellular biosynthesis (production of biomolecules or anabolism) is greater th ...

, cell death,

cell membrane The cell membrane (also known as the plasma membrane (PM) or cytoplasmic membrane, and historically referred to as the plasmalemma) is a biological membrane that separates and protects the interior of all cells from the outside environment (t ...

functions, etc.) * Organismal systems (

immune system The immune system is a network of biological processes that protects an organism from diseases. It detects and responds to a wide variety of pathogens, from viruses to parasitic worms, as well as cancer cells and objects such as wood splinte ...

endocrine system The endocrine system is a messenger system comprising feedback loops of the hormones released by internal glands of an organism directly into the circulatory system, regulating distant target organs. In vertebrates, the hypothalamus is th ...

nervous system In biology, the nervous system is the highly complex part of an animal that coordinates its actions and sensory information by transmitting signals to and from different parts of its body. The nervous system detects environmental changes ...

, etc.) * Human

s *

Drug development Drug development is the process of bringing a new pharmaceutical drug to the market once a lead compound has been identified through the process of drug discovery. It includes preclinical research on microorganisms and animals, filing for re ...

The metabolism section contains aesthetically drawn global maps showing an overall picture of metabolism, in addition to regular metabolic pathway maps. The low-resolution global maps can be used, for example, to compare metabolic capacities of different organisms in genomics studies and different environmental samples in metagenomics studies. In contrast, KEGG modules in the KEGG MODULE database are higher-resolution, localized wiring diagrams, representing tighter functional units within a pathway map, such as subpathways conserved among specific organism groups and molecular complexes. KEGG modules are defined as characteristic gene sets that can be linked to specific metabolic capacities and other

phenotypic In genetics, the phenotype () is the set of observable characteristics or traits of an organism. The term covers the organism's morphology or physical form and structure, its developmental processes, its biochemical and physiological pr ...

features, so that they can be used for automatic interpretation of genome and metagenome data. Another database that supplements KEGG PATHWAY is the KEGG BRITE database. It is an

ontology In metaphysics, ontology is the philosophy, philosophical study of being, as well as related concepts such as existence, Becoming (philosophy), becoming, and reality. Ontology addresses questions like how entities are grouped into Category ...

database containing hierarchical classifications of various entities including genes, proteins, organisms, diseases, drugs, and chemical compounds. While KEGG PATHWAY is limited to molecular interactions and reactions of these entities, KEGG BRITE incorporates many different types of relationships.

Genomic information

Several months after the KEGG project was initiated in 1995, the first report of the completely sequenced

bacteria Bacteria (; singular: bacterium) are ubiquitous, mostly free-living organisms often consisting of one biological cell. They constitute a large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria were am ...

l genome was published. Since then all published complete genomes are accumulated in KEGG for both

eukaryote Eukaryotes () are organisms whose cells have a nucleus. All animals, plants, fungi, and many unicellular organisms, are Eukaryotes. They belong to the group of organisms Eukaryota or Eukarya, which is one of the three domains of life. Bacter ...

s and

prokaryote A prokaryote () is a single-celled organism that lacks a nucleus and other membrane-bound organelles. The word ''prokaryote'' comes from the Greek πρό (, 'before') and κάρυον (, 'nut' or 'kernel').Campbell, N. "Biology:Concepts & Con ...

s. The KEGG GENES database contains gene/protein-level information and the KEGG GENOME database contains organism-level information for these genomes. The KEGG GENES database consists of gene sets for the complete genomes, and genes in each set are given

annotation An annotation is extra information associated with a particular point in a document or other piece of information. It can be a note that includes a comment or explanation. Annotations are sometimes presented in the margin of book pages. For anno ...

s in the form of establishing correspondences to the wiring diagrams of KEGG pathway maps, KEGG modules, and BRITE hierarchies. These correspondences are made using the concept of

s. The KEGG pathway maps are drawn based on experimental evidence in specific organisms but they are designed to be applicable to other organisms as well, because different organisms, such as human and mouse, often share identical pathways consisting of functionally identical genes, called orthologous genes or orthologs. All the genes in the KEGG GENES database are being grouped into such orthologs in the KEGG ORTHOLOGY (KO) database. Because the nodes (gene products) of KEGG pathway maps, as well as KEGG modules and BRITE hierarchies, are given KO identifiers, the correspondences are established once genes in the genome are annotated with KO identifiers by the genome annotation procedure in KEGG.

Chemical information

The KEGG metabolic pathway maps are drawn to represent the dual aspects of the metabolic network: the genomic network of how genome-encoded

enzyme Enzymes () are proteins that act as biological catalysts by accelerating chemical reactions. The molecules upon which enzymes may act are called substrates, and the enzyme converts the substrates into different molecules known as products ...

s are connected to catalyze consecutive reactions and the chemical network of how chemical structures of substrates and

product Product may refer to: Business * Product (business), an item that serves as a solution to a specific consumer problem. * Product (project management), a deliverable or set of deliverables that contribute to a business solution Mathematics * Produ ...

s are transformed by these reactions. A set of enzyme genes in the genome will identify enzyme relation networks when superimposed on the KEGG pathway maps, which in turn characterize chemical structure transformation networks allowing interpretation of

biosynthetic Biosynthesis is a multi-step, enzyme-Catalysis, catalyzed process where substrate (chemistry), substrates are converted into more complex Product (chemistry), products in living organisms. In biosynthesis, simple Chemical compound, compounds are mo ...

and

biodegradation Biodegradation is the breakdown of organic matter by microorganisms, such as bacteria and fungi. It is generally assumed to be a natural process, which differentiates it from composting. Composting is a human-driven process in which biodegra ...

potentials of the organism. Alternatively, a set of

metabolite In biochemistry, a metabolite is an intermediate or end product of metabolism. The term is usually used for small molecules. Metabolites have various functions, including fuel, structure, signaling, stimulatory and inhibitory effects on enzymes, ...

s identified in the metabolome will lead to the understanding of enzymatic pathways and enzyme genes involved. The databases in the chemical information category, which are collectively called KEGG LIGAND, are organized by capturing knowledge of the chemical network. In the beginning of the KEGG project, KEGG LIGAND consisted of three databases: KEGG COMPOUND for chemical compounds, KEGG REACTION for chemical reactions, and KEGG ENZYME for reactions in the enzyme nomenclature. Currently, there are additional databases: KEGG GLYCAN for glycans and two auxiliary reaction databases called RPAIR (reactant pair alignments) and RCLASS (reaction class). KEGG COMPOUND has also been expanded to contain various compounds such as

xenobiotic A xenobiotic is a chemical substance found within an organism that is not naturally produced or expected to be present within the organism. It can also cover substances that are present in much higher concentrations than are usual. Natural compo ...

s, in addition to metabolites.

Health information

In KEGG, diseases are viewed as perturbed states of the biological system caused by perturbants of genetic factors and environmental factors, and drugs are viewed as different types of perturbants. The KEGG PATHWAY database includes not only the normal states but also the perturbed states of the biological systems. However, disease pathway maps cannot be drawn for most diseases because molecular mechanisms are not well understood. An alternative approach is taken in the KEGG DISEASE database, which simply catalogs known genetic factors and environmental factors of diseases. These catalogs may eventually lead to more complete wiring diagrams of diseases. The KEGG DRUG database contains

active ingredient An active ingredient is any ingredient that provides biologically active or other direct effect in the diagnosis, cure, mitigation, treatment, or prevention of disease or to affect the structure or any function of the body of humans or animals. The ...

s of

approved drug An approved drug is a medicinal preparation that has been validated for a therapeutic use by a ruling authority of a government. This process is usually specific by country, unless specified otherwise. Process by country United States In the ...

s in Japan, the US, and Europe. They are distinguished by chemical structures and/or chemical components and associated with

target Target may refer to: Physical items * Shooting target, used in marksmanship training and various shooting sports ** Bullseye (target), the goal one for which one aims in many of these sports ** Aiming point, in field artillery, fi ...

molecules, metabolizing enzymes, and other molecular interaction network information in the KEGG pathway maps and the BRITE hierarchies. This enables an integrated analysis of drug interactions with genomic information.

Crude drug Crude drugs are plant or animal drugs that contain natural substances that have undergone only the processes of collection and drying. The term natural substances refers to those substances found in nature that have not had man-made changes made i ...

s and other health-related substances, which are outside the category of approved drugs, are stored in the KEGG ENVIRON database. The databases in the health information category are collectively called KEGG MEDICUS, which also includes

package insert A package insert is a document included in the package of a medication that provides information about that drug and its use. For prescription medications, the insert is technical, providing information for medical professionals about how to pr ...

s of all marketed drugs in Japan.

Subscription model

In July 2011 KEGG introduced a subscription model for FTP download due to a significant cutback of government funding. KEGG continues to be freely available through its website, but the subscription model has raised discussions about sustainability of bioinformatics databases.

References

External links

KEGG website

GenomeNet mirror site
* Th
entry for KEGG
in MetaBase {{DEFAULTSORT:Kegg Biological databases Genetic engineering in Japan Online databases Systems biology 21st-century encyclopedias

Databases

Systems information

Genomic information

Chemical information

Health information

Subscription model

See also

References

External links