GeneCards is a
database
In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and a ...
of human
gene
In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protei ...
s that provides
genomic
Genomics is an interdisciplinary field of molecular biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, ...
,
proteomic
Proteomics is the large-scale study of proteins. Proteins are vital macromolecules of all living organisms, with many functions such as the formation of structural fibers of muscle tissue, enzymatic digestion of food, or synthesis and replicatio ...
,
transcriptomic,
genetic and functional information on all known and predicted human genes.
It is being developed and maintained by the Crown Human Genome Center at the
Weizmann Institute of Science
The Weizmann Institute of Science ( ''Machon Weizmann LeMada'') is a Public university, public research university in Rehovot, Israel, established in 1934, fourteen years before the State of Israel was founded. Unlike other List of Israeli uni ...
, in collaboration with LifeMap Sciences.
The database aims at providing a comprehensive view of the current available
biomedical
Biomedicine (also referred to as Western medicine, mainstream medicine or conventional medicine) information about the searched gene, including its aliases and identifiers, the encoded
protein
Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residue (biochemistry), residues. Proteins perform a vast array of functions within organisms, including Enzyme catalysis, catalysing metab ...
s, associated diseases and variations, its function, relevant publications and more.
The GeneCards database provides access to free
Web
Web most often refers to:
* Spider web, a silken structure created by the animal
* World Wide Web or the Web, an Internet-based hypertext system
Web, WEB, or the Web may also refer to:
Computing
* WEB, a literate programming system created by ...
resources about more than 350,000 known and predicted human genes, integrated from >150 data resources, such as
HGNC
The HUGO Gene Nomenclature Committee (HGNC) is a committee of the Human Genome Organisation (HUGO) that sets the standardization, standards for human gene nomenclature. The HGNC approves a ''unique'' and ''meaningful'' name for every known human g ...
,
Ensembl
Ensembl genome database project is a scientific project at the European Bioinformatics Institute, which provides a centralized resource for geneticists, molecular biologists and other researchers studying the genomes of our own species and other v ...
, and
NCBI
The National Center for Biotechnology Information (NCBI) is part of the National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). It is approved and funded by the government of the United States. The NCBI is loca ...
. The core gene list is based on NCBI, Ensembl and approved gene symbols published by the HUGO Gene Nomenclature Committee (HGNC).
The information is carefully gathered and selected from these databases by its integration engine.
Over time, the GeneCards database has developed a suite of tools (VarElect, GeneALaCart, etc.) that have more specialised capabilities leveraging the database. Since 1998, the GeneCards database has been widely used by
bioinformatics
Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...
,
genomics
Genomics is an interdisciplinary field of molecular biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, ...
and
medical
Medicine is the science and Praxis (process), practice of caring for patients, managing the Medical diagnosis, diagnosis, prognosis, Preventive medicine, prevention, therapy, treatment, Palliative care, palliation of their injury or disease, ...
communities for more than 24 years.
History
Since the 1980s, sequence information has become increasingly abundant; subsequently many laboratories realized this and began to store such information in central repositories-the primary database.
However, the information provided by the primary sequence databases (lower level databases) focus on different aspects. To gather these scattered data, the Weizmann Institute of Science's Crown Human Genome Centre developed a database called ‘GeneCards’ in 1997. This database mainly dealt with human genome information, human genes, the encoded proteins’ functions, and related diseases, though it has expanded since that time.
Growth
Initially, the GeneCards database had two main features: delivery of integrated biomedical information for a gene in ‘card’ format, and a text-based
search engine
A search engine is a software system that provides hyperlinks to web pages, and other relevant information on World Wide Web, the Web in response to a user's web query, query. The user enters a query in a web browser or a mobile app, and the sea ...
. Since 1998, the database has integrated more data resources and data types, such as
protein expression Protein expression may refer to:
*Gene expression, the processes that convert the information of DNA genes into a functional copies of mRNA in living cells
*Protein production
Protein production is the biotechnological process of generating a ...
and gene network information. It has also improved the speed and sophistication of the search engine, and expanded from a gene-centric
dogma
Dogma, in its broadest sense, is any belief held definitively and without the possibility of reform. It may be in the form of an official system of principles or doctrines of a religion, such as Judaism, Roman Catholicism, Protestantism, or Islam ...
to contain gene-set analyses. Version 3 of the database gathers information from more than 90 database resources based on a consolidated gene list. It has also added a suite of GeneCards tools which focus on more specific purposes. "GeneNote and GeneAnnot for
transcriptome analyses, GeneLoc for genomic locations and markers, GeneALaCart for batch queries and GeneDecks for finding functional partners and for gene set distillations.". The database updates on a 3-year cycle of planning, implementation, development, semi-automated
quality assurance
Quality assurance (QA) is the term used in both manufacturing and service industries to describe the systematic efforts taken to assure that the product(s) delivered to customer(s) meet with the contractual and other agreed upon performance, design ...
, and deployment. Technologies used include
Eclipse
An eclipse is an astronomical event which occurs when an astronomical object or spacecraft is temporarily obscured, by passing into the shadow of another body or by having another body pass between it and the viewer. This alignment of three ...
,
Apache
The Apache ( ) are several Southern Athabaskan language-speaking peoples of the Southwestern United States, Southwest, the Southern Plains and Northern Mexico. They are linguistically related to the Navajo. They migrated from the Athabascan ho ...
,
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Though Perl is not officially an acronym, there are various backronyms in use, including "Practical Extraction and Reporting Language".
Perl was developed ...
,
XML
Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding electronic document, documents in a format that is both human-readable and Machine-r ...
,
PHP
PHP is a general-purpose scripting language geared towards web development. It was originally created by Danish-Canadian programmer Rasmus Lerdorf in 1993 and released in 1995. The PHP reference implementation is now produced by the PHP Group. ...
, Propel,
Java
Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
,
R and
MySQL
MySQL () is an Open-source software, open-source relational database management system (RDBMS). Its name is a combination of "My", the name of co-founder Michael Widenius's daughter My, and "SQL", the acronym for Structured Query Language. A rel ...
.
Ongoing GeneCards Expansions
Source:
*Animal models
*
Tissue proteomics profiling
*
RNA
Ribonucleic acid (RNA) is a polymeric molecule that is essential for most biological functions, either by performing the function itself (non-coding RNA) or by forming a template for the production of proteins (messenger RNA). RNA and deoxyrib ...
genes
*Gene and protein identifier mapping
*
Online analytical processing
In computing, online analytical processing (OLAP) (), is an approach to quickly answer multi-dimensional analytical (MDA) queries. The term ''OLAP'' was created as a slight modification of the traditional database term online transaction proces ...
(OLAP)
Availability
GeneCards can be freely accessed by
non-profit
A nonprofit organization (NPO), also known as a nonbusiness entity, nonprofit institution, not-for-profit organization, or simply a nonprofit, is a non-governmental (private) legal entity organized and operated for a collective, public, or so ...
institution for
education
Education is the transmission of knowledge and skills and the development of character traits. Formal education occurs within a structured institutional framework, such as public schools, following a curriculum. Non-formal education als ...
al and
research
Research is creative and systematic work undertaken to increase the stock of knowledge. It involves the collection, organization, and analysis of evidence to increase understanding of a topic, characterized by a particular attentiveness to ...
purpose at https://www.genecards.org/ and academic
mirror sites. Commercial usage requires a license.
GeneCards Suite
GeneDecks
GeneDecks is a novel analysis tool to identify similar or partner genes, which provides a similarity metric by highlighting shared descriptors between genes, based on GeneCards' unique wealth of combinatorial annotations of human genes.
# Annotation combinatory: Using GeneDecks, one can get a set of similar genes for a particular gene with a selected combinatorial
annotation
An annotation is extra information associated with a particular point in a document or other piece of information. It can be a note that includes a comment or explanation. Annotations are sometimes presented Marginalia, in the margin of book page ...
. The summary table result in ranking the different level of similarity between the identified genes and the probe gene.
# Annotation unification: Different data sources often offer annotations with
heterogeneous
Homogeneity and heterogeneity are concepts relating to the uniformity of a substance, process or image. A homogeneous feature is uniform in composition or character (i.e., color, shape, size, weight, height, distribution, texture, language, i ...
naming system. Annotation unification of GeneDecks is based on the similarity in GeneCards gene-content space detection
algorithm
In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...
s.
# Partner hunting: In GeneDecks's Partner Hunter, users give a query gene, and the system seeks similar genes based on combinatorial similarity of weighted attributes.
# Set distillation: In Set distiller, users give a set of genes, and the system ranks attributes by their degree of sharing within a given gene set. Like Partner Hunter, it enables sophisticated investigation of a variety of gene sets, of diverse origins, for discovering and elucidating relevant biological patterns, thus enhancing systematic genomics and systems biology scrutiny.
GeneALaCart
GeneALaCart is a gene-set-orientated batch-querying engine based on the popular GeneCards database. It allows retrieval of information about multiple genes in a batch query.
GeneLoc
The GeneLoc suit member presents an integrated human
chromosome map, which is very important for designing a custom-made
capture chip, based on data integrated by the GeneLoc algorithm. GeneLoc includes further links to GeneCards, NCBI's Human Genome Sequencing,
UniGene
UniGene was a NCBI database of the transcriptome and thus, despite the name, not primarily a database for genes. Each entry is a set of transcripts that appear to stem from the same transcription locus (i.e. gene or expressed pseudogene). Info ...
, and mapping resources.
Usage
Search
Firstly, enter a search term into the blank on the homepages. Searching methods include Keywords, Symbol only, Symbol/Alias/Identifier and Symbol/Alias.
The default search option is searching by keywords. When a user searches by keywords, MicroCard and MiniCard are shown. However, when a user searches by Symbol only, they will be directed to GeneCard.
Searches may be furthered by clicking on advanced search, where a user can choose section, category, GIFtS, Symbol Source and gene sets directly. Sections include Aliases & Descriptions, Disorders, Drugs & Compounds, Expression in Human Tissues, Function, Genomic Location, Genomic Variants, Orthologs, Paralogs, Pathways & Interactions, Protein Domains/Families, Proteins, Publications, Summaries and Transcripts. The default option is searching for all sections.
Categories include
Protein-coding,
Pseudogene
Pseudogenes are nonfunctional segments of DNA that resemble functional genes. Pseudogenes can be formed from both protein-coding genes and non-coding genes. In the case of protein-coding genes, most pseudogenes arise as superfluous copies of fun ...
s,
RNA gene
A non-coding RNA (ncRNA) is a functional RNA molecule that is not translated into a protein. The DNA sequence from which a functional non-coding RNA is transcribed is often called an RNA gene. Abundant and functionally important types of non-co ...
s
Genetic Loci,
Gene cluster
A gene cluster is a group of two or more genes found within an organism's DNA that encode similar peptide, polypeptides or proteins which collectively share a generalized function and are often located within a few thousand base pairs of each othe ...
s and Uncategorized. The default option is searching for all categories.
GIFtS is the GeneCards Inferred Functionality Scores, which gives objective numbers to show the knowledge level about the functionality of human genes. It includes High, Medium, Low, and custom range.
Symbol Sources include
HGNC
The HUGO Gene Nomenclature Committee (HGNC) is a committee of the Human Genome Organisation (HUGO) that sets the standardization, standards for human gene nomenclature. The HGNC approves a ''unique'' and ''meaningful'' name for every known human g ...
(
HUGO Gene Nomenclature Committee
The HUGO Gene Nomenclature Committee (HGNC) is a committee of the Human Genome Organisation (HUGO) that sets the standards for human gene nomenclature. The HGNC approves a ''unique'' and ''meaningful'' name for every known human gene, based on a ...
), EntrezGene (gene-centered information at NCBI), Ensembl, GeneCards RNA genes, CroW21 and so on.
Moreover, the user can choose to search for All GeneCards or Within Gene Subset, which would be more specific and with priority.
Secondly, the search result page shows all relevant minicards. Symbol, Description, Category, GIFtS, GC id and Score are displayed on the page.
A user may click on the plus button for each of the mini-cards to open the minicard. Also, the user can click directly on the symbol to see the details of a particular GeneCard.
GeneCards Content
Source:
For a particular GeneCard (example: ), it is consist of the following contents.
# Header: The header is made up of gene's symbol, category (i.e. protein-coding), GIFtS(i.e. 74) and GCID(GC19M041837). Different categories have different colors to express: protein-coding, pseudogene, RNA gene,
gene cluster
A gene cluster is a group of two or more genes found within an organism's DNA that encode similar peptide, polypeptides or proteins which collectively share a generalized function and are often located within a few thousand base pairs of each othe ...
, genetic locus, and uncategorized. The background indicates the symbol sources: HGNC Approved Genes, EntrezGene Database, Ensembl Gene Database, or GeneCards Generated Genes.
# Aliases: Aliases, as its name indicates, shows synonyms and aliases of the gene according to diverse sources such as HGNC. The right column displays how the aliases associated with the resources and gives previous GC identifiers.
# Summaries: The left column is the same with the one in the Aliases, which shows the sources. The right column here gives brief summary on gene's function, localization and effect on phenotype from various sources.
# Genomic Views: In addition to sources, this section gives reference DNA sequence, regulatory elements, epigenetics, chromosome band and genomic location of different sources. The red line on the image indicates the GeneLoc integrated location. In particular, if the GeneLoc integrated location is different from the location in Entrez Gene, it is shown in green; Blue is appeared when the GeneLoc integrated location differs from the location in Ensembl. Addition details can be accessed through the links in the section.
# Proteins: This section presents annotated information of genes, including recommended name, size, subunit, subcellular location and secondary accessions. Also, post-translational modifications, protein expression data, REF SEQ proteins, ENSEMBL proteins, Reactome Protein details, Human Recombinant Protein Products,
Gene Ontology
The Gene Ontology (GO) is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species. More specifically, the project aims to: 1) maintain and develop its controlled vocabulary of gene and ...
, Antibody Products and Assay Products are introduced.
# Protein Domains/Families: This section shows annotated information of protein domains and families.
# Function: The function section describes gene function, including: Human
phenotypes
In genetics, the phenotype () is the set of observable characteristics or traits of an organism. The term covers the organism's morphology (physical form and structure), its developmental processes, its biochemical and physiological properti ...
, bound Targets,
shRNA
A short hairpin RNA or small hairpin RNA (shRNA/Hairpin Vector) is an artificial RNA molecule with a tight hairpin turn that can be used to silence target gene expression via RNA interference (RNAi). Expression of shRNA in cells is typically acc ...
for human and/or mouse/rat,
miRNA
Micro ribonucleic acid (microRNA, miRNA, μRNA) are small, single-stranded, non-coding RNA molecules containing 21–23 nucleotides. Found in plants, animals, and even some viruses, miRNAs are involved in RNA silencing and post-transcri ...
Gene Targets,
RNAi
RNA interference (RNAi) is a biological process in which RNA molecules are involved in sequence-specific suppression of gene expression by double-stranded RNA, through translational or transcriptional repression. Historically, RNAi was known b ...
products,
microRNA
Micro ribonucleic acid (microRNA, miRNA, μRNA) are small, single-stranded, non-coding RNA molecules containing 21–23 nucleotides. Found in plants, animals, and even some viruses, miRNAs are involved in RNA silencing and post-transcr ...
for human and/or mouse/rat orthologs,
Gene Editing,
Clones,
Cell Lines
An immortalised cell line is a population of cells from a multicellular organism that would normally not proliferate indefinitely but, due to mutation, have evaded normal cellular senescence and instead can keep undergoing division. The cells ...
, Animal models,
in situ
is a Latin phrase meaning 'in place' or 'on site', derived from ' ('in') and ' ( ablative of ''situs'', ). The term typically refers to the examination or occurrence of a process within its original context, without relocation. The term is use ...
hybridization assays.
# Pathways & Interactions: This section shows unified GeneCards pathways and interactions that are from different sources. Unified GeneCards pathways are collected into super-pathways, which displays the connection between different pathways. Interaction shows interactant and interaction details.
# Drugs & Compounds: This section connects GeneCards with drugs and compounds. Compounds show chemical compound, action and CAS number. DrugBank compound gives compound, synonyms,
CAS number (Chemical Abstracts Registry number), type (transporter/target/carrier/enzyme), actions and PubMed IDs. HMDB and Novoseek show the relationships of chemical compounds, which includes compound, synonyms, CAS number and PubMed IDs (articles related to the compound). BitterDB displays compound, CAS number and SMILES (
Simplified Molecular Input Line Entry Specification
The Simplified Molecular Input Line Entry System (SMILES) is a specification in the form of a line notation for describing the structure of chemical species using short ASCII strings. SMILES strings can be imported by most molecule editors ...
). PharmGKB gives drug/compound and its annotation.
# Transcripts: This section is consist of reference sequence mRNAs,
Unigene
UniGene was a NCBI database of the transcriptome and thus, despite the name, not primarily a database for genes. Each entry is a set of transcripts that appear to stem from the same transcription locus (i.e. gene or expressed pseudogene). Info ...
Cluster and representative Sequence, miRNA products, inhib.RNA products, Clone products, primer products and additional mRNA sequence. Also, the user can gain exon structure from GeneLoc.
# Expression: The left column shows the resources of the data. Expression images and data, similar genes, PCR arrays, primers for human and in situ hybridization assays are included in this section.
# Orthologs: This section gives orthologs for a particular gene from numbers of species. The table displays the corresponding organism, taxonomic classification, gene, description, human similarity, orthology type and details. It is connected to ENSEMBL Gene Tree, TreeFam Gene Tree, and
Aminode.
# Paralogs: This section displays paralogs and pseudogenes for a particular gene.
# Genomic Variants: The genomic variants show the result of NCBI SNPs/Variants, HapMap linkage disequilibrium report, structural variations, human gene mutation database(HGMD), QIAGEN SeqTarget long-range PCR primers in human, mouse &rat and SABiosciences cancer mutation PCR arrays. The table in this section shows SNP ID, Valid, Clinical significance, Chr pos, Sequence for genomic data, AAChg, Type and More for transcription related data,
Allele freq, Pop, Total sample and More for Allele Frequencies. For Valid, the different character represents different validation methods. ‘C’ means by-cluster; ‘A’ is by-2hit-2allele; ‘F’ is by-frequency; ‘H’ is by-hapmap and ‘O’ is by-other-pop. Clinical significance can be one of the following: non-pathogenic, pathogenic, drug-response, histocompatibility, probable-non-pathogenic, probable-pathogenic, untested, unknown and other. Type should be one of these: nonsynon, syn, cds, spl, utr, int, exc, loc, stg, ds500, spa, spd, us2k, us5k, PupaSUITE Designations.
# Disorders/Diseases: Shows disorders/diseases associated with the gene.
# Publications: Displays publications associated with the gene.
# External Searches: Searches more information in
PubMed
PubMed is an openly accessible, free database which includes primarily the MEDLINE database of references and abstracts on life sciences and biomedical topics. The United States National Library of Medicine (NLM) at the National Institute ...
,
OMIM
Online Mendelian Inheritance in Man (OMIM) is a continuously updated catalog of human genes and genetic disorders and traits, with a particular focus on the gene-phenotype relationship. , approximately 9,000 of the over 25,000 entries in OMIM ...
and NCBI.
# Genome Databases: Other Databases, and specialized Databases.
# Intellectual Property: This section gives patent information and licensable technologies.
# Products
Applications
GeneCards is used widely in the
biological
Biology is the scientific study of life and living organisms. It is a broad natural science that encompasses a wide range of fields and unifying principles that explain the structure, function, growth, origin, evolution, and distribution of ...
and biomedical fields. For example, S.H. Shah extracted data of early-onset
coronary artery disease
Coronary artery disease (CAD), also called coronary heart disease (CHD), or ischemic heart disease (IHD), is a type of cardiovascular disease, heart disease involving Ischemia, the reduction of blood flow to the cardiac muscle due to a build-up ...
from GeneCards to identify genes that contributes to the
disease
A disease is a particular abnormal condition that adversely affects the structure or function (biology), function of all or part of an organism and is not immediately due to any external injury. Diseases are often known to be medical condi ...
. Chromosome 3q13, 1q25 etc. are confirmed to take effects and this paper further discussed the relationship between morbid genes and
serum lipoproteins
A lipoprotein is a biochemical assembly whose primary function is to transport hydrophobic lipid (also known as fat) molecules in water, as in blood plasma or other extracellular fluids. They consist of a triglyceride and cholesterol center, sur ...
with the help of GeneCard.
Another example is a research study on
synthetic lethality
Synthetic lethality is defined as a type of genetic interaction where the combination of two genetic events results in cell death or death of an organism. Although the foregoing explanation is wider than this, it is common when referring to synthet ...
in
cancer
Cancer is a group of diseases involving Cell growth#Disorders, abnormal cell growth with the potential to Invasion (cancer), invade or Metastasis, spread to other parts of the body. These contrast with benign tumors, which do not spread. Po ...
. Synthetic lethality appears when a
mutation
In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, ...
in a single gene has no effect on the function of a
cell
Cell most often refers to:
* Cell (biology), the functional basic unit of life
* Cellphone, a phone connected to a cellular network
* Clandestine cell, a penetration-resistant form of a secret or outlawed organization
* Electrochemical cell, a de ...
but a mutation in an additional gene leads to cell death. This study aimed to find novel methods of treating cancer through blocking the lethality of drugs. GeneCards was used when comparing data of a given target gene with all possible genes. In this process, the annotation sharing score was calculated using GeneDecks Partner Hunter (now called Genes Like Me) to give paralogy. Inactivation targets were extracted after the microarray experiments of resistant and non-resistant
neuroblastoma
Neuroblastoma (NB) is a type of cancer that forms in certain types of nerve tissue. It most frequently starts from one of the adrenal glands but can also develop in the head, neck, chest, abdomen, or Vertebral column, spine. Symptoms may include ...
cell lines.
References
External links
*{{Official website, https://www.genecards.org/
Genome databases
Weizmann Institute of Science