Single-nucleotide variant
   HOME

TheInfoList



OR:

In
genetics Genetics is the study of genes, genetic variation, and heredity in organisms.Hartl D, Jones E (2005) It is an important branch in biology because heredity is vital to organisms' evolution. Gregor Mendel, a Moravian Augustinian friar wor ...
, a single-nucleotide polymorphism (SNP ; plural SNPs ) is a germline substitution of a single
nucleotide Nucleotides are organic molecules consisting of a nucleoside and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both of which are essential biomolecule ...
at a specific position in the
genome In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding g ...
. Although certain definitions require the substitution to be present in a sufficiently large fraction of the population (e.g. 1% or more), many publications do not apply such a frequency threshold. For example, at a specific base position in the human genome, the G nucleotide may appear in most individuals, but in a minority of individuals, the position is occupied by an A. This means that there is a SNP at this specific position, and the two possible nucleotide variations – G or A – are said to be the
allele An allele (, ; ; modern formation from Greek ἄλλος ''állos'', "other") is a variation of the same sequence of nucleotides at the same place on a long DNA molecule, as described in leading textbooks on genetics and evolution. ::"The chro ...
s for this specific position. SNPs pinpoint differences in our susceptibility to a wide range of
disease A disease is a particular abnormal condition that negatively affects the structure or function of all or part of an organism, and that is not immediately due to any external injury. Diseases are often known to be medical conditions that a ...
s, for example age-related macular degeneration (a common SNP in the CFH gene is associated with increased risk of the disease) or nonalcoholic fatty liver disease (a SNP in the PNPLA3 gene is associated with increased risk of the disease). The severity of illness and the way the body responds to treatments are also manifestations of genetic variations caused by SNPs. For example, the APOE E4 allele that is determined by two common SNPs, rs429358 and rs7412, in the
APOE Apolipoprotein E (APOE) is a protein involved in the metabolism of fats in the body of mammals. A subtype is implicated in Alzheimer's disease and cardiovascular disease. APOE belongs to a family of fat-binding proteins called apolipoproteins. ...
gene is not only associated with increased risk for Alzheimer’s disease but also younger age at onset of the disease. A single-nucleotide variant (SNV) is a general term for single nucleotide change in DNA sequence. So a SNV can be a common SNP or a rare mutation, and can be germline or somatic and can be caused by cancer, but a SNP has to segregate in a species' population of organisms. SNVs also commonly arise in molecular diagnostics such as designing PCR primers to detect viruses, in which the viral RNA or DNA sample may contain SNVs.


Types

Single-nucleotide polymorphisms may fall within coding sequences of
gene In biology, the word gene (from , ; "... Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or ''birth'' or ''gender'') can have several different meanings. The Mendelian gene is a b ...
s, non-coding regions of genes, or in the
intergenic region An intergenic region is a stretch of DNA sequences located between genes. Intergenic regions may contain functional elements and junk DNA. ''Inter''genic regions should not be confused with ''intra''genic regions (or introns), which are non-cod ...
s (regions between genes). SNPs within a coding sequence do not necessarily change the
amino acid Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although hundreds of amino acids exist in nature, by far the most important are the alpha-amino acids, which comprise proteins. Only 22 alpha a ...
sequence of the
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, res ...
that is produced, due to degeneracy of the genetic code. SNPs in the coding region are of two types: synonymous SNPs and nonsynonymous SNPs. Synonymous SNPs do not affect the protein sequence, while nonsynonymous SNPs change the amino acid sequence of protein. * SNPs in
non-coding region Non-coding DNA (ncDNA) sequences are components of an organism's DNA that do not encode protein sequences. Some non-coding DNA is transcribed into functional non-coding RNA molecules (e.g. transfer RNA, microRNA, piRNA, ribosomal RNA, and regul ...
s can manifest in a higher risk of cancer, and may affect mRNA structure and disease susceptibility. Non-coding SNPs can also alter the level of
expression Expression may refer to: Linguistics * Expression (linguistics), a word, phrase, or sentence * Fixed expression, a form of words with a specific meaning * Idiom, a type of fixed expression * Metaphorical expression, a particular word, phrase, o ...
of a gene, as an
eQTL Expression quantitative trait loci (eQTLs) are genomic loci that explain variation in expression levels of mRNAs. Distant and local, trans- and cis-eQTLs, respectively An expression quantitative trait is an amount of an mRNA transcript or a pr ...
(expression quantitative trait locus). * SNPs in coding regions: **
synonymous substitution A synonymous substitution (often called a ''silent'' substitution though they are not always silent) is the evolutionary substitution of one base for another in an exon of a gene coding for a protein, such that the produced amino acid sequence i ...
s by definition do not result in a change of amino acid in the protein, but still can affect its function in other ways. An example would be a seemingly silent mutation in the multidrug resistance gene 1 (
MDR1 P-glycoprotein 1 (permeability glycoprotein, abbreviated as P-gp or Pgp) also known as multidrug resistance protein 1 (MDR1) or ATP-binding cassette sub-family B member 1 (ABCB1) or cluster of differentiation 243 (CD243) is an important protein ...
), which codes for a cellular membrane pump that expels drugs from the cell, can slow down translation and allow the peptide chain to fold into an unusual conformation, causing the mutant pump to be less functional (in MDR1 protein e.g. C1236T polymorphism changes a GGC codon to GGT at amino acid position 412 of the polypeptide (both encode glycine) and the C3435T polymorphism changes ATC to ATT at position 1145 (both encode isoleucine)). **
nonsynonymous substitution A nonsynonymous substitution is a nucleotide mutation that alters the amino acid sequence of a protein. Nonsynonymous substitutions differ from synonymous substitutions, which do not alter amino acid sequences and are (sometimes) silent mutations. ...
s: ***
missense In genetics, a missense mutation is a point mutation in which a single nucleotide change results in a codon that codes for a different amino acid. It is a type of nonsynonymous substitution. Substitution of protein from DNA mutations Missense mu ...
– single change in the base results in change in amino acid of protein and its malfunction which leads to disease (e.g. c.1580G>T SNP in
LMNA Pre-lamin A/C or lamin A/C is a protein that in humans is encoded by the ''LMNA'' gene. Lamin A/C belongs to the lamin family of proteins. Function In the setting of ZMPSTE24 deficiency, the final step of lamin processing does not occur, re ...
gene – position 1580 (nt) in the DNA sequence (CGT codon) causing the
guanine Guanine () ( symbol G or Gua) is one of the four main nucleobases found in the nucleic acids DNA and RNA, the others being adenine, cytosine, and thymine (uracil in RNA). In DNA, guanine is paired with cytosine. The guanine nucleoside is c ...
to be replaced with the
thymine Thymine () ( symbol T or Thy) is one of the four nucleobases in the nucleic acid of DNA that are represented by the letters G–C–A–T. The others are adenine, guanine, and cytosine. Thymine is also known as 5-methyluracil, a pyrimidi ...
, yielding CTT codon in the DNA sequence, results at the protein level in the replacement of the arginine by the
leucine Leucine (symbol Leu or L) is an essential amino acid that is used in the biosynthesis of proteins. Leucine is an α-amino acid, meaning it contains an α- amino group (which is in the protonated −NH3+ form under biological conditions), an α- ...
in the position 527, at the
phenotype In genetics, the phenotype () is the set of observable characteristics or traits of an organism. The term covers the organism's morphology or physical form and structure, its developmental processes, its biochemical and physiological pr ...
level this manifests in overlapping mandibuloacral dysplasia and progeria syndrome) ***
nonsense Nonsense is a communication, via speech, writing, or any other symbolic system, that lacks any coherent meaning. Sometimes in ordinary usage, nonsense is synonymous with absurdity or the ridiculous. Many poets, novelists and songwriters have u ...
point mutation A point mutation is a genetic mutation where a single nucleotide base is changed, inserted or deleted from a DNA or RNA sequence of an organism's genome. Point mutations have a variety of effects on the downstream protein product—consequence ...
in a sequence of DNA that results in a premature
stop codon In molecular biology (specifically protein biosynthesis), a stop codon (or termination codon) is a codon (nucleotide triplet within messenger RNA) that signals the termination of the translation process of the current protein. Most codons in mess ...
, or a ''nonsense codon'' in the transcribed
mRNA In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of synthesizing a protein. mRNA is created during the ...
, and in a truncated, incomplete, and usually nonfunctional protein product (e.g. Cystic fibrosis caused by the G542X mutation in the
cystic fibrosis transmembrane conductance regulator Cystic fibrosis transmembrane conductance regulator (CFTR) is a membrane protein and anion channel in vertebrates that is encoded by the ''CFTR'' gene. Geneticist Lap-Chee Tsui and his team identified the CFTR gene in 1989 as the gene linked wi ...
gene). SNPs that are not in protein-coding regions may still affect gene splicing,
transcription factor In molecular biology, a transcription factor (TF) (or sequence-specific DNA-binding factor) is a protein that controls the rate of transcription of genetic information from DNA to messenger RNA, by binding to a specific DNA sequence. The f ...
binding, messenger RNA degradation, or the sequence of noncoding RNA. Gene expression affected by this type of SNP is referred to as an eSNP (expression SNP) and may be upstream or downstream from the gene.


Frequency

More than 335 million SNPs have been found across humans from multiple populations. A typical genome differs from the reference human genome at 4 to 5 million sites, most of which (more than 99.9%) consist of SNPs and short
indel Indel is a molecular biology term for an insertion or deletion of bases in the genome of an organism. It is classified among small genetic variations, measuring from 1 to 10 000 base pairs in length, including insertion and deletion events that ...
s.


Within a genome

The genomic distribution of SNPs is not homogenous; SNPs occur in non-coding regions more frequently than in coding regions or, in general, where natural selection is acting and "fixing" the
allele An allele (, ; ; modern formation from Greek ἄλλος ''állos'', "other") is a variation of the same sequence of nucleotides at the same place on a long DNA molecule, as described in leading textbooks on genetics and evolution. ::"The chro ...
(eliminating other variants) of the SNP that constitutes the most favorable genetic adaptation. Other factors, like genetic recombination and mutation rate, can also determine SNP density. SNP density can be predicted by the presence of
microsatellites A microsatellite is a tract of repetitive DNA in which certain DNA motifs (ranging in length from one to six or more base pairs) are repeated, typically 5–50 times. Microsatellites occur at thousands of locations within an organism's genome. ...
: AT microsatellites in particular are potent predictors of SNP density, with long (AT)(n) repeat tracts tending to be found in regions of significantly reduced SNP density and low
GC content In molecular biology and genetics, GC-content (or guanine-cytosine content) is the percentage of nitrogenous bases in a DNA or RNA molecule that are either guanine (G) or cytosine (C). This measure indicates the proportion of G and C bases out ...
.


Within a population

There are variations between human populations, so a SNP allele that is common in one geographical or ethnic group may be much rarer in another. However, this pattern of variation is relatively rare; in a global sample of 67.3 million SNPs, the
Human Genome Diversity Project The Human Genome Diversity Project (HGDP) was started by Stanford University's Morrison Institute in 1990s along with collaboration of scientists around the world. It is the result of many years of work by Luigi Cavalli-Sforza, one of the most ci ...
"found no such private variants that are
fixed Fixed may refer to: * ''Fixed'' (EP), EP by Nine Inch Nails * ''Fixed'', an upcoming 2D adult animated film directed by Genndy Tartakovsky * Fixed (typeface), a collection of monospace bitmap fonts that is distributed with the X Window System * ...
in a given continent or major region. The highest frequencies are reached by a few tens of variants present at >70% (and a few thousands at >50%) in Africa, the Americas, and Oceania. By contrast, the highest frequency variants private to Europe, East Asia, the Middle East, or Central and South Asia reach just 10 to 30%." Within a population, SNPs can be assigned a
minor allele frequency Minor allele frequency (MAF) is the frequency at which the ''second most common'' allele occurs in a given population. They play a surprising role in heritability since MAF variants which occur only once, known as "singletons", drive an enormous am ...
—the lowest allele frequency at a
locus Locus (plural loci) is Latin for "place". It may refer to: Entertainment * Locus (comics), a Marvel Comics mutant villainess, a member of the Mutant Liberation Front * ''Locus'' (magazine), science fiction and fantasy magazine ** ''Locus Award' ...
that is observed in a particular population. This is simply the lesser of the two allele frequencies for single-nucleotide polymorphisms. With this knowledge scientists have developed new methods in analyzing population structures in less studied species. By using pooling techniques the cost of the analysis is significantly lowered. These techniques are based on sequencing a population in a pooled sample instead of sequencing every individual within the population by itself. With new bioinformatics tools there is a possibility of investigating population structure, gene flow and gene migration by observing the allele frequencies within the entire population. With these protocols there is a possibility in combining the advantages of SNPs with micro satellite markers. However, there are information lost in the process such as linkage disequilibrium and zygosity information.


Applications

* Association studies can determine whether a genetic variant is associated with a disease or trait. * A tag SNP is a representative single-nucleotide polymorphism in a region of the genome with high
linkage disequilibrium In population genetics, linkage disequilibrium (LD) is the non-random association of alleles at different loci in a given population. Loci are said to be in linkage disequilibrium when the frequency of association of their different alleles is h ...
(the non-random association of alleles at two or more loci). Tag SNPs are useful in whole-genome SNP association studies, in which hundreds of thousands of SNPs across the entire genome are genotyped. * Haplotype mapping: sets of alleles or DNA sequences can be clustered so that a single SNP can identify many linked SNPs. *
Linkage disequilibrium In population genetics, linkage disequilibrium (LD) is the non-random association of alleles at different loci in a given population. Loci are said to be in linkage disequilibrium when the frequency of association of their different alleles is h ...
(LD), a term used in population genetics, indicates non-random association of alleles at two or more loci, not necessarily on the same chromosome. It refers to the phenomenon that SNP allele or DNA sequence that are close together in the genome tend to be inherited together. LD can be affected by two parameters (among other factors, such as population stratification): 1) The distance between the SNPs
he larger the distance, the lower the LD He or HE may refer to: Language * He (pronoun), an English pronoun * He (kana), the romanization of the Japanese kana へ * He (letter), the fifth letter of many Semitic alphabets * He (Cyrillic), a letter of the Cyrillic script called ''He'' ...
2) Recombination rate
he lower the recombination rate, the higher the LD He or HE may refer to: Language * He (pronoun), an English pronoun * He (kana), the romanization of the Japanese kana へ * He (letter), the fifth letter of many Semitic alphabets * He (Cyrillic), a letter of the Cyrillic script called ''He'' ...
* In genetic epidemiology SNPs are used to estimate transmission clusters.


Importance

Variations in the DNA sequences of humans can affect how humans develop
disease A disease is a particular abnormal condition that negatively affects the structure or function of all or part of an organism, and that is not immediately due to any external injury. Diseases are often known to be medical conditions that a ...
s and respond to
pathogen In biology, a pathogen ( el, πάθος, "suffering", "passion" and , "producer of") in the oldest and broadest sense, is any organism or agent that can produce disease. A pathogen may also be referred to as an infectious agent, or simply a germ ...
s,
chemical A chemical substance is a form of matter having constant chemical composition and characteristic properties. Some references add that chemical substance cannot be separated into its constituent elements by physical separation methods, i.e., w ...
s, drugs,
vaccine A vaccine is a biological preparation that provides active acquired immunity to a particular infectious or malignant disease. The safety and effectiveness of vaccines has been widely studied and verified.
s, and other agents. SNPs are also critical for
personalized medicine Personalized medicine, also referred to as precision medicine, is a medical model that separates people into different groups—with medical decisions, practices, interventions and/or products being tailored to the individual patient based on the ...
. Examples include biomedical research, forensics, pharmacogenetics, and disease causation, as outlined below.


Clinical research

Genome-wide association study (GWAS) One of main contributions of SNPs in clinical research is genome-wide association study (GWAS). Genome-wide genetic data can be generated by multiple technologies, including SNP array and whole genome sequencing. GWAS has been commonly used in identifying SNPs associated with diseases or clinical phenotypes or traits. Since GWAS is a genome-wide assessment, a large sample site is required to obtain sufficient statistical power to detect all possible associations. Some SNPs have relatively small effect on diseases or clinical phenotypes or traits. To estimate study power, the genetic model for disease needs to be considered, such as dominant, recessive, or additive effects. Due to genetic heterogeneity, GWAS analysis must be adjusted for race. Candidate gene association study Candidate gene association study is commonly used in genetic study before the invention of high throughput genotyping or sequencing technologies. Candidate gene association study is to investigate limited number of pre-specified SNPs for association with diseases or clinical phenotypes or traits. So this is a hypothesis driven approach. Since only a limited number of SNPs are tested, a relatively small sample size is sufficient to detect the association. Candidate gene association approach is also commonly used to confirm findings from GWAS in independent samples. Homozygosity mapping in disease Genome-wide SNP data can be used for homozygosity mapping. Homozygosity mapping is a method used to identify homozygous autosomal recessive loci, which can be a powerful tool to map genomic regions or genes that are involved in disease pathogenesis.


Forensic sciences

SNPs have historically been used to match a forensic DNA sample to a suspect but has been made obsolete due to advancing STR-based
DNA fingerprinting DNA profiling (also called DNA fingerprinting) is the process of determining an individual's DNA characteristics. DNA analysis intended to identify a species, rather than an individual, is called DNA barcoding. DNA profiling is a forensic tec ...
techniques. However, the development of next-generation-sequencing (NGS) technology may allow for more opportunities for the use of SNPs in phenotypic clues such as ethnicity, hair color, and eye color with a good probability of a match. This can additionally be applied to increase the accuracy of facial reconstructions by providing information that may otherwise be unknown, and this information can be used to help identify suspects even without a STR
DNA profile DNA profiling (also called DNA fingerprinting) is the process of determining an individual's DNA characteristics. DNA analysis intended to identify a species, rather than an individual, is called DNA barcoding. DNA profiling is a forensic tec ...
match. Some cons to using SNPs versus STRs is that SNPs yield less information than STRs, and therefore more SNPs are needed for analysis before a profile of a suspect is able to be created. Additionally, SNPs heavily rely on the presence of a database for comparative analysis of samples. However, in instances with degraded or small volume samples, SNP techniques are an excellent alternative to STR methods. SNPs (as opposed to STRs) have an abundance of potential markers, can be fully automated, and a possible reduction of required fragment length to less than 100bp. 6


Pharmacogenetics

Pharmacogenetics focuses on identifying genetic variations including SNPs associated with differential responses to treatment. Many drug metabolizing enzymes, drug targets, or target pathways can be influenced by SNPs. The SNPs involved in drug metabolizing enzyme activities can change drug pharmacokinetics, while the SNPs involved in drug target or its pathway can change drug pharmacodynamics. Therefore, SNPs are potential genetic markers that can be used to predict drug exposure or effectiveness of the treatment. Genome-wide pharmacogenetic study is called pharmacogenomics. Pharmacogenetics and pharmacogenomics are important in the development of precision medicine, especially for life threatening diseases such as cancers.


Disease

Only small amount of SNPs in the human genome may have impact on human diseases. Large scale GWAS has been done for the most important human diseases, including heart diseases, metabolic diseases, autoimmune diseases, and neurodegenerative and psychiatric disorders. Most of the SNPs with relatively large effects on these diseases have been identified. These findings have significantly improved understanding of disease pathogenesis and molecular pathways, and facilitated development of better treatment. Further GWAS with larger samples size will reveal the SNPs with relatively small effect on diseases. For common and complex diseases, such as type-2 diabetes, rheumatoid arthritis, and Alzheimer’s disease, multiple genetic factors are involved in disease etiology. In addition, gene-gene interaction and gene-environment interaction also play an important role in disease initiation and progression.


Examples

* rs6311 and rs6313 are SNPs in the Serotonin 5-HT2A receptor gene on human chromosome 13. * The SNP − 3279C/A (rs3761548) is amongst the SNPs locating in the promoter region of the
Foxp3 FOXP3 ( forkhead box P3), also known as scurfin, is a protein involved in immune system responses. A member of the FOX protein family, FOXP3 appears to function as a master regulator of the regulatory pathway in the development and function of ...
gene, might be involved in cancer progression. * A SNP in the '' F5'' gene causes Factor V Leiden thrombophilia. *
rs3091244 RS3 or RS-3 may refer to: Vehicles Automobiles * Audi RS3, a 2011–present German compact performance car * Baojun RS-3, a 2019–present Chinese subcompact SUV Other * RS3 (sail), a windsurfing sail * ALCO RS-3, diesel locomotive built by Ame ...
is an example of a triallelic SNP in the CRP gene on human chromosome 1. *
TAS2R38 Taste receptor 2 member 38 is a protein that in humans is encoded by the ''TAS2R38'' gene. TAS2R38 is a bitter taste receptor; varying genotypes of ''TAS2R38'' influence the ability to taste both 6-''n''-propylthiouracil (PROP) and phenylthioc ...
codes for PTC tasting ability, and contains 6 annotated SNPs. * rs148649884 and rs138055828 in the '' FCN1'' gene encoding M-ficolin crippled the ligand-binding capability of the recombinant M-ficolin. * An intronic SNP in
DNA mismatch repair DNA mismatch repair (MMR) is a system for recognizing and repairing erroneous insertion, deletion, and mis-incorporation of bases that can arise during DNA replication and recombination, as well as repairing some forms of DNA damage. Mismatch ...
gene ''
PMS2 Mismatch repair endonuclease PMS2 is an enzyme that in humans is encoded by the ''PMS2'' gene. Function This gene is one of the PMS2 gene family members which are found in clusters on chromosome 7. Human PMS2 related genes are located at bands ...
'' (rs1059060, Ser775Asn) is associated with increased sperm
DNA damage DNA repair is a collection of processes by which a cell identifies and corrects damage to the DNA molecules that encode its genome. In human cells, both normal metabolic activities and environmental factors such as radiation can cause DNA d ...
and risk of male infertility.


Databases

As there are for genes, bioinformatics databases exist for SNPs. * ''
dbSNP The Single Nucleotide Polymorphism Database (dbSNP) is a free public archive for genetic variation within and across different species developed and hosted by the National Center for Biotechnology Information (NCBI) in collaboration with the Natio ...
'' is a SNP database from the
National Center for Biotechnology Information The National Center for Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). It is approved and funded by the government of the United States. The ...
(NCBI). , dbSNP listed 149,735,377 SNPs in humans. *
Kaviar
' is a compendium of SNPs from multiple data sources including dbSNP. * ''
SNPedia SNPedia (pronounced "snipedia") is a wiki-based bioinformatics web site that serves as a database of single nucleotide polymorphisms (SNPs). Each article on a SNP provides a short description, links to scientific articles and personal genomics web ...
'' is a wiki-style database supporting personal genome annotation, interpretation and analysis. * The ''
OMIM Online Mendelian Inheritance in Man (OMIM) is a continuously updated catalog of human genes and genetic disorders and traits, with a particular focus on the gene-phenotype relationship. , approximately 9,000 of the over 25,000 entries in OMIM ...
'' database describes the association between polymorphisms and diseases (e.g., gives diseases in text form) * dbSAP – single amino-acid polymorphism database for protein variation detection * The Human Gene Mutation Database provides gene mutations causing or associated with human inherited diseases and functional SNPs * The
International HapMap Project The International HapMap Project was an organization that aimed to develop a haplotype map (HapMap) of the human genome, to describe the common patterns of human genetic variation. HapMap is used to find genetic variants affecting health, disease ...
, where researchers are identifying
Tag SNP A tag SNP is a representative single nucleotide polymorphism (SNP) in a region of the genome with high linkage disequilibrium that represents a group of SNPs called a haplotype. It is possible to identify genetic variation and association to phenot ...
s to be able to determine the collection of haplotypes present in each subject. *
GWAS Central GWAS Central (previously HGBASE, HGVbase and HGVbaseG2P) is a publicly available database of summary-level findings from genetic association studies in humans, including genome-wide association studies (GWAS). It is funded through the GEN2PHEN ...
allows users to visually interrogate the actual summary-level association data in one or more
genome-wide association studies In genomics, a genome-wide association study (GWA study, or GWAS), also known as whole genome association study (WGA study, or WGAS), is an observational study of a genome-wide set of genetic variants in different individuals to see if any varian ...
. The International SNP Map working group mapped the sequence flanking each SNP by alignment to the genomic sequence of large-insert clones in Genebank. These alignments were converted to chromosomal coordinates that is shown in Table 1. This list has greatly increased since, with, for instance, the Kaviar database now listing 162 million single nucleotide variants (SNVs).


Nomenclature

The nomenclature for SNPs include several variations for an individual SNP, while lacking a common consensus. The rs### standard is that which has been adopted by
dbSNP The Single Nucleotide Polymorphism Database (dbSNP) is a free public archive for genetic variation within and across different species developed and hosted by the National Center for Biotechnology Information (NCBI) in collaboration with the Natio ...
and uses the prefix "rs", for "reference SNP", followed by a unique and arbitrary number. SNPs are frequently referred to by their dbSNP rs number, as in the examples above. The Human Genome Variation Society (HGVS) uses a standard which conveys more information about the SNP. Examples are: * c.76A>T: "c." for coding region, followed by a number for the position of the nucleotide, followed by a one-letter abbreviation for the nucleotide (A, C, G, T or U), followed by a greater than sign (">") to indicate substitution, followed by the abbreviation of the nucleotide which replaces the former * p.Ser123Arg: "p." for protein, followed by a three-letter abbreviation for the amino acid, followed by a number for the position of the amino acid, followed by the abbreviation of the amino acid which replaces the former.


SNP analysis

SNPs can be easily assayed due to only containing two possible
allele An allele (, ; ; modern formation from Greek ἄλλος ''állos'', "other") is a variation of the same sequence of nucleotides at the same place on a long DNA molecule, as described in leading textbooks on genetics and evolution. ::"The chro ...
s and three possible genotypes involving the two alleles:
homozygous Zygosity (the noun, zygote, is from the Greek "yoked," from "yoke") () is the degree to which both copies of a chromosome or gene have the same genetic sequence. In other words, it is the degree of similarity of the alleles in an organism. Mo ...
A, homozygous B and
heterozygous Zygosity (the noun, zygote, is from the Greek "yoked," from "yoke") () is the degree to which both copies of a chromosome or gene have the same genetic sequence. In other words, it is the degree of similarity of the alleles in an organism. Mo ...
AB, leading to many possible techniques for analysis. Some include: DNA sequencing;
capillary electrophoresis Capillary electrophoresis (CE) is a family of electrokinetic separation methods performed in submillimeter diameter capillaries and in micro- and nanofluidic channels. Very often, CE refers to capillary zone electrophoresis (CZE), but other elect ...
; mass spectrometry; single-strand conformation polymorphism (SSCP); single base extension; electrochemical analysis; denaturating HPLC and gel electrophoresis; restriction fragment length polymorphism; and hybridization analysis.


Programs for prediction of SNP effects

An important group of SNPs are those that corresponds to
missense mutations In genetics, a missense mutation is a point mutation in which a single nucleotide change results in a codon that codes for a different amino acid. It is a type of nonsynonymous substitution. Substitution of protein from DNA mutations Missense m ...
causing amino acid change on protein level.
Point mutation A point mutation is a genetic mutation where a single nucleotide base is changed, inserted or deleted from a DNA or RNA sequence of an organism's genome. Point mutations have a variety of effects on the downstream protein product—consequence ...
of particular residue can have different effect on protein function (from no effect to complete disruption its function). Usually, change in amino acids with similar size and physico-chemical properties (e.g. substitution from leucine to valine) has mild effect, and opposite. Similarly, if SNP disrupts secondary structure elements (e.g. substitution to proline in alpha helix region) such mutation usually may affect whole protein structure and function. Using those simple and many other
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
derived rules a group of programs for the prediction of SNP effect was developed:
SIFT
This program provides insight into how a laboratory induced missense or nonsynonymous mutation will affect protein function based on physical properties of the amino acid and sequence homology.
LIST
(Local Identity and Shared Taxa) estimates the potential deleteriousness of mutations resulted from altering their protein functions. It is based on the assumption that variations observed in closely related species are more significant when assessing conservation compared to those in distantly related species.
SNAP2



PolyPhen-2

PredictSNP
* MutationTaster
official website


from the
Ensembl Ensembl genome database project is a scientific project at the European Bioinformatics Institute, which provides a centralized resource for geneticists, molecular biologists and other researchers studying the genomes of our own species and other v ...
project
SNPViz
ref>
This program provides a 3D representation of the protein affected, highlighting the amino acid change so doctors can determine pathogenicity of the mutant protein.
PROVEAN

PhyreRisk
is a database which maps variants to experimental and predicted protein structures.
Missense3D
is a tool which provides a stereochemical report on the effect of missense variants on protein structure.


See also

* Affymetrix *
HapMap The International HapMap Project was an organization that aimed to develop a haplotype map (HapMap) of the human genome, to describe the common patterns of human genetic variation. HapMap is used to find genetic variants affecting health, disease a ...
* Illumina *
International HapMap Project The International HapMap Project was an organization that aimed to develop a haplotype map (HapMap) of the human genome, to describe the common patterns of human genetic variation. HapMap is used to find genetic variants affecting health, disease ...
* Short tandem repeat (STR) * Single-base extension * SNP array * SNP genotyping *
SNPedia SNPedia (pronounced "snipedia") is a wiki-based bioinformatics web site that serves as a database of single nucleotide polymorphisms (SNPs). Each article on a SNP provides a short description, links to scientific articles and personal genomics web ...
* Snpstr * SNV calling from NGS data * Suspension array technology *
Tag SNP A tag SNP is a representative single nucleotide polymorphism (SNP) in a region of the genome with high linkage disequilibrium that represents a group of SNPs called a haplotype. It is possible to identify genetic variation and association to phenot ...
* TaqMan * Variome


References


Further reading

*
Human Genome Project Information
— SNP Fact Sheet


External links



– Introduction to SNPs from NCBI
The SNP Consortium LTD
– SNP search
NCBI dbSNP database
– "a central repository for both single base nucleotide substitutions and short deletion and insertion polymorphisms"
HGMD
– the Human Gene Mutation Database, includes rare mutations and functional SNPs
GWAS Central
– a central database of summary-level genetic association findings
1000 Genomes Project
– A Deep Catalog of Human Genetic Variation
WatCut
– an online tool for the design of SNP-RFLP assays
SNPStats
– SNPStats, a web tool for analysis of genetic association studies
Restriction HomePage
– a set of tools for DNA restriction and SNP detection, including design of mutagenic primers
American Association for Cancer Research Cancer Concepts Factsheet on SNPs

PharmGKB
– The Pharmacogenetics and Pharmacogenomics Knowledge Base, a resource for SNPs associated with drug response and disease outcomes.
GEN-SNiP
– Online tool that identifies polymorphisms in test DNA sequences.
Rules for Nomenclature of Genes, Genetic Markers, Alleles, and Mutations in Mouse and Rat



SNP effect predictor with galaxy integration

Open SNP
– a portal for sharing own SNP test results

– SNP database for protein variation detection {{DEFAULTSORT:Single-Nucleotide Polymorphism Molecular biology Population genetics DNA Genetic genealogy Single-nucleotide polymorphisms, * Biotechnology Mutation