A haplotype (haploid genotype) is a group of alleles in an organism
that are inherited together from a single parent. However, there
are other uses of this term. First, it is used to mean a collection of
specific alleles (that is, specific
DNA sequences) in a cluster of
tightly linked genes on a chromosome that are likely to be inherited
together—that is, they are likely to be conserved as a sequence that
survives the descent of many generations of reproduction. A
second use is to mean a set of linked single-nucleotide polymorphism
(SNP) alleles that tend to always occur together (i.e., that are
associated statistically). It is thought that identifying these
statistical associations and few alleles of a specific haplotype
sequence can facilitate identifying all other such polymorphic sites
that are nearby on the chromosome. Such information is critical for
investigating the genetics of common diseases; which in fact have been
investigated in humans by the International HapMap Project.
Thirdly, many human genetic testing companies use the term in a third
way: to refer to an individual collection of specific mutations within
a given genetic segment; (see short tandem repeat mutation).
The term 'haplogroup' refers to the SNP/unique-event polymorphism
(UEP) mutations that represent the clade to which a collection of
particular human haplotypes belong. (
Clade here refers to a set of
haplotypes sharing a common ancestor.) A haplogroup is a group of
similar haplotypes that share a common ancestor with a
single-nucleotide polymorphism mutation. Mitochondrial DNA
passes along a maternal lineage that can date back thousands of
DNA haplotypes from genealogical
2.1 UEP results (SNP results)
4 See also
7 External links
An organism's genotype may not define its haplotype uniquely. For
example, consider a diploid organism and two bi-allelic loci (such as
SNPs) on the same chromosome. Assume the first locus has alleles A or
T and the second locus G or C. Both loci, then, have three possible
genotypes: (AA, AT, and TT) and (GG, GC, and CC), respectively. For a
given individual, there are nine possible configurations (haplotypes)
at these two loci (shown in the
Punnett square below). For individuals
who are homozygous at one or both loci, the haplotypes are unambiguous
- meaning that there is not any differentiation of haplotype T1T2 vs
haplotype T2T1; where T1 and T2 are labeled to show that they are the
same locus, but labeled as such to show it doesn't matter which order
you consider them in, the end result is two T loci. For individuals
heterozygous at both loci, the gametic phase is ambiguous - in these
cases, you don't know which haplotype you have, e.g., TA vs AT.
The only unequivocal method of resolving phase ambiguity is by
sequencing. However, it is possible to estimate the probability of a
particular haplotype when phase is ambiguous using a sample of
Given the genotypes for a number of individuals, the haplotypes can be
inferred by haplotype resolution or haplotype phasing techniques.
These methods work by applying the observation that certain haplotypes
are common in certain genomic regions. Therefore, given a set of
possible haplotype resolutions, these methods choose those that use
fewer different haplotypes overall. The specifics of these methods
vary - some are based on combinatorial approaches (e.g., parsimony),
whereas others use likelihood functions based on different models and
assumptions such as the Hardy-Weinberg principle, the coalescent
theory model, or perfect phylogeny. The parameters in these models are
then estimated using algorithms such as the expectation-maximization
Markov chain Monte Carlo
Markov chain Monte Carlo (MCMC), or hidden Markov
Microfluidic whole genome haplotyping
Microfluidic whole genome haplotyping is a technique for the physical
separation of individual chromosomes from a metaphase cell followed by
direct resolution of the haplotype for each allele.
DNA haplotypes from genealogical
Main article: Genealogical
Unlike other chromosomes, Y chromosomes generally do not come in
pairs. Every human male (excepting those with XYY syndrome) has only
one copy of that chromosome. This means that there is not any chance
variation of which copy is inherited, and also (for most of the
chromosome) not any shuffling between copies by recombination; so,
unlike autosomal haplotypes, there is effectively not any
randomisation of the Y-chromosome haplotype between generations. A
human male should largely share the same Y chromosome as his father,
give or take a few mutations; thus Y chromosomes tend to pass largely
intact from father to son, with a small but accumulating number of
mutations that can serve to differentiate male lineages. In
particular, the Y-
DNA represented as the numbered results of a Y-DNA
DNA test should match, except for mutations.
UEP results (SNP results)
Unique-event polymorphisms (UEPs) such as SNPs represent haplogroups.
STRs represent haplotypes. The results that comprise the full Y-DNA
haplotype from the Y chromosome
DNA test can be divided into two
parts: the results for UEPs, sometimes loosely called the SNP results
as most UEPs are single-nucleotide polymorphisms, and the results for
microsatellite short tandem repeat sequences (Y-STRs).
The UEP results represent the inheritance of events it is believed can
be assumed to have happened only once in all human history. These can
be used to identify the individual's Y-
DNA haplogroup, his place in
the "family tree" of the whole of humanity. Different Y-DNA
haplogroups identify genetic populations that are often distinctly
associated with particular geographic regions; their appearance in
more recent populations located in different regions represents the
migrations tens of thousands of years ago of the direct patrilineal
ancestors of current individuals.
Genetic results also include the
Y-STR haplotype, the set of results
Y-STR markers tested.
Unlike the UEPs, the Y-STRs mutate much more easily, which allows them
to be used to distinguish recent genealogy. But it also means that,
rather than the population of descendants of a genetic event all
sharing the same result, the
Y-STR haplotypes are likely to have
spread apart, to form a cluster of more or less similar results.
Typically, this cluster will have a definite most probable center, the
modal haplotype (presumably similar to the haplotype of the original
founding event), and also a haplotype diversity — the degree to
which it has become spread out. The further in the past the defining
event occurred, and the more that subsequent population growth
occurred early, the greater the haplotype diversity will be for a
particular number of descendants. However, if the haplotype diversity
is smaller for a particular number of descendants, this may indicate a
more recent common ancestor, or a recent population expansion.
It is important to note that, unlike for UEPs, two individuals with a
Y-STR haplotype may not necessarily share a similar ancestry.
Y-STR events are not unique. Instead, the clusters of
results inherited from different events and different histories tend
In most cases, it is a long time since the haplogroups' defining
events, so typically the cluster of
Y-STR haplotype results associated
with descendents of that event has become rather broad. These results
will tend to significantly overlap the (similarly broad) clusters of
Y-STR haplotypes associated with other haplogroups. This makes it
impossible for researchers to predict with absolute certainty to which
DNA haplogroup a
Y-STR haplotype would point. If the UEPs are not
tested, the Y-STRs may be used only to predict probabilities for
haplogroup ancestry, but not certainties.
A similar scenario exists in trying to evaluate whether shared
surnames indicate shared genetic ancestry. A cluster of similar Y-STR
haplotypes may indicate a shared common ancestor, with an identifiable
modal haplotype, but only if the cluster is sufficiently distinct from
what may have happened by chance from different individuals who
historically adopted the same name independently. Many names were
adopted from common occupations, for instance, or were associated with
habitation of particular sites. More extensive haplotype typing is
needed to establish genetic genealogy. Commercial DNA-testing
companies now offer their customers testing of more numerous sets of
markers to improve definition of their genetic ancestry. The number of
sets of markers tested has increased from 12 during the early years to
111 more recently.
Establishing plausible relatedness between different surnames
data-mined from a database is significantly more difficult. The
researcher must establish that the very nearest member of the
population in question, chosen purposely from the population for that
reason, would be unlikely to match by accident. This is more than
establishing that a randomly selected member of the population is
unlikely to have such a close match by accident. Because of the
difficulty, establishing relatedness between different surnames as in
such a scenario is likely to be impossible, except in special cases
where there is specific information to drastically limit the size of
the population of candidates under consideration.
Haplotype diversity is a measure of the uniqueness of a particular
haplotype in a given population. The haplotype diversity (H) is
displaystyle H= frac N N-1 (1-sum _ i x_ i ^ 2 )
displaystyle x_ i
is the (relative) haplotype frequency of each haplotype in the sample
is the sample size.
Haplotype diversity is given for each sample.
International HapMap Project
FAMHAP — FAMHAP is a software for single-marker analysis and, in
particular, joint analysis of unphased genotype data from tightly
linked markers (haplotype analysis).
Fugue — EM based haplotype estimation and association tests in
unrelated and nuclear families.
HPlus — A software package for imputation and testing of
haplotypes in association studies using a modified method that
incorporates the expectation-maximization algorithm and a Bayesian
method known as progressive ligation.
HaploBlockFinder — A software package for analyses of haplotype
Haploscribe — Reconstruction of whole-chromosome haplotypes
based on all genotyped positions in a nuclear family, including rare
Haploview — Visualisation of linkage disequilibrium, haplotype
estimation and haplotype tagging (Homepage).
Haplotype analysis software -
Haplotype Trend Regression
(HTR), haplotypic association tests, and haplotype frequency
estimation using both the expectation-maximization (EM) algorithm and
composite haplotype method (CHM).
PHASE — A software for haplotype reconstruction, and recombination
rate estimation from population data.
SHAPEIT — SHAPEIT2 is a program for haplotype estimation of SNP
genotypes in large cohorts across whole chromosome.
SNPHAP — EM based software for estimating haplotype frequencies from
WHAP — haplotype based association analysis.
^ By C. Barry Cox, Peter D. Moore, Richard Ladle. Wiley-Blackwell,
2016. ISBN 978-1-118-96858-1 p106. Biogeography: An Ecological
and Evolutionary Approach
^ Editorial Board, V&S Publishers, 2012, ISBN 9381588643
p137.Concise Dictionary of Science
^ BiologyPages/H/Haplotypes.html Kimball's Biology Pages (Creative
Commons Attribution 3.0)
^ Nature SciTable
^ The International HapMap Consortium (2003). "The International
HapMap Project" (PDF). Nature. 426 (6968): 789–796.
doi:10.1038/nature02168. PMID 14685227.
^ The International HapMap Consortium (2005). "A haplotype map of the
human genome" (PDF). Nature. 437 (7063): 1299–1320.
doi:10.1038/nature04226. PMC 1880871 .
^ Facts & Genes. Volume 7, Issue 3 Archived 2008-05-09 at the
^ a b Arora, Devender; Singh, Ajeet; Sharma, Vikrant; Bhaduria,
Harvendra Singh; Patel, Ram Bahadur (2015). "Hgs Db: Haplogroups
Database to understand migration and molecular risk assessment".
Bioinformation. 11 (6): 272–5. doi:10.6026/97320630011272.
PMC 4512000 . PMID 26229286.
^ International Society of Genetic Genealogy 2015 Genetics Glossary
^ Masatoshi Nei and Fumio Tajima, "
DNA polymorphism detectable by
restriction endonucleases", Genetics 97:145 (1981)
^ Becker T.; Knapp M. (2004). "Maximum-likelihood estimation of
haplotype frequencies in nuclear families". Genetic Epidemiology. 27
(1): 21–32. doi:10.1002/gepi.10323. PMID 15185400.
^ Li S.S.; Khalid N.; Carlson C.; Zhao L.P. (2003). "Estimating
haplotype frequencies and standard errors for multiple single
nucleotide polymorphisms". Biostatistics. 4 (4): 513–522.
doi:10.1093/biostatistics/4.4.513. PMID 14557108.
^ Roach J.C.; Glusman G.; Hubley R.; Montsaroff S.Z.; Holloway A.K.;
Mauldin D.E.; Srivastava D.; Garg V.; Pollard K.S.; Galas D.J.; Hood
L.; Smit A.F.A. (2011). "Chromosomal Haplotypes by Genetic Phasing of
Human Families". American Journal of Human Genetics. 89 (3):
382–397. doi:10.1016/j.ajhg.2011.07.023. PMC 3169815 .
^ Barrett J.C.; Fry B.; Maller J.; Daly M.J. (2005). "Haploview:
analysis and visualization of LD and haplotype maps". Bioinformatics.
21 (2): 263–265. doi:10.1093/bioinformatics/bth457.
^ Delaneau O, Zagury JF, Marchini J (2013). "Improved whole chromosome
phasing for disease and population genetic studies". Nature Methods.
10 (1): 5–6. doi:10.1038/nmeth.2307. PMID 23269371.
^ Purcell S.; Daly M. J.; Sham P. C. (2007). "WHAP: haplotype-based
association analysis". Bioinformatics. 23 (2): 255–256.
doi:10.1093/bioinformatics/btl580. PMID 17118959.
HapMap — homepage for the International HapMap Project.
Haplogroup — the difference between haplogroup