A haplotype ( haploid genotype) is a group of

allele An allele (, ; ; modern formation from Greek ἄλλος ''állos'', "other") is a variation of the same sequence of nucleotides at the same place on a long DNA molecule, as described in leading textbooks on genetics and evolution. ::"The chrom ...

s in an

organism In biology, an organism () is any living system that functions as an individual entity. All organisms are composed of cells ( cell theory). Organisms are classified by taxonomy into groups such as multicellular animals, plants, and fu ...

that are inherited together from a single parent. Many organisms contain genetic material ( DNA) which is inherited from two parents. Normally these organisms have their DNA organized in two sets of pairwise similar

chromosome A chromosome is a long DNA molecule with part or all of the genetic material of an organism. In most chromosomes the very long thin DNA fibers are coated with packaging proteins; in eukaryotic cells the most important of these proteins ar ...

s. The offspring gets one chromosome in each pair from each parent. A set of pairs of chromosomes is called diploid and a set of only one half of each pair is called haploid. The haploid genotype (haplotype) is a genotype that considers the singular chromosomes rather than the pairs of chromosomes. It can be all the chromosomes from one of the parents or a minor part of a chromosome, for example a sequence of 9000 base pairs. However, there are other uses of this term. First, it is used to mean a collection of specific alleles (that is, specific DNA sequences) in a cluster of tightly linked genes on a

that are likely to be inherited together—that is, they are likely to be conserved as a sequence that survives the descent of many generations of reproduction. A second use is to mean a set of linked

single-nucleotide polymorphism In genetics, a single-nucleotide polymorphism (SNP ; plural SNPs ) is a germline substitution of a single nucleotide at a specific position in the genome. Although certain definitions require the substitution to be present in a sufficiently ...

(SNP) alleles that tend to always occur together (i.e., that are associated statistically). It is thought that identifying these statistical associations and a few alleles of a specific haplotype sequence can facilitate identifying ''all other such'' polymorphic sites that are nearby on the chromosome. Such information is critical for investigating the genetics of common diseases; which in fact have been investigated in humans by the International HapMap Project. Thirdly, many human genetic testing companies use the term in a third way: to refer to an individual collection of specific mutations within a given genetic segment; (see short tandem repeat mutation). The term ' haplogroup' refers to the SNP/ unique-event polymorphism (UEP) mutations that represent the

clade A clade (), also known as a monophyletic group or natural group, is a group of organisms that are monophyletic – that is, composed of a common ancestor and all its lineal descendants – on a phylogenetic tree. Rather than the English ter ...

to which a collection of particular human haplotypes belong. (Clade here refers to a set of haplotypes sharing a common ancestor.) A haplogroup is a group of similar haplotypes that share a common ancestor with a

mutation In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, m ...

. Mitochondrial DNA passes along a maternal lineage that can date back thousands of years.

Haplotype resolution

An organism's genotype may not define its haplotype uniquely. For example, consider a diploid organism and two bi-allelic loci (such as SNPs) on the same chromosome. Assume the first locus has alleles ''A'' or ''T'' and the second locus ''G'' or ''C''. Both loci, then, have three possible genotypes: (''AA'', ''AT'', and ''TT'') and (''GG'', ''GC'', and ''CC''), respectively. For a given individual, there are nine possible configurations (haplotypes) at these two loci (shown in the Punnett square below). For individuals who are homozygous at one or both loci, the haplotypes are unambiguous - meaning that there is not any differentiation of haplotype T1T2 vs haplotype T2T1; where T1 and T2 are labeled to show that they are the same locus, but labeled as such to show it doesn't matter which order you consider them in, the end result is two T loci. For individuals

heterozygous Zygosity (the noun, zygote, is from the Greek "yoked," from "yoke") () is the degree to which both copies of a chromosome or gene have the same genetic sequence. In other words, it is the degree of similarity of the alleles in an organism. ...

at both loci, the gametic phase is ambiguous - in these cases, you don't know which haplotype you have, e.g., TA vs AT. The only unequivocal method of resolving phase ambiguity is by sequencing. However, it is possible to estimate the probability of a particular haplotype when phase is ambiguous using a sample of individuals. Given the genotypes for a number of individuals, the haplotypes can be inferred by haplotype resolution or haplotype phasing techniques. These methods work by applying the observation that certain haplotypes are common in certain genomic regions. Therefore, given a set of possible haplotype resolutions, these methods choose those that use fewer different haplotypes overall. The specifics of these methods vary - some are based on combinatorial approaches (e.g.,

parsimony Parsimony refers to the quality of economy or frugality in the use of resources. Parsimony may also refer to * The Law of Parsimony, or Occam's razor, a problem-solving principle ** Maximum parsimony (phylogenetics), an optimality criterion in p ...

), whereas others use likelihood functions based on different models and assumptions such as the Hardy–Weinberg principle, the coalescent theory model, or perfect phylogeny. The parameters in these models are then estimated using algorithms such as the expectation-maximization algorithm (EM),

Markov chain Monte Carlo In statistics, Markov chain Monte Carlo (MCMC) methods comprise a class of algorithms for sampling from a probability distribution. By constructing a Markov chain that has the desired distribution as its equilibrium distribution, one can obtain ...

(MCMC), or

hidden Markov model A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process — call it X — with unobservable ("''hidden''") states. As part of the definition, HMM requires that there be an ...

s (HMM).

Microfluidic whole genome haplotyping Microfluidic whole genome haplotyping is a technique for the physical separation of individual chromosomes from a metaphase cell followed by direct resolution of the haplotype for each allele. Background Whole genome haplotyping Whole genome hapl ...

is a technique for the physical separation of individual chromosomes from a

metaphase Metaphase ( and ) is a stage of mitosis in the eukaryotic cell cycle in which chromosomes are at their second-most condensed and coiled stage (they are at their most condensed in anaphase). These chromosomes, carrying genetic information, a ...

cell followed by direct resolution of the haplotype for each allele.

Y-DNA haplotypes from genealogical DNA tests

Unlike other chromosomes, Y chromosomes generally do not come in pairs. Every human male (excepting those with

XYY syndrome XYY syndrome, also known as Jacobs syndrome, is an aneuploid genetic condition in which a male has an extra Y chromosome. There are usually few symptoms. These may include being taller than average, acne, and an increased risk of learning di ...

) has only one copy of that chromosome. This means that there is not any chance variation of which copy is inherited, and also (for most of the chromosome) not any shuffling between copies by recombination; so, unlike autosomal haplotypes, there is effectively not any randomisation of the Y-chromosome haplotype between generations. A human male should largely share the same Y chromosome as his father, give or take a few mutations; thus Y chromosomes tend to pass largely intact from father to son, with a small but accumulating number of mutations that can serve to differentiate male lineages. In particular, the Y-DNA represented as the numbered results of a Y-DNA genealogical DNA test should match, except for mutations.

UEP results (SNP results)

Unique-event polymorphisms (UEPs) such as SNPs represent haplogroups. STRs represent haplotypes. The results that comprise the full Y-DNA haplotype from the Y chromosome DNA test can be divided into two parts: the results for UEPs, sometimes loosely called the SNP results as most UEPs are single-nucleotide polymorphisms, and the results for microsatellite short tandem repeat sequences (

Y-STR A Y-STR is a short tandem repeat (STR) on the Y-chromosome. Y-STRs are often used in forensics, paternity, and genealogical DNA testing. Y-STRs are taken specifically from the male Y chromosome. These Y-STRs provide a weaker analysis than autoso ...

s). The UEP results represent the inheritance of events it is believed can be assumed to have happened only once in all human history. These can be used to identify the individual's

Y-DNA haplogroup In human genetics, a human Y-chromosome DNA haplogroup is a haplogroup defined by mutations in the non- recombining portions of DNA from the male-specific Y chromosome (called Y-DNA). Many people within a haplogroup share similar numbers of s ...

, his place in the "family tree" of the whole of humanity. Different Y-DNA haplogroups identify genetic populations that are often distinctly associated with particular geographic regions; their appearance in more recent populations located in different regions represents the migrations tens of thousands of years ago of the direct

patrilineal Patrilineality, also known as the male line, the spear side or agnatic kinship, is a common kinship system in which an individual's family membership derives from and is recorded through their father's lineage. It generally involves the inheritan ...

ancestors of current individuals.

Y-STR haplotypes

Genetic results also include the Y-STR haplotype, the set of results from the Y-STR markers tested. Unlike the UEPs, the Y-STRs mutate much more easily, which allows them to be used to distinguish recent genealogy. But it also means that, rather than the population of descendants of a genetic event all sharing the ''same'' result, the Y-STR haplotypes are likely to have spread apart, to form a ''cluster'' of more or less similar results. Typically, this cluster will have a definite most probable center, the modal haplotype (presumably similar to the haplotype of the original founding event), and also a haplotype diversity — the degree to which it has become spread out. The further in the past the defining event occurred, and the more that subsequent population growth occurred early, the greater the haplotype diversity will be for a particular number of descendants. However, if the haplotype diversity is smaller for a particular number of descendants, this may indicate a more recent common ancestor, or a recent population expansion. It is important to note that, unlike for UEPs, two individuals with a similar Y-STR haplotype may not necessarily share a similar ancestry. Y-STR events are not unique. Instead, the clusters of Y-STR haplotype results inherited from different events and different histories tend to overlap. In most cases, it is a long time since the haplogroups' defining events, so typically the cluster of Y-STR haplotype results associated with descendants of that event has become rather broad. These results will tend to significantly overlap the (similarly broad) clusters of Y-STR haplotypes associated with other haplogroups. This makes it impossible for researchers to predict with absolute certainty to which Y-DNA haplogroup a Y-STR haplotype would point. If the UEPs are not tested, the Y-STRs may be used only to predict probabilities for haplogroup ancestry, but not certainties. A similar scenario exists in trying to evaluate whether shared surnames indicate shared genetic ancestry. A cluster of similar Y-STR haplotypes may indicate a shared common ancestor, with an identifiable modal haplotype, but only if the cluster is sufficiently distinct from what may have happened by chance from different individuals who historically adopted the same name independently. Many names were adopted from common occupations, for instance, or were associated with habitation of particular sites. More extensive haplotype typing is needed to establish genetic genealogy. Commercial DNA-testing companies now offer their customers testing of more numerous sets of markers to improve definition of their genetic ancestry. The number of sets of markers tested has increased from 12 during the early years to 111 more recently. Establishing plausible relatedness between different surnames data-mined from a database is significantly more difficult. The researcher must establish that the ''very nearest'' member of the population in question, chosen purposely from the population for that reason, would be unlikely to match by accident. This is more than establishing that a ''randomly selected'' member of the population is unlikely to have such a close match by accident. Because of the difficulty, establishing relatedness between different surnames as in such a scenario is likely to be impossible, except in special cases where there is specific information to drastically limit the size of the population of candidates under consideration.

Diversity

Haplotype diversity is a measure of the uniqueness of a particular haplotype in a given population. The haplotype diversity (H) is computed as:

H=\frac(1- \sum_x_i^2)

where

x_i

is the (relative) haplotype frequency of each haplotype in the sample and

N

is the sample size. Haplotype diversity is given for each sample.

Software

FAMHAP
— FAMHAP is a software for single-marker analysis and, in particular, joint analysis of unphased genotype data from tightly linked markers (haplotype analysis).

— EM based haplotype estimation and association tests in unrelated and nuclear families.
HPlus
— A software package for imputation and testing of haplotypes in association studies using a modified method that incorporates the expectation-maximization algorithm and a Bayesian method known as progressive ligation.
HaploBlockFinder
— A software package for analyses of haplotype block structure.
Haploscribe
— Reconstruction of whole-chromosome haplotypes based on all genotyped positions in a nuclear family, including rare variants. *

Haploview Haploview is a commonly used bioinformatics software which is designed to analyze and visualize patterns of linkage disequilibrium (LD) in genetic data. Haploview can also perform association studies, choosing tagSNPs{{cite journal, author1=de Bakk ...

— Visualisation of linkage disequilibrium, haplotype estimation and haplotype tagging
Homepage
.

— Haplotype analysis software - Haplotype Trend Regression (HTR), haplotypic association tests, and haplotype frequency estimation using both the expectation-maximization (EM) algorithm and composite haplotype method (CHM).

— A software for haplotype reconstruction, and recombination rate estimation from population data.
SHAPEIT
— SHAPEIT2 is a program for haplotype estimation of SNP genotypes in large cohorts across whole chromosome.
SNPHAP
— EM based software for estimating haplotype frequencies from unphased genotypes.
WHAP
ref> — ''haplotype'' based association analysis.

References

External links

HapMap
— homepage for the International HapMap Project.

— the difference between haplogroup & haplotype explained. {{Authority control Classical genetics Population genetics Genetic genealogy