HOME

TheInfoList



OR:

In
population genetics Population genetics is a subfield of genetics that deals with genetic differences within and among populations, and is a part of evolutionary biology. Studies in this branch of biology examine such phenomena as Adaptation (biology), adaptation, s ...
, the Hardy–Weinberg principle, also known as the Hardy–Weinberg equilibrium, model, theorem, or law, states that
allele An allele is a variant of the sequence of nucleotides at a particular location, or Locus (genetics), locus, on a DNA molecule. Alleles can differ at a single position through Single-nucleotide polymorphism, single nucleotide polymorphisms (SNP), ...
and genotype frequencies in a population will remain constant from generation to generation in the absence of other evolutionary influences. These influences include ''
genetic drift Genetic drift, also known as random genetic drift, allelic drift or the Wright effect, is the change in the Allele frequency, frequency of an existing gene variant (allele) in a population due to random chance. Genetic drift may cause gene va ...
'', ''
mate choice Mate choice is one of the primary mechanisms under which evolution can occur. It is characterized by a "selective response by animals to particular stimuli" which can be observed as behavior.Bateson, Paul Patrick Gordon. "Mate Choice." Mate Choi ...
'', '' assortative mating'', ''
natural selection Natural selection is the differential survival and reproduction of individuals due to differences in phenotype. It is a key mechanism of evolution, the change in the Heredity, heritable traits characteristic of a population over generation ...
'', ''
sexual selection Sexual selection is a mechanism of evolution in which members of one sex mate choice, choose mates of the other sex to mating, mate with (intersexual selection), and compete with members of the same sex for access to members of the opposite sex ...
'', ''
mutation In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, ...
'', ''
gene flow In population genetics, gene flow (also known as migration and allele flow) is the transfer of genetic variation, genetic material from one population to another. If the rate of gene flow is high enough, then two populations will have equivalent ...
'', '' meiotic drive'', ''
genetic hitchhiking Genetic hitchhiking, also called genetic draft or the hitchhiking effect, is when an allele changes frequency not because it itself is under natural selection, but because it is near another gene that is undergoing a selective sweep and that is ...
'', ''
population bottleneck A population bottleneck or genetic bottleneck is a sharp reduction in the size of a population due to environmental events such as famines, earthquakes, floods, fires, disease, and droughts; or human activities such as genocide, speciocide, wid ...
'', ''
founder effect In population genetics, the founder effect is the loss of genetic variation that occurs when a new population is established by a very small number of individuals from a larger population. It was first fully outlined by Ernst Mayr in 1942, us ...
,'' ''
inbreeding Inbreeding is the production of offspring from the mating or breeding of individuals or organisms that are closely genetic distance, related genetically. By analogy, the term is used in human reproduction, but more commonly refers to the genet ...
and
outbreeding depression In biology, outbreeding depression happens when crosses between two genetically distant groups or populations result in a reduction of fitness. The concept is in contrast to inbreeding depression, although the two effects can occur simultaneously o ...
''. In the simplest case of a single locus with two
allele An allele is a variant of the sequence of nucleotides at a particular location, or Locus (genetics), locus, on a DNA molecule. Alleles can differ at a single position through Single-nucleotide polymorphism, single nucleotide polymorphisms (SNP), ...
s denoted ''A'' and ''a'' with frequencies and , respectively, the expected genotype frequencies under random mating are for the AA
homozygote Zygosity (the noun, zygote, is from the Greek "yoked," from "yoke") () is the degree to which both copies of a chromosome or gene have the same genetic sequence. In other words, it is the degree of similarity of the alleles in an organism. Mos ...
s, for the aa homozygotes, and for the
heterozygote Zygosity (the noun, zygote, is from the Greek "yoked," from "yoke") () is the degree to which both copies of a chromosome or gene have the same genetic sequence. In other words, it is the degree of similarity of the alleles in an organism. Mos ...
s. In the absence of selection, mutation, genetic drift, or other forces, allele frequencies ''p'' and ''q'' are constant between generations, so equilibrium is reached. The principle is named after
G. H. Hardy Godfrey Harold Hardy (7 February 1877 – 1 December 1947) was an English mathematician, known for his achievements in number theory and mathematical analysis. In biology, he is known for the Hardy–Weinberg principle, a basic principle of pop ...
and Wilhelm Weinberg, who first demonstrated it mathematically. Hardy's paper was focused on debunking the view that a dominant allele would automatically tend to increase in frequency (a view possibly based on a misinterpreted question at a lecture). Today, tests for Hardy–Weinberg genotype frequencies are used primarily to test for
population stratification Population structure (also called genetic structure and population stratification) is the presence of a systematic difference in allele frequencies between subpopulations. In a randomly mating (or ''panmictic'') population, allele frequencies ar ...
and other forms of non-random mating.


Derivation

Consider a population of
monoecious Monoecy (; adj. monoecious ) is a sexual system in seed plants where separate male and female cones or flowers are present on the same plant. It is a monomorphic sexual system comparable with gynomonoecy, andromonoecy and trimonoecy, and contras ...
diploids, where each organism produces male and female gametes at equal frequency, and has two alleles at each gene locus. We assume that the population is so large that it can be treated as infinite. Organisms reproduce by random union of gametes (the "
gene pool The gene pool is the set of all genes, or genetic information, in any population, usually of a particular species. Description A large gene pool indicates extensive genetic diversity, which is associated with robust populations that can survi ...
" population model). A locus in this population has two alleles, A and a, that occur with initial frequencies and , respectively. The allele frequencies at each generation are obtained by pooling together the alleles from each
genotype The genotype of an organism is its complete set of genetic material. Genotype can also be used to refer to the alleles or variants an individual carries in a particular gene or genetic location. The number of alleles an individual can have in a ...
of the same generation according to the expected contribution from the homozygote and heterozygote genotypes, which are 1 and 1/2, respectively: The different ways to form genotypes for the next generation can be shown in a
Punnett square The Punnett square is a square diagram that is used to predict the genotypes of a particular cross or breeding experiment. It is named after Reginald C. Punnett, who devised the approach in 1905. The diagram is used by biologists to determine ...
, where the proportion of each genotype is equal to the product of the row and column allele frequencies from the current generation. The sum of the entries is , as the genotype frequencies must sum to one. Note again that as , the binomial expansion of gives the same relationships. Summing the elements of the Punnett square or the binomial expansion, we obtain the expected genotype proportions among the offspring after a single generation: These frequencies define the Hardy–Weinberg equilibrium. It should be mentioned that the genotype frequencies after the first generation need not equal the genotype frequencies from the initial generation, e.g. . However, the genotype frequencies for all ''future'' times will equal the Hardy–Weinberg frequencies, e.g. for . This follows since the genotype frequencies of the next generation depend only on the allele frequencies of the current generation which, as calculated by equations () and (), are preserved from the initial generation: : \begin f_1(\text) &= f_1(\text) + \tfrac f_1(\text) = p^2 + p q = p (p+q) = p = f_0(\text) \\ f_1(\text) &= f_1(\text) + \tfrac f_1(\text) = q^2 + p q = q (p + q) = q = f_0(\text) \end For the more general case of
dioecious Dioecy ( ; ; adj. dioecious, ) is a characteristic of certain species that have distinct unisexual individuals, each producing either male or female gametes, either directly (in animals) or indirectly (in seed plants). Dioecious reproduction is ...
diploids rganisms are either male or femalethat reproduce by random mating of individuals, it is necessary to calculate the genotype frequencies from the nine possible matings between each parental genotype (''AA'', ''Aa'', and ''aa'') in either sex, weighted by the expected genotype contributions of each such mating. Equivalently, one considers the six unique diploid–diploid combinations: : \left (\text,\text), (\text, \text), (\text, \text), (\text,\text), (\text, \text), (\text, \text) \right/math> and constructs a Punnett square for each, so as to calculate its contribution to the next generation's genotypes. These contributions are weighted according to the probability of each diploid–diploid combination, which follows a
multinomial distribution In probability theory, the multinomial distribution is a generalization of the binomial distribution. For example, it models the probability of counts for each side of a ''k''-sided die rolled ''n'' times. For ''n'' statistical independence, indepen ...
with . For example, the probability of the mating combination is and it can only result in the genotype: . Overall, the resulting genotype frequencies are calculated as: : \begin &\left f_(\text), f_(\text), f_(\text)\right= \\ &\qquad= f_t(\text) f_t(\text) \left 1, 0, 0 \right+ 2 f_t(\text) f_t(\text) \left \tfrac, \tfrac, 0 \right+ 2 f_t(\text) f_t(\text) \left 0, 1, 0 \right\\ &\qquad\qquad+ f_t(\text) f_t(\text) \left \tfrac, \tfrac, \tfrac \right+ 2 f_t(\text) f_t(\text) \left 0, \tfrac, \tfrac \right+ f_t(\text) f_t(\text) \left 0, 0, 1 \right\\ &\qquad= \left[ \left(f_t(\text) + \tfrac f_t(\text) \right)^2, 2 \left(f_t(\text) + \tfrac f_t(\text) \right) \left(f_t(\text) + \tfrac f_t(\text) \right), \left(f_t(\text) + \tfrac f_t(\text) \right)^2 \right]\\ &\qquad= \left[ f_t(\text)^2, 2 f_t(\text) f_t(\text), f_t(\text)^2 \right] \end As before, one can show that the allele frequencies at time equal those at time , and so, are constant in time. Similarly, the genotype frequencies depend only on the allele frequencies, and so, after time are also constant in time. If in either
monoecious Monoecy (; adj. monoecious ) is a sexual system in seed plants where separate male and female cones or flowers are present on the same plant. It is a monomorphic sexual system comparable with gynomonoecy, andromonoecy and trimonoecy, and contras ...
or
dioecious Dioecy ( ; ; adj. dioecious, ) is a characteristic of certain species that have distinct unisexual individuals, each producing either male or female gametes, either directly (in animals) or indirectly (in seed plants). Dioecious reproduction is ...
organisms, either the allele or genotype proportions are initially unequal in either sex, it can be shown that constant proportions are obtained after one generation of random mating. If
dioecious Dioecy ( ; ; adj. dioecious, ) is a characteristic of certain species that have distinct unisexual individuals, each producing either male or female gametes, either directly (in animals) or indirectly (in seed plants). Dioecious reproduction is ...
organisms are heterogametic and the gene locus is located on the
X chromosome The X chromosome is one of the two sex chromosomes in many organisms, including mammals, and is found in both males and females. It is a part of the XY sex-determination system and XO sex-determination system. The X chromosome was named for its u ...
, it can be shown that if the allele frequencies are initially unequal in the two sexes 'e.g''., XX females and XY males, as in humans in the heterogametic sex 'chases' in the homogametic sex of the previous generation, until an equilibrium is reached at the weighted average of the two initial frequencies.


Deviations from Hardy–Weinberg equilibrium

The seven assumptions underlying Hardy–Weinberg equilibrium are as follows:Hartl DL, Clarke AG (2007) Principles of population genetics. Sunderland, MA: Sinauer * organisms are diploid * only
sexual reproduction Sexual reproduction is a type of reproduction that involves a complex life cycle in which a gamete ( haploid reproductive cells, such as a sperm or egg cell) with a single set of chromosomes combines with another gamete to produce a zygote tha ...
occurs * generations are nonoverlapping * mating is random * population size is infinitely large * allele frequencies are equal in the sexes * there is no migration, gene flow, admixture, mutation or selection Violations of the Hardy–Weinberg assumptions can cause deviations from expectation. How this affects the population depends on the assumptions that are violated. *
Random mating Panmixia (or panmixis) means uniform random fertilization, which means individuals do not select a mate based on physical traits. A panmictic population is one where all potential parents may contribute equally to the gamete pool, and that these ga ...
. The HWP states the population will have the given genotypic frequencies (called Hardy–Weinberg proportions) after a single generation of random mating within the population. When the random mating assumption is violated, the population will not have Hardy–Weinberg proportions. A common cause of non-random mating is
inbreeding Inbreeding is the production of offspring from the mating or breeding of individuals or organisms that are closely genetic distance, related genetically. By analogy, the term is used in human reproduction, but more commonly refers to the genet ...
, which causes an increase in homozygosity for all genes. If a population violates one of the following four assumptions, the population may continue to have Hardy–Weinberg proportions each generation, but the allele frequencies will change over time. *
Selection Selection may refer to: Science * Selection (biology), also called natural selection, selection in evolution ** Sex selection, in genetics ** Mate selection, in mating ** Sexual selection in humans, in human sexuality ** Human mating strat ...
, in general, causes allele frequencies to change, often quite rapidly. While
directional selection In population genetics, directional selection is a type of natural selection in which one extreme phenotype is favored over both the other extreme and moderate phenotypes. This genetic selection causes the allele frequency to shift toward the ...
eventually leads to the loss of all alleles except the favored one (unless one allele is dominant, in which case recessive alleles can survive at low frequencies), some forms of selection, such as
balancing selection Balancing selection refers to a number of selective processes by which multiple alleles (different versions of a gene) are actively maintained in the gene pool of a population at frequencies larger than expected from genetic drift alone. Balancing ...
, lead to equilibrium without loss of alleles. *
Mutation In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, ...
will have a very subtle effect on allele frequencies through the introduction of new allele into a population. Mutation rates are of the order 10−4 to 10−8, and the change in allele frequency will be, at most, the same order. Recurrent mutation will maintain alleles in the population, even if there is strong selection against them. * Migration genetically links two or more populations together. In general, allele frequencies will become more homogeneous among the populations. Some models for migration inherently include nonrandom mating ( Wahlund effect, for example). For those models, the Hardy–Weinberg proportions will normally not be valid. *
Small population size Small populations can behave differently from larger populations. They are often the result of population bottlenecks from larger populations, leading to loss of heterozygosity and reduced genetic diversity and loss or fixation of alleles and shif ...
can cause a random change in allele frequencies. This is due to a sampling effect, and is called
genetic drift Genetic drift, also known as random genetic drift, allelic drift or the Wright effect, is the change in the Allele frequency, frequency of an existing gene variant (allele) in a population due to random chance. Genetic drift may cause gene va ...
. Sampling effects are most important when the allele is present in a small number of copies. In real world genotype data, deviations from Hardy–Weinberg Equilibrium may be a sign of genotyping error.


Sex linkage

Where the A gene is sex linked, the heterogametic sex (''e.g.'', mammalian males; avian females) have only one copy of the gene (and are termed hemizygous), while the homogametic sex (''e.g.'',
human Humans (''Homo sapiens'') or modern humans are the most common and widespread species of primate, and the last surviving species of the genus ''Homo''. They are Hominidae, great apes characterized by their Prehistory of nakedness and clothing ...
females) have two copies. The genotype frequencies at equilibrium are ''p'' and ''q'' for the heterogametic sex but ''p''2, 2''pq'' and ''q''2 for the homogametic sex. For example, in humans red–green colorblindness is an X-linked recessive trait. In western European males, the trait affects about 1 in 12, (''q'' = 0.083) whereas it affects about 1 in 200 females (0.005, compared to ''q''2 = 0.007), very close to Hardy–Weinberg proportions. If a population is brought together with males and females with a different allele frequency in each subpopulation (males or females), the allele frequency of the male population in the next generation will follow that of the female population because each son receives its X chromosome from its mother. The population converges on equilibrium very quickly.


Generalizations

The simple derivation above can be generalized for more than two alleles and
polyploidy Polyploidy is a condition in which the cells of an organism have more than two paired sets of ( homologous) chromosomes. Most species whose cells have nuclei (eukaryotes) are diploid, meaning they have two complete sets of chromosomes, one fro ...
.


Generalization for more than two alleles

Consider an extra allele frequency, ''r''. The two-allele case is the binomial expansion of (''p'' + ''q'')2, and thus the three-allele case is the trinomial expansion of (''p'' + ''q'' + ''r'')2. :(p+q+r)^2=p^2 + q^2 + r^2 + 2pq +2pr + 2qr\, More generally, consider the alleles A1, ..., A''n'' given by the allele frequencies ''p''1 to ''p''''n''; :(p_1 + \cdots + p_n)^2\, giving for all
homozygotes Zygosity (the noun, zygote, is from the Greek "yoked," from "yoke") () is the degree to which both copies of a chromosome or gene have the same genetic sequence. In other words, it is the degree of similarity of the alleles in an organism. Mos ...
: :f(A_i A_i) = p_i^2\, and for all
heterozygotes Zygosity (the noun, zygote, is from the Greek "yoked," from "yoke") () is the degree to which both copies of a chromosome or gene have the same genetic sequence. In other words, it is the degree of similarity of the alleles in an organism. Mos ...
: :f(A_i A_j) = 2p_ip_j\,


Generalization for polyploidy

The Hardy–Weinberg principle may also be generalized to
polyploid Polyploidy is a condition in which the biological cell, cells of an organism have more than two paired sets of (Homologous chromosome, homologous) chromosomes. Most species whose cells have Cell nucleus, nuclei (eukaryotes) are diploid, meaning ...
systems, that is, for organisms that have more than two copies of each chromosome. Consider again only two alleles. The diploid case is the binomial expansion of: :(p + q)^2\, and therefore the polyploid case is the binomial expansion of: :(p + q)^c\, where ''c'' is the
ploidy Ploidy () is the number of complete sets of chromosomes in a cell, and hence the number of possible alleles for autosomal and pseudoautosomal genes. Here ''sets of chromosomes'' refers to the number of maternal and paternal chromosome copies, ...
, for example with tetraploid (''c'' = 4): Whether the organism is a 'true' tetraploid or an amphidiploid will determine how long it will take for the population to reach Hardy–Weinberg equilibrium.


Complete generalization

For n distinct alleles in c-ploids, the genotype frequencies in the Hardy–Weinberg equilibrium are given by individual terms in the multinomial expansion of (p_1 + \cdots + p_n)^c: :(p_1 + \cdots + p_n)^c = \sum_ p_1^ \cdots p_n^


Significance tests for deviation

Testing deviation from the HWP is generally performed using
Pearson's chi-squared test Pearson's chi-squared test or Pearson's \chi^2 test is a statistical test applied to sets of categorical data to evaluate how likely it is that any observed difference between the sets arose by chance. It is the most widely used of many chi-squa ...
, using the observed genotype frequencies obtained from the data and the expected genotype frequencies obtained using the HWP. For systems where there are large numbers of alleles, this may result in data with many empty possible genotypes and low genotype counts, because there are often not enough individuals present in the sample to adequately represent all genotype classes. If this is the case, then the
asymptotic In analytic geometry, an asymptote () of a curve is a line such that the distance between the curve and the line approaches zero as one or both of the ''x'' or ''y'' coordinates Limit of a function#Limits at infinity, tends to infinity. In pro ...
assumption of the
chi-squared distribution In probability theory and statistics, the \chi^2-distribution with k Degrees of freedom (statistics), degrees of freedom is the distribution of a sum of the squares of k Independence (probability theory), independent standard normal random vari ...
, will no longer hold, and it may be necessary to use a form of
Fisher's exact test Fisher's exact test (also Fisher-Irwin test) is a statistical significance test used in the analysis of contingency tables. Although in practice it is employed when sample sizes are small, it is valid for all sample sizes. The test assumes that a ...
, which requires a
computer A computer is a machine that can be Computer programming, programmed to automatically Execution (computing), carry out sequences of arithmetic or logical operations (''computation''). Modern digital electronic computers can perform generic set ...
to solve. More recently a number of MCMC methods of testing for deviations from HWP have been proposed (Guo & Thompson, 1992; Wigginton ''et al.'' 2005)


Example chi-squared test for deviation

This data is from E. B. Ford (1971) on the scarlet tiger moth, for which the
phenotype In genetics, the phenotype () is the set of observable characteristics or traits of an organism. The term covers the organism's morphology (physical form and structure), its developmental processes, its biochemical and physiological propert ...
s of a sample of the population were recorded.
Genotype–phenotype distinction The genotype–phenotype distinction is drawn in genetics. The "genotype" is an organism's full hereditary information. The "phenotype" is an organism's actual observed properties, such as morphology, development, or behavior. This distinction ...
is assumed to be negligibly small. The
null hypothesis The null hypothesis (often denoted ''H''0) is the claim in scientific research that the effect being studied does not exist. The null hypothesis can also be described as the hypothesis in which no relationship exists between two sets of data o ...
is that the population is in Hardy–Weinberg proportions, and the
alternative hypothesis In statistical hypothesis testing, the alternative hypothesis is one of the proposed propositions in the hypothesis test. In general the goal of hypothesis test is to demonstrate that in the given condition, there is sufficient evidence supporting ...
is that the population is not in Hardy–Weinberg proportions. From this, allele frequencies can be calculated: : \begin p & = \\ \\ & = \\ \\ & = \\ \\ & = 0.954 \end and : \begin q & = 1 - p \\ & = 1 - 0.954 \\ & = 0.046 \end So the Hardy–Weinberg expectation is: : \begin \mathrm(\text) & = p^2n = 0.954^2 \times 1612 = 1467.4 \\ \mathrm(\text) & = 2pqn = 2 \times 0.954 \times 0.046 \times 1612 = 141.2 \\ \mathrm(\text) & = q^2n = 0.046^2 \times 1612 = 3.4 \end
Pearson's chi-squared test Pearson's chi-squared test or Pearson's \chi^2 test is a statistical test applied to sets of categorical data to evaluate how likely it is that any observed difference between the sets arose by chance. It is the most widely used of many chi-squa ...
states: : \begin \chi^2 & = \sum \\ & = + + \\ & = 0.001 + 0.073 + 0.756 \\ & = 0.83 \end There is 1
degree of freedom In many scientific fields, the degrees of freedom of a system is the number of parameters of the system that may vary independently. For example, a point in the plane has two degrees of freedom for translation: its two coordinates; a non-infinites ...
(degrees of freedom for test for Hardy–Weinberg proportions are # genotypes − # alleles). The 5%
significance level In statistical hypothesis testing, a result has statistical significance when a result at least as "extreme" would be very infrequent if the null hypothesis were true. More precisely, a study's defined significance level, denoted by \alpha, is the ...
for 1 degree of freedom is 3.84, and since the χ2 value is less than this, the
null hypothesis The null hypothesis (often denoted ''H''0) is the claim in scientific research that the effect being studied does not exist. The null hypothesis can also be described as the hypothesis in which no relationship exists between two sets of data o ...
that the population is in Hardy–Weinberg frequencies is not rejected.


Fisher's exact test (probability test)

Fisher's exact test Fisher's exact test (also Fisher-Irwin test) is a statistical significance test used in the analysis of contingency tables. Although in practice it is employed when sample sizes are small, it is valid for all sample sizes. The test assumes that a ...
can be applied to testing for Hardy–Weinberg proportions. Since the test is conditional on the allele frequencies, ''p'' and ''q'', the problem can be viewed as testing for the proper number of heterozygotes. In this way, the hypothesis of Hardy–Weinberg proportions is rejected if the number of heterozygotes is too large or too small. The conditional probabilities for the heterozygote, given the allele frequencies are given in Emigh (1980) as :\operatorname _ \mid n_1= \frac 2^, where ''n''11, ''n''12, ''n''22 are the observed numbers of the three genotypes, AA, Aa, and aa, respectively, and ''n''1 is the number of A alleles, where n_1 = 2 n_ + n_. An example Using one of the examples from Emigh (1980), we can consider the case where ''n'' = 100, and ''p'' = 0.34. The possible observed heterozygotes and their exact significance level is given in Table 4. Using this table, one must look up the significance level of the test based on the observed number of heterozygotes. For example, if one observed 20 heterozygotes, the significance level for the test is 0.007. As is typical for Fisher's exact test for small samples, the gradation of significance levels is quite coarse. However, a table like this has to be created for every experiment, since the tables are dependent on both ''n'' and ''p''.


Equivalence tests

The equivalence tests are developed in order to establish sufficiently good agreement of the observed genotype frequencies and Hardy Weinberg equilibrium. Let \mathcal denote the family of the genotype distributions under the assumption of Hardy Weinberg equilibrium. The distance between a genotype distribution p and Hardy Weinberg equilibrium is defined by d(p,\mathcal)=\min_d(p,q) , where d is some distance. The equivalence test problem is given by H_0=\ and H_1=\ , where \varepsilon>0 is a tolerance parameter. If the hypothesis H_0 can be rejected then the population is close to Hardy Weinberg equilibrium with a high probability. The equivalence tests for the biallelic case are developed among others in Wellek (2004). The equivalence tests for the case of multiple alleles are proposed in Ostrovski (2020).


Inbreeding coefficient

The inbreeding coefficient, F (see also ''F''-statistics), is one minus the observed frequency of heterozygotes over that expected from Hardy–Weinberg equilibrium. : where the expected value from Hardy–Weinberg equilibrium is given by : \operatorname(f(\text)) = 2 p q For example, for Ford's data above: : For two alleles, the chi-squared
goodness of fit The goodness of fit of a statistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measur ...
test for Hardy–Weinberg proportions is equivalent to the test for inbreeding, F = 0. The inbreeding coefficient is unstable as the expected value approaches zero, and thus not useful for rare and very common alleles. For: F \big, _ = -\infty; F \big, _ is undefined.


History

Mendelian genetics Mendelian inheritance (also known as Mendelism) is a type of biological inheritance following the principles originally proposed by Gregor Mendel in 1865 and 1866, re-discovered in 1900 by Hugo de Vries and Carl Correns, and later popularized ...
were rediscovered in 1900. However, it remained somewhat controversial for several years as it was not then known how it could cause continuous characteristics.
Udny Yule George Udny Yule, CBE, FRS (18 February 1871 – 26 June 1951), usually known as Udny Yule, was a British statistician, particularly known for the Yule distribution and proposing the preferential attachment model for random graphs. Perso ...
(1902) argued against Mendelism because he thought that dominant alleles would increase in the population. The American William E. Castle (1903) showed that without
selection Selection may refer to: Science * Selection (biology), also called natural selection, selection in evolution ** Sex selection, in genetics ** Mate selection, in mating ** Sexual selection in humans, in human sexuality ** Human mating strat ...
, the genotype frequencies would remain stable.
Karl Pearson Karl Pearson (; born Carl Pearson; 27 March 1857 – 27 April 1936) was an English biostatistician and mathematician. He has been credited with establishing the discipline of mathematical statistics. He founded the world's first university ...
(1903) found one equilibrium position with values of ''p'' = ''q'' = 0.5.
Reginald Punnett Reginald Crundall Punnett FRS (; 20 June 1875 – 3 January 1967) was a British geneticist who co-founded, with William Bateson, the ''Journal of Genetics'' in 1910. Punnett is probably best remembered today as the creator of the Punnett ...
, unable to counter Yule's point, introduced the problem to
G. H. Hardy Godfrey Harold Hardy (7 February 1877 – 1 December 1947) was an English mathematician, known for his achievements in number theory and mathematical analysis. In biology, he is known for the Hardy–Weinberg principle, a basic principle of pop ...
, a
British British may refer to: Peoples, culture, and language * British people, nationals or natives of the United Kingdom, British Overseas Territories and Crown Dependencies. * British national identity, the characteristics of British people and culture ...
mathematician A mathematician is someone who uses an extensive knowledge of mathematics in their work, typically to solve mathematical problems. Mathematicians are concerned with numbers, data, quantity, mathematical structure, structure, space, Mathematica ...
, with whom he played
cricket Cricket is a Bat-and-ball games, bat-and-ball game played between two Sports team, teams of eleven players on a cricket field, field, at the centre of which is a cricket pitch, pitch with a wicket at each end, each comprising two Bail (cr ...
. Hardy was a
pure mathematician Pure may refer to: Computing * Pure function * PureSystems, a family of computer systems introduced by IBM in 2012 * Pure Software, a company founded in 1991 by Reed Hastings to support the Purify tool * Pure-FTPd, FTP server software * Pure ...
and held
applied mathematics Applied mathematics is the application of mathematics, mathematical methods by different fields such as physics, engineering, medicine, biology, finance, business, computer science, and Industrial sector, industry. Thus, applied mathematics is a ...
in some contempt; his view of biologists' use of mathematics comes across in his 1908 paper where he describes this as "very simple": :''To the Editor of Science: I am reluctant to intrude in a discussion concerning matters of which I have no expert knowledge, and I should have expected the very simple point which I wish to make to have been familiar to biologists. However, some remarks of Mr. Udny Yule, to which Mr. R. C. Punnett has called my attention, suggest that it may still be worth making...'' :''Suppose that Aa is a pair of Mendelian characters, A being dominant, and that in any given generation the number of pure dominants (AA), heterozygotes (Aa), and pure recessives (aa) are as'' ''p'':2''q'':''r''. ''Finally, suppose that the numbers are fairly large, so that mating may be regarded as random, that the sexes are evenly distributed among the three varieties, and that all are equally fertile. A little mathematics of the multiplication-table type is enough to show that in the next generation the numbers will be as'' (''p'' + ''q'')2:2(''p'' + ''q'')(''q'' + ''r''):(''q'' + ''r'')2, or as ''p''1:2''q''1:''r''1, ''say.'' :''The interesting question is: in what circumstances will this distribution be the same as that in the generation before? It is easy to see that the condition for this is'' ''q''2 = ''pr''. And since ''q''12 = ''p''1''r''1, ''whatever the values of'' ''p'', ''q'', ''and'' ''r'' ''may be, the distribution will in any case continue unchanged after the second generation'' The principle was thus known as ''Hardy's law'' in the
English-speaking world The English-speaking world comprises the 88 countries and territories in which English language, English is an official, administrative, or cultural language. In the early 2000s, between one and two billion people spoke English, making it the ...
until 1943, when Curt Stern pointed out that it had first been formulated independently in 1908 by the German physician Wilhelm Weinberg.
William Castle William Castle (born William Schloss Jr.; April 24, 1914 – May 31, 1977) was an American film director, producer, screenwriter, and actor. He is known for the horror film, horror and thriller film, thriller B movie, B-movies he directed durin ...
in 1903 also derived the ratios for the special case of equal allele frequencies, and it is sometimes (but rarely) called the Hardy–Weinberg–Castle Law.


Derivation of Hardy's equations

Hardy's statement begins with a
recurrence relation In mathematics, a recurrence relation is an equation according to which the nth term of a sequence of numbers is equal to some combination of the previous terms. Often, only k previous terms of the sequence appear in the equation, for a parameter ...
for the frequencies ''p'', 2''q'', and ''r''. These recurrence relations follow from fundamental concepts in probability, specifically
independence Independence is a condition of a nation, country, or state, in which residents and population, or some portion thereof, exercise self-government, and usually sovereignty, over its territory. The opposite of independence is the status of ...
, and
conditional probability In probability theory, conditional probability is a measure of the probability of an Event (probability theory), event occurring, given that another event (by assumption, presumption, assertion or evidence) is already known to have occurred. This ...
. For example, consider the probability of an offspring from the generation \textstyle t being homozygous dominant. Alleles are inherited independently from each parent. A dominant allele can be inherited from a homozygous dominant parent with probability 1, or from a heterozygous parent with probability 0.5. To represent this reasoning in an equation, let \textstyle A_t represent inheritance of a dominant allele from a parent. Furthermore, let \textstyle AA_ and \textstyle Aa_ represent potential parental genotypes in the preceding generation. : \begin p_t & = P(A_t, A_t) = P(A_t)^2 \\ & = \left(P(A_t\mid AA_)P(AA_) + P(A_t\mid Aa_)P(Aa_)\right)^2 \\ & = \left((1)p_ + (0.5) 2q_\right)^2 \\ & = \left(p_ + q_\right)^2 \end The same reasoning, applied to the other genotypes yields the two remaining recurrence relations. Equilibrium occurs when each proportion is constant between subsequent generations. More formally, a population is at equilibrium at generation \textstyle t when : \textstyle 0 = p_t - p_, \textstyle 0 = q_t - q_, and \textstyle 0 = r_t - r_ By solving these equations necessary and sufficient conditions for equilibrium to occur can be determined. Again, consider the frequency of homozygous dominant animals. Equilibrium implies : \begin 0 & = p_t - p_ \\ & = p_^2 + 2p_q_ + q_^2 - p_ \end First consider the case, where \textstyle p_ = 0, and note that it implies that \textstyle q_ = 0 and \textstyle r_ = 1. Now consider the remaining case, where \textstyle p_ \ne \textstyle 0: : \begin 0 & = p_(p_ + 2q_ + q_^2/p_ - 1) \\ & = q_^2/p_ - r_ \end where the final equality holds because the allele proportions must sum to one. In both cases, \textstyle q_^2 = p_r_. It can be shown that the other two equilibrium conditions imply the same equation. Together, the solutions of the three equilibrium equations imply sufficiency of Hardy's condition for equilibrium. Since the condition always holds for the second generation, all succeeding generations have the same proportions.


Numerical example


Estimation of genotype distribution

An example computation of the genotype distribution given by Hardy's original equations is instructive. The phenotype distribution from Table 3 above will be used to compute Hardy's initial genotype distribution. Note that the ''p'' and ''q'' values used by Hardy are not the same as those used above. : \begin \text & = = \\ pt& = 1750 \end : \begin p & = = 0.83943 \\ pt2q & = = 0.15771 \\ ptr & = = 0.00286 \end As checks on the distribution, compute : p + 2q + r = 0.83943 + 0.15771 + 0.00286 = 1.00000 \, and : E_0 = q^2 - pr = 0.00382. \, For the next generation, Hardy's equations give : \begin q & = = 0.07886 \\ \\ p_1 & = (p + q)^2 = 0.84325 \\ pt2q_1 & = 2(p + q)(q + r) = 0.15007 \\ ptr_1 & = (q + r)^2 = 0.00668. \end Again as checks on the distribution, compute : p_1 + 2q_1 + r_1 = 0.84325 + 0.15007 + 0.00668 = 1.00000 \, and : E_1 = q_1^2 - p_1 r_1 = 0.00000 \, which are the expected values. The reader may demonstrate that subsequent use of the second-generation values for a third generation will yield identical results.


Estimation of carrier frequency

The Hardy–Weinberg principle can also be used to estimate the frequency of carriers of an autosomal recessive condition in a population based on the frequency of suffers. Let us assume an estimated \textstyle \frac babies are born with
cystic fibrosis Cystic fibrosis (CF) is a genetic disorder inherited in an autosomal recessive manner that impairs the normal clearance of Sputum, mucus from the lungs, which facilitates the colonization and infection of the lungs by bacteria, notably ''Staphy ...
, this is about the frequency of homozygous individuals observed in Northern European populations. We can use the Hardy–Weinberg equations to estimate the carrier frequency, the frequency of heterozygous individuals, \textstyle 2pq. : \begin & q^2 = \frac \\ pt& q = \frac \\ pt& p = 1 - q \end As \textstyle \frac is small we can take ''p'', \textstyle 1 - \frac, to be 1. : \begin 2pq = 2 \cdot \frac \\ pt2pq = \frac \end We therefore estimate the carrier rate to be \textstyle \frac, which is about the frequency observed in Northern European populations. This can be simplified to the carrier frequency being about twice the square root of the birth frequency.


Graphical representation

It is possible to represent the distribution of genotype frequencies for a bi-allelic locus within a population graphically using a de Finetti diagram. This uses a triangular plot (also known as trilinear, triaxial or
ternary plot A ternary plot, ternary graph, triangle plot, simplex plot, or Gibbs triangle is a barycentric plot on three variables which sum to a constant. It graphically depicts the ratios of the three variables as positions in an equilateral triangle. ...
) to represent the distribution of the three genotype frequencies in relation to each other. It differs from many other such plots in that the direction of one of the axes has been reversed. The curved line in the diagram is the Hardy–Weinberg
parabola In mathematics, a parabola is a plane curve which is Reflection symmetry, mirror-symmetrical and is approximately U-shaped. It fits several superficially different Mathematics, mathematical descriptions, which can all be proved to define exactl ...
and represents the state where
allele An allele is a variant of the sequence of nucleotides at a particular location, or Locus (genetics), locus, on a DNA molecule. Alleles can differ at a single position through Single-nucleotide polymorphism, single nucleotide polymorphisms (SNP), ...
s are in Hardy–Weinberg equilibrium. It is possible to represent the effects of
natural selection Natural selection is the differential survival and reproduction of individuals due to differences in phenotype. It is a key mechanism of evolution, the change in the Heredity, heritable traits characteristic of a population over generation ...
and its effect on allele frequency on such graphs. The de Finetti diagram was developed and used extensively by
A. W. F. Edwards Anthony William Fairbank Edwards, Fellow of the Royal Society, FRS One or more of the preceding sentences incorporates text from the royalsociety.org website where: (born 1935) is a British statistician, geneticist and evolutionary biologist. Ed ...
in his book ''Foundations of Mathematical Genetics''.Edwards, 1977


See also

*
F-statistics In population genetics, ''F''-statistics (also known as fixation indices) describe the statistically expected level of heterozygosity in a population; more specifically the expected degree of (usually) a reduction in heterozygosity when compared ...
*
Fixation index The fixation index (FST) is a measure of population differentiation due to genetic structure. It is frequently estimated from Polymorphism (biology), genetic polymorphism data, such as single-nucleotide polymorphisms (SNP) or Microsatellite (genet ...
*
QST_(genetics) In quantitative genetics, QST is a statistic intended to measure the degree of genetic differentiation among populations with regard to a quantitative trait. It was developed by Ken Spitze in 1993. Its name reflects that QST was intended to be anal ...
* Wahlund effect *
Regression toward the mean In statistics, regression toward the mean (also called regression to the mean, reversion to the mean, and reversion to mediocrity) is the phenomenon where if one sample of a random variable is extreme, the next sampling of the same random var ...
*
Multinomial distribution In probability theory, the multinomial distribution is a generalization of the binomial distribution. For example, it models the probability of counts for each side of a ''k''-sided die rolled ''n'' times. For ''n'' statistical independence, indepen ...
(Hardy–Weinberg is a trinomial distribution with probabilities (\theta^2, 2 \theta (1-\theta), (1-\theta)^2) ) * Additive disequilibrium and z statistic *
Population genetics Population genetics is a subfield of genetics that deals with genetic differences within and among populations, and is a part of evolutionary biology. Studies in this branch of biology examine such phenomena as Adaptation (biology), adaptation, s ...
*
Genetic diversity Genetic diversity is the total number of genetic characteristics in the genetic makeup of a species. It ranges widely, from the number of species to differences within species, and can be correlated to the span of survival for a species. It is d ...
*
Founder effect In population genetics, the founder effect is the loss of genetic variation that occurs when a new population is established by a very small number of individuals from a larger population. It was first fully outlined by Ernst Mayr in 1942, us ...
*
Population bottleneck A population bottleneck or genetic bottleneck is a sharp reduction in the size of a population due to environmental events such as famines, earthquakes, floods, fires, disease, and droughts; or human activities such as genocide, speciocide, wid ...
*
Genetic drift Genetic drift, also known as random genetic drift, allelic drift or the Wright effect, is the change in the Allele frequency, frequency of an existing gene variant (allele) in a population due to random chance. Genetic drift may cause gene va ...
*
Inbreeding depression Inbreeding depression is the reduced biological fitness caused by loss of genetic diversity as a consequence of inbreeding, the breeding of individuals closely related genetically. This loss of genetic diversity results from small population siz ...
* Coefficient of inbreeding *
Coefficient of relationship The coefficient of relationship is a measure of the degree of consanguinity (or biological relationship) between two individuals. The term coefficient of relationship was defined by Sewall Wright in 1922, and was derived from his definition of th ...
*
Natural selection Natural selection is the differential survival and reproduction of individuals due to differences in phenotype. It is a key mechanism of evolution, the change in the Heredity, heritable traits characteristic of a population over generation ...
* Fitness * Genetic load


Notes


References


Citations


Sources

* * *Edwards, A.W.F. 1977. ''Foundations of Mathematical Genetics.'' Cambridge University Press, Cambridge (2nd ed., 2000). * * Ford, E.B. (1971). ''Ecological Genetics'', London. * * * * * * * * *


External links


''EvolutionSolution'' (at bottom of page)



genetics Population Genetics Simulator

HARDY C implementation of Guo & Thompson 1992

Source code (C/C++/Fortran/R) for Wigginton ''et al.'' 2005

Online de Finetti Diagram Generator and Hardy–Weinberg equilibrium tests

Online Hardy–Weinberg equilibrium tests and drawing of de Finetti diagrams


{{DEFAULTSORT:Hardy-Weinberg Principle Population genetics Classical genetics Statistical genetics Sexual selection