In
population genetics
Population genetics is a subfield of genetics that deals with genetic differences within and among populations, and is a part of evolutionary biology. Studies in this branch of biology examine such phenomena as Adaptation (biology), adaptation, s ...
, Ewens's sampling formula describes the
probabilities
Probability is a branch of mathematics and statistics concerning Event (probability theory), events and numerical descriptions of how likely they are to occur. The probability of an event is a number between 0 and 1; the larger the probab ...
associated with counts of how many different
allele
An allele is a variant of the sequence of nucleotides at a particular location, or Locus (genetics), locus, on a DNA molecule.
Alleles can differ at a single position through Single-nucleotide polymorphism, single nucleotide polymorphisms (SNP), ...
s are observed a given number of times in the
sample.
Definition
Ewens's sampling formula, introduced by
Warren Ewens, states that under certain conditions (specified below), if a random sample of ''n''
gamete
A gamete ( ) is a Ploidy#Haploid and monoploid, haploid cell that fuses with another haploid cell during fertilization in organisms that Sexual reproduction, reproduce sexually. Gametes are an organism's reproductive cells, also referred to as s ...
s is taken from a population and classified according to the
gene
In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protei ...
at a particular
locus then the
probability
Probability is a branch of mathematics and statistics concerning events and numerical descriptions of how likely they are to occur. The probability of an event is a number between 0 and 1; the larger the probability, the more likely an e ...
that there are ''a''
1 allele
An allele is a variant of the sequence of nucleotides at a particular location, or Locus (genetics), locus, on a DNA molecule.
Alleles can differ at a single position through Single-nucleotide polymorphism, single nucleotide polymorphisms (SNP), ...
s represented once in the sample, and ''a''
2 alleles represented twice, and so on, is
:
for some positive number ''θ'' representing the
population mutation rate, whenever
is a sequence of nonnegative integers such that
:
The phrase "under certain conditions" used above is made precise by the following assumptions:
* The sample size ''n'' is small by comparison to the size of the whole population; and
* The population is in statistical equilibrium under
mutation
In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, ...
and
genetic drift
Genetic drift, also known as random genetic drift, allelic drift or the Wright effect, is the change in the Allele frequency, frequency of an existing gene variant (allele) in a population due to random chance.
Genetic drift may cause gene va ...
and the role of selection at the locus in question is negligible; and
* Every mutant allele is novel.
This is a
probability distribution
In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...
on the set of all
partitions of the integer ''n''. Among probabilists and statisticians it is often called the multivariate Ewens distribution.
Mathematical properties
When ''θ'' = 0, the probability is 1 that all ''n'' genes are the same. When ''θ'' = 1, then the distribution is precisely that of the integer partition induced by a uniformly distributed
random permutation. As ''θ'' → ∞, the probability that no two of the ''n'' genes are the same approaches 1.
This family of probability distributions enjoys the property that if after the sample of ''n'' is taken, ''m'' of the ''n'' gametes are chosen without replacement, then the resulting probability distribution on the set of all partitions of the smaller integer ''m'' is just what the formula above would give if ''m'' were put in place of ''n''.
The Ewens distribution arises naturally from the
Chinese restaurant process.
See also
*
Chinese restaurant table distribution
*
Coalescent theory
*
Unified neutral theory of biodiversity
*
Biomathematics
Notes
* Warren Ewens, "The sampling theory of selectively neutral alleles", ''Theoretical Population Biology'', volume 3, pages 87–112, 1972.
* H. Crane. (2016)
The Ubiquitous Ewens Sampling Formula, ''Statistical Science'', 31:1 (Feb 2016). This article introduces a series of seven articles about Ewens Sampling in a special issue of the journal.
* J.F.C. Kingman, "Random partitions in population genetics", ''Proceedings of the Royal Society of London, Series B, Mathematical and Physical Sciences'', volume 361, number 1704, 1978.
* S. Tavare and W. J. Ewens, "The Multivariate Ewens distribution." (1997, Chapter 41 from the reference below).
* N.L. Johnson, S. Kotz, and N. Balakrishnan (1997) ''Discrete Multivariate Distributions'', Wiley. .
{{DEFAULTSORT:Ewens's Sampling Formula
Theory of probability distributions
Population genetics
Discrete distributions