
The fixation index (F
ST) is a measure of
population differentiation due to
genetic structure. It is frequently estimated from
genetic polymorphism
A gene is said to be polymorphic if more than one allele occupies that gene's locus within a population. In addition to having more than one allele at a specific locus, each allele must also occur in the population at a rate of at least 1% to ge ...
data, such as
single-nucleotide polymorphism
In genetics and bioinformatics, a single-nucleotide polymorphism (SNP ; plural SNPs ) is a germline substitution of a single nucleotide at a specific position in the genome. Although certain definitions require the substitution to be present in a ...
s (SNP) or
microsatellite
A microsatellite is a tract of repetitive DNA in which certain Sequence motif, DNA motifs (ranging in length from one to six or more base pairs) are repeated, typically 5–50 times. Microsatellites occur at thousands of locations within an organ ...
s. Developed as a special case of
Wright's F-statistics
In population genetics, ''F''-statistics (also known as fixation indices) describe the statistically expected level of heterozygosity in a population; more specifically the expected degree of (usually) a reduction in heterozygosity when compared ...
, it is one of the most commonly used statistics in
population genetics
Population genetics is a subfield of genetics that deals with genetic differences within and among populations, and is a part of evolutionary biology. Studies in this branch of biology examine such phenomena as Adaptation (biology), adaptation, s ...
. Its values range from 0 to 1, with 0 being no differentiation and 1 being complete differentiation.
Interpretation
This comparison of genetic variability within and between populations is frequently used in applied
population genetics
Population genetics is a subfield of genetics that deals with genetic differences within and among populations, and is a part of evolutionary biology. Studies in this branch of biology examine such phenomena as Adaptation (biology), adaptation, s ...
. The values range from 0 to 1. A zero value implies complete
panmixia
Panmixia (or panmixis) means uniform random fertilization, which means individuals do not select a mate based on physical traits. A panmictic population is one where all potential parents may contribute equally to the gamete pool, and that these ga ...
; that is, that the two populations are interbreeding freely. A value of one implies that all genetic variation is explained by the population structure, and that the two populations do not share any genetic diversity.
For idealized models such as
Wright's finite island model, F
ST can be used to estimate migration rates. Under that model, the migration rate is
:
,
where is the migration rate per generation, and
is the mutation rate per generation.
The interpretation of F
ST can be difficult when the data analyzed are highly polymorphic. In this case, the probability of identity by descent is very low and F
ST can have an arbitrarily low upper bound, which might lead to misinterpretation of the data. Also, strictly speaking F
ST is not a
distance
Distance is a numerical or occasionally qualitative measurement of how far apart objects, points, people, or ideas are. In physics or everyday usage, distance may refer to a physical length or an estimation based on other criteria (e.g. "two co ...
in the mathematical sense, as it does not satisfy the
triangle inequality
In mathematics, the triangle inequality states that for any triangle, the sum of the lengths of any two sides must be greater than or equal to the length of the remaining side.
This statement permits the inclusion of Degeneracy (mathematics)#T ...
.
Definition
Two of the most commonly used definitions for F
ST at a given locus are based on 1) the variance of
allele frequencies
Allele frequency, or gene frequency, is the relative frequency of an allele (variant of a gene) at a particular locus in a population, expressed as a fraction or percentage. Specifically, it is the fraction of all chromosomes in the population tha ...
among populations, and on 2) the probability of
identity by descent
A DNA segment is identical by descent (IBD) in two or more individuals if:
* they have inherited it from a common ancestor without recombination, that is, the segment has the same ancestral origin in these individuals
* the segment is maximal, t ...
.
If
is the average frequency of an allele in the total population,
is the variance in the frequency of the allele among different subpopulations, weighted by the sizes of the subpopulations, and
is the variance of the allelic state in the total population, F
ST is defined as
:
Wright's definition illustrates that F
ST measures the amount of genetic variance that can be explained by population structure. This can also be thought of as the fraction of total diversity that is not a consequence of the average diversity within subpopulations, where diversity is measured by the probability that two randomly selected alleles are different, namely
. If the allele frequency in the
th population is
and the relative size of the
th population is
, then
:
Alternatively,
:
where
is the probability of identity by descent of two individuals given that the two individuals are in the same subpopulation, and
is the probability that two individuals from the total population are identical by descent. Using this definition, F
ST can be interpreted as measuring how much closer two individuals from the same subpopulation are, compared to the total population. If the
mutation rate
In genetics, the mutation rate is the frequency of new mutations in a single gene, nucleotide sequence, or organism over time. Mutation rates are not constant and are not limited to a single type of mutation; there are many different types of mu ...
is small, this interpretation can be made more explicit by linking the probability of identity by descent to
coalescent times: Let T
0 and T denote the average time to coalescence for individuals from the same subpopulation and the total population, respectively. Then,
:
This formulation has the advantage that the expected time to coalescence can easily be estimated from genetic data, which led to the development of various estimators for F
ST.
Estimation
In practice, none of the quantities used for the definitions can be easily measured. As a consequence, various estimators have been proposed. A particularly simple estimator applicable to DNA sequence data is:
:
where
and
represent the
average number of pairwise differences between two individuals sampled from different sub-populations (
) or from the same sub-population (
). The average pairwise difference within a population can be calculated as the sum of the pairwise differences divided by the number of pairs. However, this estimator is biased when sample sizes are small or if they vary between populations. Therefore, more elaborate methods are used to compute F
ST in practice. Two of the most widely used procedures are the estimator by Weir & Cockerham (1984),
or performing an
Analysis of molecular variance. A list of implementations is available at the end of this article.
FST in humans

F
ST values depend strongly on the choice of populations.
Closely related ethnic groups, such as the
Danes
Danes (, ), or Danish people, are an ethnic group and nationality native to Denmark and a modern nation identified with the country of Denmark. This connection may be ancestral, legal, historical, or cultural.
History
Early history
Denmark ...
vs. the
Dutch, or the
Portuguese vs. the
Spaniards
Spaniards, or Spanish people, are a Romance-speaking ethnic group native to the Iberian Peninsula, primarily associated with the modern nation-state of Spain. Genetically and ethnolinguistically, Spaniards belong to the broader Southern a ...
show values significantly below 1%, indistinguishable from panmixia.
Within Europe, the most divergent ethnic groups have been found to have values of the order of 7% (
Sámi
Acronyms
* SAMI, ''Synchronized Accessible Media Interchange'', a closed-captioning format developed by Microsoft
* Saudi Arabian Military Industries, a government-owned defence company
* South African Malaria Initiative, a virtual expertise ...
vs.
Sardinians
Sardinians or Sards are an Italians, Italian ethno-linguistic group and a nation indigenous to Sardinia, an island in the western Mediterranean Sea, Mediterranean which is administratively an Regions of Italy#Autonomous regions with special st ...
).
Larger values are found if highly divergent homogenous groups are compared: the highest such value found was at close to 46%, between
Mbuti
The Mbuti people, or Bambuti, are one of several indigenous pygmy groups in the Congo region of Africa. Their languages are Central Sudanic languages and Bantu languages.
Subgroups
Bambuti are pygmy hunter-gatherers, and are one of the oldest ...
and
Papuans Papuans may refer to:
* Indonesian Papuans – the Native Indonesians of Papua-origin
* Papua New Guineans – the nationals of Papua New Guinea
* Indigenous people of New Guinea
The indigenous peoples of Western New Guinea in Indonesia and Pap ...
.
A genetic distance of 0.125 implies that kinship between unrelated individuals of the same ancestry relative to the world population is equivalent to kinship between half siblings in a randomly mating population. This also implies that if a human from a given ancestral population has a mixed half-sibling, that human is closer genetically to an unrelated individual of their ancestral population than to their mixed half-sibling.
Genetic distances in human populations
Autosomal genetic distances based on classical markers
In their study ''The History and Geography of Human Genes (1994)'', Cavalli-Sforza, Menozzi and Piazza provide some of the most detailed and comprehensive estimates of genetic distances between human populations, within and across continents. Their initial database contains 76,676 gene frequencies (using 120 blood polymorphisms), corresponding to 6,633 samples in different locations. By culling and pooling such samples, they restrict their analysis to 491 populations.

They focus on ''aboriginal populations'' that were at their present location at the end of the 15th century when the great European migrations began. When studying genetic difference at the world level, the number is reduced to 42 representative populations, aggregating subpopulations characterized by a high level of genetic similarity.
For these 42 populations, Cavalli-Sforza and coauthors report bilateral distances computed from 120 alleles. Among this set of 42 world populations, the greatest genetic distance observed is between Mbuti Pygmies and Papua New Guineans, where the Fst distance is 0.4573, while the smallest genetic distance (0.0021) is between the Danish and the English.
When considering more disaggregated data for 26 European populations, the smallest genetic distance (0.0009) is between the Dutch and the Danes, and the largest (0.0667) is between the Lapps and the Sardinians. The mean genetic distance among the 861 available pairings of the 42 selected populations was found to be 0.1338..
The following table shows Fst calculated by Cavalli-Sforza (1994) for some populations:
Autosomal genetic distances based on SNPs
A 2012 study based on
International HapMap Project
The International HapMap Project was an organization that aimed to develop a haplotype map (HapMap) of the human genome, to describe the common patterns of human genetic variation. HapMap is used to find genetic variants affecting health, disease ...
data estimated F
ST
between the three major "continental" populations of
Europeans
Europeans are the focus of European ethnology, the field of anthropology related to the various ethnic groups that reside in the states of Europe. Groups may be defined by common ancestry, language, faith, historical continuity, etc. There are ...
(combined from Utah residents of Northern and Western European ancestry from the CEPH collection and Italians from Tuscany),
East Asians
East Asian people (also East Asians) are the people from East Asia, which consists of China, Japan, Mongolia, North Korea, South Korea, and Taiwan. The total population of all countries within this region is estimated to be 1.677 billion and 21% ...
(combining Han Chinese from Beijing, Chinese from metropolitan Denver and Japanese from Tokyo, Japan) and
Sub-Saharan Africans (combining
Luhya Luhya or Abaluyia may refer to:
* Luhya people
* Luhya language
{{disambig
Language and nationality disambiguation pages ...
of Webuye, Kenya,
Maasai of Kinyawa, Kenya and
Yoruba of Ibadan, Nigeria). It reported a value close to 12% between continental populations, and values close to
panmixia
Panmixia (or panmixis) means uniform random fertilization, which means individuals do not select a mate based on physical traits. A panmictic population is one where all potential parents may contribute equally to the gamete pool, and that these ga ...
(smaller than 1%) within continental populations.
Autosomal genetic distances based on whole exome sequencing (WES)
Pairwise Fst values among several populations based on whole
exome sequencing
Exome sequencing, also known as whole exome sequencing (WES), is a genomic technique for sequencing all of the protein-coding regions of genes in a genome (known as the exome). It consists of two steps: the first step is to select only the subs ...
(WES) in 2016:
Programs for calculating FST
*
Arlequin
* Fstat
SMOGDdiveRsity(R package)
hierfstat(R package)
FinePopref> (R package)
*
DnaSP
* Popoolation2
Modules for calculating FST
*
BioPerl
BioPython
References
Further reading
* Evolution and the Genetics of Populations Volume 2: the Theory of Gene Frequencies, pg 294–295, S. Wright, Univ. of Chicago Press, Chicago, 1969
* A haplotype map of the human genome, The International HapMap Consortium, Nature 2005
See also
*
Genetic distance
Genetic distance is a measure of the genetics, genetic divergence between species or between population#Genetics, populations within a species, whether the distance measures time from common ancestor or degree of differentiation. Populations with ...
*
F-statistics
In population genetics, ''F''-statistics (also known as fixation indices) describe the statistically expected level of heterozygosity in a population; more specifically the expected degree of (usually) a reduction in heterozygosity when compared ...
*
QST_(genetics) In quantitative genetics, QST is a statistic intended to measure the degree of genetic differentiation among populations with regard to a quantitative trait. It was developed by Ken Spitze in 1993. Its name reflects that QST was intended to be anal ...
*
Coefficient of inbreeding
*
Coefficient of relationship
The coefficient of relationship is a measure of the degree of consanguinity (or biological relationship) between two individuals. The term coefficient of relationship was defined by Sewall Wright in 1922, and was derived from his definition of th ...
*
Hardy-Weinberg principle
*
Wahlund effect
External links
BioPerl - Bio::PopGen::PopStats
{{DEFAULTSORT:Fixation Index
Population genetics
Mathematical and theoretical biology