Galton–Watson Process
   HOME

TheInfoList



OR:

The Galton–Watson process is a branching stochastic process arising from Francis Galton's statistical investigation of the extinction of
family name In some cultures, a surname, family name, or last name is the portion of one's personal name that indicates one's family, tribe or community. Practices vary by culture. The family name may be placed at either the start of a person's full name ...
s. The process models family names as
patrilineal Patrilineality, also known as the male line, the spear side or agnatic kinship, is a common kinship system in which an individual's family membership derives from and is recorded through their father's lineage. It generally involves the inheritan ...
(passed from father to son), while offspring are randomly either male or female, and names become extinct if the family name line dies out (holders of the family name die without male descendants). This is an accurate description of
Y chromosome The Y chromosome is one of two sex chromosomes (allosomes) in therian mammals, including humans, and many other animals. The other is the X chromosome. Y is normally the sex-determining chromosome in many species, since it is the presence or abse ...
transmission in genetics, and the model is thus useful for understanding
human Y-chromosome DNA haplogroup In human genetics, a human Y-chromosome DNA haplogroup is a haplogroup defined by mutations in the non- recombining portions of DNA from the male-specific Y chromosome (called Y-DNA). Many people within a haplogroup share similar numbers of ...
s. Likewise, since
mitochondria A mitochondrion (; ) is an organelle found in the Cell (biology), cells of most Eukaryotes, such as animals, plants and Fungus, fungi. Mitochondria have a double lipid bilayer, membrane structure and use aerobic respiration to generate adenosi ...
are inherited only on the maternal line, the same mathematical formulation describes transmission of mitochondria. The formula is of limited usefulness in understanding actual family name distributions, since in practice family names change for many other reasons, and dying out of name line is only one factor.


History

There was concern amongst the Victorians that
aristocratic Aristocracy (, ) is a form of government that places strength in the hands of a small, privileged ruling class, the aristocrats. The term derives from the el, αριστοκρατία (), meaning 'rule of the best'. At the time of the word' ...
surnames were becoming extinct. Galton originally posed a mathematical question regarding the distribution of surnames in an idealized population in an 1873 issue of '' The Educational Times'', and the Reverend Henry William Watson replied with a solution. Together, they then wrote an 1874 paper titled "On the probability of the extinction of families" in the ''Journal of the Anthropological Institute of Great Britain and Ireland'' (now the ''
Journal of the Royal Anthropological Institute The ''Journal of the Royal Anthropological Institute'' (JRAI) is the principal journal of the oldest anthropological organization in the world, the Royal Anthropological Institute of Great Britain and Ireland. Articles, at the forefront of the di ...
''). Galton and Watson appear to have derived their process independently of the earlier work by I. J. Bienaymé; see Heyde and Seneta 1977. For a detailed history see Kendall (1966 and 1975).


Concepts

Assume, for the sake of the model, that surnames are passed on to all male children by their father. Suppose the number of a man's sons to be a
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
distributed Distribution may refer to: Mathematics *Distribution (mathematics), generalized functions used to formulate solutions of partial differential equations *Probability distribution, the probability of a particular value or value range of a varia ...
on the set . Further suppose the numbers of different men's sons to be
independent Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s * Independ ...
random variables, all having the same distribution. Then the simplest substantial mathematical conclusion is that if the average number of a man's sons is 1 or less, then their surname will
almost surely In probability theory, an event is said to happen almost surely (sometimes abbreviated as a.s.) if it happens with probability 1 (or Lebesgue measure 1). In other words, the set of possible exceptions may be non-empty, but it has probability 0 ...
die out, and if it is more than 1, then there is more than zero probability that it will survive for any given number of generations. Modern applications include the survival probabilities for a new
mutant In biology, and especially in genetics, a mutant is an organism or a new genetic character arising or resulting from an instance of mutation, which is generally an alteration of the DNA sequence of the genome or chromosome of an organism. It ...
gene, or the initiation of a nuclear chain reaction, or the dynamics of
disease outbreak In epidemiology, an outbreak is a sudden increase in occurrences of a disease when cases are in excess of normal expectancy for the location or season. It may affect a small and localized group or impact upon thousands of people across an entire ...
s in their first generations of spread, or the chances of
extinction Extinction is the termination of a kind of organism or of a group of kinds (taxon), usually a species. The moment of extinction is generally considered to be the death of the last individual of the species, although the capacity to breed and ...
of small
population Population typically refers to the number of people in a single area, whether it be a city or town, region, country, continent, or the world. Governments typically quantify the size of the resident population within their jurisdiction using a ...
of
organism In biology, an organism () is any living system that functions as an individual entity. All organisms are composed of cells (cell theory). Organisms are classified by taxonomy into groups such as multicellular animals, plants, and ...
s; as well as explaining (perhaps closest to Galton's original interest) why only a handful of males in the deep past of humanity now have ''any'' surviving male-line descendants, reflected in a rather small number of distinctive
human Y-chromosome DNA haplogroups In human genetics, a human Y-chromosome DNA haplogroup is a haplogroup defined by mutations in the non- recombining portions of DNA from the male-specific Y chromosome (called Y-DNA). Many people within a haplogroup share similar numbers of s ...
. A corollary of high extinction probabilities is that if a lineage ''has'' survived, it is likely to have experienced, purely by chance, an unusually high growth rate in its early generations at least when compared to the rest of the population.


Mathematical definition

A Galton–Watson process is a stochastic process which evolves according to the recurrence formula ''X''0 = 1 and :X_ = \sum_^ \xi_j^ where \ is a set of independent and identically-distributed natural number-valued random variables. In the analogy with family names, ''X''''n'' can be thought of as the number of descendants (along the male line) in the ''n''th generation, and \xi_j^ can be thought of as the number of (male) children of the ''j''th of these descendants. The recurrence relation states that the number of descendants in the ''n''+1st generation is the sum, over all ''n''th generation descendants, of the number of children of that descendant. The extinction probability (i.e. the probability of final extinction) is given by :\lim_ \Pr(X_n = 0).\, This is clearly equal to zero if each member of the population has exactly one descendant. Excluding this case (usually called the trivial case) there exists a simple necessary and sufficient condition, which is given in the next section.


Extinction criterion for Galton–Watson process

In the non-trivial case, the probability of final extinction is equal to 1 if ''E'' ≤ 1 and strictly less than 1 if ''E'' > 1. The process can be treated analytically using the method of
probability generating function In probability theory, the probability generating function of a discrete random variable is a power series representation (the generating function) of the probability mass function of the random variable. Probability generating functions are often ...
s. If the number of children ''ξ j'' at each node follows a
Poisson distribution In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known co ...
with parameter λ, a particularly simple recurrence can be found for the total extinction probability ''xn'' for a process starting with a single individual at time ''n'' = 0: :x_ = e^,\, giving the above curves.


Bisexual Galton–Watson process

In the classical family surname Galton–Watson process described above, only men need to be considered, since only males transmit their family name to descendants. This effectively means that reproduction can be modeled as asexual. (Likewise, if mitochondrial transmission is analyzed, only women need to be considered, since only females transmit their mitochondria to descendants.) A model more closely following actual sexual reproduction is the so-called "bisexual Galton–Watson process", where only couples reproduce. (''Bisexual'' in this context refers to the number of sexes involved, not
sexual orientation Sexual orientation is an enduring pattern of romantic or sexual attraction (or a combination of these) to persons of the opposite sex or gender, the same sex or gender, or to both sexes or more than one gender. These attractions are generall ...
.) In this process, each child is supposed as male or female, independently of each other, with a specified probability, and a so-called "mating function" determines how many couples will form in a given generation. As before, reproduction of different couples are considered to be independent of each other. Now the analogue of the trivial case corresponds to the case of each male and female reproducing in exactly one couple, having one male and one female descendant, and that the mating function takes the value of the minimum of the number of males and females (which are then the same from the next generation onwards). Since the total reproduction within a generation depends now strongly on the mating function, there exists in general no simple necessary and sufficient condition for final extinction as is the case in the classical Galton–Watson process. However, excluding the non-trivial case, the concept of the averaged reproduction mean (Bruss (1984)) allows for a general sufficient condition for final extinction, treated in the next section.


Extinction criterion

If in the non-trivial case the averaged reproduction mean per couple stays bounded over all generations and will not exceed 1 for a sufficiently large population size, then the probability of final extinction is always 1.


Examples

Citing historical examples of Galton–Watson process is complicated due to the history of family names often deviating significantly from the theoretical model. Notably, new names can be created, existing names can be changed over a person's lifetime, and people historically have often assumed names of unrelated persons, particularly nobility. Thus, a small number of family names at present is not in itself evidence for names having become extinct over time, or that they did so due to dying out of family name lines – that requires that there were more names in the past ''and'' that they die out due to the line dying out, rather than the name changing for other reasons, such as vassals assuming the name of their lord.
Chinese names Chinese names or Chinese personal names are names used by individuals from Greater China and other parts of the Chinese-speaking world throughout East and Southeast Asia (ESEA). In addition, many names used in Japan, Korea and Vietnam are often ...
are a well-studied example of surname extinction: there are currently only about 3,100 surnames in use in China, compared with close to 12,000 recorded in the past, with 22% of the population sharing the names Li,
Wang Wang may refer to: Names * Wang (surname) (王), a common Chinese surname * Wāng (汪), a less common Chinese surname * Titles in Chinese nobility * A title in Korean nobility * A title in Mongolian nobility Places * Wang River in Thai ...
and
Zhang Zhang may refer to: Chinese culture, etc. * Zhang (surname) (張/张), common Chinese surname ** Zhang (surname 章), a rarer Chinese surname * Zhang County (漳县), of Dingxi, Gansu * Zhang River (漳河), a river flowing mainly in Henan * ''Zha ...
(numbering close to 300 million people), and the top 200 names covering 96% of the population. Names have changed or become extinct for various reasons such as people taking the names of their rulers, orthographic simplifications, taboos against using characters from an emperor's name, among others. While family name lines dying out may be a factor in the surname extinction, it is by no means the only or even a significant factor. Indeed, the most significant factor affecting the surname frequency is other ethnic groups identifying as
Han Han may refer to: Ethnic groups * Han Chinese, or Han People (): the name for the largest ethnic group in China, which also constitutes the world's largest ethnic group. ** Han Taiwanese (): the name for the ethnic group of the Taiwanese p ...
and adopting Han names. Further, while new names have arisen for various reasons, this has been outweighed by old names disappearing. By contrast, some nations have adopted family names only recently. This means both that they have not experienced surname extinction for an extended period, and that the names were adopted when the nation had a relatively large population, rather than the smaller populations of ancient times. Further, these names have often been chosen creatively and are very diverse. Examples include: *
Japanese names in modern times consist of a family name (surname) followed by a given name, in that order. Nevertheless, when a Japanese name is written in the Roman alphabet, ever since the Meiji era, the official policy has been to cater to Western expecta ...
, which in general use date only to the
Meiji restoration The , referred to at the time as the , and also known as the Meiji Renovation, Revolution, Regeneration, Reform, or Renewal, was a political event that restored practical imperial rule to Japan in 1868 under Emperor Meiji. Although there were ...
in the late 19th century (when the population was over 30,000,000), have over 100,000 family names, surnames are very varied, and the government restricts married couples to using the same surname. * Many
Dutch name Dutch names consist of one or more given names and a surname. The given name is usually gender-specific. Dutch given names A Dutch child's birth and given name(s) must be officially registered by the parents within 3 days after birth. It is not ...
s have included a formal family name only since the
Napoleonic Wars The Napoleonic Wars (1803–1815) were a series of major global conflicts pitting the French Empire and its allies, led by Napoleon I, against a fluctuating array of European states formed into various coalitions. It produced a period of Fren ...
in the early 19th century. Earlier, surnames originated from patronyms (e.g., Jansen = John's son), personal qualities (e.g., de Rijke = the rich one), geographical locations (e.g., van Rotterdam), and occupations (e.g., Visser = the fisherman), sometimes even combined (e.g., Jan Jansz van Rotterdam). There are over 68,000 Dutch family names. *
Thai name Thai names follow the Western European pattern of a given name followed by a family name. This differs from the family-name-first patterns of Cambodian, Vietnamese, and other East Asian countries. Thai names (given and family) are diverse and ...
s have included a family name only since 1920, and only a single family can use a given family name; hence there are a great number of Thai names. Furthermore, Thai people change their family names with some frequency, complicating the analysis. On the other hand, some examples of high concentration of family names is not primarily due to the Galton–Watson process: *
Vietnamese name Traditional Vietnamese personal names generally consist of three parts, used in Eastern name order. * A family name (normally patrilineal, The father’s family name may be combined with the mother's family name to form a compound family name) ...
s have about 100 family names, and 60% of the population sharing three family names. The name
Nguyễn Nguyễn () is the most common Vietnamese surname. Outside of Vietnam, the surname is commonly rendered without diacritics as Nguyen. Nguyên (元)is a different word and surname. By some estimates 39 percent of Vietnamese people bear this su ...
alone is estimated to be used by almost 40% of the Vietnamese population, and 90% share 15 names. However, as the history of the
Nguyễn Nguyễn () is the most common Vietnamese surname. Outside of Vietnam, the surname is commonly rendered without diacritics as Nguyen. Nguyên (元)is a different word and surname. By some estimates 39 percent of Vietnamese people bear this su ...
name makes clear, this is in no small part due to names being forced on people or adopted for reasons unrelated to genetic relation.


See also

*
Branching process In probability theory, a branching process is a type of mathematical object known as a stochastic process, which consists of collections of random variables. The random variables of a stochastic process are indexed by the natural numbers. The origi ...
*
Resource-dependent branching process A branching process (BP) (see e.g. Jagers (1975)) is a mathematical model to describe the development of a population. Here population is meant in a general sense, including a human population, animal populations, bacteria and others which reprod ...
*
Pedigree collapse In genealogy, pedigree collapse describes how reproduction between two individuals who share an ancestor causes the number of distinct ancestors in the family tree of their offspring to be smaller than it could otherwise be. Robert C. Gunderson c ...


References


Further reading

* F. Thomas Bruss (1984). "A Note on Extinction Criteria for Bisexual Galton–Watson Processes". ''
Journal of Applied Probability A journal, from the Old French ''journal'' (meaning "daily"), may refer to: *Bullet journal, a method of personal organization *Diary, a record of what happened over the course of a day or other period *Daybook, also known as a general journal, a ...
'' 21: 915–919. * C C Heyde and E Seneta (1977). ''I.J. Bienayme: Statistical Theory Anticipated''. Berlin, Germany. * *


External links


"Survival of a Single Mutant" by Peter M. Lee of the University of York

The simple Galton-Watson process: Classical approach
University of Muenster {{DEFAULTSORT:Galton-Watson Process Genetic genealogy Genetics in the United Kingdom Human population genetics Stochastic processes