Genetic Correlation
   HOME

TheInfoList



OR:

In multivariate
quantitative genetics Quantitative genetics deals with phenotypes that vary continuously (such as height or mass)—as opposed to discretely identifiable phenotypes and gene-products (such as eye-colour, or the presence of a particular biochemical). Both branches u ...
, a genetic correlation (denoted r_g or r_a) is the proportion of
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers ...
that two traits share due to
gene In biology, the word gene (from , ; "...Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or ''birth'' or ''gender'') can have several different meanings. The Mendelian gene is a ba ...
tic causes, the
correlation In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics ...
between the genetic influences on a trait and the genetic influences on a different trait Plomin et al., p. 123 estimating the degree of
pleiotropy Pleiotropy (from Greek , 'more', and , 'way') occurs when one gene influences two or more seemingly unrelated phenotypic traits. Such a gene that exhibits multiple phenotypic expression is called a pleiotropic gene. Mutation in a pleiotropic g ...
or causal overlap. A genetic correlation of 0 implies that the genetic effects on one trait are independent of the other, while a correlation of 1 implies that all of the genetic influences on the two traits are identical. The bivariate genetic correlation can be generalized to inferring genetic
latent variable In statistics, latent variables (from Latin: present participle of ''lateo'', “lie hidden”) are variables that can only be inferred indirectly through a mathematical model from other observable variables that can be directly observed or me ...
factors across > 2 traits using
factor analysis Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. For example, it is possible that variations in six observed ...
. Genetic correlation models were introduced into behavioral genetics in the 1970s–1980s. Genetic correlations have applications in validation of
genome-wide association study In genomics, a genome-wide association study (GWA study, or GWAS), also known as whole genome association study (WGA study, or WGAS), is an observational study of a genome-wide set of Single-nucleotide polymorphism, genetic variants in different i ...
(GWAS) results, breeding, prediction of traits, and discovering the
etiology Etiology (pronounced ; alternatively: aetiology or ætiology) is the study of causation or origination. The word is derived from the Greek (''aitiología'') "giving a reason for" (, ''aitía'', "cause"); and ('' -logía''). More completely, e ...
of traits & diseases. They can be estimated using individual-level data from twin studies and molecular genetics, or even with GWAS summary statistics. Genetic correlations have been found to be common in non-human genetics and to be broadly similar to their respective phenotypic correlations, and also found extensively in human traits, dubbed the 'phenome'. This finding of widespread pleiotropy has implications for artificial selection in agriculture, interpretation of phenotypic correlations, social inequality, attempts to use
Mendelian randomization In epidemiology, Mendelian randomization (commonly abbreviated to MR) is a method using measured variation in genes to interrogate the causal effect of an exposure on an outcome. Under key assumptions (see below), the design reduces both reverse ca ...
in causal inference, the understanding of the biological origins of complex traits, and the design of GWASes. A genetic correlation is to be contrasted with environmental correlation between the environments affecting two traits (e.g. if poor nutrition in a household caused both lower IQ and height); a genetic correlation between two traits can contribute to the observed (
phenotypic In genetics, the phenotype () is the set of observable characteristics or traits of an organism. The term covers the organism's morphology or physical form and structure, its developmental processes, its biochemical and physiological proper ...
) correlation between two traits, but genetic correlations can also be opposite observed phenotypic correlations if the environment correlation is sufficiently strong in the other direction, perhaps due to tradeoffs or specialization. The observation that genetic correlations usually mirror phenotypic correlations is known as "Cheverud's Conjecture" and has been confirmed in animals and humans, and showed they are of similar sizes; for example, in the
UK Biobank UK Biobank is a large long-term biobank study in the United Kingdom (UK) which is investigating the respective contributions of genetic predisposition and environmental exposure (including nutrition, lifestyle, medications etc.) to the developmen ...
, of 118 continuous human traits, only 29% of their intercorrelations have opposite signs, and a later analysis of 17 high-quality UKBB traits reported correlation near-unity.


Interpretation

Genetic correlations are not the same as
heritability Heritability is a statistic used in the fields of breeding and genetics that estimates the degree of ''variation'' in a phenotypic trait in a population that is due to genetic variation between individuals in that population. The concept of h ...
, as it is about the overlap between the two sets of influences and not their absolute magnitude; two traits could be both highly heritable but not be genetically correlated or have small heritabilities and be completely correlated (as long as the heritabilities are non-zero). For example, consider two traits – dark skin and black hair. These two traits may individually have a very high heritability (most of the population-level variation in the trait due to genetic differences, or in simpler terms, genetics contributes significantly to these two traits), however, they may still have a very low genetic correlation if, for instance, these two traits were being controlled by different, non-overlapping, non-linked genetic loci. A genetic correlation between two traits will tend to produce phenotypic correlations – e.g. the genetic correlation between
intelligence Intelligence has been defined in many ways: the capacity for abstraction, logic, understanding, self-awareness, learning, emotional knowledge, reasoning, planning, creativity, critical thinking, and problem-solving. More generally, it can b ...
and
SES SES, S.E.S., Ses and similar variants can refere to: Business and economics * Socioeconomic status * Scottish Economic Society, a learned society in Scotland * SES, callsign of the TV station SES/RTS (Mount Gambier, South Australia) * SES S.A., ...
or education and family SES implies that intelligence/SES will also correlate phenotypically. The phenotypic correlation will be limited by the degree of genetic correlation and also by the heritability of each trait. The expected phenotypic correlation is the ''bivariate heritability and can be calculated as the square roots of the heritabilities multiplied by the genetic correlation. (Using a Plomin example, for two traits with heritabilities of 0.60 & 0.23, r_g=0.75, and phenotypic correlation of ''r''=0.45 the bivariate heritability would be \sqrt \cdot 0.75 \cdot \sqrt = 0.28, so of the observed phenotypic correlation, 0.28/0.45 = 62% of it is due to genetics.)


Cause

Genetic correlations can arise due to: #
linkage disequilibrium In population genetics, linkage disequilibrium (LD) is the non-random association of alleles at different loci in a given population. Loci are said to be in linkage disequilibrium when the frequency of association of their different alleles is h ...
(two neighboring genes tend to be inherited together, each affecting a different trait) # biological pleiotropy (a single gene having multiple otherwise unrelated biological effects, or shared regulation of multiple genes) # mediated pleiotropy (a gene causes trait ''X'' and trait ''X'' causes trait ''Y''). # biases:
population stratification Population structure (also called genetic structure and population stratification) is the presence of a systematic difference in allele frequencies between subpopulations. In a randomly mating (or ''panmictic'') population, allele frequencies are ...
such as ancestry or
assortative mating Assortative mating (also referred to as positive assortative mating or homogamy) is a mating pattern and a form of sexual selection in which individuals with similar phenotypes or genotypes mate with one another more frequently than would be exp ...
(sometimes called "gametic phase disequilibrium"), spurious stratification such as
ascertainment bias In statistics, sampling bias is a bias (statistics), bias in which a sample is collected in such a way that some members of the intended statistical population, population have a lower or higher sampling probability than others. It results in a bia ...
/self-selection or
Berkson's paradox Berkson's paradox, also known as Berkson's bias, collider bias, or Berkson's fallacy, is a result in conditional probability and statistics which is often found to be counterintuitive, and hence a veridical paradox. It is a complicating factor ari ...
, or misclassification of diagnoses


Uses


Causes of changes in traits

Genetic correlations are scientifically useful because genetic correlations can be analyzed over time within an individual longitudinally (e.g. intelligence is stable over a lifetime, due to the same genetic influences – childhood genetically correlates r_g=0.62 with old age), or across studies or populations or ethnic groups/races , or across diagnoses, allowing discovery of whether different genes influence a trait over a lifetime (typically, they do not), whether different genes influence a trait in different populations due to differing local environments, whether there is disease heterogeneity across times or places or sex (particularly in psychiatric diagnoses there is uncertainty whether 1 country's 'autism' or 'schizophrenia' is the same as another's or whether diagnostic categories have shifted over time/place leading to different levels of
ascertainment bias In statistics, sampling bias is a bias (statistics), bias in which a sample is collected in such a way that some members of the intended statistical population, population have a lower or higher sampling probability than others. It results in a bia ...
), and to what degree traits like autoimmune or psychiatric disorders or cognitive functioning meaningfully cluster due sharing a biological basis and
genetic architecture Genetic architecture is the underlying genetic basis of a phenotypic trait and its variational properties. Phenotypic variation for quantitative traits is, at the most basic level, the result of the segregation of alleles at quantitative trait l ...
(for example,
reading Reading is the process of taking in the sense or meaning of Letter (alphabet), letters, symbols, etc., especially by Visual perception, sight or Somatosensory system, touch. For educators and researchers, reading is a multifaceted process invo ...
& mathematics disability genetically correlate, consistent with the
Generalist Genes Hypothesis The Generalist Genes Hypothesis of learning abilities and disabilities was originally coined in an article by Plomin & Kovas (2005). The Generalist Genes Hypothesis suggests that most genes associated with common learning disabilities Learnin ...
, and these genetic correlations explain the observed phenotypic correlations or 'co-morbidity'; IQ and specific measures of cognitive performance such as verbal, spatial, and memory tasks,
reaction time Mental chronometry is the scientific study of processing speed or reaction time on cognitive tasks to infer the content, duration, and temporal sequencing of mental operations. Reaction time (RT; sometimes referred to as "response time") is meas ...
,
long-term memory Long-term memory (LTM) is the stage of the Atkinson–Shiffrin memory model in which informative knowledge is held indefinitely. It is defined in contrast to short-term and working memory, which persist for only about 18 to 30 seconds. Long-t ...
,
executive function In cognitive science and neuropsychology, executive functions (collectively referred to as executive function and cognitive control) are a set of cognitive processes that are necessary for the cognitive control of behavior: selecting and succe ...
etc. all show high genetic correlations as do neuroanatomical measurements , and the correlations may increase with age, with implications for the etiology & nature of intelligence). This can be an important constraint on conceptualizations of the two traits: traits which seem different
phenotypically In genetics, the phenotype () is the set of observable characteristics or traits of an organism. The term covers the organism's morphology or physical form and structure, its developmental processes, its biochemical and physiological proper ...
but which share a common genetic basis require an explanation for how these genes can influence both traits.


Boosting GWASes

Genetic correlations can be used in GWASes by using polygenic scores or genome-wide hits for one (often more easily measured) trait to increase the
prior probability In Bayesian statistical inference, a prior probability distribution, often simply called the prior, of an uncertain quantity is the probability distribution that would express one's beliefs about this quantity before some evidence is taken into ...
of variants for a second trait; for example, since intelligence and years of education are highly genetically correlated, a GWAS for education will inherently also be a GWAS for intelligence and be able to predict variance in intelligence as well and the strongest SNP candidates can be used to increase the
statistical power In statistics, the power of a binary hypothesis test is the probability that the test correctly rejects the null hypothesis (H_0) when a specific alternative hypothesis (H_1) is true. It is commonly denoted by 1-\beta, and represents the chances ...
of a smaller GWAS, a combined analysis on the latent trait done where each measured genetically-correlated trait helps reduce measurement error and boosts the GWAS's power considerably (e.g. Krapohl et al. 2017, using elastic net and multiple polygenic scores, improving intelligence prediction from 3.6% of variance to 4.8%; Hill et al. 2017b uses MTAG to combine 3 ''g''-loaded traits of education, household income, and a cognitive test score to find 107 hits & doubles predictive power of intelligence) or one could do a GWAS for multiple traits jointly. Genetic correlations can also quantify the contribution of correlations <1 across datasets which might create a false " missing heritability", by estimating the extent to which differing measurement methods, ancestral influences, or environments create only partially overlapping sets of relevant genetic variants.


Breeding

Genetic correlations are also useful in applied contexts such as
plant Plants are predominantly photosynthetic eukaryotes of the kingdom Plantae. Historically, the plant kingdom encompassed all living things that were not animals, and included algae and fungi; however, all current definitions of Plantae exclud ...
/
animal breeding Animal breeding is a branch of animal science that addresses the evaluation (using best linear unbiased prediction and other methods) of the genetic value (estimated breeding value, EBV) of livestock. Selecting for breeding animals with superior EB ...
by allowing substitution of more easily measured but highly genetically correlated characteristics (particularly in the case of sex-linked or binary traits under the liability-threshold model, where differences in the phenotype can rarely be observed but another highly correlated measure, perhaps an
endophenotype In genetic epidemiology, endophenotype (or intermediate phenotype) is a term used to separate behavioral symptoms into more stable phenotypes with a clear genetic connection. The concept was coined by Bernard John and Kenneth R. Lewis in a 1966 pap ...
, is available in all individuals), compensating for different environments than the breeding was carried out in, making more accurate predictions of breeding value using the multivariate breeder's equation as compared to predictions based on the univariate breeder's equation using only per-trait heritability & assuming independence of traits, and avoiding unexpected consequences by taking into consideration that
artificial selection Selective breeding (also called artificial selection) is the process by which humans use animal breeding and plant breeding to selectively develop particular phenotypic traits (characteristics) by choosing which typically animal or plant m ...
for/against trait ''X'' will also increase/decrease all traits which positively/negatively correlate with ''X''. The limits to selection set by the inter-correlation of traits, and the possibility for genetic correlations to change over long-term breeding programs, lead to
Haldane's dilemma Haldane's dilemma, also known as "the waiting time problem", is a limit on the speed of beneficial evolution, calculated by J. B. S. Haldane in 1957. Before the invention of DNA sequencing technologies, it was not known how much polymorphism ...
limiting the intensity of selection and thus progress. Breeding experiments on genetically correlated traits can measure the extent to which correlated traits are inherently developmentally linked & response is constrained, and which can be dissociated. Some traits, such as the size of eyespots on the butterfly ''
Bicyclus anynana ''Bicyclus anynana'' (squinting bush brown) is a small brown butterfly in the family Nymphalidae, the most globally diverse family of butterflies. It is primarily found in eastern Africa from southern Sudan to Eswatini.standardizing this, i.e., by converting the covariance matrix to a correlation matrix. Generally, if \Sigma is a genetic covariance matrix and D=\sqrt, then the correlation matrix is D^ \Sigma D^. For a given genetic covariance \operatorname_g between two traits, one with genetic variance V_ and the other with genetic variance V_, the genetic correlation is computed in the same way as the correlation coefficient r_g = \frac.


Computing the genetic correlation

Genetic correlations require a genetically informative sample. They can be estimated in breeding experiments on two traits of known heritability and selecting on one trait to measure the change in the other trait (allowing inferring the genetic correlation), family/adoption/
twin studies Twin studies are studies conducted on identical or fraternal twins. They aim to reveal the importance of environmental and genetic influences for traits, phenotypes, and disorders. Twin research is considered a key tool in behavioral genetics a ...
(analyzed using SEMs or DeFries–Fulker extremes analysis), molecular estimation of relatedness such as GCTA, methods employing polygenic scores like HDL (High-Definition Likelihood), LD score regression, BOLT-REML, CPBayes, or HESS, comparison of genome-wide SNP hits in GWASes (as a loose lower bound), and phenotypic correlations of populations with at least some related individuals. As with estimating SNP heritability and genetic correlation, the better computational scaling & the ability to estimate using only established summary association statistics is a particular advantage for HDL and LD score regression over competing methods. Combined with the increasing availability of GWAS summary statistics or polygenic scores from datasets like the
UK Biobank UK Biobank is a large long-term biobank study in the United Kingdom (UK) which is investigating the respective contributions of genetic predisposition and environmental exposure (including nutrition, lifestyle, medications etc.) to the developmen ...
, such summary-level methods have led to an explosion of genetic correlation research since 2015. The methods are related to
Haseman–Elston regression In statistical genetics, Haseman–Elston (HE) regression is a form of statistical regression originally proposed for linkage analysis of quantitative traits for sibling relationship, sibling pairs. It was first developed by Joseph K. Haseman and R ...
& PCGC regression. Such methods are typically genome-wide, but it is also possible to estimate genetic correlations for specific variants or genome regions. One way to consider it is using trait X in twin 1 to predict trait Y in twin 2 for monozygotic and dizygotic twins (i.e. using twin 1's IQ to predict twin 2's brain volume); if this cross-correlation is larger for the more genetically-similar monozygotic twins than for the dizygotic twins, the similarity indicates that the traits are not genetically independent and there is some common genetics influencing both IQ and brain volume. (Statistical power can be boosted by using siblings as well.) Genetic correlations are affected by methodological concerns; underestimation of heritability, such as due to
assortative mating Assortative mating (also referred to as positive assortative mating or homogamy) is a mating pattern and a form of sexual selection in which individuals with similar phenotypes or genotypes mate with one another more frequently than would be exp ...
, will lead to overestimates of longitudinal genetic correlation, and moderate levels of misdiagnoses can create pseudo correlations. As they are affected by heritabilities of both traits, genetic correlations have low statistical power, especially in the presence of measurement errors biasing heritability downwards, because "estimates of genetic correlations are usually subject to rather large sampling errors and therefore seldom very precise": the
standard error The standard error (SE) of a statistic (usually an estimate of a parameter) is the standard deviation of its sampling distribution or an estimate of that standard deviation. If the statistic is the sample mean, it is called the standard error ...
of an estimate r_g is \sigma(r_g) = \frac \cdot \sqrt. (Larger genetic correlations & heritabilities will be estimated more precisely.) However, inclusion of genetic correlations in an analysis of a pleiotropic trait can boost power for the same reason that multivariate regressions are more powerful than separate univariate regressions. Twin methods have the advantage of being usable without detailed biological data, with human genetic correlations calculated as far back as the 1970s and animal/plant genetic correlations calculated in the 1930s, and require sample sizes in the hundreds for being well-powered, but they have the disadvantage of making assumptions which have been criticized, and in the case of rare traits like anorexia nervosa it may be difficult to find enough twins with a diagnosis to make meaningful cross-twin comparisons, and can only be estimated with access to the twin data; molecular genetic methods like GCTA or LD score regression have the advantage of not requiring specific degrees of relatedness and so can easily study rare traits using case-control designs, which also reduces the number of assumptions they rely on, but those methods could not be run until recently, require large sample sizes in the thousands or hundreds of thousands (to obtain precise SNP heritability estimates, see the standard error formula), may require individual-level genetic data (in the case of GCTA but not LD score regression). More concretely, if two traits, say height and weight have the following additive genetic variance-covariance matrix: Then the genetic correlation is .55, as seen is the standardized matrix below: In practice, structural equation modeling applications such as Mx or
OpenMx OpenMx is an open source program for extended structural equation modeling. It runs as a package under R. Cross platform, it runs under Linux, Mac OS and Windows.S. Boker, M. Neale, H. Maes, M. Wilde, M. Spiegel, T. Brick, J. Spies, R. Estabroo ...
(and before that, historically,
LISREL LISREL (linear structural relations) is a proprietary statistical software package used in structural equation modeling (SEM) for manifest and latent variables. It requires a "fairly high level of statistical sophistication". History LISREL was ...
) are used to calculate both the genetic covariance matrix and its standardized form. In R, will standardize the matrix. Typically, published reports will provide genetic variance components that have been standardized as a proportion of total variance (for instance in an ACE
twin study Twin studies are studies conducted on identical or fraternal twins. They aim to reveal the importance of environmental and genetic influences for traits, phenotypes, and disorders. Twin research is considered a key tool in behavioral genetics a ...
model standardised as a proportion of V-total = A+C+E). In this case, the metric for computing the genetic covariance (the variance within the genetic covariance matrix) is lost (because of the standardizing process), so you cannot readily estimate the genetic correlation of two traits from such published models. Multivariate models (such as the
Cholesky decomposition In linear algebra, the Cholesky decomposition or Cholesky factorization (pronounced ) is a decomposition of a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose, which is useful for effici ...
) will, however, allow the viewer to see shared genetic effects (as opposed to the genetic correlation) by following path rules. It is important therefore to provide the unstandardised path coefficients in publications.


See also

* Gene-environment correlation *
Heritability of intelligence Research on the heritability of IQ inquires into the degree of variation in IQ within a population that is due to genetic variation between individuals in that population. There has been significant controversy in the academic community about the ...
;
g factor (psychometrics) The ''g'' factor (also known as general intelligence, general mental ability or general intelligence factor) is a construct developed in psychometric investigations of Cognitive skill, cognitive abilities and human intelligence. It is a variable ...
*
Cognitive epidemiology Cognitive epidemiology is a field of research that examines the associations between intelligence test scores (IQ scores or extracted ''g''-factors) and health, more specifically morbidity (mental and physical) and mortality. Typically, test scor ...
*
Lothian birth-cohort studies The Lothian birth-cohort studies are two ongoing cohort studies which primarily involve research into how childhood intelligence relates to intelligence and health in old age. The Lothian Birth Cohort studies of 1921 and 1936 have, respectively, ...
*
Mendelian randomization In epidemiology, Mendelian randomization (commonly abbreviated to MR) is a method using measured variation in genes to interrogate the causal effect of an exposure on an outcome. Under key assumptions (see below), the design reduces both reverse ca ...


References


Cited sources

* *


External links


The G-matrix Online
{{DEFAULTSORT:Genetic Correlation Statistical genetics