In
statistics, latent variables (from
Latin
Latin (, or , ) is a classical language belonging to the Italic branch of the Indo-European languages. Latin was originally a dialect spoken in the lower Tiber area (then known as Latium) around present-day Rome, but through the power ...
:
present participle
In linguistics, a participle () (from Latin ' a "sharing, partaking") is a nonfinite verb form that has some of the characteristics and functions of both verbs and adjectives. More narrowly, ''participle'' has been defined as "a word derived from ...
of ''lateo'', “lie hidden”) are
variables that can only be
inferred indirectly through a
mathematical model
A mathematical model is a description of a system using mathematical concepts and language. The process of developing a mathematical model is termed mathematical modeling. Mathematical models are used in the natural sciences (such as physics, ...
from other observable variables that can be directly
observed
Observation is the active acquisition of information from a primary source. In living beings, observation employs the senses. In science, observation can also involve the perception and recording of data (information), data via the use of scienti ...
or
measured
Measurement is the quantification of attributes of an object or event, which can be used to compare with other objects or events.
In other words, measurement is a process of determining how large or small a physical quantity is as compared t ...
. Such ''
latent variable models'' are used in many disciplines, including
political science
Political science is the scientific study of politics. It is a social science dealing with systems of governance and power, and the analysis of political activities, political thought, political behavior, and associated constitutions and ...
,
demography
Demography () is the statistical study of populations, especially human beings.
Demographic analysis examines and measures the dimensions and dynamics of populations; it can cover whole societies or groups defined by criteria such as ed ...
,
engineering
Engineering is the use of scientific method, scientific principles to design and build machines, structures, and other items, including bridges, tunnels, roads, vehicles, and buildings. The discipline of engineering encompasses a broad rang ...
,
medicine
Medicine is the science and Praxis (process), practice of caring for a patient, managing the diagnosis, prognosis, Preventive medicine, prevention, therapy, treatment, Palliative care, palliation of their injury or disease, and Health promotion ...
,
ecology
Ecology () is the study of the relationships between living organisms, including humans, and their physical environment. Ecology considers organisms at the individual, population, community, ecosystem, and biosphere level. Ecology overl ...
,
physics
Physics is the natural science that studies matter, its fundamental constituents, its motion and behavior through space and time, and the related entities of energy and force. "Physical science is that department of knowledge which rel ...
,
machine learning
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence.
Machine ...
/
artificial intelligence
Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machine
A machine is a physical system using Power (physics), power to apply Force, forces and control Motion, moveme ...
,
bioinformatics
Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combin ...
,
chemometrics,
natural language processing
Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to proc ...
,
management
Management (or managing) is the administration of an organization, whether it is a business, a nonprofit organization, or a government body. It is the art and science of managing resources of the business.
Management includes the activities ...
and the
social sciences
Social science is one of the branches of science, devoted to the study of society, societies and the Social relation, relationships among individuals within those societies. The term was formerly used to refer to the field of sociology, the o ...
.
Latent variables may correspond to aspects of physical reality. These could in principle be measured, but may not be for practical reasons. In this situation, the term ''hidden variables'' is commonly used (reflecting the fact that the variables are meaningful, but not observable). Other latent variables correspond to abstract concepts, like categories, behavioral or mental states, or data structures. The terms ''hypothetical variables'' or ''hypothetical constructs'' may be used in these situations.
The use of latent variables can serve to
reduce the dimensionality of data. Many observable variables can be aggregated in a model to represent an underlying concept, making it easier to understand the data. In this sense, they serve a function similar to that of scientific theories. At the same time, latent variables link observable "
sub-symbolic" data in the real world to symbolic data in the modeled world.
Examples
Psychology
Latent variables, as created by factor analytic methods, generally represent "shared" variance, or the degree to which variables "move" together. Variables that have no correlation cannot result in a latent construct based on the common
factor model.
* The "
Big Five personality traits
The Big Five personality traits is a suggested taxonomy, or grouping, for personality traits, developed from the 1980s onward in psychological trait theory.
Starting in the 1990s, the theory identified five factors by labels, for the US Englis ...
" have been inferred using
factor analysis
Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. For example, it is possible that variations in six observed ...
.
* extraversion
* spatial ability
[
* wisdom “Two of the more predominant means of assessing wisdom include wisdom-related performance and latent variable measures.”]
* Spearman's g
The ''g'' factor (also known as general intelligence, general mental ability or general intelligence factor) is a construct developed in psychometric investigations of cognitive abilities and human intelligence. It is a variable that summarize ...
, or the general intelligence factor in psychometrics
Psychometrics is a field of study within psychology concerned with the theory and technique of measurement. Psychometrics generally refers to specialized fields within psychology and education devoted to testing, measurement, assessment, and ...
Economics
Examples of latent variables from the field of economics
Economics () is the social science that studies the production, distribution, and consumption of goods and services.
Economics focuses on the behaviour and interactions of economic agents and how economies work. Microeconomics analy ...
include quality of life
Quality of life (QOL) is defined by the World Health Organization as "an individual's perception of their position in life in the context of the culture and value systems in which they live and in relation to their goals, expectations, standards ...
, business confidence, morale, happiness and conservatism: these are all variables which cannot be measured directly. But linking these latent variables to other, observable variables, the values of the latent variables can be inferred from measurements of the observable variables. Quality of life is a latent variable which cannot be measured directly so observable variables are used to infer quality of life. Observable variables to measure quality of life include wealth, employment, environment, physical and mental health, education, recreation and leisure time, and social belonging.
Medicine
Latent-variable methodology is used in many branches of medicine
Medicine is the science and Praxis (process), practice of caring for a patient, managing the diagnosis, prognosis, Preventive medicine, prevention, therapy, treatment, Palliative care, palliation of their injury or disease, and Health promotion ...
. A class of problems that naturally lend themselves to latent variables approaches are longitudinal studies where the time scale (e.g. age of participant or time since study baseline) is not synchronized with the trait being studied. For such studies, an unobserved time scale that is synchronized with the trait being studied can be modeled as a transformation of the observed time scale using latent variables. Examples of this include disease progression modeling and modeling of growth (see box).
Inferring latent variables
There exists a range of different model classes and methodology that make use of latent variables and allow inference in the presence of latent variables. Models include:
* linear mixed-effects models and nonlinear mixed-effects models
* Hidden Markov model
A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process — call it X — with unobservable ("''hidden''") states. As part of the definition, HMM requires that there be an ob ...
s
* Factor analysis
Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. For example, it is possible that variations in six observed ...
* Item response theory
In psychometrics, item response theory (IRT) (also known as latent trait theory, strong true score theory, or modern mental test theory) is a paradigm for the design, analysis, and scoring of tests, questionnaires, and similar instruments measur ...
Analysis and inference methods include:
* Principal component analysis
* Instrumented principal component analysis[Kelly, Bryan T. and Pruitt, Seth and Su, Yinan, Instrumented Principal Component Analysis (December 17, 2020). Available at SSRN: https://ssrn.com/abstract=2983919 or http://dx.doi.org/10.2139/ssrn.2983919]
* Partial least squares regression
* Latent semantic analysis and probabilistic latent semantic analysis
* EM algorithms
* Metropolis–Hastings algorithm
In statistics and statistical physics, the Metropolis–Hastings algorithm is a Markov chain Monte Carlo (MCMC) method for obtaining a sequence of random samples from a probability distribution from which direct sampling is difficult. This seq ...
Bayesian algorithms and methods
Bayesian statistics
Bayesian statistics is a theory in the field of statistics based on the Bayesian interpretation of probability where probability expresses a ''degree of belief'' in an event. The degree of belief may be based on prior knowledge about the event, ...
is often used for inferring latent variables.
* Latent Dirichlet allocation
* The Chinese restaurant process is often used to provide a prior distribution over assignments of objects to latent categories.
* The Indian buffet process is often used to provide a prior distribution over assignments of latent binary features to objects.
See also
* Confounding
In statistics, a confounder (also confounding variable, confounding factor, extraneous determinant or lurking variable) is a variable that influences both the dependent variable and independent variable, causing a spurious association. Con ...
* Dependent and independent variables
* Errors-in-variables models
* Evidence lower bound
In variational Bayesian methods, the evidence lower bound (often abbreviated ELBO, also sometimes called the variational lower bound or negative variational free energy) is a useful lower bound on the log-likelihood of some observed data.
Termin ...
* Factor analysis
Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. For example, it is possible that variations in six observed ...
* Intervening variable
In statistics, a mediation model seeks to identify and explain the mechanism or process that underlies an observed relationship between an independent variable and a dependent variable via the inclusion of a third hypothetical variable, known a ...
* Latent variable model
* Item response theory
In psychometrics, item response theory (IRT) (also known as latent trait theory, strong true score theory, or modern mental test theory) is a paradigm for the design, analysis, and scoring of tests, questionnaires, and similar instruments measur ...
* Partial least squares path modeling
* Partial least squares regression
* Proxy (statistics)
* Rasch model
The Rasch model, named after Georg Rasch, is a psychometric model for analyzing categorical data, such as answers to questions on a reading assessment or questionnaire responses, as a function of the trade-off between the respondent's abilities, ...
* Structural equation modeling
References
Further reading
*
{{DEFAULTSORT:Latent Variable
Social research
Bayesian networks
Econometric modeling
Latent variable
Psychometrics
de:Latente Variable