, a random variable, random quantity, aleatory variable, or stochastic variable is described informally as a variable whose values depend
of a random
The formal mathematical treatment of random variables is a topic in probability theory
. In that context, a random variable is understood as a measurable function
defined on a probability space
that maps from the sample space
to the real number
A random variable's possible values might represent the possible outcomes of a yet-to-be-performed experiment, or the possible outcomes of a past experiment whose already-existing value is uncertain (for example, because of imprecise measurements or quantum uncertainty
They may also conceptually represent either the results of an "objectively" random process (such as rolling a die) or the "subjective" randomness that results from incomplete knowledge of a quantity. The meaning of the probabilities assigned to the potential values of a random variable is not part of probability theory itself, but is instead related to philosophical arguments over the interpretation of probability
. The mathematics works the same regardless of the particular interpretation in use.
As a function, a random variable is required to be measurable
, which allows for probabilities to be assigned to sets of its potential values. It is common that the outcomes depend on some physical variables that are not predictable. For example, when tossing a fair coin, the final outcome of heads or tails depends on the uncertain physical conditions, so the outcome being observed is uncertain. The coin could get caught in a crack in the floor, but such a possibility is excluded from consideration.
of a random variable is called a ''sample space,'' defined as the set of possible outcomes of a non-deterministic event. For example, in the event of a coin toss, only two possible outcomes are possible: heads or tails.
A random variable has a probability distribution
, which specifies the probability of Borel subset
s of its range. Random variables can be discrete
, that is, taking any of a specified finite or countable list
of values (having a countable range), endowed with a probability mass function
that is characteristic of the random variable's probability distribution; or continuous
, taking any numerical value in an interval or collection of intervals (having an uncountable
range), via a probability density function
that is characteristic of the random variable's probability distribution; or a mixture of both.
Two random variables with the same probability distribution can still differ in terms of their associations with, or independence
from, other random variables. The realizations of a random variable, that is, the results of randomly choosing values according to the variable's probability distribution function, are called random variate
Although the idea was originally introduced by Christiaan Huygens
, the first person to think systematically in terms of random variables was Pafnuty Chebyshev
A random variable is a measurable function
from a set of possible outcome
to a measurable space
. The technical axiomatic definition requires
to be a sample space of a probability triple
(see the measure-theoretic definition
). A random variable is often denoted by capital roman letters
The probability that
takes on a value in a measurable set
is written as
In many cases,
. In some contexts, the term random element
) is used to denote a random variable not of this form.
When the image
(or range) of
, the random variable is called a discrete random variable
and its distribution is a discrete probability distribution
, i.e. can be described by a probability mass function
that assigns a probability to each value in the image of
. If the image is uncountably infinite (usually an interval
is called a continuous random variable. In the special case that it is absolutely continuous
, its distribution can be described by a probability density function
, which assigns probabilities to intervals; in particular, each individual point must necessarily have probability zero for an absolutely continuous random variable. Not all continuous random variables are absolutely continuous, a mixture distribution
is one such counterexample; such random variables cannot be described by a probability density or a probability mass function.
Any random variable can be described by its cumulative distribution function
, which describes the probability that the random variable will be less than or equal to a certain value.
The term "random variable" in statistics is traditionally limited to the real-valued
). In this case, the structure of the real numbers makes it possible to define quantities such as the expected value
of a random variable, its cumulative distribution function
, and the moment
s of its distribution.
However, the definition above is valid for any measurable space
of values. Thus one can consider random elements of other sets
, such as random boolean value
s, categorical value
s, complex numbers
s, and function
s. One may then specifically refer to a ''random variable of type
'', or an ''
-valued random variable''.
This more general concept of a random element
is particularly useful in disciplines such as graph theory
, machine learning
, natural language processing
, and other fields in discrete mathematics
and computer science
, where one is often interested in modeling the random variation of non-numerical data structure
s. In some cases, it is nonetheless convenient to represent each element of
, using one or more real numbers. In this case, a random element may optionally be represented as a vector of real-valued random variables
(all defined on the same underlying probability space
, which allows the different random variables to covary
). For example:
*A random word may be represented as a random integer that serves as an index into the vocabulary of possible words. Alternatively, it can be represented as a random indicator vector, whose length equals the size of the vocabulary, where the only values of positive probability are
and the position of the 1 indicates the word.
*A random sentence of given length
may be represented as a vector of
*A random graph
given vertices may be represented as a
matrix of random variables, whose values specify the adjacency matrix
of the random graph.
*A random function
may be represented as a collection of random variables
, giving the function's values at the various points
in the function's domain. The
are ordinary real-valued random variables provided that the function is real-valued. For example, a stochastic process
is a random function of time, a random vector
is a random function of some index set such as
, and random field
is a random function on any set (typically time, space, or a discrete set).
If a random variable
defined on the probability space
is given, we can ask questions like "How likely is it that the value of
is equal to 2?". This is the same as the probability of the event
which is often written as
Recording all these probabilities of output ranges of a real-valued random variable
yields the probability distribution
. The probability distribution "forgets" about the particular probability space used to define
and only records the probabilities of various values of
. Such a probability distribution can always be captured by its cumulative distribution function
and sometimes also using a probability density function
. In measure-theoretic
terms, we use the random variable
to "push-forward" the measure
to a measure
The underlying probability space
is a technical device used to guarantee the existence of random variables, sometimes to construct them, and to define notions such as correlation and dependence
based on a joint distribution
of two or more random variables on the same probability space. In practice, one often disposes of the space
altogether and just puts a measure on
that assigns measure 1 to the whole real line, i.e., one works with probability distributions instead of random variables. See the article on quantile function
s for fuller development.
Discrete random variable
In an experiment a person may be chosen at random, and one random variable may be the person's height. Mathematically, the random variable is interpreted as a function which maps the person to the person's height. Associated with the random variable is a probability distribution that allows the computation of the probability that the height is in any subset of possible values, such as the probability that the height is between 180 and 190 cm, or the probability that the height is either less than 150 or more than 200 cm.
Another random variable may be the person's number of children; this is a discrete random variable with non-negative integer values. It allows the computation of probabilities for individual integer values – the probability mass function (PMF) – or for sets of values, including infinite sets. For example, the event of interest may be "an even number of children". For both finite and infinite event sets, their probabilities can be found by adding up the PMFs of the elements; that is, the probability of an even number of children is the infinite sum
In examples such as these, the sample space
is often suppressed, since it is mathematically hard to describe, and the possible values of the random variables are then treated as a sample space. But when two random variables are measured on the same sample space of outcomes, such as the height and number of children being computed on the same random persons, it is easier to track their relationship if it is acknowledged that both height and number of children come from the same random person, for example so that questions of whether such random variables are correlated or not can be posed.
are countable sets of real numbers,
is a discrete distribution function. Here
. Taking for instance an enumeration of all rational numbers as
, one gets a discrete distribution function that is not a step function or piecewise constant.
The possible outcomes for one coin toss can be described by the sample space
. We can introduce a real-valued random variable
that models a $1 payoff for a successful bet on heads as follows:
If the coin is a fair coin
, ''Y'' has a probability mass function
A random variable can also be used to describe the process of rolling dice and the possible outcomes. The most obvious representation for the two-dice case is to take the set of pairs of numbers ''n''1
from (representing the numbers on the two dice) as the sample space. The total number rolled (the sum of the numbers in each pair) is then a random variable ''X'' given by the function that maps the pair to the sum:
and (if the dice are fair
) has a probability mass function ''ƒ''''X''
Continuous random variable
Formally, a continuous random variable is a random variable whose cumulative distribution function
There are no "gaps
", which would correspond to numbers which have a finite probability of occurring
. Instead, continuous random variables almost never
take an exact prescribed value ''c'' (formally,
) but there is a positive probability that its value will lie in particular intervals
which can be arbitrarily small
. Continuous random variables usually admit probability density function
s (PDF), which characterize their CDF and probability measure
such distributions are also called absolutely continuous
; but some continuous distributions are singular
, or mixes of an absolutely continuous part and a singular part.
An example of a continuous random variable would be one based on a spinner that can choose a horizontal direction. Then the values taken by the random variable are directions. We could represent these directions by North, West, East, South, Southeast, etc. However, it is commonly more convenient to map the sample space to a random variable which takes values which are real numbers. This can be done, for example, by mapping a direction to a bearing in degrees clockwise from North. The random variable then takes values which are real numbers from the interval [0, 360), with all parts of the range being "equally likely". In this case, ''X'' = the angle spun. Any real number has probability zero of being selected, but a positive probability can be assigned to any ''range'' of values. For example, the probability of choosing a number in [0, 180] is . Instead of speaking of a probability mass function, we say that the probability ''density'' of ''X'' is 1/360. The probability of a subset of [0, 360) can be calculated by multiplying the measure of the set by 1/360. In general, the probability of a set for a given continuous random variable can be calculated by integrating the density over the given set.
More formally, given any interval
, a random variable