Univariate is a term commonly used in statistics to describe a type of data which consists of observations on only a single characteristic or attribute. A simple example of univariate data would be the salaries of workers in industry. Like all the other data, univariate data can be visualized using graphs, images or other analysis tools after the data is measured, collected, reported, and analyzed.
Univariate data types
Some univariate data consists of numbers (such as the height of 65 inches or the weight of 100 pounds), while others are nonnumerical (such as eye colors of brown or blue). Generally, the terms
categorical univariate data and
numerical univariate data are used to distinguish between these types.
Categorical univariate data
Categorical univariate data consists of non-numerical
observations that may be placed in categories. It includes labels or names used to identify an attribute of each element. Categorical univariate data usually use either
nominal
Nominal may refer to:
Linguistics and grammar
* Nominal (linguistics), one of the parts of speech
* Nominal, the adjectival form of "noun", as in "nominal agreement" (= "noun agreement")
* Nominal sentence, a sentence without a finite verb
* Nou ...
or
ordinal scale of measurement.
Numerical univariate data
Numerical univariate data consists of observations that are numbers. They are obtained using either
interval or
ratio
In mathematics, a ratio shows how many times one number contains another. For example, if there are eight oranges and six lemons in a bowl of fruit, then the ratio of oranges to lemons is eight to six (that is, 8:6, which is equivalent to the ...
scale of measurement. This type of univariate data can be classified even further into two subcategories:
discrete
Discrete may refer to:
*Discrete particle or quantum in physics, for example in quantum theory
*Discrete device, an electronic component with just one circuit element, either passive or active, other than an integrated circuit
*Discrete group, a g ...
and
continuous
Continuity or continuous may refer to:
Mathematics
* Continuity (mathematics), the opposing concept to discreteness; common examples include
** Continuous probability distribution or random variable in probability and statistics
** Continuous ...
. A numerical univariate data is discrete if the set of all possible values is
finite
Finite is the opposite of infinite. It may refer to:
* Finite number (disambiguation)
* Finite set, a set whose cardinality (number of elements) is some natural number
* Finite verb, a verb form that has a subject, usually being inflected or marke ...
or countably
infinite
Infinite may refer to:
Mathematics
* Infinite set, a set that is not a finite set
*Infinity, an abstract concept describing something without any limit
Music
*Infinite (group), a South Korean boy band
*''Infinite'' (EP), debut EP of American m ...
. Discrete univariate data are usually associated with counting (such as the number of books read by a person). A numerical univariate data is continuous if the set of all possible values is an interval of numbers. Continuous univariate data are usually associated with measuring (such as the weights of people).
Data analysis and applications
Univariate analysis is the simplest form of analyzing data. Uni means one, so in other words the data has only one variable.
Univariate data requires to analyze each
variable
Variable may refer to:
* Variable (computer science), a symbolic name associated with a value and whose associated value may be changed
* Variable (mathematics), a symbol that represents a quantity in a mathematical expression, as used in many ...
separately. Data is gathered for the purpose of answering a question, or more specifically, a research question. Univariate data does not answer research questions about relationships between variables, but rather it is used to describe one characteristic or attribute that varies from observation to observation.
Usually there are two purposes that a researcher can look for. The first one is to answer a research question with descriptive study and the second one is to get knowledge about how
attribute
Attribute may refer to:
* Attribute (philosophy), an extrinsic property of an object
* Attribute (research), a characteristic of an object
* Grammatical modifier, in natural languages
* Attribute (computing), a specification that defines a prope ...
varies with individual effect of a variable in
Regression analysis
In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one ...
. There are some ways to describe patterns found in univariate data which include graphical methods, measures of central tendency and measures of variability.
Graphical methods
The most frequently used graphical illustrations for univariate data are:
Frequency distribution tables
Frequency is how many times a number occurs. The frequency of an observation in statistics tells us the number of times the observation occurs in the data. For example, in the following list of numbers , the frequency of the number 9 is 5 (because it occurs 5 times in this data set).
Bar charts
Bar chart is a
graph
Graph may refer to:
Mathematics
*Graph (discrete mathematics), a structure made of vertices and edges
**Graph theory, the study of such graphs and their properties
*Graph (topology), a topological space resembling a graph in the sense of discre ...
consisting of
rectangular
In Euclidean plane geometry, a rectangle is a quadrilateral with four right angles. It can also be defined as: an equiangular quadrilateral, since equiangular means that all of its angles are equal (360°/4 = 90°); or a parallelogram containin ...
bars. These bars actually represents
number
A number is a mathematical object used to count, measure, and label. The original examples are the natural numbers 1, 2, 3, 4, and so forth. Numbers can be represented in language with number words. More universally, individual numbers c ...
or percentage of observations of existing categories in a variable. The
length or
height
Height is measure of vertical distance, either vertical extent (how "tall" something or someone is) or vertical position (how "high" a point is).
For example, "The height of that building is 50 m" or "The height of an airplane in-flight is ab ...
of bars gives a visual representation of the proportional differences among categories.
Histograms
Histograms
A histogram is an approximate representation of the distribution of numerical data. The term was first introduced by Karl Pearson. To construct a histogram, the first step is to " bin" (or "bucket") the range of values—that is, divide the en ...
are used to estimate distribution of the data, with the frequency of values assigned to a value range called a
bin.
Pie charts
Pie chart is a circle divided into portions that represent the relative frequencies or percentages of a population or a sample belonging to different categories.
Measures of central tendency
Central tendency is one of the most common numerical descriptive measures. It's used to estimate the central location of the univariate data by the calculation of
mean
There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value (magnitude and sign) of a given data set.
For a data set, the '' ari ...
,
median and
mode
Mode ( la, modus meaning "manner, tune, measure, due measure, rhythm, melody") may refer to:
Arts and entertainment
* '' MO''D''E (magazine)'', a defunct U.S. women's fashion magazine
* ''Mode'' magazine, a fictional fashion magazine which is ...
. Each of these calculations has its own advantages and limitations. The mean has the advantage that its calculation includes each value of the data set, but it is particularly susceptible to the influence of
outliers. The median is a better measure when the data set contains
outliers. The mode is simple to locate. The important thing is that it's not restricted to using only one of these measure of central tendency. If the data being analyzed is categorical, then the only measure of central tendency that can be used is the mode. However, if the data is numerical in nature (
ordinal or
interval/
ratio
In mathematics, a ratio shows how many times one number contains another. For example, if there are eight oranges and six lemons in a bowl of fruit, then the ratio of oranges to lemons is eight to six (that is, 8:6, which is equivalent to the ...
) then the mode, median, or mean can all be used to describe the data. Using more than one of these measures provides a more accurate descriptive summary of central tendency for the univariate.
Measures of variability
A measure of
variability or
dispersion
Dispersion may refer to:
Economics and finance
* Dispersion (finance), a measure for the statistical distribution of portfolio returns
* Price dispersion, a variation in prices across sellers of the same item
*Wage dispersion, the amount of variat ...
(deviation from the mean) of a univariate data set can reveal the shape of a univariate data distribution more sufficiently. It will provide some information about the variation among data values. The measures of variability together with the measures of central tendency give a better picture of the data than the measures of central tendency alone. The three most frequently used measures of variability are
range
Range may refer to:
Geography
* Range (geographic), a chain of hills or mountains; a somewhat linear, complex mountainous or hilly area (cordillera, sierra)
** Mountain range, a group of mountains bordered by lowlands
* Range, a term used to i ...
,
variance
In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...
and
standard deviation. The appropriateness of each measure would depend on the type of data, the shape of the distribution of data and which measure of central tendency are being used. If the data is categorical, then there is no measure of variability to report. For data that is numerical, all three measures are possible. If the distribution of data is symmetrical, then the measures of variability are usually the variance and standard deviation. However, if the data are
skewed
In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined.
For a unimoda ...
, then the measure of variability that would be appropriate for that data set is the range.
Univariate distributions
Univariate distribution In statistics, a univariate distribution is a probability distribution of only one random variable. This is in contrast to a multivariate distribution, the probability distribution of a random vector (consisting of multiple random variables).
Exam ...
is a dispersal type of a single random variable described either with a
probability mass function (pmf) for
discrete probability distribution
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon ...
, or
probability density function
In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) ca ...
(pdf) for
continuous probability distribution
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon ...
.
[{{cite book, last1=Samaniego, first1=Francisco J., title=Stochastic modeling and mathematical statistics : a text for statisticians and quantitative scientists, date=2014, publisher=CRC Press, location=Boca Raton, isbn=978-1-4665-6046-8, page=167] It is not to be confused with
multivariate distribution
Given two random variables that are defined on the same probability space, the joint probability distribution is the corresponding probability distribution on all possible pairs of outputs. The joint distribution can just as well be considered ...
.
Common discrete distributions
Uniform distribution (discrete)
In probability theory and statistics, the discrete uniform distribution is a symmetric probability distribution wherein a finite number of values are equally likely to be observed; every one of ''n'' values has equal probability 1/''n''. Anot ...
Bernoulli distribution
In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli,James Victor Uspensky: ''Introduction to Mathematical Probability'', McGraw-Hill, New York 1937, page 45 is the discrete probabi ...
Binomial distribution
Geometric distribution
In probability theory and statistics, the geometric distribution is either one of two discrete probability distributions:
* The probability distribution of the number ''X'' of Bernoulli trials needed to get one success, supported on the set \;
* ...
Negative binomial distribution
Poisson distribution
In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known co ...
Hypergeometric distribution
In probability theory and statistics, the hypergeometric distribution is a discrete probability distribution that describes the probability of k successes (random draws for which the object drawn has a specified feature) in n draws, ''without'' ...
Zeta distribution
In probability theory and statistics, the zeta distribution is a discrete probability distribution. If ''X'' is a zeta-distributed random variable with parameter ''s'', then the probability that ''X'' takes the integer value ''k'' is given by t ...
Common continuous distributions
Uniform distribution (continuous)
In probability theory and statistics, the continuous uniform distribution or rectangular distribution is a family of symmetric probability distributions. The distribution describes an experiment where there is an arbitrary outcome that lies bet ...
Normal distribution
In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
:
f(x) = \frac e^
The parameter \mu ...
Gamma distribution
In probability theory and statistics, the gamma distribution is a two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-square distribution are special cases of the gamma d ...
Exponential distribution
Weibull distribution
In probability theory and statistics, the Weibull distribution is a continuous probability distribution. It is named after Swedish mathematician Waloddi Weibull, who described it in detail in 1951, although it was first identified by Maurice Re ...
Cauchy distribution
The Cauchy distribution, named after Augustin Cauchy, is a continuous probability distribution. It is also known, especially among physicists, as the Lorentz distribution (after Hendrik Lorentz), Cauchy–Lorentz distribution, Lorentz(ian) fun ...
Beta distribution
In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval , 1in terms of two positive parameters, denoted by ''alpha'' (''α'') and ''beta'' (''β''), that appear as ...
See also
*
Univariate
In mathematics, a univariate object is an expression, equation, function or polynomial involving only one variable. Objects involving more than one variable are multivariate. In some cases the distinction between the univariate and multivariate ...
*
Univariate distribution In statistics, a univariate distribution is a probability distribution of only one random variable. This is in contrast to a multivariate distribution, the probability distribution of a random vector (consisting of multiple random variables).
Exam ...
*
Bivariate analysis
*
Multivariate analysis
Multivariate statistics is a subdivision of statistics encompassing the simultaneous observation and analysis of more than one outcome variable.
Multivariate statistics concerns understanding the different aims and background of each of the dif ...
*
List of probability distributions
Many probability distributions that are important in theory or applications have been given specific names.
Discrete distributions
With finite support
* The Bernoulli distribution, which takes value 1 with probability ''p'' and value 0 with ...
References
Mathematical terminology
Statistical data