In
statistics
Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
, bivariate data is
data
In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted ...
on each of two
variables, where each value of one of the variables is paired with a value of the other variable. Typically it would be of interest to investigate the possible association between the two variables. The association can be studied via a tabular or graphical display, or via sample statistics which might be used for inference. The method used to investigate the association would depend on the
level of measurement
Level of measurement or scale of measure is a classification that describes the nature of information within the values assigned to variables. Psychologist Stanley Smith Stevens developed the best-known classification with four levels, or scal ...
of the variable. This association that involves exactly two variables can be termed a bivariate correlation, or bivariate association.
For two quantitative variables (interval or ratio in
level of measurement
Level of measurement or scale of measure is a classification that describes the nature of information within the values assigned to variables. Psychologist Stanley Smith Stevens developed the best-known classification with four levels, or scal ...
) a scatterplot can be used and a
correlation coefficient
A correlation coefficient is a numerical measure of some type of correlation, meaning a statistical relationship between two variables. The variables may be two columns of a given data set of observations, often called a sample, or two components ...
or
regression model can be used to quantify the association.
For two qualitative variables (nominal or ordinal in
level of measurement
Level of measurement or scale of measure is a classification that describes the nature of information within the values assigned to variables. Psychologist Stanley Smith Stevens developed the best-known classification with four levels, or scal ...
) a
contingency table
In statistics, a contingency table (also known as a cross tabulation or crosstab) is a type of table in a matrix format that displays the (multivariate) frequency distribution of the variables. They are heavily used in survey research, business i ...
can be used to view the data, and a measure of association or a test of independence could be used.
If the variables are quantitative, the pairs of values of these two variables are often represented as individual points in a
plane
Plane(s) most often refers to:
* Aero- or airplane, a powered, fixed-wing aircraft
* Plane (geometry), a flat, 2-dimensional surface
Plane or planes may also refer to:
Biology
* Plane (tree) or ''Platanus'', wetland native plant
* ''Planes' ...
using a
scatter plot
A scatter plot (also called a scatterplot, scatter graph, scatter chart, scattergram, or scatter diagram) is a type of plot or mathematical diagram using Cartesian coordinates to display values for typically two variables for a set of data. ...
. This is done so that the relationship (if any) between the variables is easily seen. For example, bivariate data on a scatter plot could be used to study the relationship between stride length and length of legs.
In a bivariate correlation, outliers can be incredibly problematic when they involve both extreme scores on both variables. The best way to look for these outliers is to look at the scatterplots and see if any data points stand out between the variables.
Dependent and independent variables
{{Main, Dependent and independent variables
In some instances of bivariate data, it is determined that one variable influences or determines the second variable, and the terms dependent and independent variables are used to distinguish between the two types of variables. In the above example, the length of a person's legs is the independent variable. The stride length is determined by the length of a person's legs, so it is the dependent variable. Having long legs increases stride length, but increasing stride length will not increase the length of your legs.
Correlation
In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics ...
s between the two variables are determined as strong or weak correlations and are rated on a scale of –1 to 1, where 1 is a perfect direct correlation, –1 is a perfect inverse correlation, and 0 is no correlation. In the case of long legs and long strides, there would be a strong direct correlation.
[Pierce, Rod. (4 Jan 2013). "Correlation". Math Is Fun. Retrieved 7 Aug 2013 from http://www.mathsisfun.com/data/correlation.html]
Analysis of bivariate data
In the analysis of bivariate data, one typically either compares
summary statistics
In descriptive statistics, summary statistics are used to summarize a set of observations, in order to communicate the largest amount of information as simply as possible. Statisticians commonly try to describe the observations in
* a measure of ...
of each of the variables or uses
regression analysis
In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one ...
to find the strength and direction of a specific relationship between the variables. If each variable can only take one of a small number of values, such as only "male" or "female", or only "left-handed" or "right-handed", then the
joint frequency distribution can be displayed in a
contingency table
In statistics, a contingency table (also known as a cross tabulation or crosstab) is a type of table in a matrix format that displays the (multivariate) frequency distribution of the variables. They are heavily used in survey research, business i ...
, which can be analyzed for the strength of the relationship between the two variables.
References
Statistical data types