A scatter plot (also called a scatterplot, scatter graph, scatter chart, scattergram, or scatter diagram) is a type of
plot
Plot or Plotting may refer to:
Art, media and entertainment
* Plot (narrative), the story of a piece of fiction
Music
* ''The Plot'' (album), a 1976 album by jazz trumpeter Enrico Rava
* The Plot (band), a band formed in 2003
Other
* ''Plot' ...
or
mathematical diagram
Mathematical diagrams, such as charts and graphs, are mainly designed to convey mathematical relationships—for example, comparisons over time.
Specific types of mathematical diagrams
Argand diagram
A complex number can be visually repres ...
using
Cartesian coordinates
A Cartesian coordinate system (, ) in a plane is a coordinate system that specifies each point uniquely by a pair of numerical coordinates, which are the signed distances to the point from two fixed perpendicular oriented lines, measured in t ...
to display values for typically two
variables for a set of data. If the points are coded (color/shape/size), one additional variable can be displayed.
The data are displayed as a collection of points, each having the value of one variable determining the position on the horizontal axis and the value of the other variable determining the position on the
vertical axis
A Cartesian coordinate system (, ) in a plane (geometry), plane is a coordinate system that specifies each point (geometry), point uniquely by a pair of number, numerical coordinates, which are the positive and negative numbers, signed distance ...
.
Overview
A scatter plot can be used either when one continuous variable is under the control of the experimenter and the other depends on it or when both continuous variables are independent. If a
parameter
A parameter (), generally, is any characteristic that can help in defining or classifying a particular system (meaning an event, project, object, situation, etc.). That is, a parameter is an element of a system that is useful, or critical, when ...
exists that is systematically incremented and/or decremented by the other, it is called the ''control parameter'' or
independent variable
Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or demand ...
and is customarily plotted along the horizontal axis. The measured or
dependent variable
Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or demand ...
is customarily plotted along the vertical axis. If no dependent variable exists, either type of variable can be plotted on either axis and a scatter plot will illustrate only the degree of
correlation
In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics ...
(not
causation) between two variables.
A scatter plot can suggest various kinds of correlations between variables with a certain
confidence interval
In frequentist statistics, a confidence interval (CI) is a range of estimates for an unknown parameter. A confidence interval is computed at a designated ''confidence level''; the 95% confidence level is most common, but other levels, such as 9 ...
. For example, weight and height would be on the -axis, and height would be on the -axis. Correlations may be positive (rising), negative (falling), or null (uncorrelated). If the dots' pattern slopes from lower left to upper right, it indicates a positive
correlation
In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics ...
between the variables being studied. If the pattern of dots slopes from upper left to lower right, it indicates a negative correlation. A line of
best fit
Curve fitting is the process of constructing a curve, or mathematical function, that has the best fit to a series of data points, possibly subject to constraints. Curve fitting can involve either interpolation, where an exact fit to the data is ...
(alternatively called 'trendline') can be drawn to study the relationship between the variables. An equation for the correlation between the variables can be determined by established best-fit procedures. For a linear correlation, the best-fit procedure is known as
linear regression
In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is call ...
and is guaranteed to generate a correct solution in a finite time. No universal best-fit procedure is guaranteed to generate a correct solution for arbitrary relationships. A scatter plot is also very useful when we wish to see how two comparable data sets agree to show nonlinear relationships between variables. The ability to do this can be enhanced by adding a smooth line such as
LOESS
Loess (, ; from german: Löss ) is a clastic, predominantly silt-sized sediment that is formed by the accumulation of wind-blown dust. Ten percent of Earth's land area is covered by loess or similar deposits.
Loess is a periglacial or aeolian ...
. Furthermore, if the data are represented by a mixture model of simple relationships, these relationships will be visually evident as superimposed patterns.
The scatter diagram is one of the
seven basic tools of
quality control
Quality control (QC) is a process by which entities review the quality of all factors involved in production. ISO 9000 defines quality control as "a part of quality management focused on fulfilling quality requirements".
This approach places ...
.
Scatter charts can be built in the form of
bubble
Bubble, Bubbles or The Bubble may refer to:
Common uses
* Bubble (physics), a globule of one substance in another, usually gas in a liquid
** Soap bubble
* Economic bubble, a situation where asset prices are much higher than underlying fundame ...
, marker, or/and
line chart
A line chart or line graph or curve chart is a type of chart which displays information as a series of data points called 'markers' connected by straight line segments. It is a basic type of chart common in many fields. It is similar to a s ...
s.
Example
For example, to display a link between a person's lung capacity, and how long that person could hold their breath, a researcher would choose a group of people to study, then measure each one's lung capacity (first variable) and how long that person could hold their breath (second variable). The researcher would then plot the data in a scatter plot, assigning "lung capacity" to the horizontal axis, and "time holding breath" to the vertical axis.
A person with a lung capacity of who held their breath for would be represented by a single dot on the scatter plot at the point (400, 21.7) in the
Cartesian coordinates
A Cartesian coordinate system (, ) in a plane is a coordinate system that specifies each point uniquely by a pair of numerical coordinates, which are the signed distances to the point from two fixed perpendicular oriented lines, measured in t ...
. The scatter plot of all the people in the study would enable the researcher to obtain a visual comparison of the two variables in the data set and will help to determine what kind of relationship there might be between the two variables.
Scatter plot matrices
For a set of data variables (dimensions) X
1, X
2, ... , X
k, the scatter plot matrix shows all the pairwise scatter plots of the variables on a single view with multiple scatterplots in a matrix format. For variables, the scatterplot matrix will contain rows and columns. A plot located on the intersection of row and th column is a plot of variables X
i versus X
j. This means that each row and column is one dimension, and each cell plots a scatter plot of two dimensions.
A generalized scatter plot matrix
offers a range of displays of paired combinations of categorical and quantitative variables. A
mosaic plot
A mosaic is a pattern or image made of small regular or irregular pieces of colored stone, glass or ceramic, held in place by plaster/mortar, and covering a surface. Mosaics are often used as floor and wall decoration, and were particularly pop ...
,
fluctuation diagram
Fluctuation may refer to:
Physics and mathematics
* Statistical fluctuations, in statistics, statistical mechanics, and thermodynamics
** Thermal fluctuations, statistical fluctuations in a thermodynamic variable
* Quantum fluctuation, arising fr ...
, or faceted
bar chart
A bar chart or bar graph is a chart or graph that presents categorical data with rectangular bars with heights or lengths proportional to the values that they represent. The bars can be plotted vertically or horizontally. A vertical bar chart is ...
may be used to display two categorical variables. Other plots are used for one categorical and one quantitative variables.
See also
*
Rug plot
Rug or RUG may refer to:
* Rug, or carpet, a textile floor covering
* Rug, slang for a toupée
* Ghent University (''Rijksunversiteit Gent'', or RUG)
* Really Useful Group, or RUG, a company set up by Andrew Lloyd Webber
* Rugby railway station, N ...
*
Bar graph
A bar chart or bar graph is a chart or graph that presents categorical data with rectangular bars with heights or lengths proportional to the values that they represent. The bars can be plotted vertically or horizontally. A vertical bar chart is ...
*
Line chart
A line chart or line graph or curve chart is a type of chart which displays information as a series of data points called 'markers' connected by straight line segments. It is a basic type of chart common in many fields. It is similar to a s ...
*
Scagnostics
Scagnostics (scatterplot diagnostics) refers to a series of measures that characterize certain properties of a point cloud in a scatter plot. The term and idea was coined by John Tukey and Paul Tukey
Paul may refer to:
*Paul (given name), a giv ...
References
External links
*
What is a scatterplot?Correlation scatter-plot matrix for ordered-categorical data– Explanation and R code
Density scatterplot for large datasets(hundreds of millions of points)
{{Statistics, descriptive
Statistical charts and diagrams
Quality control tools