![Bagplot](https://upload.wikimedia.org/wikipedia/commons/9/9f/Bagplot.png)
A bagplot, or starburst plot,
is a method in
robust statistics
Robust statistics are statistics with good performance for data drawn from a wide range of probability distributions, especially for distributions that are not normal. Robust statistical methods have been developed for many common problems, suc ...
for visualizing
two- or three-dimensional statistical data, analogous to the one-dimensional
box plot
In descriptive statistics, a box plot or boxplot is a method for graphically demonstrating the locality, spread and skewness groups of numerical data through their quartiles. In addition to the box on a box plot, there can be lines (which are ca ...
. Introduced in 1999 by
Rousseuw et al., the bagplot allows one to visualize the location, spread,
skewness
In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined.
For a unimodal d ...
, and
outliers
In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter are ...
of a data set.
Construction
The bagplot consists of three nested
polygon
In geometry, a polygon () is a plane figure that is described by a finite number of straight line segments connected to form a closed ''polygonal chain'' (or ''polygonal circuit''). The bounded plane region, the bounding circuit, or the two toge ...
s, called the "bag", the "fence", and the "loop".
*The inner polygon, called the ''bag'', is constructed on the basis of
Tukey depth
In computational geometry, the Tukey depth is a measure of the depth of a point in a fixed set of points. The concept is named after its inventor, John Tukey. Given a set of points P in ''d''-dimensional space, a point ''p'' has Tukey depth ''k' ...
, the smallest number of observations that can be contained by a
half-plane that also contains a given point.
It contains at most 50% of the data points
*The outermost of the three polygons, called the ''fence'' is not drawn as part of the bagplot, but is used to construct it. It is formed by inflating the bag by a certain factor (usually 3). Observations outside the fence are flagged as
outlier
In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter are ...
s.
*The observations that are not marked as outliers are surrounded by a ''loop'', the
convex hull
In geometry, the convex hull or convex envelope or convex closure of a shape is the smallest convex set that contains it. The convex hull may be defined either as the intersection of all convex sets containing a given subset of a Euclidean space ...
of the observations within the fence.
An asterisk symbol (*) near the center of the graph is used to mark the depth median, the point with the highest possible Tukey depth. The observations between the bag and fence are marked by line segments, on a line to the depth median, connecting them to the bag.
The three-dimensional version consists of an inner and outer bag.
The outer bag must be drawn in transparent colors so that the inner bag remains visible.
Properties
The bagplot is invariant under
affine transformation
In Euclidean geometry, an affine transformation or affinity (from the Latin, ''affinis'', "connected with") is a geometric transformation that preserves lines and parallelism, but not necessarily Euclidean distances and angles.
More generally, ...
s of the plane, and robust against outliers.
References
{{Statistics, descriptive
Robust statistics
Statistical charts and diagrams
Statistical outliers