In
statistical graphics
Statistical graphics, also known as statistical graphical techniques, are graphics used in the field of statistics for data visualization.
Overview
Whereas statistics and data analysis procedures generally yield their output in numeric or ta ...
and
scientific visualization
Scientific visualization ( also spelled scientific visualisation) is an interdisciplinary branch of science concerned with the visualization of scientific phenomena. Michael Friendly (2008)"Milestones in the history of thematic cartography, st ...
, the contour boxplot
is an exploratory tool that has been proposed for visualizing ensembles of feature-sets determined by a threshold on some scalar function (e.g.
level-sets,
isocontours). Analogous to the classical boxplot and considered an expansion of the concepts defining
functional boxplot In statistical graphics, the functional boxplot is an informative exploratory tool that has been proposed for visualizing functional data. Analogous to the classical boxplot, the descriptive statistics of a functional boxplot are: the envelope of ...
,
the
descriptive statistics
A descriptive statistic (in the count noun sense) is a summary statistic that quantitatively describes or summarizes features from a collection of information, while descriptive statistics (in the mass noun sense) is the process of using and a ...
of a contour boxplot are: the envelope of the 50% central region, the median curve and the maximum non-outlying envelope.
To construct a contour boxplot, data ordering is the first step. In
functional data analysis Functional data analysis (FDA) is a branch of statistics that analyses data providing information about curves, surfaces or anything else varying over a continuum. In its most general form, under an FDA framework, each sample element of functional ...
, each observation is a real function, therefore data ordering is different from the classical boxplot where scalar data are simply ordered from the smallest sample value to the largest. More generally, data depth, gives a center-outward ordering of data points, and thereby provides a mechanism for constructing rank statistics of various kinds of multidimensional data. For instance, functional data examples can be ordered using the method of band depth or a modified band depth. In contour data analysis, each observation is a feature-set (a subset of the domain), and therefore not a function. Thus, the notion of band depth and modified band depth is further extended to accommodate features that can be expressed as sets but not necessarily as functions. Contour band depth allows for ordering feature-set data from the center outwards and, thus, introduces a measure to define functional quantiles and the centrality or outlyingness of an observation. Having the ranks of feature-set data, the contour boxplot is a natural extension of the classical boxplot which in special cases reduces back to the traditional functional boxplot.
Set/contour band depth
Set band depth (introduced in
), denoted as sBD, is a method for establishing a center-outward ordering of a collection of sets. As with other band depth, data ordering methods, set band depth, computes the probability of whether a sample lies in the band formed by ''j'' other samples from the distribution. We say that a set ''S'' ∈ ''E'' is an element of the band of a collection of ''j'' other sets ''S''
1, ..., ''S''
''j'' ∈ ''E'' if it is bounded by their union and intersection. That is:
:
The set band depth is the sum of probabilities of lying in bands formed by different numbers of samples (2, ..., ''J'').
:
Set band depth is shown to be a generalization of function band depth. Set band depth has a modified form that is derived from a relaxed form of subset, which requires only a percentage of a set to be included in another.
Contour band depth (cBD) is a direct application of sBD, where the sets are derived from thresholded input functions, ''F''(''x'') > ''q''. In this way, an ensemble of scalar input functions and a threshold value, gives rise to a collection of contours, and sorting cBD gives a data-depth ordering (highest-to-lowest probability gives greatest-to-smallest depth) of those contours. By relying on the set formulation, contour boxplots avoid any explicit correspondence of points on different contours.
Contour boxplot construction
In the classical boxplot, the box itself represents the middle 50% of the data. Since the data ordering in the contour boxplot is from the center outwards, the 50% central region is defined by the band delimited by the 50% of deepest, or the most central observations. The border of the 50% central region is defined as the envelope representing the box in a classical boxplot. Thus, this 50% central region is the analog to the
interquartile range
In descriptive statistics, the interquartile range (IQR) is a measure of statistical dispersion, which is the spread of the data. The IQR may also be called the midspread, middle 50%, fourth spread, or H‑spread. It is defined as the difference ...
(IQR) and gives a useful indication of the spread of the central 50% of the curves. This is a robust range for interpretation because the 50% central region is not affected by outliers or extreme values, and gives a less biased visualization of the curves' spread. The observation in the box indicates the
median, or the most central observation which is also a robust statistic to measure centrality.
The "whiskers" of the boxplot are the vertical lines of the plot extending from the box and indicating the maximum envelope of the dataset except the outliers. In contour boxplots, this is formed by considering the difference of the union and intersection formed by all non-outlying samples. Outliers are determined as having a cBD value that is less than some multiplier (less than one) times the cBD of the 50% ranked samples.
Examples
The following example is an ensemble of data from 2D incompressible Navier–Stokes simulation consisting of 40 members, where each ensemble member is a simulation with Reynolds number and inlet velocity chosen randomly. The inlet velocity values are randomly drawn from a normal distribution with mean value of 1 and standard deviation of ±0.01 (in non-dimensionalized units); likewise, Reynolds numbers are generated from a normal distribution with mean value of 130 and standard deviation of ±3.
Controur Boxplots Fluid Simulation ensemble.jpg, A set of contours resulting from different parameter values of a 2D fluid simulation (flow past a cylinder, left to right)
Controur Boxplots Fluid Simulation median, order statistics, and outliers.jpg, Contour boxplots give a median, order statistics, and outliers (rendered over a line integral convolution
In scientific visualization, line integral convolution (LIC) is a method to visualize a vector field, such as fluid motion.
Features
* global method
* integration-based method
* texture-based method
Convolution
In signal processing this pr ...
visualization of the flow).
Controur Boxplots Fluid Simulation average standard deviation.jpg, Averages and ±1 standard deviation of the fields gives results that are not geometrically correct (consistent with no simulation) and misestimates the possible positions of contours (rendered over a line integral convolution
In scientific visualization, line integral convolution (LIC) is a method to visualize a vector field, such as fluid motion.
Features
* global method
* integration-based method
* texture-based method
Convolution
In signal processing this pr ...
visualization of the flow).
The example below is from an ensemble of publicly available data from the
National Oceanic and Atmospheric Administration
The National Oceanic and Atmospheric Administration (abbreviated as NOAA ) is an United States scientific and regulatory agency within the United States Department of Commerce that forecasts weather, monitors oceanic and atmospheric conditi ...
(NOAA)
The ensemble data are formed through different runs of a simulation model with different perturbations of the initial conditions to account for the errors in the initial conditions and/or model parameterizations. The ensemble consists of isocontours of the temperature field (isovalue −15C) at 500mb in altitude.
Contour Boxplots Weather ensemble.jpg, An ensemble of isocontours from the NOAA web site, shown as a "spaghetti plot".
Controur Boxplots Weather Visualization median, order statistics, and outliers.jpg, Contour boxplots of the weather data ensemble give a median, order statistics, and outliers.
Controur Boxplots Weather Visualization average standard deviation.jpg, Averages and ±1 standard deviation of the temperature fields associate with the NOAA forecast weather ensemble.
See also
*
Boxplot
In descriptive statistics, a box plot or boxplot is a method for graphically demonstrating the locality, spread and skewness groups of numerical data through their quartiles. In addition to the box on a box plot, there can be lines (which are cal ...
*
Functional boxplot In statistical graphics, the functional boxplot is an informative exploratory tool that has been proposed for visualizing functional data. Analogous to the classical boxplot, the descriptive statistics of a functional boxplot are: the envelope of ...
References
{{reflist
Statistical charts and diagrams