Functional Boxplot
   HOME

TheInfoList



OR:

In
statistical graphics Statistical graphics, also known as statistical graphical techniques, are graphics used in the field of statistics for data visualization. Overview Whereas statistics and data analysis procedures generally yield their output in numeric or tabul ...
, the functional boxplot is an informative exploratory tool that has been proposed for visualizing functional data. Analogous to the classical
boxplot In descriptive statistics, a box plot or boxplot is a method for graphically demonstrating the locality, spread and skewness groups of numerical data through their quartiles. In addition to the box on a box plot, there can be lines (which are cal ...
, the descriptive statistics of a functional boxplot are: the envelope of the 50% central region, the median curve and the maximum non-outlying envelope. To construct a functional boxplot, data ordering is the first step. In
functional data analysis Functional data analysis (FDA) is a branch of statistics that analyses data providing information about curves, surfaces or anything else varying over a continuum. In its most general form, under an FDA framework, each sample element of functional ...
, each observation is a real function, therefore, different from the classical boxplot where data are simply ordered from the smallest sample value to the largest, in a functional boxplot, functional data, e.g. curves or images, are ordered by a notion of band depth or a modified band depth. It allows for ordering functional data from the center outwards and, thus, introduces a measure to define functional quantiles and the centrality or outlyingness of an observation. Having the ranks of functional data, the functional boxplot is a natural extension of the classical boxplot.


Construction

In the classical boxplot, the box itself represents the middle 50% of the data. Since the data ordering in the functional boxplot is from the center outwards, the 50% central region is defined by the band delimited by the 50% of deepest, or the most central observations. The border of the 50% central region is defined as the envelope representing the box in a classical boxplot. Thus, this 50% central region is the analog to the " interquartile range" (IQR) and gives a useful indication of the spread of the central 50% of the curves. This is a robust range for interpretation because the 50% central region is not affected by outliers or extreme values, and gives a less biased visualization of the curves' spread. The observation in the box indicates the
median In statistics and probability theory, the median is the value separating the higher half from the lower half of a data sample, a population, or a probability distribution. For a data set, it may be thought of as "the middle" value. The basic fe ...
, or the most central observation which is also a robust statistic to measure centrality. The "whiskers" of the boxplot are the vertical lines of the plot extending from the box and indicating the maximum envelope of the dataset except the outliers.


Outlier detection

Outliers can be detected in a functional boxplot by the 1.5 times the 50% central region empirical rule, analogous to the 1.5 IQR empirical rule for classical boxplots. The fences are obtained by inflating the envelope of the 50% central region by 1.5 times the height of the 50% central region. Any observations outside the fences are flagged as potential outliers. When each observation is simply a point, the functional boxplot degenerates to a classical boxplot, and it is different from the pointwise boxplots.


Enhanced functional boxplot

By introducing the concept of central regions, the functional boxplot can be generalized to an enhanced functional boxplot where the 25% and 75% central regions are provided as well.


Surface boxplot

Spatio-temporal data can be viewed as a temporal curve at each spatial location, or a spatial surface at each time. In the latter case, a volume-based surface band depth can be used to order sample surfaces and leads to a three-dimensional surface boxplot with similar characteristics as the functional boxplots. Similarly, the fences are obtained by the 1.5 times the 50% central region rule. Any surface outside the fences are flagged as outlier candidates. The surface boxplot is a natural extension of the functional boxplot to R3.


Examples

File:sstcurve.jpg, Data of monthly sea surface temperatures (SST) measured in degrees Celsius over the east-central tropical Pacific Ocean from 1951 to 2007. File:sstfbplot.jpg, The functional boxplot of SST with blue curves denoting envelopes, and a black curve representing the median curve. The red dashed curves are the outlier candidates detected by the 1.5 times the 50% central region rule. File:sstenhance.jpg, The enhanced functional boxplot of SST with dark magenta denoting the 25% central region, magenta representing the 50% central region and pink indicating the 75% central region. File:sstpoint.jpg, The pointwise boxplots of SST with medians connected by a black line. File:splottrans.jpg, The surface boxplot with the box in the middle representing the 50% central region in R3, the middle surface inside the box denoting the median surface, and the upper and lower surfaces indicating the maximum non-outlying envelope.


Statistics code

The command fbplot for functional boxplots is in fda R package, and MATLAB code is also available. The Python library statsmodels makes functional boxplots available via the fboxplot function. One could also use the boxplot function i
scikit-fda
package.


See also

*
Boxplot In descriptive statistics, a box plot or boxplot is a method for graphically demonstrating the locality, spread and skewness groups of numerical data through their quartiles. In addition to the box on a box plot, there can be lines (which are cal ...
*
Bagplot A bagplot, or starburst plot, is a method in robust statistics for visualizing two- or three-dimensional statistical data, analogous to the one-dimensional box plot. Introduced in 1999 by Rousseuw et al., the bagplot allows one to visualize the l ...


References

{{reflist Statistical charts and diagrams