HOME

TheInfoList



OR:

Biplots are a type of exploratory graph used in
statistics Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
, a generalization of the simple two-variable
scatterplot A scatter plot (also called a scatterplot, scatter graph, scatter chart, scattergram, or scatter diagram) is a type of plot or mathematical diagram using Cartesian coordinates to display values for typically two variables for a set of data. ...
. A biplot overlays a ''score plot'' with a ''loading plot''. A biplot allows information on both samples and variables of a
data matrix A Data Matrix is a two-dimensional code consisting of black and white "cells" or dots arranged in either a square or rectangular pattern, also known as a matrix. The information to be encoded can be text or numeric data. Usual data size is fro ...
to be displayed graphically. Samples are displayed as points while variables are displayed either as vectors, linear
axes Axes, plural of ''axe'' and of ''axis'', may refer to * ''Axes'' (album), a 2005 rock album by the British band Electrelane * a possibly still empty plot (graphics) See also *Axess (disambiguation) *Axxess (disambiguation) Axxess may refer to: ...
or nonlinear trajectories. In the case of categorical variables, ''category level points'' may be used to represent the levels of a categorical variable. A ''generalised'' biplot displays information on both continuous and categorical variables.


Introduction and history

The biplot was introduced by
K. Ruben Gabriel Kuno Ruben Gabriel (1929–2003) was a statistician known for the inventing the biplot and the Gabriel graph. See in particulapp. 273–274 and for his work in statistical meteorology.
(1971). Gower and Hand (1996) wrote a monograph on biplots. Yan and Kang (2003) described various methods which can be used in order to visualize and interpret a biplot. The book by Greenacre (2010) is a practical user-oriented guide to biplots, along with scripts in the open-source
R programming language R is a programming language for statistical computing and graphics supported by the R Core Team and the R Foundation for Statistical Computing. Created by statisticians Ross Ihaka and Robert Gentleman, R is used among data miners, bioinform ...
, to generate biplots associated with
principal component analysis Principal component analysis (PCA) is a popular technique for analyzing large datasets containing a high number of dimensions/features per observation, increasing the interpretability of data while preserving the maximum amount of information, and ...
(PCA),
multidimensional scaling Multidimensional scaling (MDS) is a means of visualizing the level of similarity of individual cases of a dataset. MDS is used to translate "information about the pairwise 'distances' among a set of n objects or individuals" into a configurati ...
(MDS), log-ratio analysis (LRA)—also known as spectral mapping—
discriminant analysis Linear discriminant analysis (LDA), normal discriminant analysis (NDA), or discriminant function analysis is a generalization of Fisher's linear discriminant, a method used in statistics and other fields, to find a linear combination of features ...
(DA) and various forms of
correspondence analysis Correspondence analysis (CA) is a multivariate statistical technique proposed by Herman Otto Hartley (Hirschfeld) and later developed by Jean-Paul Benzécri. It is conceptually similar to principal component analysis, but applies to categorical rat ...
: simple correspondence analysis (CA), multiple correspondence analysis (MCA) and canonical correspondence analysis (CCA) (Greenacre 2016 Greenacre, M. (2016) ''Correspondence Analysis in Practice. Third Edition''. Chapman and Hall / CRC Press.). The book by Gower, Lubbe and le Roux (2011) aims to popularize biplots as a useful and reliable method for the visualization of multivariate data when researchers want to consider, for example, principal component analysis (PCA), canonical variates analysis (CVA) or various types of correspondence analysis.


Construction

A biplot is constructed by using the
singular value decomposition In linear algebra, the singular value decomposition (SVD) is a factorization of a real or complex matrix. It generalizes the eigendecomposition of a square normal matrix with an orthonormal eigenbasis to any \ m \times n\ matrix. It is related ...
(SVD) to obtain a
low-rank approximation In mathematics, low-rank approximation is a mathematical optimization, minimization problem, in which the Loss function, cost function measures the fit between a given matrix (the data) and an approximating matrix (the optimization variable), subjec ...
to a transformed version of the data matrix X, whose ''n'' rows are the samples (also called the cases, or objects), and whose ''p'' columns are the variables. The transformed data matrix Y is obtained from the original matrix X by centering and optionally standardizing the columns (the variables). Using the SVD, we can write Y = Σ''k''=1,...''p''''d''''k''u''k''v''k''T;, where the u''k'' are ''n''-dimensional column vectors, the v''k'' are ''p''-dimensional column vectors, and the ''d''''k'' are a non-increasing sequence of non-negative
scalars Scalar may refer to: *Scalar (mathematics), an element of a field, which is used to define a vector space, usually the field of real numbers *Scalar (physics), a physical quantity that can be described by a single element of a number field such a ...
. The biplot is formed from two scatterplots that share a common set of axes and have a between-set
scalar product In mathematics, the dot product or scalar productThe term ''scalar product'' means literally "product with a scalar as a result". It is also used sometimes for other symmetric bilinear forms, for example in a pseudo-Euclidean space. is an algebra ...
interpretation. The first scatterplot is formed from the points (''d''1α''u''1''i'',  ''d''2α''u''2''i''), for ''i'' = 1,...,''n''. The second plot is formed from the points (''d''11−α''v''1''j'', ''d''21−α''v''2''j''), for ''j'' = 1,...,''p''. This is the biplot formed by the dominant two terms of the SVD, which can then be represented in a two-dimensional display. Typical choices of α are 1 (to give a distance interpretation to the row display) and 0 (to give a distance interpretation to the column display), and in some rare cases α=1/2 to obtain a symmetrically scaled biplot (which gives no distance interpretation to the rows or the columns, but only the scalar product interpretation). The set of points depicting the variables can be drawn as arrows from the origin to reinforce the idea that they represent biplot axes onto which the samples can be projected to approximate the original data.


References


Sources

* * Gower, J.C., Lubbe, S. and le Roux, N. (2010). ''Understanding Biplots''.
Wiley Wiley may refer to: Locations * Wiley, Colorado, a U.S. town *Wiley, Pleasants County, West Virginia, U.S. * Wiley-Kaserne, a district of the city of Neu-Ulm, Germany People * Wiley (musician), British grime MC, rapper, and producer * Wiley Mill ...
. * Gower, J.C. and Hand, D.J (1996). ''Biplots''.
Chapman & Hall Chapman & Hall is an imprint owned by CRC Press, originally founded as a British publishing house in London in the first half of the 19th century by Edward Chapman and William Hall. Chapman & Hall were publishers for Charles Dickens (from 1840 ...
, London, UK. * Yan, W. and Kang, M.S. (2003). ''GGE Biplot Analysis''.
CRC Press The CRC Press, LLC is an American publishing group that specializes in producing technical books. Many of their books relate to engineering, science and mathematics. Their scope also includes books on business, forensics and information tec ...
, Boca Raton, Florida. * Demey, J.R., Vicente-Villardón, J.L., Galindo-Villardón, M.P. and Zambrano, A.Y. (2008). ''Identifying molecular markers associated with classification of genotypes by External Logistic Biplots''.
Bioinformatics Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combi ...
. 24(24):2832–2838 {{Statistics, descriptive Statistical charts and diagrams Factor analysis