HOME

TheInfoList



OR:

Parallel Coordinates plots are a common method of visualizing high-dimensional datasets to analyze multivariate data having multiple variables, or attributes. To plot, or visualize, a set of
points A point is a small dot or the sharp tip of something. Point or points may refer to: Mathematics * Point (geometry), an entity that has a location in space or on a plane, but has no extent; more generally, an element of some abstract topologica ...
in ''n''-dimensional space, ''n'' parallel lines are drawn over the background representing
coordinate In geometry, a coordinate system is a system that uses one or more numbers, or coordinates, to uniquely determine and standardize the position of the points or other geometric elements on a manifold such as Euclidean space. The coordinates are ...
axes, typically oriented vertically with equal spacing. Points in ''n''-dimensional space are represented as individual
polyline In geometry, a polygonal chain is a connected series of line segments. More formally, a polygonal chain is a curve specified by a sequence of points (A_1, A_2, \dots, A_n) called its vertices. The curve itself consists of the line segments co ...
s with ''n'' vertices placed on the parallel axes corresponding to each
coordinate In geometry, a coordinate system is a system that uses one or more numbers, or coordinates, to uniquely determine and standardize the position of the points or other geometric elements on a manifold such as Euclidean space. The coordinates are ...
entry of the ''n''-dimensional point, vertices are connected with ''n-1'' polyline segments. This data visualization is similar to
time series In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. ...
visualization, except that Parallel Coordinates are applied to data which do not correspond with chronological time. Therefore, different axes arrangements can be of interest, including reflecting axes horizontally, otherwise inverting the attribute range.


History

The concept of Parallel Coordinates is often said to originate in 1885 by a French mathematician Philbert Maurice d'Ocagne. d'Ocagne sought a way to provide graphical calculation of mathematical functions using alignment diagrams called
nomogram A nomogram (), also called a nomograph, alignment chart, or abac, is a graphical Analog computer, calculating device, a two-dimensional diagram designed to allow the approximate graphical computation of a Function (mathematics), mathematical fu ...
s which used parallel axes with different scales. For example, a three-variable equation could be solved using three parallel axes, marking known values on their scales, then drawing a line between them, with an unknown read from the scale at the point where the line intersects that scale. The use of Parallel Coordinates as a visualization technique to show data is also often said to have originated earlier with
Henry Gannett Henry Gannett (August 24, 1846 – November 5, 1914) was an American geographer who is described as the "father of mapmaking in America."Evans, Richard Tranter; Frye, Helen M. (2009).History of the Topographic Branch (Division) (PDF). ''U.S. Geo ...
in work preceding the Statistical Atlas of the United States for the 1890 Census, for example his "General Summary, Showing the Rank of States, by Ratios, 1880", that shows the rank of 10 measures (population, occupations, wealth, manufacturing, agriculture, and so forth) on parallel axes connected by lines for each state. However, both d'Ocagne and Gannet were far preceded in this by André-Michel Guerry, Plate IV, "Influence de l'Age", where he showed rankings of crimes against persons by age along parallel axes, connecting the same crime across age groups. Parallel Coordinates were popularised again 87 years later by Alfred Inselberg in 1985 and systematically developed as a coordinate system starting from 1977. Some important applications are in collision avoidance algorithms for
air traffic control Air traffic control (ATC) is a service provided by ground-based air traffic controllers who direct aircraft on the ground and through a given section of controlled airspace, and can provide advisory services to aircraft in non-controlled air ...
(1987—3 USA patents),
data mining Data mining is the process of extracting and finding patterns in massive data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and ...
(USA patent),
computer vision Computer vision tasks include methods for image sensor, acquiring, Image processing, processing, Image analysis, analyzing, and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical ...
(USA patent), Optimization,
process control Industrial process control (IPC) or simply process control is a system used in modern manufacturing which uses the principles of control theory and physical industrial control systems to monitor, control and optimize continuous Industrial processe ...
, more recently in
intrusion detection An intrusion detection system (IDS) is a device or software application that monitors a network or systems for malicious activity or policy violations. Any intrusion activity or violation is typically either reported to an administrator or collec ...
and elsewhere.


Higher dimensions

On the plane with an XY Cartesian coordinate system, adding more dimensions in parallel coordinates (often abbreviated , , -coords, PCP, or PC) involves adding more axes. The value of parallel coordinates is that certain geometrical properties in high dimensions transform into easily seen 2D patterns. For example, a set of points on a line in ''n''-space transforms to a set of
polyline In geometry, a polygonal chain is a connected series of line segments. More formally, a polygonal chain is a curve specified by a sequence of points (A_1, A_2, \dots, A_n) called its vertices. The curve itself consists of the line segments co ...
s in parallel coordinates all intersecting at ''n'' − 1 points. For ''n'' = 2 this yields a point-line duality pointing out why the mathematical foundations of parallel coordinates are developed in the projective rather than euclidean space. A pair of lines intersects at a unique point which has two coordinates and, therefore, can correspond to a unique line which is also specified by two parameters (or two points). By contrast, more than two points are required to specify a curve and also a pair of curves may not have a unique intersection. Hence by using curves in parallel coordinates instead of lines, the point line duality is lost together with all the other properties of projective geometry, and the known nice higher-dimensional patterns corresponding to (hyper)planes, curves, several smooth (hyper)surfaces, proximities, convexity and recently non-orientability. The goal is to map n-dimensional relations into 2D patterns. Hence, parallel coordinates is not a point-to-point mapping but rather a ''n''D subset to 2D subset mapping, there is no loss of information. Note: even a point in nD is not mapped into a point in 2D, but to a polygonal line—a subset of 2D.


Statistical considerations

When used for statistical data visualisation there are three important considerations: the order, the rotation, and the scaling of the axes. The order of the axes is critical for finding features, and in typical data analysis many reorderings will need to be tried. Some authors have come up with ordering heuristics which may create illuminating orderings. The rotation of the axes is a translation in the parallel coordinates and if the lines intersected outside the parallel axes it can be translated between them by rotations. The simplest example of this is rotating the axis by 180 degrees. Scaling is necessary because the plot is based on interpolation (linear combination) of consecutive pairs of variables. Therefore, the variables must be in common scale, and there are many scaling methods to be considered as part of data preparation process that can reveal more informative views. A smooth parallel coordinate plot is achieved with splines. In the smooth plot, every observation is mapped into a parametric line (or curve), which is smooth, continuous on the axes, and orthogonal to each parallel axis. This design emphasizes the quantization level for each data attribute.


Reading

Inselberg () made a full review of how to visually read out parallel coordinates relational patterns. When most lines between two parallel axes are somewhat parallel to each other, it suggests a positive relationship between these two dimensions. When lines cross in a kind of superposition of X-shapes, it's a negative relationship. When lines cross randomly or are parallel, it shows there is no particular relationship.


Limitations

In parallel coordinates, each axis can have at most two neighboring axes (one on the left, and one on the right). For a ''n''-dimensional data set, at most ''n''-1 relationships can be shown at a time without altering the approach. In
time series In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. ...
visualization, there exists a natural predecessor and successor; therefore in this special case, there exists a preferred arrangement. However, when the axes do not have a unique order, finding a good axis arrangement requires the use of experimentation and feature engineering. To explore more relationships, axes may be reordered or restructured. One approach arranges axes in 3-dimensional space (still in parallel, forming a Lattice graph), an axis can have more than two neighbors in a circle around the central attribute, and the arrangement problem can be improve by using a
minimum spanning tree A minimum spanning tree (MST) or minimum weight spanning tree is a subset of the edges of a connected, edge-weighted undirected graph that connects all the vertices together, without any cycles and with the minimum possible total edge weight. ...
. A prototype of this visualization is available as extension to the data mining software ELKI. However, the visualization is harder to interpret and interact with than a linear order.


Software

While there are a large number of papers about parallel coordinates, there are only few notable software publicly available to convert databases into parallel coordinates graphics. Notable software are ELKI, GGobi, Mondrian, Orange and
ROOT In vascular plants, the roots are the plant organ, organs of a plant that are modified to provide anchorage for the plant and take in water and nutrients into the plant body, which allows plants to grow taller and faster. They are most often bel ...
. Libraries include Protovis.js,
D3.js D3.js (also known as D3, short for Data-Driven Documents) is a JavaScript library for producing dynamic, interactive data visualizations in web browsers. It makes use of Scalable Vector Graphics (SVG), HTML5, and Cascading Style Sheets (CSS) stan ...
provides basic examples. D3.Parcoords.js (a D3-based library) specifically dedicated to parallel coordinates graphic creation has also been published. The Python data structure and analysis library Pandas implements parallel coordinates plotting, using the plotting library
matplotlib Matplotlib (portmanteau of MATLAB, plot, and library) is a Plotter, plotting Library (computer science), library for the Python (programming language), Python programming language and its Numerical analysis, numerical mathematics extension NumPy. ...
.Parallel Coordinates in Pandas
/ref>


Other visualizations for multivariate data

*
Radar chart A radar chart is a graphical method of displaying multivariate data in the form of a two-dimensional chart of three or more quantitative variables represented on axes starting from the same point. The relative position and angle of the axes i ...
– A visualization with coordinate axes arranged radially. * Andrews plot – A Fourier transform of the Parallel Coordinates graph. * Sankey diagram - A visualization that emphasizes flow/movement/change from one state to another.


References


Further reading

* Heinrich, Julian and Weiskopf, Daniel (2013)
State of the Art of Parallel Coordinates
', Eurographics 2013 - State of the Art Reports, pp. 95–116 * Moustafa, Rida (2011) '' Parallel coordinate and parallel coordinate density plots'', Wiley Interdisciplinary Reviews: Computational Statistics, Vol 3(2), pp. 134–148. * Weidele, Daniel Karl I. (2019)
Conditional Parallel Coordinates
', IEEE Visualization Conference (VIS) 2019, pp. 221–225


External links


Alfred Inselberg's Homepage
with Visual Tutorial, History, Selected Publications and Applications

by C. Brunsdon, A. S. Fotheringham & M. E. Charlton, University of Newcastle, UK
Using Curves to Enhance Parallel Coordinate Visualisations
{{Webarchive, url=https://web.archive.org/web/20070315191533/http://www.dcs.napier.ac.uk/~marting/parCoord/GrahamKennedyParallelCurvesIV03.pdf , date=2007-03-15 by Martin Graham & Jessie Kennedy, Napier University,
Edinburgh Edinburgh is the capital city of Scotland and one of its 32 Council areas of Scotland, council areas. The city is located in southeast Scotland and is bounded to the north by the Firth of Forth and to the south by the Pentland Hills. Edinburgh ...
, UK
Parallel Coordinates
a tutorial by Robert Kosara
Conditional Parallel Coordinates
– Recursive variant of Parallel Coordinates, where a categorical value can expand to reveal another level of Parallel Coordinates. Data and information visualization Multi-dimensional geometry Statistical charts and diagrams