Data and information visualization (data viz or info viz)
is an interdisciplinary field that deals with the
graphic
Graphics () are visual images or designs on some surface, such as a wall, canvas, screen, paper, or stone, to inform, illustrate, or entertain. In contemporary usage, it includes a pictorial representation of data, as in design and manufacture, ...
representation
Representation may refer to:
Law and politics
*Representation (politics), political activities undertaken by elected representatives, as well as other theories
** Representative democracy, type of democracy in which elected officials represent a ...
of
data
In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpret ...
and
information
Information is an abstract concept that refers to that which has the power to inform. At the most fundamental level information pertains to the interpretation of that which may be sensed. Any natural process that is not completely random, ...
. It is a particularly efficient way of communicating when the data or information is numerous as for example a
time series
In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. E ...
.
It is also the study of
visual representations of abstract data to reinforce human cognition. The abstract data include both numerical and non-numerical data, such as text and
geographic information. It is related to
infographics
Infographics (a clipped compound of "information" and "graphics") are graphic visual representations of information, data, or knowledge intended to present information quickly and clearly.Doug Newsom and Jim Haynes (2004). ''Public Relations Wr ...
and
scientific visualization
Scientific visualization ( also spelled scientific visualisation) is an interdisciplinary branch of science concerned with the visualization of scientific phenomena. Michael Friendly (2008)"Milestones in the history of thematic cartography, st ...
. One distinction is that it's information visualization when the spatial representation (e.g., the
page layout
In graphic design, page layout is the arrangement of visual elements on a page. It generally involves organizational principles of composition to achieve specific communication objectives.
The high-level page layout involves deciding on the ...
of a
graphic design) is chosen, whereas it's
scientific visualization
Scientific visualization ( also spelled scientific visualisation) is an interdisciplinary branch of science concerned with the visualization of scientific phenomena. Michael Friendly (2008)"Milestones in the history of thematic cartography, st ...
when the spatial representation is given.
From an academic point of view, this representation can be considered as a mapping between the original data (usually numerical) and graphic elements (for example, lines or points in a chart). The mapping determines how the attributes of these elements vary according to the data. In this light, a bar chart is a mapping of the length of a bar to a magnitude of a variable. Since the graphic design of the mapping can adversely affect the readability of a chart,
mapping is a core competency of Data visualization.
Data and information visualization has its roots in the field of
statistics and is therefore generally considered a branch of
descriptive statistics
A descriptive statistic (in the count noun sense) is a summary statistic that quantitatively describes or summarizes features from a collection of information, while descriptive statistics (in the mass noun sense) is the process of using and a ...
. However, because both design skills and statistical and computing skills are required to visualize effectively, it is argued by authors such as Gershon and Page that it is both an art and a science.
Research into how people read and misread various types of visualizations is helping to determine what types and features of visualizations are most understandable and effective in conveying information.
Overview

The field of data and information visualization has emerged "from research in
human–computer interaction
Human–computer interaction (HCI) is research in the design and the use of computer technology, which focuses on the interfaces between people ( users) and computers. HCI researchers observe the ways humans interact with computers and design ...
,
computer science
Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includin ...
,
graphics
Graphics () are visual images or designs on some surface, such as a wall, canvas, screen, paper, or stone, to inform, illustrate, or entertain. In contemporary usage, it includes a pictorial representation of data, as in design and manufacture, ...
,
visual design,
psychology
Psychology is the scientific study of mind and behavior. Psychology includes the study of conscious and unconscious phenomena, including feelings and thoughts. It is an academic discipline of immense scope, crossing the boundaries betwe ...
, and
business methods
A business process, business method or business function is a collection of related, structured activities or tasks by people or equipment in which a specific sequence produces a service or product (serves a particular business goal) for a parti ...
. It is increasingly applied as a critical component in scientific research,
digital libraries
A digital library, also called an online library, an internet library, a digital repository, or a digital collection is an online database of digital objects that can include text, still images, audio, video, digital documents, or other digital ...
,
data mining, financial data analysis, market studies, manufacturing
production control Within supply chain management and manufacturing, production control is the activity of monitoring and controlling any particular production or operation. Production control is often run from a specific control room or operations room. With invent ...
, and
drug discovery
In the fields of medicine, biotechnology and pharmacology, drug discovery is the process by which new candidate medications are discovered.
Historically, drugs were discovered by identifying the active ingredient from traditional remedies or by ...
".
[Benjamin B. Bederson and ]Ben Shneiderman
Ben Shneiderman (born August 21, 1947) is an American computer scientist, a Distinguished University Professor in the University of Maryland Department of Computer Science, which is part of the University of Maryland College of Computer, Mathe ...
(2003)
''The Craft of Information Visualization: Readings and Reflections''
Morgan Kaufmann .
Data and information visualization presumes that "visual representations and interaction techniques take advantage of the human eye’s broad bandwidth pathway into the mind to allow users to see, explore, and understand large amounts of information at once. Information visualization focused on the creation of approaches for conveying abstract information in intuitive ways."
Data analysis is an indispensable part of all applied research and problem solving in industry. The most fundamental data analysis approaches are visualization (histograms, scatter plots, surface plots, tree maps, parallel coordinate plots, etc.),
statistics (
hypothesis test
A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis.
Hypothesis testing allows us to make probabilistic statements about population parameters.
...
,
regression
Regression or regressions may refer to:
Science
* Marine regression, coastal advance due to falling sea level, the opposite of marine transgression
* Regression (medicine), a characteristic of diseases to express lighter symptoms or less extent ( ...
,
PCA, etc.),
data mining (
association mining, etc.), and
machine learning
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence.
Machine ...
methods (
clustering,
classification Classification is a process related to categorization, the process in which ideas and objects are recognized, differentiated and understood.
Classification is the grouping of related facts into classes.
It may also refer to:
Business, organizat ...
,
decision trees
A decision tree is a decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains cond ...
, etc.). Among these approaches, information visualization, or visual data analysis, is the most reliant on the cognitive skills of human analysts, and allows the discovery of unstructured actionable insights that are limited only by human imagination and creativity. The analyst does not have to learn any sophisticated methods to be able to interpret the visualizations of the data. Information visualization is also a hypothesis generation scheme, which can be, and is typically followed by more analytical or formal analysis, such as statistical hypothesis testing.
To communicate information clearly and efficiently, data visualization uses
statistical graphics
Statistical graphics, also known as statistical graphical techniques, are graphics used in the field of statistics for data visualization.
Overview
Whereas statistics and data analysis procedures generally yield their output in numeric or ta ...
,
plots,
information graphics
Infographics (a clipped compound of "information" and "graphics") are graphic visual representations of information, data, or knowledge intended to present information quickly and clearly.Doug Newsom and Jim Haynes (2004). ''Public Relations Wr ...
and other tools. Numerical data may be encoded using dots, lines, or bars, to visually communicate a quantitative message.
Effective visualization helps users analyze and reason about data and evidence. It makes complex data more accessible, understandable, and usable, but can also be reductive. Users may have particular analytical tasks, such as making comparisons or understanding
causality
Causality (also referred to as causation, or cause and effect) is influence by which one event, process, state, or object (''a'' ''cause'') contributes to the production of another event, process, state, or object (an ''effect'') where the ca ...
, and the design principle of the graphic (i.e., showing comparisons or showing causality) follows the task. Tables are generally used where users will look up a specific measurement, while charts of various types are used to show patterns or relationships in the data for one or more variables.
Data visualization refers to the techniques used to communicate data or information by encoding it as visual objects (e.g., points, lines, or bars) contained in graphics. The goal is to communicate information clearly and efficiently to users. It is one of the steps in
data analysis
Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Data analysis has multiple facets and approaches, en ...
or
data science
Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract or extrapolate knowledge and insights from noisy, structured and unstructured data, and apply knowledge from data across a bro ...
. According to Vitaly Friedman (2008) the "main goal of data visualization is to communicate information clearly and effectively through graphical means. It doesn't mean that data visualization needs to look boring to be functional or extremely sophisticated to look beautiful. To convey ideas effectively, both aesthetic form and functionality need to go hand in hand, providing insights into a rather sparse and complex data set by communicating its key aspects in a more intuitive way. Yet designers often fail to achieve a balance between form and function, creating gorgeous data visualizations which fail to serve their main purpose — to communicate information".
Indeed,
Fernanda Viegas and
Martin M. Wattenberg
Martin M. Wattenberg (born 1970) is an American scientist and artist known for his work with data visualization. He is currently the Gordon McKay Professor of Computer Science at the Harvard University School of Engineering and Applied Sciences. ...
suggested that an ideal visualization should not only communicate clearly, but stimulate viewer engagement and attention.
Data visualization is closely related to
information graphics
Infographics (a clipped compound of "information" and "graphics") are graphic visual representations of information, data, or knowledge intended to present information quickly and clearly.Doug Newsom and Jim Haynes (2004). ''Public Relations Wr ...
,
information visualization
Information is an abstract concept that refers to that which has the power to inform. At the most fundamental level information pertains to the interpretation of that which may be sensed. Any natural process that is not completely random, ...
,
scientific visualization
Scientific visualization ( also spelled scientific visualisation) is an interdisciplinary branch of science concerned with the visualization of scientific phenomena. Michael Friendly (2008)"Milestones in the history of thematic cartography, st ...
,
exploratory data analysis
In statistics, exploratory data analysis (EDA) is an approach of analyzing data sets to summarize their main characteristics, often using statistical graphics and other data visualization methods. A statistical model can be used or not, but prim ...
and
statistical graphics
Statistical graphics, also known as statistical graphical techniques, are graphics used in the field of statistics for data visualization.
Overview
Whereas statistics and data analysis procedures generally yield their output in numeric or ta ...
. In the new millennium, data visualization has become an active area of research, teaching and development. According to Post et al. (2002), it has united scientific and information visualization.
[Frits H. Post, Gregory M. Nielson and Georges-Pierre Bonneau (2002)]
''Data Visualization: The State of the Art''. Research paper TU delft, 2002.
.
In the commercial environment data visualization is often referred to as
dashboards.
Infographic
Infographics (a clipped compound of "information" and "graphics") are graphic visual representations of information, data, or knowledge intended to present information quickly and clearly.Doug Newsom and Jim Haynes (2004). ''Public Relations Wr ...
s are another very common form of data visualization.
Principles
Characteristics of effective graphical displays
Edward Tufte
Edward Rolf Tufte (; born March 14, 1942), sometimes known as "ET",. is an American statistician and professor emeritus of political science, statistics, and computer science at Yale University. He is noted for his writings on information desig ...
has explained that users of information displays are executing particular ''analytical tasks'' such as making comparisons. The ''design principle'' of the information graphic should support the analytical task. As William Cleveland and Robert McGill show, different graphical elements accomplish this more or less effectively. For example, dot plots and bar charts outperform pie charts.
In his 1983 book ''The Visual Display of Quantitative Information'',
Edward Tufte
Edward Rolf Tufte (; born March 14, 1942), sometimes known as "ET",. is an American statistician and professor emeritus of political science, statistics, and computer science at Yale University. He is noted for his writings on information desig ...
defines 'graphical displays' and principles for effective graphical display in the following passage:
"Excellence in statistical graphics consists of complex ideas communicated with clarity, precision, and efficiency. Graphical displays should:
*show the data
*induce the viewer to think about the substance rather than about methodology, graphic design, the technology of graphic production, or something else
* avoid distorting what the data has to say
*present many numbers in a small space
*make large data sets coherent
*encourage the eye to compare different pieces of data
*reveal the data at several levels of detail, from a broad overview to the fine structure
*serve a reasonably clear purpose: description, exploration, tabulation, or decoration
*be closely integrated with the statistical and verbal descriptions of a data set.
Graphics ''reveal'' data. Indeed graphics can be more precise and revealing than conventional statistical computations."
For example, the Minard diagram shows the losses suffered by Napoleon's army in the 1812–1813 period. Six variables are plotted: the size of the army, its location on a two-dimensional surface (x and y), time, the direction of movement, and temperature. The line width illustrates a comparison (size of the army at points in time), while the temperature axis suggests a cause of the change in army size. This multivariate display on a two-dimensional surface tells a story that can be grasped immediately while identifying the source data to build credibility. Tufte wrote in 1983 that: "It may well be the best statistical graphic ever drawn."
[
Not applying these principles may result in misleading graphs, distorting the message, or supporting an erroneous conclusion. According to Tufte, ]chartjunk
Chartjunk refers to all visual elements in charts and graphs that are not necessary to comprehend the information represented on the graph, or that distract the viewer from this information.
Markings and visual elements can be called chartjunk if ...
refers to the extraneous interior decoration of the graphic that does not enhance the message or gratuitous three-dimensional or perspective effects. Needlessly separating the explanatory key from the image itself, requiring the eye to travel back and forth from the image to the key, is a form of "administrative debris." The ratio of "data to ink" should be maximized, erasing non-data ink where feasible.[
The ]Congressional Budget Office
The Congressional Budget Office (CBO) is a federal agency within the legislative branch of the United States government that provides budget and economic information to Congress.
Inspired by California's Legislative Analyst's Office that manag ...
summarized several best practices for graphical displays in a June 2014 presentation. These included: a) Knowing your audience; b) Designing graphics that can stand alone outside the report's context; and c) Designing graphics that communicate the key messages in the report.
Quantitative messages
Author Stephen Few described eight types of quantitative messages that users may attempt to understand or communicate from a set of data and the associated graphs used to help communicate the message:
#Time-series: A single variable is captured over a period of time, such as the unemployment rate or temperature measures over a 10-year period. A line chart
A line chart or line graph or curve chart is a type of chart which displays information as a series of data points called 'markers' connected by straight line segments. It is a basic type of chart common in many fields. It is similar to a s ...
may be used to demonstrate the trend over time.
#Ranking: Categorical subdivisions are ranked in ascending or descending order, such as a ranking of sales performance (the ''measure'') by sales persons (the ''category'', with each sales person a ''categorical subdivision'') during a single period. A bar chart
A bar chart or bar graph is a chart or graph that presents categorical data with rectangular bars with heights or length
Length is a measure of distance. In the International System of Quantities, length is a quantity with dimension dist ...
may be used to show the comparison across the sales persons.
#Part-to-whole: Categorical subdivisions are measured as a ratio to the whole (i.e., a percentage out of 100%). A pie chart
A pie chart (or a circle chart) is a circular statistical graphic, which is divided into slices to illustrate numerical proportion. In a pie chart, the arc length of each slice (and consequently its central angle and area) is proportional t ...
or bar chart can show the comparison of ratios, such as the market share represented by competitors in a market.
#Deviation: Categorical subdivisions are compared against a reference, such as a comparison of actual vs. budget expenses for several departments of a business for a given time period. A bar chart can show comparison of the actual versus the reference amount.
#Frequency distribution: Shows the number of observations of a particular variable for given interval, such as the number of years in which the stock market return is between intervals such as 0-10%, 11-20%, etc. A histogram
A histogram is an approximate representation of the distribution of numerical data. The term was first introduced by Karl Pearson. To construct a histogram, the first step is to " bin" (or " bucket") the range of values—that is, divide the ent ...
, a type of bar chart, may be used for this analysis. A boxplot
In descriptive statistics, a box plot or boxplot is a method for graphically demonstrating the locality, spread and skewness groups of numerical data through their quartiles. In addition to the box on a box plot, there can be lines (which are cal ...
helps visualize key statistics about the distribution, such as median, quartiles, outliers, etc.
#Correlation: Comparison between observations represented by two variables (X,Y) to determine if they tend to move in the same or opposite directions. For example, plotting unemployment (X) and inflation (Y) for a sample of months. A scatter plot
A scatter plot (also called a scatterplot, scatter graph, scatter chart, scattergram, or scatter diagram) is a type of plot or mathematical diagram using Cartesian coordinates to display values for typically two variables for a set of data. ...
is typically used for this message.
#Nominal comparison: Comparing categorical subdivisions in no particular order, such as the sales volume by product code. A bar chart may be used for this comparison.
#Geographic
Geography (from Greek: , ''geographia''. Combination of Greek words ‘Geo’ (The Earth) and ‘Graphien’ (to describe), literally "earth description") is a field of science devoted to the study of the lands, features, inhabitants, and ...
or geospatial
Geographic data and information is defined in the ISO/TC 211 series of standards as data and information having an implicit or explicit association with a location relative to Earth (a geographic location or geographic position).
It is also ca ...
: Comparison of a variable across a map or layout, such as the unemployment rate by state or the number of persons on the various floors of a building. A cartogram is a typical graphic used.
Analysts reviewing a set of data may consider whether some or all of the messages and graphic types above are applicable to their task and audience. The process of trial and error to identify meaningful relationships and messages in the data is part of exploratory data analysis
In statistics, exploratory data analysis (EDA) is an approach of analyzing data sets to summarize their main characteristics, often using statistical graphics and other data visualization methods. A statistical model can be used or not, but prim ...
.
Visual perception and data visualization
A human can distinguish differences in line length, shape, orientation, distances, and color (hue) readily without significant processing effort; these are referred to as " pre-attentive attributes". For example, it may require significant time and effort ("attentive processing") to identify the number of times the digit "5" appears in a series of numbers; but if that digit is different in size, orientation, or color, instances of the digit can be noted quickly through pre-attentive processing.
Compelling graphics take advantage of pre-attentive processing and attributes and the relative strength of these attributes. For example, since humans can more easily process differences in line length than surface area, it may be more effective to use a bar chart (which takes advantage of line length to show comparison) rather than pie charts (which use surface area to show comparison).
Human perception/cognition and data visualization
Almost all data visualizations are created for human consumption. Knowledge of human perception and cognition is necessary when designing intuitive visualizations. Cognition refers to processes in human beings like perception, attention, learning, memory, thought, concept formation, reading, and problem solving. Human visual processing is efficient in detecting changes and making comparisons between quantities, sizes, shapes and variations in lightness. When properties of symbolic data are mapped to visual properties, humans can browse through large amounts of data efficiently. It is estimated that 2/3 of the brain's neurons can be involved in visual processing. Proper visualization provides a different approach to show potential connections, relationships, etc. which are not as obvious in non-visualized quantitative data. Visualization can become a means of data exploration Data exploration is an approach similar to initial data analysis, whereby a data analyst uses visual exploration to understand what is in a dataset and the characteristics of the data, rather than through traditional data management systems. .
Studies have shown individuals used on average 19% less cognitive resources, and 4.5% better able to recall details when comparing data visualization with text.
History
The modern study of visualization started with computer graphics
Computer graphics deals with generating images with the aid of computers. Today, computer graphics is a core technology in digital photography, film, video games, cell phone and computer displays, and many specialized applications. A great deal ...
, which "has from its beginning been used to study scientific problems. However, in its early days the lack of graphics power often limited its usefulness. The recent emphasis on visualization started in 1987 with the special issue of Computer Graphics on Visualization in ''Scientific Computing
Computational science, also known as scientific computing or scientific computation (SC), is a field in mathematics that uses advanced computing capabilities to understand and solve complex problems. It is an area of science that spans many disc ...
''. Since then there have been several conferences and workshops, co-sponsored by the IEEE Computer Society
The Institute of Electrical and Electronics Engineers (IEEE) is a 501(c)(3) professional association for electronic engineering and electrical engineering (and associated disciplines) with its corporate office in New York City and its operati ...
and ACM SIGGRAPH
ACM SIGGRAPH is the international Association for Computing Machinery's Special Interest Group on Computer Graphics and Interactive Techniques based in New York. It was founded in 1969 by Andy van Dam (its direct predecessor, ACM SICGRAPH was fou ...
". They have been devoted to the general topics of data visualization
Data and information visualization (data viz or info viz) is an interdisciplinary field that deals with the graphic representation of data and information. It is a particularly efficient way of communicating when the data or information is nume ...
, information visualization and scientific visualization
Scientific visualization ( also spelled scientific visualisation) is an interdisciplinary branch of science concerned with the visualization of scientific phenomena. Michael Friendly (2008)"Milestones in the history of thematic cartography, st ...
, and more specific areas such as volume visualization.
In 1786, William Playfair published the first presentation graphics.
There is no comprehensive 'history' of data visualization. There are no accounts that span the entire development of visual thinking and the visual representation of data, and which collate the contributions of disparate disciplines. Michael Friendly and Daniel J Denis of York University
York University (french: Université York), also known as YorkU or simply YU, is a public research university in Toronto, Ontario, Canada. It is Canada's fourth-largest university, and it has approximately 55,700 students, 7,000 faculty and staf ...
are engaged in a project that attempts to provide a comprehensive history of visualization. Contrary to general belief, data visualization is not a modern development. Since prehistory, stellar data, or information such as location of stars were visualized on the walls of caves (such as those found in Lascaux Cave
Lascaux ( , ; french: Grotte de Lascaux , "Lascaux Cave") is a network of caves near the village of Montignac, in the department of Dordogne in southwestern France. Over 600 parietal wall paintings cover the interior walls and ceilings of t ...
in Southern France) since