interactive data visualization
   HOME

TheInfoList



OR:

Data and information visualization (data viz or info viz) is an interdisciplinary field that deals with the
graphic Graphics () are visual images or designs on some surface, such as a wall, canvas, screen, paper, or stone, to inform, illustrate, or entertain. In contemporary usage, it includes a pictorial representation of data, as in design and manufacture, ...
representation Representation may refer to: Law and politics *Representation (politics), political activities undertaken by elected representatives, as well as other theories ** Representative democracy, type of democracy in which elected officials represent a ...
of
data In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted ...
and
information Information is an abstract concept that refers to that which has the power to inform. At the most fundamental level information pertains to the interpretation of that which may be sensed. Any natural process that is not completely random ...
. It is a particularly efficient way of communicating when the data or information is numerous as for example a
time series In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Exa ...
. It is also the study of visual representations of abstract data to reinforce human cognition. The abstract data include both numerical and non-numerical data, such as text and geographic information. It is related to infographics and
scientific visualization Scientific visualization ( also spelled scientific visualisation) is an interdisciplinary branch of science concerned with the visualization of scientific phenomena.Michael Friendly (2008)"Milestones in the history of thematic cartography, stat ...
. One distinction is that it's information visualization when the spatial representation (e.g., the
page layout In graphic design, page layout is the arrangement of visual elements on a page. It generally involves organizational principles of composition to achieve specific communication objectives. The high-level page layout involves deciding on the ov ...
of a
graphic design Graphic design is a profession, academic discipline and applied art whose activity consists in projecting visual communications intended to transmit specific messages to social groups, with specific objectives. Graphic design is an interdiscipli ...
) is chosen, whereas it's
scientific visualization Scientific visualization ( also spelled scientific visualisation) is an interdisciplinary branch of science concerned with the visualization of scientific phenomena.Michael Friendly (2008)"Milestones in the history of thematic cartography, stat ...
when the spatial representation is given. From an academic point of view, this representation can be considered as a mapping between the original data (usually numerical) and graphic elements (for example, lines or points in a chart). The mapping determines how the attributes of these elements vary according to the data. In this light, a bar chart is a mapping of the length of a bar to a magnitude of a variable. Since the graphic design of the mapping can adversely affect the readability of a chart, mapping is a core competency of Data visualization. Data and information visualization has its roots in the field of
statistics Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
and is therefore generally considered a branch of descriptive statistics. However, because both design skills and statistical and computing skills are required to visualize effectively, it is argued by authors such as Gershon and Page that it is both an art and a science. Research into how people read and misread various types of visualizations is helping to determine what types and features of visualizations are most understandable and effective in conveying information.


Overview

The field of data and information visualization has emerged "from research in human–computer interaction,
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
,
graphics Graphics () are visual images or designs on some surface, such as a wall, canvas, screen, paper, or stone, to inform, illustrate, or entertain. In contemporary usage, it includes a pictorial representation of data, as in design and manufacture ...
, visual design,
psychology Psychology is the scientific study of mind and behavior. Psychology includes the study of conscious and unconscious phenomena, including feelings and thoughts. It is an academic discipline of immense scope, crossing the boundaries betwe ...
, and
business methods A business process, business method or business function is a collection of related, structured activities or tasks by people or equipment in which a specific sequence produces a service or product (serves a particular business goal) for a parti ...
. It is increasingly applied as a critical component in scientific research, digital libraries, data mining, financial data analysis, market studies, manufacturing production control, and
drug discovery In the fields of medicine, biotechnology and pharmacology, drug discovery is the process by which new candidate medications are discovered. Historically, drugs were discovered by identifying the active ingredient from traditional remedies or by ...
".Benjamin B. Bederson and Ben Shneiderman (2003)
''The Craft of Information Visualization: Readings and Reflections''
Morgan Kaufmann .
Data and information visualization presumes that "visual representations and interaction techniques take advantage of the human eye’s broad bandwidth pathway into the mind to allow users to see, explore, and understand large amounts of information at once. Information visualization focused on the creation of approaches for conveying abstract information in intuitive ways." Data analysis is an indispensable part of all applied research and problem solving in industry. The most fundamental data analysis approaches are visualization (histograms, scatter plots, surface plots, tree maps, parallel coordinate plots, etc.),
statistics Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
( hypothesis test,
regression Regression or regressions may refer to: Science * Marine regression, coastal advance due to falling sea level, the opposite of marine transgression * Regression (medicine), a characteristic of diseases to express lighter symptoms or less extent ( ...
, PCA, etc.), data mining ( association mining, etc.), and
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
methods ( clustering,
classification Classification is a process related to categorization, the process in which ideas and objects are recognized, differentiated and understood. Classification is the grouping of related facts into classes. It may also refer to: Business, organizat ...
,
decision trees A decision tree is a decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains condit ...
, etc.). Among these approaches, information visualization, or visual data analysis, is the most reliant on the cognitive skills of human analysts, and allows the discovery of unstructured actionable insights that are limited only by human imagination and creativity. The analyst does not have to learn any sophisticated methods to be able to interpret the visualizations of the data. Information visualization is also a hypothesis generation scheme, which can be, and is typically followed by more analytical or formal analysis, such as statistical hypothesis testing. To communicate information clearly and efficiently, data visualization uses
statistical graphics Statistical graphics, also known as statistical graphical techniques, are graphics used in the field of statistics for data visualization. Overview Whereas statistics and data analysis procedures generally yield their output in numeric or tabul ...
, plots, information graphics and other tools. Numerical data may be encoded using dots, lines, or bars, to visually communicate a quantitative message. Effective visualization helps users analyze and reason about data and evidence. It makes complex data more accessible, understandable, and usable, but can also be reductive. Users may have particular analytical tasks, such as making comparisons or understanding
causality Causality (also referred to as causation, or cause and effect) is influence by which one event, process, state, or object (''a'' ''cause'') contributes to the production of another event, process, state, or object (an ''effect'') where the cau ...
, and the design principle of the graphic (i.e., showing comparisons or showing causality) follows the task. Tables are generally used where users will look up a specific measurement, while charts of various types are used to show patterns or relationships in the data for one or more variables. Data visualization refers to the techniques used to communicate data or information by encoding it as visual objects (e.g., points, lines, or bars) contained in graphics. The goal is to communicate information clearly and efficiently to users. It is one of the steps in
data analysis Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Data analysis has multiple facets and approaches, enco ...
or
data science Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract or extrapolate knowledge and insights from noisy, structured and unstructured data, and apply knowledge from data across a br ...
. According to Vitaly Friedman (2008) the "main goal of data visualization is to communicate information clearly and effectively through graphical means. It doesn't mean that data visualization needs to look boring to be functional or extremely sophisticated to look beautiful. To convey ideas effectively, both aesthetic form and functionality need to go hand in hand, providing insights into a rather sparse and complex data set by communicating its key aspects in a more intuitive way. Yet designers often fail to achieve a balance between form and function, creating gorgeous data visualizations which fail to serve their main purpose — to communicate information". Indeed,
Fernanda Viegas Fernanda is a Portuguese, Spanish and Italian feminine equivalent of Fernando, a male given name of Germanic origin, with an original meaning of "adventurous, bold journey". __TOC__ People *Fernanda Abreu (born 1961), Brazilian popular singer *Fe ...
and
Martin M. Wattenberg Martin M. Wattenberg (born 1970) is an American scientist and artist known for his work with data visualization. He is currently the Gordon McKay Professor of Computer Science at the Harvard University School of Engineering and Applied Sciences. ...
suggested that an ideal visualization should not only communicate clearly, but stimulate viewer engagement and attention. Data visualization is closely related to information graphics, information visualization,
scientific visualization Scientific visualization ( also spelled scientific visualisation) is an interdisciplinary branch of science concerned with the visualization of scientific phenomena.Michael Friendly (2008)"Milestones in the history of thematic cartography, stat ...
, exploratory data analysis and
statistical graphics Statistical graphics, also known as statistical graphical techniques, are graphics used in the field of statistics for data visualization. Overview Whereas statistics and data analysis procedures generally yield their output in numeric or tabul ...
. In the new millennium, data visualization has become an active area of research, teaching and development. According to Post et al. (2002), it has united scientific and information visualization.Frits H. Post, Gregory M. Nielson and Georges-Pierre Bonneau (2002)
''Data Visualization: The State of the Art''. Research paper TU delft, 2002.
.
In the commercial environment data visualization is often referred to as
dashboards For business applications, see Dashboard (business). A dashboard (also called dash, instrument panel (IP), or fascia) is a control panel (engineering), control panel set within the central console of a vehicle or small aircraft. Usually located ...
. Infographics are another very common form of data visualization.


Principles


Characteristics of effective graphical displays

Edward Tufte has explained that users of information displays are executing particular ''analytical tasks'' such as making comparisons. The ''design principle'' of the information graphic should support the analytical task. As William Cleveland and Robert McGill show, different graphical elements accomplish this more or less effectively. For example, dot plots and bar charts outperform pie charts. In his 1983 book ''The Visual Display of Quantitative Information'', Edward Tufte defines 'graphical displays' and principles for effective graphical display in the following passage: "Excellence in statistical graphics consists of complex ideas communicated with clarity, precision, and efficiency. Graphical displays should: *show the data *induce the viewer to think about the substance rather than about methodology, graphic design, the technology of graphic production, or something else * avoid distorting what the data has to say *present many numbers in a small space *make large data sets coherent *encourage the eye to compare different pieces of data *reveal the data at several levels of detail, from a broad overview to the fine structure *serve a reasonably clear purpose: description, exploration, tabulation, or decoration *be closely integrated with the statistical and verbal descriptions of a data set. Graphics ''reveal'' data. Indeed graphics can be more precise and revealing than conventional statistical computations." For example, the Minard diagram shows the losses suffered by Napoleon's army in the 1812–1813 period. Six variables are plotted: the size of the army, its location on a two-dimensional surface (x and y), time, the direction of movement, and temperature. The line width illustrates a comparison (size of the army at points in time), while the temperature axis suggests a cause of the change in army size. This multivariate display on a two-dimensional surface tells a story that can be grasped immediately while identifying the source data to build credibility. Tufte wrote in 1983 that: "It may well be the best statistical graphic ever drawn." Not applying these principles may result in
misleading graphs In statistics, a misleading graph, also known as a distorted graph, is a graph that misrepresents data, constituting a misuse of statistics and with the result that an incorrect conclusion may be derived from it. Graphs may be misleading by be ...
, distorting the message, or supporting an erroneous conclusion. According to Tufte,
chartjunk Chartjunk refers to all visual elements in charts and graphs that are not necessary to comprehend the information represented on the graph, or that distract the viewer from this information. Markings and visual elements can be called chartjunk if ...
refers to the extraneous interior decoration of the graphic that does not enhance the message or gratuitous three-dimensional or perspective effects. Needlessly separating the explanatory key from the image itself, requiring the eye to travel back and forth from the image to the key, is a form of "administrative debris." The ratio of "data to ink" should be maximized, erasing non-data ink where feasible. The Congressional Budget Office summarized several best practices for graphical displays in a June 2014 presentation. These included: a) Knowing your audience; b) Designing graphics that can stand alone outside the report's context; and c) Designing graphics that communicate the key messages in the report.


Quantitative messages

Author Stephen Few described eight types of quantitative messages that users may attempt to understand or communicate from a set of data and the associated graphs used to help communicate the message: #Time-series: A single variable is captured over a period of time, such as the unemployment rate or temperature measures over a 10-year period. A
line chart A line chart or line graph or curve chart is a type of chart which displays information as a series of data points called 'markers' connected by straight line segments. It is a basic type of chart common in many fields. It is similar to a s ...
may be used to demonstrate the trend over time. #Ranking: Categorical subdivisions are ranked in ascending or descending order, such as a ranking of sales performance (the ''measure'') by sales persons (the ''category'', with each sales person a ''categorical subdivision'') during a single period. A bar chart may be used to show the comparison across the sales persons. #Part-to-whole: Categorical subdivisions are measured as a ratio to the whole (i.e., a percentage out of 100%). A
pie chart A pie chart (or a circle chart) is a circular Statistical graphics, statistical graphic, which is divided into slices to illustrate numerical proportion. In a pie chart, the arc length of each slice (and consequently its central angle and are ...
or bar chart can show the comparison of ratios, such as the market share represented by competitors in a market. #Deviation: Categorical subdivisions are compared against a reference, such as a comparison of actual vs. budget expenses for several departments of a business for a given time period. A bar chart can show comparison of the actual versus the reference amount. #Frequency distribution: Shows the number of observations of a particular variable for given interval, such as the number of years in which the stock market return is between intervals such as 0-10%, 11-20%, etc. A
histogram A histogram is an approximate representation of the distribution of numerical data. The term was first introduced by Karl Pearson. To construct a histogram, the first step is to " bin" (or "bucket") the range of values—that is, divide the ent ...
, a type of bar chart, may be used for this analysis. A
boxplot In descriptive statistics, a box plot or boxplot is a method for graphically demonstrating the locality, spread and skewness groups of numerical data through their quartiles. In addition to the box on a box plot, there can be lines (which are cal ...
helps visualize key statistics about the distribution, such as median, quartiles, outliers, etc. #Correlation: Comparison between observations represented by two variables (X,Y) to determine if they tend to move in the same or opposite directions. For example, plotting unemployment (X) and inflation (Y) for a sample of months. A
scatter plot A scatter plot (also called a scatterplot, scatter graph, scatter chart, scattergram, or scatter diagram) is a type of plot or mathematical diagram using Cartesian coordinates to display values for typically two variables for a set of data. ...
is typically used for this message. #Nominal comparison: Comparing categorical subdivisions in no particular order, such as the sales volume by product code. A bar chart may be used for this comparison. #
Geographic Geography (from Greek: , ''geographia''. Combination of Greek words ‘Geo’ (The Earth) and ‘Graphien’ (to describe), literally "earth description") is a field of science devoted to the study of the lands, features, inhabitants, and ...
or geospatial: Comparison of a variable across a map or layout, such as the unemployment rate by state or the number of persons on the various floors of a building. A cartogram is a typical graphic used. Analysts reviewing a set of data may consider whether some or all of the messages and graphic types above are applicable to their task and audience. The process of trial and error to identify meaningful relationships and messages in the data is part of exploratory data analysis.


Visual perception and data visualization

A human can distinguish differences in line length, shape, orientation, distances, and color (hue) readily without significant processing effort; these are referred to as " pre-attentive attributes". For example, it may require significant time and effort ("attentive processing") to identify the number of times the digit "5" appears in a series of numbers; but if that digit is different in size, orientation, or color, instances of the digit can be noted quickly through pre-attentive processing. Compelling graphics take advantage of pre-attentive processing and attributes and the relative strength of these attributes. For example, since humans can more easily process differences in line length than surface area, it may be more effective to use a bar chart (which takes advantage of line length to show comparison) rather than pie charts (which use surface area to show comparison).


Human perception/cognition and data visualization

Almost all data visualizations are created for human consumption. Knowledge of human perception and cognition is necessary when designing intuitive visualizations. Cognition refers to processes in human beings like perception, attention, learning, memory, thought, concept formation, reading, and problem solving. Human visual processing is efficient in detecting changes and making comparisons between quantities, sizes, shapes and variations in lightness. When properties of symbolic data are mapped to visual properties, humans can browse through large amounts of data efficiently. It is estimated that 2/3 of the brain's neurons can be involved in visual processing. Proper visualization provides a different approach to show potential connections, relationships, etc. which are not as obvious in non-visualized quantitative data. Visualization can become a means of
data exploration Data exploration is an approach similar to initial data analysis, whereby a data analyst uses visual exploration to understand what is in a dataset and the characteristics of the data, rather than through traditional data management systems.
. Studies have shown individuals used on average 19% less cognitive resources, and 4.5% better able to recall details when comparing data visualization with text.


History

The modern study of visualization started with
computer graphics Computer graphics deals with generating images with the aid of computers. Today, computer graphics is a core technology in digital photography, film, video games, cell phone and computer displays, and many specialized applications. A great de ...
, which "has from its beginning been used to study scientific problems. However, in its early days the lack of graphics power often limited its usefulness. The recent emphasis on visualization started in 1987 with the special issue of Computer Graphics on Visualization in ''
Scientific Computing Computational science, also known as scientific computing or scientific computation (SC), is a field in mathematics that uses advanced computing capabilities to understand and solve complex problems. It is an area of science that spans many disc ...
''. Since then there have been several conferences and workshops, co-sponsored by the IEEE Computer Society and ACM SIGGRAPH". They have been devoted to the general topics of
data visualization Data and information visualization (data viz or info viz) is an interdisciplinary field that deals with the graphic representation of data and information. It is a particularly efficient way of communicating when the data or information is num ...
, information visualization and
scientific visualization Scientific visualization ( also spelled scientific visualisation) is an interdisciplinary branch of science concerned with the visualization of scientific phenomena.Michael Friendly (2008)"Milestones in the history of thematic cartography, stat ...
, and more specific areas such as
volume visualization Scientific visualization ( also spelled scientific visualisation) is an interdisciplinary branch of science concerned with the visualization of scientific phenomena.Michael Friendly (2008)"Milestones in the history of thematic cartography, sta ...
. In 1786, William Playfair published the first presentation graphics. There is no comprehensive 'history' of data visualization. There are no accounts that span the entire development of visual thinking and the visual representation of data, and which collate the contributions of disparate disciplines. Michael Friendly and Daniel J Denis of
York University York University (french: Université York), also known as YorkU or simply YU, is a public university, public research university in Toronto, Ontario, Canada. It is Canada's fourth-largest university, and it has approximately 55,700 students, 7,0 ...
are engaged in a project that attempts to provide a comprehensive history of visualization. Contrary to general belief, data visualization is not a modern development. Since prehistory, stellar data, or information such as location of stars were visualized on the walls of caves (such as those found in Lascaux Cave in Southern France) since the
Pleistocene The Pleistocene ( , often referred to as the ''Ice age'') is the geological Epoch (geology), epoch that lasted from about 2,580,000 to 11,700 years ago, spanning the Earth's most recent period of repeated glaciations. Before a change was fina ...
era. Physical artefacts such as Mesopotamian clay tokens (5500 BC), Inca
quipu ''Quipu'' (also spelled ''khipu'') are recording devices fashioned from strings historically used by a number of cultures in the region of Andean South America. A ''quipu'' usually consisted of cotton or camelid fiber strings. The Inca people u ...
s (2600 BC) and Marshall Islands stick charts (n.d.) can also be considered as visualizing quantitative information. The first documented data visualization can be tracked back to 1160 B.C. with
Turin Papyrus Map The Turin Papyrus Map is an ancient Egyptian map, generally considered the oldest surviving map of topographical interest from the ancient world. It is drawn on a papyrus reportedly discovered at Deir el-Medina in Thebes, collected by Bernardino ...
which accurately illustrates the distribution of geological resources and provides information about quarrying of those resources. Such maps can be categorized as thematic cartography, which is a type of data visualization that presents and communicates specific data and information through a geographical illustration designed to show a particular theme connected with a specific geographic area. Earliest documented forms of data visualization were various thematic maps from different cultures and ideograms and hieroglyphs that provided and allowed interpretation of information illustrated. For example,
Linear B Linear B was a syllabic script used for writing in Mycenaean Greek, the earliest attested form of Greek. The script predates the Greek alphabet by several centuries. The oldest Mycenaean writing dates to about 1400 BC. It is descended from ...
tablets of
Mycenae Mycenae ( ; grc, Μυκῆναι or , ''Mykē̂nai'' or ''Mykḗnē'') is an archaeological site near Mykines in Argolis, north-eastern Peloponnese, Greece. It is located about south-west of Athens; north of Argos; and south of Corinth. Th ...
provided a visualization of information regarding Late Bronze Age era trades in the Mediterranean. The idea of coordinates was used by ancient Egyptian surveyors in laying out towns, earthly and heavenly positions were located by something akin to latitude and longitude at least by 200 BC, and the map projection of a spherical earth into latitude and longitude by
Claudius Ptolemy Claudius Ptolemy (; grc-gre, Πτολεμαῖος, ; la, Claudius Ptolemaeus; AD) was a mathematician, astronomer, astrologer, geographer, and music theorist, who wrote about a dozen scientific treatises, three of which were of importance ...
.85–c. 165in Alexandria would serve as reference standards until the 14th century. The invention of paper and parchment allowed further development of visualizations throughout history. Figure shows a graph from the 10th or possibly 11th century that is intended to be an illustration of the planetary movement, used in an appendix of a textbook in monastery schools. The graph apparently was meant to represent a plot of the inclinations of the planetary orbits as a function of the time. For this purpose, the zone of the zodiac was represented on a plane with a horizontal line divided into thirty parts as the time or longitudinal axis. The vertical axis designates the width of the zodiac. The horizontal scale appears to have been chosen for each planet individually for the periods cannot be reconciled. The accompanying text refers only to the amplitudes. The curves are apparently not related in time. By the 16th century, techniques and instruments for precise observation and measurement of physical quantities, and geographic and celestial position were well-developed (for example, a “wall quadrant” constructed by Tycho Brahe 546–1601 covering an entire wall in his observatory). Particularly important were the development of triangulation and other methods to determine mapping locations accurately. Very early, the measure of time led scholars to develop innovative way of visualizing the data (e.g. Lorenz Codomann in 1596, Johannes Temporarius in 1596). French philosopher and mathematician
René Descartes René Descartes ( or ; ; Latinized: Renatus Cartesius; 31 March 1596 – 11 February 1650) was a French philosopher, scientist, and mathematician, widely considered a seminal figure in the emergence of modern philosophy and science. Mathem ...
and
Pierre de Fermat Pierre de Fermat (; between 31 October and 6 December 1607 – 12 January 1665) was a French mathematician who is given credit for early developments that led to infinitesimal calculus, including his technique of adequality. In particular, he ...
developed analytic geometry and two-dimensional coordinate system which heavily influenced the practical methods of displaying and calculating values. Fermat and
Blaise Pascal Blaise Pascal ( , , ; ; 19 June 1623 – 19 August 1662) was a French mathematician, physicist, inventor, philosopher, and Catholic Church, Catholic writer. He was a child prodigy who was educated by his father, a tax collector in Rouen. Pa ...
's work on statistics and probability theory laid the groundwork for what we now conceptualize as data. According to the Interaction Design Foundation, these developments allowed and helped William Playfair, who saw potential for graphical communication of quantitative data, to generate and develop graphical methods of statistics. In the second half of the 20th century, Jacques Bertin used quantitative graphs to represent information "intuitively, clearly, accurately, and efficiently". John Tukey and Edward Tufte pushed the bounds of data visualization; Tukey with his new statistical approach of exploratory data analysis and Tufte with his book "The Visual Display of Quantitative Information" paved the way for refining data visualization techniques for more than statisticians. With the progression of technology came the progression of data visualization; starting with hand-drawn visualizations and evolving into more technical applications – including interactive designs leading to software visualization. Programs like
SAS SAS or Sas may refer to: Arts, entertainment, and media * ''SAS'' (novel series), a French book series by Gérard de Villiers * ''Shimmer and Shine'', an American animated children's television series * Southern All Stars, a Japanese rock ba ...
, SOFA, R, Minitab, Cornerstone and more allow for data visualization in the field of statistics. Other data visualization applications, more focused and unique to individuals, programming languages such as D3,
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
and
JavaScript JavaScript (), often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS. As of 2022, 98% of Website, websites use JavaScript on the Client (computing), client side ...
help to make the visualization of quantitative data a possibility. Private schools have also developed programs to meet the demand for learning data visualization and associated programming libraries, including free programs like
The Data Incubator The Data Incubator is a data science education company. It offers corporate data science training and placement services. It is best known for an 8-week educational fellowship preparing students with Master's degrees and PhDs for careers in ...
or paid programs like
General Assembly A general assembly or general meeting is a meeting of all the members of an organization or shareholders of a company. Specific examples of general assembly include: Churches * General Assembly (presbyterian church), the highest court of presby ...
. Beginning with the symposium "Data to Discovery" in 2013, ArtCenter College of Design, Caltech and JPL in Pasadena have run an annual program on interactive data visualization. The program asks: How can interactive data visualization help scientists and engineers explore their data more effectively? How can computing, design, and design thinking help maximize research results? What methodologies are most effective for leveraging knowledge from these fields? By encoding relational information with appropriate visual and interactive characteristics to help interrogate, and ultimately gain new insight into data, the program develops new interdisciplinary approaches to complex science problems, combining design thinking and the latest methods from computing, user-centered design, interaction design and 3D graphics.


Terminology

Data visualization involves specific terminology, some of which is derived from statistics. For example, author Stephen Few defines two types of data, which are used in combination to support a meaningful analysis or visualization: *Categorical: Represent groups of objects with a particular characteristic. Categorical variables can either be nominal or ordinal. Nominal variables for example gender have no order between them and are thus nominal. Ordinal variables are categories with an order, for sample recording the age group someone falls into. *Quantitative: Represent measurements, such as the height of a person or the temperature of an environment. Quantitative variables can either be continuous or discrete. Continuous variables capture the idea that measurements can always be made more precisely. While discrete variables have only a finite number of possibilities, such as a count of some outcomes or an age measured in whole years. The distinction between quantitative and categorical variables is important because the two types require different methods of visualization. Two primary types of
information displays A display device is an output device for presentation of information in visual or tactile form (the latter used for example in tactile electronic displays for blind people). When the input information that is supplied has an electrical signal the ...
are tables and graphs. *A ''table'' contains quantitative data organized into rows and columns with categorical labels. It is primarily used to look up specific values. In the example above, the table might have categorical column labels representing the name (a ''qualitative variable'') and age (a ''quantitative variable''), with each row of data representing one person (the sampled ''experimental unit'' or ''category subdivision''). *A ''graph'' is primarily used to show relationships among data and portrays values encoded as ''visual objects'' (e.g., lines, bars, or points). Numerical values are displayed within an area delineated by one or more ''axes''. These axes provide ''scales'' (quantitative and categorical) used to label and assign values to the visual objects. Many graphs are also referred to as ''charts''. Eppler and Lengler have developed the "Periodic Table of Visualization Methods," an interactive chart displaying various data visualization methods. It includes six types of data visualization methods: data, information, concept, strategy, metaphor and compound.


Techniques


Other techniques

* Cartogram * Cladogram (phylogeny) * Concept Mapping *
Dendrogram A dendrogram is a diagram representing a tree. This diagrammatic representation is frequently used in different contexts: * in hierarchical clustering, it illustrates the arrangement of the clusters produced by the corresponding analyses. ...
(classification) *
Information visualization reference model The Information visualization reference model is an example of a reference model for information visualization, developed by Ed Chi in 1999, under the name of the ''data state model''. Chi showed that the framework successfully modeled a wide array ...
*
Graph drawing Graph drawing is an area of mathematics and computer science combining methods from geometric graph theory and information visualization to derive two-dimensional depictions of graph (discrete mathematics), graphs arising from applications such a ...
*
Heatmap A heat map (or heatmap) is a data visualization technique that shows magnitude of a phenomenon as color in two dimensions. The variation in color may be by hue or intensity, giving obvious visual cues to the reader about how the phenomenon is clu ...
*
HyperbolicTree A hyperbolic tree (often shortened as hypertree) is an information visualization and graph drawing method inspired by hyperbolic geometry. Displaying hierarchical data as a tree suffers from visual clutter as the number of nodes per level can gr ...
* Multidimensional scaling * Parallel coordinates *
Problem solving environment A problem solving environment (PSE) is a completed, integrated and specialised computer software for solving one class of problems, combining automated problem-solving methods with human-oriented tools for guiding the problem resolution. A PSE may a ...
*
Treemapping In information visualization and computing, treemapping is a method for displaying hierarchical data using nested figures, usually rectangles. Treemaps display hierarchical ( tree-structured) data as a set of nested rectangles. Each branch of ...


Interactivity

Interactive data visualization enables direct actions on a graphical
plot Plot or Plotting may refer to: Art, media and entertainment * Plot (narrative), the story of a piece of fiction Music * ''The Plot'' (album), a 1976 album by jazz trumpeter Enrico Rava * The Plot (band), a band formed in 2003 Other * ''Plot' ...
to change elements and link between multiple plots. Interactive data visualization has been a pursuit of statisticians since the late 1960s. Examples of the developments can be found on the American Statistical Association video lending library. Common interactions include: * Brushing: works by using the
mouse A mouse ( : mice) is a small rodent. Characteristically, mice are known to have a pointed snout, small rounded ears, a body-length scaly tail, and a high breeding rate. The best known mouse species is the common house mouse (''Mus musculus' ...
to control a paintbrush, directly changing the color or glyph of elements of a plot. The paintbrush is sometimes a pointer and sometimes works by drawing an outline of sorts around points; the outline is sometimes irregularly shaped, like a lasso. Brushing is most commonly used when multiple plots are visible and some linking mechanism exists between the plots. There are several different conceptual models for brushing and a number of common linking mechanisms. Brushing
scatterplots A scatter plot (also called a scatterplot, scatter graph, scatter chart, scattergram, or scatter diagram) is a type of Plot (graphics), plot or mathematical diagram using Cartesian coordinate system, Cartesian coordinates to display values fo ...
can be a transient operation in which points in the active plot only retain their new characteristics. At the same time, they are enclosed or intersected by the brush, or it can be a persistent operation, so that points retain their new appearance after the brush has been moved away. Transient brushing is usually chosen for linked brushing, as we have just described. * Painting: Persistent brushing is useful when we want to group the points into clusters and then proceed to use other operations, such as the tour, to compare the groups. It is becoming common terminology to call the persistent operation painting, * Identification: which could also be called labeling or label brushing, is another plot manipulation that can be linked. Bringing the cursor near a point or edge in a scatterplot, or a bar in a barchart, causes a label to appear that identifies the plot element. It is widely available in many interactive graphics, and is sometimes called mouseover. * Scaling: maps the data onto the window, and changes in the area of the. mapping function help us learn different things from the same plot. Scaling is commonly used to zoom in on crowded regions of a scatterplot, and it can also be used to change the aspect ratio of a plot, to reveal different features of the data. * Linking: connects elements selected in one plot with elements in another plot. The simplest kind of linking, one-to-one, where both plots show different projections of the same data, and a point in one plot corresponds to exactly one point in the other. When using area plots, brushing any part of an area has the same effect as brushing it all and is equivalent to selecting all cases in the corresponding category. Even when some plot elements represent more than one case, the underlying linking rule still links one case in one plot to the same case in other plots. Linking can also be by categorical variable, such as by a subject id, so that all data values corresponding to that subject are highlighted, in all the visible plots.


Other perspectives

There are different approaches on the scope of data visualization. One common focus is on information presentation, such as Friedman (2008). Friendly (2008) presumes two main parts of data visualization:
statistical graphics Statistical graphics, also known as statistical graphical techniques, are graphics used in the field of statistics for data visualization. Overview Whereas statistics and data analysis procedures generally yield their output in numeric or tabul ...
, and thematic cartography.
Michael Friendly Michael Louis Friendly (born 1945) is an American-Canadian psychologist, Professor of Psychology at York University in Ontario, Canada, and director of its Statistical Consulting Service, especially known for his contributions to graphical metho ...
(2008)
"Milestones in the history of thematic cartography, statistical graphics, and data visualization"
.
In this line the "Data Visualization: Modern Approaches" (2007) article gives an overview of seven subjects of data visualization: *
Articles Article often refers to: * Article (grammar), a grammatical element used to indicate definiteness or indefiniteness * Article (publishing), a piece of nonfictional prose that is an independent part of a publication Article may also refer to: G ...
& resources * Displaying connections * Displaying
data In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted ...
* Displaying
news News is information about current events. This may be provided through many different Media (communication), media: word of mouth, printing, Mail, postal systems, broadcasting, Telecommunications, electronic communication, or through the tes ...
* Displaying
website A website (also written as a web site) is a collection of web pages and related content that is identified by a common domain name and published on at least one web server. Examples of notable websites are Google Search, Google, Facebook, Amaz ...
s * Mind maps * Tools and services All these subjects are closely related to
graphic design Graphic design is a profession, academic discipline and applied art whose activity consists in projecting visual communications intended to transmit specific messages to social groups, with specific objectives. Graphic design is an interdiscipli ...
and information representation. On the other hand, from a
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
perspective, Frits H. Post in 2002 categorized the field into sub-fields:Frits H. Post, Gregory M. Nielson and Georges-Pierre Bonneau (2002)
''Data Visualization: The State of the Art''
.
* Information visualization *
Interaction techniques An interaction technique, user interface technique or input technique is a combination of hardware and software elements that provides a way for computer users to accomplish a single task. For example, one can go back to the previously visited pa ...
and architectures * Modelling techniques * Multiresolution methods * Visualization
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specificat ...
s and techniques *
Volume visualization Scientific visualization ( also spelled scientific visualisation) is an interdisciplinary branch of science concerned with the visualization of scientific phenomena.Michael Friendly (2008)"Milestones in the history of thematic cartography, sta ...
Within The Harvard Business Review, Scott Berinato developed a framework to approach data visualisation. To start thinking visually, users must consider two questions; 1) What you have and 2) what you're doing. The first step is identifying what data you want visualised. It is data-driven like profit over the past ten years or a conceptual idea like how a specific organisation is structured. Once this question is answered one can then focus on whether they are trying to communicate information (declarative visualisation) or trying to figure something out (exploratory visualisation). Scott Berinato combines these questions to give four types of visual communication that each have their own goals. These four types of visual communication are as follows; * idea illustration (conceptual & declarative). ** Used to teach, explain and/or simply concepts. For example, organisation charts and decision trees. * idea generation (conceptual & exploratory). ** Used to discover, innovate and solve problems. For example, a whiteboard after a brainstorming session. * visual discovery (data-driven & exploratory). ** Used to spot trends and make sense of data. This type of visual is more common with large and complex data where the dataset is somewhat unknown and the task is open-ended. * everyday data-visualisation (data-driven & declarative). ** The most common and simple type of visualisation used for affirming and setting context. For example, a line graph of GDP over time.


Applications

Data and information visualization insights are being applied in areas such as: * Scientific research * Digital libraries * Data mining * Information graphics * Financial data analysis *
Health care Health care or healthcare is the improvement of health via the prevention, diagnosis, treatment, amelioration or cure of disease, illness, injury, and other physical and mental impairments in people. Health care is delivered by health profe ...
* Market studies * Manufacturing production control *
Crime mapping Crime mapping is used by analysts in law enforcement agencies to map, visualize, and analyze crime incident patterns. It is a key component of crime analysis and the CompStat policing strategy. Mapping crime, using Geographic Information Systems ...
*
eGovernance Electronic governance or e-governance is the application of information technology for delivering government services, exchange of information, communication transactions, integration of various stand-alone systems between government to citi ...
and
Policy Modeling Policy is a deliberate system of guidelines to guide decisions and achieve rational outcomes. A policy is a statement of intent and is implemented as a procedure or protocol. Policies are generally adopted by a governance body within an organ ...


Organization

Notable academic and industry laboratories in the field are: * Adobe Research * IBM Research * Google Research *
Microsoft Research Microsoft Research (MSR) is the research subsidiary of Microsoft. It was created in 1991 by Richard Rashid, Bill Gates and Nathan Myhrvold with the intent to advance state-of-the-art computing and solve difficult world problems through technologi ...
*
Panopticon Software Panopticon Software (now part of Altair Data Analytics) was a multi-national data visualization software company specializing in monitoring and analysis of real-time data. The firm was headquartered in Stockholm, Sweden. It partnered with several ...
*
Scientific Computing and Imaging Institute The Scientific Computing and Imaging (SCI) Institute is a permanent research institute at the University of Utah that focuses on the development of new scientific computing and visualization techniques, tools, and systems with primary application ...
*
Tableau Software Tableau Software ( ) is an American interactive data visualization software company focused on business intelligence. It was founded in 2003 in Mountain View, California, and is currently headquartered in Seattle, Washington. In 2019 the compa ...
* University of Maryland Human-Computer Interaction Lab * Vvi Conferences in this field, ranked by significance in data visualization research, are: * IEEE Visualization: An annual international conference on scientific visualization, information visualization, and visual analytics. Conference is held in October. * ACM SIGGRAPH: An annual international conference on computer graphics, convened by the ACM SIGGRAPH organization. Conference dates vary. * EuroVis: An annual Europe-wide conference on data visualization, organized by the Eurographics Working Group on Data Visualization and supported by the IEEE Visualization and Graphics Technical Committee (IEEE VGTC). Conference is usually held in June. * Conference on Human Factors in Computing Systems (CHI): An annual international conference on human–computer interaction, hosted by
ACM ACM or A.C.M. may refer to: Aviation * AGM-129 ACM, 1990–2012 USAF cruise missile * Air chief marshal * Air combat manoeuvring or dogfighting * Air cycle machine * Arica Airport (Colombia) (IATA: ACM), in Arica, Amazonas, Colombia Computing * ...
SIGCHI. Conference is usually held in April or May. * Eurographics: An annual Europe-wide computer graphics conference, held by the European Association for Computer Graphics. Conference is usually held in April or May. * PacificVis: An annual visualization symposium held in the Asia-Pacific region, sponsored by the IEEE Visualization and Graphics Technical Committee (IEEE VGTC). Conference is usually held in March or April. For further examples, see: :Computer graphics organizations


Data presentation architecture

Data presentation architecture (DPA) is a skill-set that seeks to identify, locate, manipulate, format and present data in such a way as to optimally communicate meaning and proper knowledge. Historically, the term ''data presentation architecture'' is attributed to Kelly Lautt: "Data Presentation Architecture (DPA) is a rarely applied skill set critical for the success and value of
Business Intelligence Business intelligence (BI) comprises the strategies and technologies used by enterprises for the data analysis and management of business information. Common functions of business intelligence technologies include reporting, online analytical pr ...
. Data presentation architecture weds the science of numbers, data and statistics in discovering valuable information from data and making it usable, relevant and actionable with the arts of data visualization, communications, organizational psychology and
change management Change management (sometimes abbreviated as CM) is a collective term for all approaches to prepare, support, and help individuals, teams, and organizations in making organizational change. It includes methods that redirect or redefine the use of ...
in order to provide business intelligence solutions with the data scope, delivery timing, format and visualizations that will most effectively support and drive operational, tactical and strategic behaviour toward understood business (or organizational) goals. DPA is neither an IT nor a business skill set but exists as a separate field of expertise. Often confused with data visualization, data presentation architecture is a much broader skill set that includes determining what data on what schedule and in what exact format is to be presented, not just the best way to present data that has already been chosen. Data visualization skills are one element of DPA."


Objectives

DPA has two main objectives: * To use data to provide knowledge in the most efficient manner possible (minimize noise, complexity, and unnecessary data or detail given each audience's needs and roles) * To use data to provide knowledge in the most effective manner possible (provide relevant, timely and complete data to each audience member in a clear and understandable manner that conveys important meaning, is actionable and can affect understanding, behavior and decisions)


Scope

With the above objectives in mind, the actual work of data presentation architecture consists of: * Creating effective delivery mechanisms for each audience member depending on their role, tasks, locations and access to technology * Defining important meaning (relevant knowledge) that is needed by each audience member in each context * Determining the required periodicity of data updates (the currency of the data) * Determining the right timing for data presentation (when and how often the user needs to see the data) * Finding the right data (subject area, historical reach, breadth, level of detail, etc.) * Utilizing appropriate analysis, grouping, visualization, and other presentation formats


Related fields

DPA work shares commonalities with several other fields, including: * Business analysis in determining business goals, collecting requirements, mapping processes. * Business process improvement in that its goal is to improve and streamline actions and decisions in furtherance of business goals * Data visualization in that it uses well-established theories of visualization to add or highlight meaning or importance in data presentation. * Digital humanities explores more nuanced ways of visualising complex data. * Information architecture, but information architecture's focus is on unstructured data and therefore excludes both analysis (in the statistical/data sense) and direct transformation of the actual content (data, for DPA) into new entities and combinations. * HCI and
interaction design Interaction design, often abbreviated as IxD, is "the practice of designing interactive digital products, environments, systems, and services." Beyond the digital aspect, interaction design is also useful when creating physical (non-digital) produ ...
, since many of the principles in how to design interactive data visualisation have been developed cross-disciplinary with HCI. *
Visual journalism Visual journalism is the practice of strategically combining words and images to convey information. Universal Visual journalism is premised upon the idea that at a time of accelerating change, often words cannot keep pace with concepts. Visual ...
and data-driven journalism or
data journalism Data journalism or data-driven journalism (DDJ) is a journalistic process based on analyzing and filtering large data sets for the purpose of creating or elevating a news story. Data journalism is a type of journalism reflecting the increased ...
: Visual journalism is concerned with all types of graphic facilitation of the telling of news stories, and data-driven and data journalism are not necessarily told with data visualisation. Nevertheless, the field of journalism is at the forefront in developing new data visualisations to communicate data. *
Graphic design Graphic design is a profession, academic discipline and applied art whose activity consists in projecting visual communications intended to transmit specific messages to social groups, with specific objectives. Graphic design is an interdiscipli ...
, conveying information through styling, typography, position, and other aesthetic concerns.


See also

*
Analytics Analytics is the systematic computational analysis of data or statistics. It is used for the discovery, interpretation, and communication of meaningful patterns in data. It also entails applying data patterns toward effective decision-making. It ...
*
Big data Though used sometimes loosely partly because of a lack of formal definition, the interpretation that seems to best describe Big data is the one associated with large body of information that we could not comprehend when used only in smaller am ...
* Climate change art *
Color coding in data visualization Data visualization achieves its significance today due to information technology: big data processed in computers with capable visualization software, combined with statistical techniques and color coding on electronic displays. This article is ...
*
Computational visualistics The term Computational visualistics is used for addressing the whole range of investigating pictures scientifically "in" the computer.< ...
* Information art *
Data management Data management comprises all disciplines related to handling data as a valuable resource. Concept The concept of data management arose in the 1980s as technology moved from sequential processing (first punched cards, then magnetic tape) to r ...
*
Data Presentation Architecture Data and information visualization (data viz or info viz) is an interdisciplinary field that deals with the graphic representation of data and information. It is a particularly efficient way of communicating when the data or information is num ...
* Data profiling * Data warehouse * Geovisualization *
Grand Tour (data visualisation) The Grand Tour is a technique developed by Daniel Asimov in 1985, which is used to explore multivariate statistical data by means of an animation. The animation, or "movie", consists of a series of distinct views of the data as seen from different ...
* imc FAMOS (1987), graphical data analysis * Infographics *
Information design Information design is the practice of presenting information in a way that fosters an efficient and effective understanding of the information. The term has come to be used for a specific area of graphic design related to displaying information ...
*
Information management Information management (IM) concerns a cycle of organizational activity: the acquisition of information from one or more sources, the custodianship and the distribution of that information to those who need it, and its ultimate disposal throug ...
*
List of information graphics software This is a list of software to create any kind of information graphics: * either includes the ability to create one or more infographics from a provided data set * either it is provided specifically for information visualization Vector graphics V ...
* List of countries by economic complexity, example of Treemapping *
Patent visualisation Patent visualisation is an application of information visualisation. The number of patents has been increasing steadily, thus forcing companies to consider intellectual property as a part of their strategy. Patent visualisation, like patent mapping ...
*
Software visualization Software visualization or software visualisation refers to the visualization of information of and related to software systems—either the architecture of its source code or metrics of their runtime behavior—and their development process by mea ...
*
Statistical analysis Statistical inference is the process of using data analysis to infer properties of an underlying distribution of probability.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical analysis infers propertie ...
* Visual analytics *
Warming stripes Warming stripes (sometimes referred to as climate stripes, climate timelines or stripe graphics) are data visualization graphics that use a series of coloured stripes chronologically ordered to visually portray long-term temperature trends. Wa ...


Notes


References


Further reading

* * * * * * *
Ben Bederson Benjamin Bederson is a Computer Science professor at the University of Maryland, College Park, a member of the University of Maryland Human-Computer Interaction Lab, and a co-founder of Zumobi. His father is Benjamin Bederson, Sr., a Professor ...
and Ben Shneiderman (2003)
''The Craft of Information Visualization: Readings and Reflections''
Morgan Kaufmann. *
Stuart K. Card Stuart K. Card (born December 21, 1943), an American researcher and retired senior research fellow at Xerox PARC, is considered to be one of the pioneers of applying human factors in human–computer interaction. With Jock D. Mackinlay, George G. ...
,
Jock D. Mackinlay Jock D. Mackinlay (born August 16, 1952) is an American information visualization expert and Vice President of Research and Design at Tableau Software. With Stuart K. Card, George G. Robertson and others he invented a number of Information Visual ...
and Ben Shneiderman (1999)
''Readings in Information Visualization: Using Vision to Think''
Morgan Kaufmann Publishers. * Jeffrey Heer,
Stuart K. Card Stuart K. Card (born December 21, 1943), an American researcher and retired senior research fellow at Xerox PARC, is considered to be one of the pioneers of applying human factors in human–computer interaction. With Jock D. Mackinlay, George G. ...
, James Landay (2005)
"Prefuse: a toolkit for interactive information visualization"
In: ''ACM Human Factors in Computing Systems'' CHI 2005. * Andreas Kerren, John T. Stasko,
Jean-Daniel Fekete Jean-Daniel Fekete is a French computer scientist. Education Fekete received his PhD from the Paris-Saclay University in 1996.) He obtained his Habilitation in 2005, entitled "Nouvelle génération d'Interfaces Homme-Machine pour mieux agir ...
, and Chris North (2008)
''Information Visualization – Human-Centered Issues and Perspectives''
Volume 4950 of LNCS State-of-the-Art Survey, Springer. * Riccardo Mazza (2009)
''Introduction to Information Visualization''
Springer. * Spence, Robert ''Information Visualization: Design for Interaction (2nd Edition)'', Prentice Hall, 2007, . * Colin Ware (2000)
''Information Visualization: Perception for design''
San Francisco, CA: Morgan Kaufmann. * Kawa Nazemi (2014)
Adaptive Semantics Visualization
Eurographics Association.


External links


Milestones in the History of Thematic Cartography, Statistical Graphics, and Data Visualization
An illustrated chronology of innovations by Michael Friendly and Daniel J. Denis.
Duke University-Christa Kelleher Presentation-Communicating through infographics-visualizing scientific & engineering information-March 6, 2015
{{DEFAULTSORT:Data Visualization Visualization (graphics) Statistical charts and diagrams Information technology governance
Visualization Visualization or visualisation may refer to: *Visualization (graphics), the physical or imagining creation of images, diagrams, or animations to communicate a message * Data visualization, the graphic representation of data * Information visualiz ...
de:Informationsvisualisierung