Data and information visualization

1

Data and information visualization (data viz/vis or info viz/vis) is the practice of designing and creating easy-to-communicate and easy-to-understand graphic or visual representations of a large amount of complex quantitative and qualitative data and information with the help of static, dynamic or interactive visual items. Typically based on data and information collected from a certain domain of expertise, these visualizations are intended for a broader audience to help them visually explore and discover, quickly understand, interpret and gain important insights into otherwise difficult-to-identify structures, relationships, correlations, local and global patterns, trends, variations, constancy, clusters, outliers and unusual groupings within data (exploratory visualization). When intended for the general public (mass communication) to convey a concise version of known, specific information in a clear and engaging manner (presentational or explanatory visualization), it is typically called information graphics. Data visualization is concerned with visually presenting sets of primarily quantitative raw data in a schematic form. The visual formats used in data visualization include tables, charts and graphs (e.g. pie charts, bar charts, line charts, area charts, cone charts, pyramid charts, donut charts, histograms, spectrograms, cohort charts, waterfall charts, funnel charts, bullet graphs, etc.), diagrams, plots (e.g. scatter plots, distribution plots, box-and-whisker plots), geospatial maps (such as proportional symbol maps, choropleth maps, isopleth maps and heat maps), figures, correlation matrices, percentage gauges, etc., which sometimes can be combined in a dashboard. Information visualization, on the other hand, deals with multiple, large-scale and complicated datasets which contain quantitative (numerical) data as well as qualitative (non-numerical, i.e. verbal or graphical) and primarily abstract information and its goal is to add value to raw data, improve the viewers' comprehension, reinforce their cognition and help them derive insights and make decisions as they navigate and interact with the computer-supported graphical display. Visual tools used in information visualization include maps (such as tree maps), animations, infographics, Sankey diagrams, flow charts, network diagrams, semantic networks, entity-relationship diagrams, venn diagrams, timelines, mind maps, etc. Emerging technologies like virtual, augmented and mixed reality have the potential to make information visualization more immersive, intuitive, interactive and easily manipulable and thus enhance the user's visual perception and cognition. In data and information visualization, the goal is to graphically present and explore abstract, non-physical and non-spatial data collected from databases, information systems, file systems, documents, business data, etc. (presentational and exploratory visualization) which is different from the field of scientific visualization, where the goal is to render realistic images based on physical and spatial scientific data to confirm or reject hypotheses (confirmatory visualization). Effective data visualization is properly sourced, contextualized, simple and uncluttered. The underlying data is accurate and up-to-date to make sure that insights are reliable. Graphical items are well-chosen for the given datasets and aesthetically appealing, with shapes, colors and other visual elements used deliberately in a meaningful and non-distracting manner. The visuals are accompanied by supporting texts (labels and titles). These verbal and graphical components complement each other to ensure clear, quick and memorable understanding. Effective information visualization is aware of the needs and concerns and the level of expertise of the target audience, deliberately guiding them to the intended conclusion. Such effective visualization can be used not only for conveying specialized, complex, big data-driven ideas to a wider group of non-technical audience in a visually appealing, engaging and accessible manner, but also to domain experts and executives for making decisions, monitoring performance, generating new ideas and stimulating research. In addition, data scientists, data analysts and data mining specialists use data visualization to check the quality of data, find errors, unusual gaps and missing values in data, clean data, explore the structures and features of data and assess outputs of data-driven models. In business, data and information visualization can constitute a part of data storytelling, where they are paired with a coherent narrative structure or storyline to contextualize the analyzed data and communicate the insights gained from analyzing the data clearly and memorably with the goal of convincing the audience into making a decision or taking an action in order to create business value. This can be contrasted with the field of statistical graphics, where complex statistical data are communicated graphically in an accurate and precise manner among researchers and analysts with statistical expertise to help them perform exploratory data analysis or to convey the results of such analyses, where visual appeal, capturing attention to a certain issue and storytelling are not as important. The field of data and information visualization is of interdisciplinary nature as it incorporates principles found in the disciplines of descriptive statistics (as early as the 18th century), visual communication, graphic design, cognitive science and, more recently, interactive computer graphics and human-computer interaction. Since effective visualization requires design skills, statistical skills and computing skills, it is argued by authors such as Gershon and Page that it is both an art and a science. The neighboring field of visual analytics marries statistical data analysis, data and information visualization and human analytical reasoning through interactive visual interfaces to help human users reach conclusions, gain actionable insights and make informed decisions which are otherwise difficult for computers to do. Research into how people read and misread various types of visualizations is helping to determine what types and features of visualizations are most understandable and effective in conveying information. On the other hand, unintentionally poor or intentionally misleading and deceptive visualizations (misinformative visualization) can function as powerful tools which disseminate misinformation, manipulate public perception and divert public opinion toward a certain agenda. Thus data visualization literacy has become an important component of data and information literacy in the information age akin to the roles played by textual, mathematical and visual literacy in the past.

Overview

The field of data and information visualization has emerged "from research in human–computer interaction, computer science, graphics, visual design, psychology, and business methods. It is increasingly applied as a critical component in scientific research, digital libraries, data mining, financial data analysis, market studies, manufacturing production control, and drug discovery". Data and information visualization presumes that "visual representations and interaction techniques take advantage of the human eye's broad bandwidth pathway into the mind to allow users to see, explore, and understand large amounts of information at once. Information visualization focused on the creation of approaches for conveying abstract information in intuitive ways." Data analysis is an indispensable part of all applied research and problem solving in industry. The most fundamental data analysis approaches are visualization (histograms, scatter plots, surface plots, tree maps, parallel coordinate plots, etc.), statistics (hypothesis test, regression, PCA, etc.), data mining (association mining, etc.), and machine learning methods (clustering, classification, decision trees, etc.). Among these approaches, information visualization, or visual data analysis, is the most reliant on the cognitive skills of human analysts, and allows the discovery of unstructured actionable insights that are limited only by human imagination and creativity. The analyst does not have to learn any sophisticated methods to be able to interpret the visualizations of the data. Information visualization is also a hypothesis generation scheme, which can be, and is typically followed by more analytical or formal analysis, such as statistical hypothesis testing. To communicate information clearly and efficiently, data visualization uses statistical graphics, plots, information graphics and other tools. Numerical data may be encoded using dots, lines, or bars, to visually communicate a quantitative message. Effective visualization helps users analyze and reason about data and evidence. It makes complex data more accessible, understandable, and usable, but can also be reductive. Users may have particular analytical tasks, such as making comparisons or understanding causality, and the design principle of the graphic (i.e., showing comparisons or showing causality) follows the task. Tables are generally used where users will look up a specific measurement, while charts of various types are used to show patterns or relationships in the data for one or more variables. Data visualization refers to the techniques used to communicate data or information by encoding it as visual objects (e.g., points, lines, or bars) contained in graphics. The goal is to communicate information clearly and efficiently to users. It is one of the steps in data analysis or data science. According to Vitaly Friedman (2008) the "main goal of data visualization is to communicate information clearly and effectively through graphical means. It doesn't mean that data visualization needs to look boring to be functional or extremely sophisticated to look beautiful. To convey ideas effectively, both aesthetic form and functionality need to go hand in hand, providing insights into a rather sparse and complex data set by communicating its key aspects in a more intuitive way. Yet designers often fail to achieve a balance between form and function, creating gorgeous data visualizations which fail to serve their main purpose — to communicate information". Indeed, Fernanda Viegas and Martin M. Wattenberg suggested that an ideal visualization should not only communicate clearly, but stimulate viewer engagement and attention. Data visualization is closely related to information graphics, information visualization, scientific visualization, exploratory data analysis and statistical graphics. In the new millennium, data visualization has become an active area of research, teaching and development. According to Post et al. (2002), it has united scientific and information visualization. In the commercial environment data visualization is often referred to as dashboards. Infographics are another very common form of data visualization.

Principles

Characteristics of effective graphical displays

Edward Tufte has explained that users of information displays are executing particular analytical tasks such as making comparisons. The design principle of the information graphic should support the analytical task. As William Cleveland and Robert McGill show, different graphical elements accomplish this more or less effectively. For example, dot plots and bar charts outperform pie charts. In his 1983 book The Visual Display of Quantitative Information, Edward Tufte defines 'graphical displays' and principles for effective graphical display in the following passage: "Excellence in statistical graphics consists of complex ideas communicated with clarity, precision, and efficiency. Graphical displays should: Graphics reveal data. Indeed, graphics can be more precise and revealing than conventional statistical computations." For example, the Minard diagram shows the losses suffered by Napoleon's army in the 1812–1813 period. Six variables are plotted: the size of the army, its location on a two-dimensional surface (x and y), time, the direction of movement, and temperature. The line width illustrates a comparison (size of the army at points in time), while the temperature axis suggests a cause of the change in army size. This multivariate display on a two-dimensional surface tells a story that can be grasped immediately while identifying the source data to build credibility. Tufte wrote in 1983 that: "It may well be the best statistical graphic ever drawn." Not applying these principles may result in misleading graphs, distorting the message, or supporting an erroneous conclusion. According to Tufte, chartjunk refers to the extraneous interior decoration of the graphic that does not enhance the message or gratuitous three-dimensional or perspective effects. Needlessly separating the explanatory key from the image itself, requiring the eye to travel back and forth from the image to the key, is a form of "administrative debris." The ratio of "data to ink" should be maximized, erasing non-data ink where feasible. The Congressional Budget Office summarized several best practices for graphical displays in a June 2014 presentation. These included: a) Knowing your audience; b) Designing graphics that can stand alone outside the report's context; and c) Designing graphics that communicate the key messages in the report.

Quantitative messages

Author Stephen Few described eight types of quantitative messages that users may attempt to understand or communicate from a set of data and the associated graphs used to help communicate the message: Analysts reviewing a set of data may consider whether some or all of the messages and graphic types above are applicable to their task and audience. The process of trial and error to identify meaningful relationships and messages in the data is part of exploratory data analysis.

Visual perception and data visualization

A human can distinguish differences in line length, shape, orientation, distances, and color (hue) readily without significant processing effort; these are referred to as "pre-attentive attributes". For example, it may require significant time and effort ("attentive processing") to identify the number of times the digit "5" appears in a series of numbers; but if that digit is different in size, orientation, or color, instances of the digit can be noted quickly through pre-attentive processing. Compelling graphics take advantage of pre-attentive processing and attributes and the relative strength of these attributes. For example, since humans can more easily process differences in line length than surface area, it may be more effective to use a bar chart (which takes advantage of line length to show comparison) rather than pie charts (which use surface area to show comparison).

Human perception/cognition and data visualization

Almost all data visualizations are created for human consumption. Knowledge of human perception and cognition is necessary when designing intuitive visualizations. Cognition refers to processes in human beings like perception, attention, learning, memory, thought, concept formation, reading, and problem solving. Human visual processing is efficient in detecting changes and making comparisons between quantities, sizes, shapes and variations in lightness. When properties of symbolic data are mapped to visual properties, humans can browse through large amounts of data efficiently. It is estimated that 2/3 of the brain's neurons can be involved in visual processing. Proper visualization provides a different approach to show potential connections, relationships, etc. which are not as obvious in non-visualized quantitative data. Visualization can become a means of data exploration. Studies have shown individuals used on average 19% less cognitive resources, and 4.5% better able to recall details when comparing data visualization with text.

History

The modern study of visualization started with computer graphics, which "has from its beginning been used to study scientific problems. However, in its early days the lack of graphics power often limited its usefulness. The recent emphasis on visualization started in 1987 with the special issue of Computer Graphics on Visualization in Scientific Computing. Since then there have been several conferences and workshops, co-sponsored by the IEEE Computer Society and ACM SIGGRAPH". They have been devoted to the general topics of data visualization, information visualization and scientific visualization, and more specific areas such as volume visualization. In 1786, William Playfair published the first presentation graphics. There is no comprehensive 'history' of data visualization. There are no accounts that span the entire development of visual thinking and the visual representation of data, and which collate the contributions of disparate disciplines. Michael Friendly and Daniel J Denis of York University are engaged in a project that attempts to provide a comprehensive history of visualization. Contrary to general belief, data visualization is not a modern development. Since prehistory, stellar data, or information such as location of stars were visualized on the walls of caves (such as those found in Lascaux Cave in Southern France) since the Pleistocene era. Physical artefacts such as Mesopotamian clay tokens (5500 BC), Inca quipus (2600 BC) and Marshall Islands stick charts (n.d.) can also be considered as visualizing quantitative information. The first documented data visualization can be tracked back to 1160 B.C. with Turin Papyrus Map which accurately illustrates the distribution of geological resources and provides information about quarrying of those resources. Such maps can be categorized as thematic cartography, which is a type of data visualization that presents and communicates specific data and information through a geographical illustration designed to show a particular theme connected with a specific geographic area. Earliest documented forms of data visualization were various thematic maps from different cultures and ideograms and hieroglyphs that provided and allowed interpretation of information illustrated. For example, Linear B tablets of Mycenae provided a visualization of information regarding Late Bronze Age era trades in the Mediterranean. The idea of coordinates was used by ancient Egyptian surveyors in laying out towns, earthly and heavenly positions were located by something akin to latitude and longitude at least by 200 BC, and the map projection of a spherical Earth into latitude and longitude by Claudius Ptolemy [c. 85–c. 165] in Alexandria would serve as reference standards until the 14th century. The invention of paper and parchment allowed further development of visualizations throughout history. Figure shows a graph from the 10th or possibly 11th century that is intended to be an illustration of the planetary movement, used in an appendix of a textbook in monastery schools. The graph apparently was meant to represent a plot of the inclinations of the planetary orbits as a function of the time. For this purpose, the zone of the zodiac was represented on a plane with a horizontal line divided into thirty parts as the time or longitudinal axis. The vertical axis designates the width of the zodiac. The horizontal scale appears to have been chosen for each planet individually for the periods cannot be reconciled. The accompanying text refers only to the amplitudes. The curves are apparently not related in time. By the 16th century, techniques and instruments for precise observation and measurement of physical quantities, and geographic and celestial position were well-developed (for example, a "wall quadrant" constructed by Tycho Brahe [1546–1601], covering an entire wall in his observatory). Particularly important were the development of triangulation and other methods to determine mapping locations accurately. Very early, the measure of time led scholars to develop innovative way of visualizing the data (e.g. Lorenz Codomann in 1596, Johannes Temporarius in 1596 ). French philosopher and mathematician René Descartes and Pierre de Fermat developed analytic geometry and two-dimensional coordinate system which heavily influenced the practical methods of displaying and calculating values. Fermat and Blaise Pascal's work on statistics and probability theory laid the groundwork for what we now conceptualize as data. According to the Interaction Design Foundation, these developments allowed and helped William Playfair, who saw potential for graphical communication of quantitative data, to generate and develop graphical methods of statistics. In the second half of the 20th century, Jacques Bertin used quantitative graphs to represent information "intuitively, clearly, accurately, and efficiently". John Tukey and Edward Tufte pushed the bounds of data visualization; Tukey with his new statistical approach of exploratory data analysis and Tufte with his book "The Visual Display of Quantitative Information" paved the way for refining data visualization techniques for more than statisticians. With the progression of technology came the progression of data visualization; starting with hand-drawn visualizations and evolving into more technical applications – including interactive designs leading to software visualization. Programs like SAS, SOFA, R, Minitab, Cornerstone and more allow for data visualization in the field of statistics. Other data visualization applications, more focused and unique to individuals, programming languages such as D3, Python and JavaScript help to make the visualization of quantitative data a possibility. Private schools have also developed programs to meet the demand for learning data visualization and associated programming libraries, including free programs like The Data Incubator or paid programs like General Assembly. Beginning with the symposium "Data to Discovery" in 2013, ArtCenter College of Design, Caltech and JPL in Pasadena have run an annual program on interactive data visualization. The program asks: How can interactive data visualization help scientists and engineers explore their data more effectively? How can computing, design, and design thinking help maximize research results? What methodologies are most effective for leveraging knowledge from these fields? By encoding relational information with appropriate visual and interactive characteristics to help interrogate, and ultimately gain new insight into data, the program develops new interdisciplinary approaches to complex science problems, combining design thinking and the latest methods from computing, user-centered design, interaction design and 3D graphics.

Terminology

Data visualization involves specific terminology, some of which is derived from statistics. For example, author Stephen Few defines two types of data, which are used in combination to support a meaningful analysis or visualization: The distinction between quantitative and categorical variables is important because the two types require different methods of visualization. Two primary types of information displays are tables and graphs. Eppler and Lengler have developed the "Periodic Table of Visualization Methods," an interactive chart displaying various data visualization methods. It includes six types of data visualization methods: data, information, concept, strategy, metaphor and compound. In "Visualization Analysis and Design" Tamara Munzner writes "Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks more effectively." Munzner agues that visualization "is suitable when there is a need to augment human capabilities rather than replace people with computational decision-making methods."

Techniques

Other techniques

Interactivity

Interactive data visualization enables direct actions on a graphical plot to change elements and link between multiple plots. Interactive data visualization has been a pursuit of statisticians since the late 1960s. Examples of the developments can be found on the American Statistical Association video lending library. Common interactions include:

Other perspectives

There are different approaches on the scope of data visualization. One common focus is on information presentation, such as Friedman (2008). Friendly (2008) presumes two main parts of data visualization: statistical graphics, and thematic cartography. In this line the "Data Visualization: Modern Approaches" (2007) article gives an overview of seven subjects of data visualization: All these subjects are closely related to graphic design and information representation. On the other hand, from a computer science perspective, Frits H. Post in 2002 categorized the field into sub-fields: Within The Harvard Business Review, Scott Berinato developed a framework to approach data visualisation. To start thinking visually, users must consider two questions; 1) What you have and 2) what you're doing. The first step is identifying what data you want visualised. It is data-driven like profit over the past ten years or a conceptual idea like how a specific organisation is structured. Once this question is answered one can then focus on whether they are trying to communicate information (declarative visualisation) or trying to figure something out (exploratory visualisation). Scott Berinato combines these questions to give four types of visual communication that each have their own goals. These four types of visual communication are as follows;

Applications

Data and information visualization insights are being applied in areas such as:

Organization

Notable academic and industry laboratories in the field are: Conferences in this field, ranked by significance in data visualization research, are: For further examples, see: Category:Computer graphics organizations

Data presentation architecture

Data presentation architecture (DPA) is a skill-set that seeks to identify, locate, manipulate, format and present data in such a way as to optimally communicate meaning and proper knowledge. Historically, the term data presentation architecture is attributed to Kelly Lautt: "Data Presentation Architecture (DPA) is a rarely applied skill set critical for the success and value of Business Intelligence. Data presentation architecture weds the science of numbers, data and statistics in discovering valuable information from data and making it usable, relevant and actionable with the arts of data visualization, communications, organizational psychology and change management in order to provide business intelligence solutions with the data scope, delivery timing, format and visualizations that will most effectively support and drive operational, tactical and strategic behaviour toward understood business (or organizational) goals. DPA is neither an IT nor a business skill set but exists as a separate field of expertise. Often confused with data visualization, data presentation architecture is a much broader skill set that includes determining what data on what schedule and in what exact format is to be presented, not just the best way to present data that has already been chosen. Data visualization skills are one element of DPA."

Objectives

DPA has two main objectives:

Scope

With the above objectives in mind, the actual work of data presentation architecture consists of:

Related fields

DPA work shares commonalities with several other fields, including:

This article is derived from Wikipedia and licensed under CC BY-SA 4.0. View the original article.

Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc.
Bliptext is not affiliated with or endorsed by Wikipedia or the Wikimedia Foundation.

View original