Lost in Magnitudes: Exploring the Design Space for Visualizing Data with Large Value Ranges

Read original: arXiv:2404.15150 - Published 4/24/2024 by Katerina Batziakoudi, Florent Cabric, St'ephanie Rey, Jean-Daniel Fekete

📊

Overview

This paper explores the design space for static visualizations of datasets with quantitative attributes that vary across multiple orders of magnitude, referred to as Orders of Magnitude Values (OMVs).
The authors provide design guidelines and recommendations for effective visual encodings of OMVs, as current visualization techniques like linear or logarithmic scales have limitations in performing simple tasks with OMVs.
The paper's main contributions include: 1) presenting the design space of OMVs, 2) generating a large number of OMV visualizations, some of which are novel and effective, 3) defining a new scale called E+M for OMVs, and 4) providing guidelines and recommendations for designing effective OMV visualizations.

Plain English Explanation

The paper focuses on datasets that contain values that can be very different in size, sometimes by a factor of a thousand or more. For example, the dataset might include both tiny values, like 0.001, and huge values, like 1,000,000. The authors call these "Orders of Magnitude Values" or OMVs.

Current visualization techniques, such as using linear or logarithmic scales, have trouble effectively displaying OMVs. Linear scales can't show the smaller values well, while logarithmic scales can be hard for the general public to understand.

The authors propose a new approach that separates the OMV into two parts: the "mantissa" (the main number) and the "exponent" (the power of 10). This allows them to visually encode both parts in a way that makes the data easier to understand and analyze.

The authors systematically generated a large number of possible visualizations for datasets with OMVs, using different types of visual elements and encodings. They then evaluated these visualizations and provide guidelines on the most effective ways to visualize OMVs.

The goal is to improve visualization tools and techniques to better support datasets with values that span multiple orders of magnitude, making it easier for people to understand and work with this type of data.

Technical Explanation

The paper starts by describing the design space for visualizing datasets with OMVs. The authors use four datasets, each with two attributes: an OMV (divided into mantissa and exponent) and a second attribute that is nominal, ordinal, time-based, or quantitative.

The authors systematically generated all possible visualizations for these datasets, using different visual marks and visual channels. They then refined this design space by enforcing integrity constraints from visualization and graphical perception literature.

Through a qualitative assessment of the viable visualizations, the authors discuss the most effective encodings for OMVs, focusing on the effectiveness of different visual channels and the ability to perform various tasks. This led to the definition of a new scale called E+M, which separately encodes the exponent and mantissa of the OMV.

The paper's main contributions are:

The presentation of the design space of OMVs
The generation of a large number of OMV visualizations, some of which are novel and effective
The definition of the E+M scale for OMVs
Guidelines and recommendations for designing effective OMV visualizations

Critical Analysis

The paper provides a comprehensive exploration of the design space for visualizing OMVs and presents several novel and effective visualization techniques. The systematic approach of generating and evaluating a large number of possible visualizations is a strength of the research.

However, the paper does not provide any user studies or empirical evaluations of the proposed visualizations. While the authors discuss the effectiveness of different encodings based on visualization and perception principles, it would be valuable to see how these visualizations perform in practice with end-users.

Additionally, the paper focuses on static visualizations and does not consider the potential benefits of using interactive or animated techniques to better convey OMVs. Further research could explore how dynamic visualizations, such as those discussed in MDD-Glyphs: Immersive Insights through Multidimensional Distribution, might enhance the understanding of OMVs.

Another potential area for future work is to investigate how the proposed visualizations and guidelines could be integrated into existing data visualization tools and systems, as mentioned in What We Augment When We Augment Visualizations and Delve into Earth's Past: Visualization-Based Exhibit.

Conclusion

This paper makes a valuable contribution to the field of data visualization by exploring the design space for visualizing datasets with OMVs. The authors present a systematic approach to generating and evaluating a wide range of visualizations, leading to the definition of the E+M scale and a set of guidelines for effectively encoding OMVs.

The insights from this research can help improve the design of visualization tools and techniques, enabling better support for datasets with values that span multiple orders of magnitude. This, in turn, can enhance the ability of users to understand, analyze, and draw insights from these types of datasets, which are increasingly common in fields such as science, engineering, and finance.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

📊

Lost in Magnitudes: Exploring the Design Space for Visualizing Data with Large Value Ranges

Katerina Batziakoudi, Florent Cabric, St'ephanie Rey, Jean-Daniel Fekete

We explore the design space for the static visualization of datasets with quantitative attributes that vary over multiple orders of magnitude-we call these attributes Orders of Magnitude Values (OMVs)-and provide design guidelines and recommendations on effective visual encodings for OMVs. Current charts rely on linear or logarithmic scales to visualize values, leading to limitations in performing simple tasks for OMVs. In particular, linear scales prevent the reading of smaller magnitudes and their comparisons, while logarithmic scales are challenging for the general public to understand. Our design space leverages the approach of dividing OMVs into two different parts: mantissa and exponent, in a way similar to scientific notation. This separation allows for a visual encoding of both parts. For our exploration, we use four datasets, each with two attributes: an OMV, divided into mantissa and exponent, and a second attribute that is nominal, ordinal, time, or quantitative. We start from the original design space described by the Grammar of Graphics and systematically generate all possible visualizations for these datasets, employing different marks and visual channels. We refine this design space by enforcing integrity constraints from visualization and graphical perception literature. Through a qualitative assessment of all viable combinations, we discuss the most effective visualizations for OMVs, focusing on channel and task effectiveness. The article's main contributions are 1) the presentation of the design space of OMVs, 2) the generation of a large number of OMV visualizations, among which some are novel and effective, 3) the refined definition of a scale that we call E+M for OMVs, and 4) guidelines and recommendations for designing effective OMV visualizations. These efforts aim to enrich visualization systems to better support data with OMVs and guide future research.

4/24/2024

A Design Space for Visualization with Large Scale-Item Ratios

Mara Solen, Tamara Munzner

The scale-item ratio is the relationship between the largest scale and the smallest item in a visualization. Designing visualizations when this ratio is large can be challenging, and designers have developed many approaches to overcome this challenge. We present a design space for visualization with large scale-item ratios. The design space includes three dimensions, with eight total subdimensions. We demonstrate its descriptive power by using it to code approaches from a corpus we compiled of 54 examples, created by a mix of academics and practitioners. We then partition these examples into five strategies, which are shared approaches with respect to design space dimension choices. We demonstrate generative power by analyzing missed opportunities within the corpus of examples, identified through analysis of the design space, where we note how certain examples could have benefited from different choices. Supplemental materials: https://osf.io/wbrdm/?view_only=04389a2101a04e71a2c208a93bf2f7f2

4/3/2024

👀

Metric Space Magnitude for Evaluating the Diversity of Latent Representations

Katharina Limbeck, Rayna Andreeva, Rik Sarkar, Bastian Rieck

The magnitude of a metric space is a novel invariant that provides a measure of the 'effective size' of a space across multiple scales, while also capturing numerous geometrical properties, such as curvature, density, or entropy. We develop a family of magnitude-based measures of the intrinsic diversity of latent representations, formalising a novel notion of dissimilarity between magnitude functions of finite metric spaces. Our measures are provably stable under perturbations of the data, can be efficiently calculated, and enable a rigorous multi-scale characterisation and comparison of latent representations. We show their utility and superior performance across different domains and tasks, including (i) the automated estimation of diversity, (ii) the detection of mode collapse, and (iii) the evaluation of generative models for text, image, and graph data.

6/24/2024

📊

Toward the Categorical Data Map

Frederik L. Dennig, Lucas Joos, Patrick Paetzold, Daniela Blumberg, Oliver Deussen, Daniel A. Keim, Maximilian T. Fischer

Categorical data does not have an intrinsic definition of distance or order, and therefore, established visualization techniques for categorical data only allow for a set-based or frequency-based analysis, e.g., through Euler diagrams or Parallel Sets, and do not support a similarity-based analysis. We present a novel dimensionality reduction-based visualization for categorical data, which is based on defining the distance of two data items as the number of varying attributes. Our technique enables users to pre-attentively detect groups of similar data items and observe the properties of the projection, such as attributes strongly influencing the embedding. Our prototype visually encodes data properties in an enhanced scatterplot-like visualization, encoding attributes in the background to show the distribution of categories. In addition, we propose two graph-based measures to quantify the plot's visual quality, which rank attributes according to their contribution to cluster cohesion. To demonstrate the capabilities of our similarity-based approach, we compare it to Euler diagrams and Parallel Sets regarding visual scalability and show its benefits through an expert study with five data scientists analyzing the Titanic and Mushroom datasets with up to 23 attributes and 8124 category combinations. Our results indicate that the Categorical Data Map offers an effective analysis method, especially for large datasets with a high number of category combinations.

8/27/2024