Evaluating the Semantic Profiling Abilities of LLMs for Natural Language Utterances in Data Visualization

Read original: arXiv:2407.06129 - Published 7/10/2024 by Hannah K. Bako, Arshnoor Bhutani, Xinyi Liu, Kwesi A. Cobbina, Zhicheng Liu

Evaluating the Semantic Profiling Abilities of LLMs for Natural Language Utterances in Data Visualization

Related Work

The paper discusses prior research on using large language models (LLMs) for data visualization tasks. It cites several relevant papers, including Automated Data Visualization from Natural Language via Retrieval-Based Model, Can LLMs Generate Visualizations from Dataless Prompts?, and A Preliminary Roadmap for LLMs as Assistants for Exploring and Analyzing Data. These studies have explored the potential of LLMs to understand natural language prompts and generate appropriate data visualizations.

Collating Natural Language Utterances

The paper describes the process of collecting a dataset of natural language utterances related to data visualization. This involved crowdsourcing prompts from participants and filtering them to create a high-quality, diverse set of examples. The resulting dataset covers a wide range of visualization types and user intents, providing a comprehensive testbed for evaluating LLM capabilities.

Evaluating Semantic Profiling Abilities

The core of the paper focuses on assessing how well LLMs can understand and interpret the semantics of the natural language utterances in the dataset. This involves measuring the models' ability to accurately profile the key elements of the prompts, such as the visualization type, data characteristics, and user intent. The authors employ various evaluation metrics to quantify the models' performance and provide insights into their strengths and limitations.

Insights and Implications

The findings from the semantic profiling evaluation offer valuable insights into the current state of LLM capabilities for data visualization tasks. The paper discusses the models' strong performance on certain aspects, such as recognizing common visualization types, but also highlights areas where they struggle, such as understanding more complex user intents or data characteristics. These insights can inform the ongoing development of LLM-powered data visualization tools and guide future research directions.

Limitations and Future Work

The paper acknowledges several limitations of the study, such as the potential for bias in the crowdsourced dataset or the focus on a specific set of LLMs. It also suggests avenues for future research, such as exploring the use of LLMs for other data visualization tasks, like generation or interactive exploration, or expanding the evaluation to include a wider range of models and user scenarios.

Conclusion

Overall, this paper provides a comprehensive assessment of the semantic profiling abilities of LLMs for natural language utterances in the context of data visualization. By leveraging a curated dataset and rigorous evaluation, the authors shed light on the current capabilities and limitations of these models, laying the groundwork for further advancements in the field of natural language-driven data visualization.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Evaluating the Semantic Profiling Abilities of LLMs for Natural Language Utterances in Data Visualization

Hannah K. Bako, Arshnoor Bhutani, Xinyi Liu, Kwesi A. Cobbina, Zhicheng Liu

Automatically generating data visualizations in response to human utterances on datasets necessitates a deep semantic understanding of the data utterance, including implicit and explicit references to data attributes, visualization tasks, and necessary data preparation steps. Natural Language Interfaces (NLIs) for data visualization have explored ways to infer such information, yet challenges persist due to inherent uncertainty in human speech. Recent advances in Large Language Models (LLMs) provide an avenue to address these challenges, but their ability to extract the relevant semantic information remains unexplored. In this study, we evaluate four publicly available LLMs (GPT-4, Gemini-Pro, Llama3, and Mixtral), investigating their ability to comprehend utterances even in the presence of uncertainty and identify the relevant data context and visual tasks. Our findings reveal that LLMs are sensitive to uncertainties in utterances. Despite this sensitivity, they are able to extract the relevant data context. However, LLMs struggle with inferring visualization tasks. Based on these results, we highlight future research directions on using LLMs for visualization generation.

7/10/2024

Generating Analytic Specifications for Data Visualization from Natural Language Queries using Large Language Models

Subham Sah, Rishab Mitra, Arpit Narechania, Alex Endert, John Stasko, Wenwen Dou

Recently, large language models (LLMs) have shown great promise in translating natural language (NL) queries into visualizations, but their black-box nature often limits explainability and debuggability. In response, we present a comprehensive text prompt that, given a tabular dataset and an NL query about the dataset, generates an analytic specification including (detected) data attributes, (inferred) analytic tasks, and (recommended) visualizations. This specification captures key aspects of the query translation process, affording both explainability and debuggability. For instance, it provides mappings from the detected entities to the corresponding phrases in the input query, as well as the specific visual design principles that determined the visualization recommendations. Moreover, unlike prior LLM-based approaches, our prompt supports conversational interaction and ambiguity detection capabilities. In this paper, we detail the iterative process of curating our prompt, present a preliminary performance evaluation using GPT-4, and discuss the strengths and limitations of LLMs at various stages of query translation. The prompt is open-source and integrated into NL4DV, a popular Python-based natural language toolkit for visualization, which can be accessed at https://nl4dv.github.io.

8/28/2024

Visualization Literacy of Multimodal Large Language Models: A Comparative Study

Zhimin Li, Haichao Miao, Valerio Pascucci, Shusen Liu

The recent introduction of multimodal large language models (MLLMs) combine the inherent power of large language models (LLMs) with the renewed capabilities to reason about the multimodal context. The potential usage scenarios for MLLMs significantly outpace their text-only counterparts. Many recent works in visualization have demonstrated MLLMs' capability to understand and interpret visualization results and explain the content of the visualization to users in natural language. In the machine learning community, the general vision capabilities of MLLMs have been evaluated and tested through various visual understanding benchmarks. However, the ability of MLLMs to accomplish specific visualization tasks based on visual perception has not been properly explored and evaluated, particularly, from a visualization-centric perspective. In this work, we aim to fill the gap by utilizing the concept of visualization literacy to evaluate MLLMs. We assess MLLMs' performance over two popular visualization literacy evaluation datasets (VLAT and mini-VLAT). Under the framework of visualization literacy, we develop a general setup to compare different multimodal large language models (e.g., GPT4-o, Claude 3 Opus, Gemini 1.5 Pro) as well as against existing human baselines. Our study demonstrates MLLMs' competitive performance in visualization literacy, where they outperform humans in certain tasks such as identifying correlations, clusters, and hierarchical structures.

7/17/2024

Automated Data Visualization from Natural Language via Large Language Models: An Exploratory Study

Yang Wu, Yao Wan, Hongyu Zhang, Yulei Sui, Wucai Wei, Wei Zhao, Guandong Xu, Hai Jin

The Natural Language to Visualization (NL2Vis) task aims to transform natural-language descriptions into visual representations for a grounded table, enabling users to gain insights from vast amounts of data. Recently, many deep learning-based approaches have been developed for NL2Vis. Despite the considerable efforts made by these approaches, challenges persist in visualizing data sourced from unseen databases or spanning multiple tables. Taking inspiration from the remarkable generation capabilities of Large Language Models (LLMs), this paper conducts an empirical study to evaluate their potential in generating visualizations, and explore the effectiveness of in-context learning prompts for enhancing this task. In particular, we first explore the ways of transforming structured tabular data into sequential text prompts, as to feed them into LLMs and analyze which table content contributes most to the NL2Vis. Our findings suggest that transforming structured tabular data into programs is effective, and it is essential to consider the table schema when formulating prompts. Furthermore, we evaluate two types of LLMs: finetuned models (e.g., T5-Small) and inference-only models (e.g., GPT-3.5), against state-of-the-art methods, using the NL2Vis benchmarks (i.e., nvBench). The experimental results reveal that LLMs outperform baselines, with inference-only models consistently exhibiting performance improvements, at times even surpassing fine-tuned models when provided with certain few-shot demonstrations through in-context learning. Finally, we analyze when the LLMs fail in NL2Vis, and propose to iteratively update the results using strategies such as chain-of-thought, role-playing, and code-interpreter. The experimental results confirm the efficacy of iterative updates and hold great potential for future study.

4/29/2024