iTBLS: A Dataset of Interactive Conversations Over Tabular Information

Read original: arXiv:2404.12580 - Published 4/22/2024 by Anirudh Sundar, Christopher Richardson, William Gay, Larry Heck

iTBLS: A Dataset of Interactive Conversations Over Tabular Information

Overview

This paper introduces iTBLS, a dataset of interactive conversations over tabular information.
The dataset aims to facilitate research on language models' ability to understand and reason about tabular data through natural language interactions.
The dataset consists of human-written conversations where participants ask questions, make inferences, and collaborate to explore the information in tables.

Plain English Explanation

The researchers have created a new dataset called iTBLS that contains conversations between people about information stored in tables. The goal is to help develop AI systems that can better understand and reason about tabular data by learning from these human interactions.

In the dataset, people ask questions, make observations, and work together to explore the data in the tables. This mimics how humans might naturally interact with and make sense of tabular information. By studying these conversations, researchers can work to create AI models that are better able to comprehend and reason about tables just as humans do.

The dataset provides a rich source of examples for training and evaluating AI systems on understanding and manipulating tabular data through natural language. This could lead to significant advances in enhancing the reasoning capabilities of large language models when it comes to tables and generating synthetic tabular data to support model training.

Technical Explanation

The iTBLS dataset contains over 6,000 human-written dialogues focused on exploring and making sense of tabular information. The conversations involve two participants who are given a table and instructed to discuss its contents, ask questions, and share insights.

The tables cover a variety of topics like sports statistics, financial data, and scientific measurements. Each conversation is annotated with metadata about the table, the specific actions taken by the participants (e.g. asking a question, looking up a value), and the overall quality of the interaction.

By studying these natural language interactions around tabular data, the researchers aim to advance the state-of-the-art in large language models' ability to understand and reason about tables. The dataset provides a rich testbed for developing models that can effectively interpret, manipulate, and draw insights from tabular information through conversational interfaces.

Critical Analysis

The iTBLS dataset represents an important step forward in creating resources to study language models' interactions with tabular data. By capturing natural conversations, the dataset provides a more realistic and nuanced view of how humans make sense of tables compared to traditional benchmark tasks.

However, one limitation is that the dataset only includes dialogues between two participants. Expanding to multi-party conversations could yield additional insights into collaborative reasoning about tables. Additionally, the dataset is primarily in English, so expanding to other languages could broaden its applicability.

Furthermore, the researchers acknowledge that the current annotations may not capture the full complexity of the conversations. There may be opportunities to develop more sophisticated analysis techniques to better understand the thought processes and strategies employed by the participants.

Overall, the iTBLS dataset represents a valuable contribution to the field, and further research building on this foundation could lead to significant advances in empowering large language models to work effectively with tabular data.

Conclusion

The iTBLS dataset provides a rich source of human-written conversations about tabular information, with the goal of advancing the development of AI systems that can understand and reason about tables through natural language interactions. By studying these dialogues, researchers can work to create more capable and versatile language models that can effectively interpret, manipulate, and gain insights from tabular data.

The dataset represents an important step forward in bridging the gap between how humans and machines interact with and make sense of tables. Further research building on this foundation could yield significant breakthroughs in enhancing the table-related capabilities of large language models and advancing their role as versatile tools for a wide range of applications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

iTBLS: A Dataset of Interactive Conversations Over Tabular Information

Anirudh Sundar, Christopher Richardson, William Gay, Larry Heck

This paper introduces Interactive Tables (iTBLS), a dataset of interactive conversations situated in tables from scientific articles. This dataset is designed to facilitate human-AI collaborative problem-solving through AI-powered multi-task tabular capabilities. In contrast to prior work that models interactions as factoid QA or procedure synthesis, iTBLS broadens the scope of interactions to include mathematical reasoning, natural language manipulation, and expansion of existing tables from natural language conversation by delineating interactions into one of three tasks: interpretation, modification, or generation. Additionally, the paper presents a suite of baseline approaches to iTBLS, utilizing zero-shot prompting and parameter-efficient fine-tuning for different computing situations. We also introduce a novel multi-step approach and show how it can be leveraged in conjunction with parameter-efficient fine-tuning to achieve the state-of-the-art on iTBLS; outperforming standard parameter-efficient fine-tuning by up to 15% on interpretation, 18% on modification, and 38% on generation.

4/22/2024

Interactive-T2S: Multi-Turn Interactions for Text-to-SQL with Large Language Models

Guanming Xiong, Junwei Bao, Hongfei Jiang, Yang Song, Wen Zhao

This study explores text-to-SQL parsing by leveraging the powerful reasoning capabilities of large language models (LLMs). Despite recent advancements, existing LLM-based methods have not adequately addressed scalability, leading to inefficiencies when processing wide tables. Furthermore, current interaction-based approaches either lack a step-by-step, interpretable SQL generation process or fail to provide an efficient and universally applicable interaction design. To address these challenges, we introduce Interactive-T2S, a framework that generates SQL queries through direct interactions with databases. This framework includes four general tools that facilitate proactive and efficient information retrieval by the LLM. Additionally, we have developed detailed exemplars to demonstrate the step-wise reasoning processes within our framework. Our experiments on the BIRD-Dev dataset, employing a setting without oracle knowledge, reveal that our method achieves state-of-the-art results with only two exemplars, underscoring the effectiveness and robustness of our framework.

8/22/2024

Uncovering Limitations of Large Language Models in Information Seeking from Tables

Chaoxu Pang, Yixuan Cao, Chunhao Yang, Ping Luo

Tables are recognized for their high information density and widespread usage, serving as essential sources of information. Seeking information from tables (TIS) is a crucial capability for Large Language Models (LLMs), serving as the foundation of knowledge-based Q&A systems. However, this field presently suffers from an absence of thorough and reliable evaluation. This paper introduces a more reliable benchmark for Table Information Seeking (TabIS). To avoid the unreliable evaluation caused by text similarity-based metrics, TabIS adopts a single-choice question format (with two options per question) instead of a text generation format. We establish an effective pipeline for generating options, ensuring their difficulty and quality. Experiments conducted on 12 LLMs reveal that while the performance of GPT-4-turbo is marginally satisfactory, both other proprietary and open-source models perform inadequately. Further analysis shows that LLMs exhibit a poor understanding of table structures, and struggle to balance between TIS performance and robustness against pseudo-relevant tables (common in retrieval-augmented systems). These findings uncover the limitations and potential challenges of LLMs in seeking information from tables. We release our data and code to facilitate further research in this field.

6/7/2024

🛸

Automatic Generation of Conversational Interfaces for Tabular Data Analysis

Marcos Gomez-Vazquez, Jordi Cabot, Robert Claris'o

Tabular data is the most common format to publish and exchange structured data online. A clear example is the growing number of open data portals published by public administrations. However, exploitation of these data sources is currently limited to technical people able to programmatically manipulate and digest such data. As an alternative, we propose the use of chatbots to offer a conversational interface to facilitate the exploration of tabular data sources, including support for data analytics questions that are responded via charts rendered by the chatbot. Moreover, our chatbots are automatically generated from the data source itself thanks to the instantiation of a configurable collection of conversation patterns matched to the chatbot intents and entities.

8/7/2024