EX-FEVER: A Dataset for Multi-hop Explainable Fact Verification

Read original: arXiv:2310.09754 - Published 6/17/2024 by Huanhuan Ma, Weizhi Xu, Yifan Wei, Liuji Chen, Liang Wang, Qiang Liu, Shu Wu, Liang Wang

EX-FEVER: A Dataset for Multi-hop Explainable Fact Verification

Overview

Presents a new dataset called EX-FEVER for multi-hop explainable fact verification
Focuses on developing models that can not only verify facts, but also provide explanations for their decisions
Aims to advance the field of explainable AI by encouraging the development of models that are more transparent and accountable

Plain English Explanation

The provided paper introduces a new dataset called EX-FEVER that is designed for the task of multi-hop explainable fact verification. Fact verification is the process of determining whether a given statement is true or false based on available evidence. Multi-hop fact verification refers to cases where multiple pieces of information need to be combined to reach a conclusion.

The key innovation of the EX-FEVER dataset is that it not only requires models to correctly verify facts, but also to provide explanations for their decisions. This is an important step towards building more transparent and accountable AI systems that can justify their outputs. By encouraging the development of explainable fact verification models, the researchers hope to advance the field of explainable AI and make these systems more trustworthy and understandable to users.

The paper also discusses related work in the areas of fact verification and explainable AI, providing context for the significance of the EX-FEVER dataset.

Technical Explanation

The EX-FEVER dataset is built on top of the existing FEVER (Fact Extraction and VERification) dataset, which is a widely used benchmark for fact verification. The EX-FEVER dataset extends FEVER by requiring models to not only predict whether a given claim is supported, refuted, or not enough information, but also to provide a textual explanation for their decision.

To create the EX-FEVER dataset, the researchers manually annotated a subset of the FEVER dataset with explanations. These explanations consist of a sequence of sentences from the supporting Wikipedia articles that, when combined, provide a justification for the model's fact verification decision.

The paper also presents baseline models for the EX-FEVER task, including a multi-hop reasoning approach that leverages a retriever-reader architecture to first find relevant evidence from Wikipedia and then generate an explanation based on that evidence. The researchers evaluate these models on various metrics, including fact verification accuracy and the quality of the generated explanations.

The results show that while the baseline models are able to achieve reasonable performance on the fact verification task, they struggle to generate high-quality explanations. This highlights the challenges involved in developing AI systems that can not only make correct decisions, but also explain their reasoning in a way that is meaningful and understandable to human users.

Critical Analysis

The EX-FEVER dataset and associated baseline models represent an important step forward in the field of explainable AI for public health fact verification. By focusing on the task of multi-hop fact verification and requiring models to provide textual explanations, the researchers are pushing the boundaries of what current AI systems are capable of.

However, the paper also acknowledges several limitations of the dataset and models. For example, the explanations in the dataset may not always be complete or fully align with the model's decision-making process. Additionally, the baseline models struggle to generate high-quality explanations, suggesting that more advanced techniques may be needed to achieve this goal.

Further research is also needed to understand the broader implications of explainable fact verification systems. While the ability to explain decisions is important for building trust and accountability, it's not yet clear how these explanations will be perceived and used by human users in real-world settings.

Conclusion

The EX-FEVER dataset and associated research represent an important advancement in the field of explainable AI. By focusing on the task of multi-hop fact verification and requiring models to provide textual explanations, the researchers are paving the way for the development of more transparent and accountable AI systems.

The results of this study highlight the challenges involved in building AI systems that can not only make correct decisions, but also explain their reasoning in a meaningful way. However, the researchers are optimistic that continued progress in this area will lead to the development of more trustworthy and useful AI applications, particularly in domains like public health where the stakes are high and transparency is crucial.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

EX-FEVER: A Dataset for Multi-hop Explainable Fact Verification

Huanhuan Ma, Weizhi Xu, Yifan Wei, Liuji Chen, Liang Wang, Qiang Liu, Shu Wu, Liang Wang

Fact verification aims to automatically probe the veracity of a claim based on several pieces of evidence. Existing works are always engaging in accuracy improvement, let alone explainability, a critical capability of fact verification systems. Constructing an explainable fact verification system in a complex multi-hop scenario is consistently impeded by the absence of a relevant, high-quality dataset. Previous datasets either suffer from excessive simplification or fail to incorporate essential considerations for explainability. To address this, we present EXFEVER, a pioneering dataset for multi-hop explainable fact verification. With over 60,000 claims involving 2-hop and 3-hop reasoning, each is created by summarizing and modifying information from hyperlinked Wikipedia documents. Each instance is accompanied by a veracity label and an explanation that outlines the reasoning path supporting the veracity classification. Additionally, we demonstrate a novel baseline system on our EX-FEVER dataset, showcasing document retrieval, explanation generation, and claim verification, and validate the significance of our dataset. Furthermore, we highlight the potential of utilizing Large Language Models in the fact verification task. We hope our dataset could make a significant contribution by providing ample opportunities to explore the integration of natural language explanations in the domain of fact verification.

6/17/2024

🛸

Fin-Fact: A Benchmark Dataset for Multimodal Financial Fact Checking and Explanation Generation

Aman Rangapur, Haoran Wang, Ling Jian, Kai Shu

Fact-checking in financial domain is under explored, and there is a shortage of quality dataset in this domain. In this paper, we propose Fin-Fact, a benchmark dataset for multimodal fact-checking within the financial domain. Notably, it includes professional fact-checker annotations and justifications, providing expertise and credibility. With its multimodal nature encompassing both textual and visual content, Fin-Fact provides complementary information sources to enhance factuality analysis. Its primary objective is combating misinformation in finance, fostering transparency, and building trust in financial reporting and news dissemination. By offering insightful explanations, Fin-Fact empowers users, including domain experts and end-users, to understand the reasoning behind fact-checking decisions, validating claim credibility, and fostering trust in the fact-checking process. The Fin-Fact dataset, along with our experimental codes is available at https://github.com/IIT-DM/Fin-Fact/.

5/3/2024

🤯

Mining the Explainability and Generalization: Fact Verification Based on Self-Instruction

Guangyao Lu, Yulin Liu

Fact-checking based on commercial LLMs has become mainstream. Although these methods offer high explainability, it falls short in accuracy compared to traditional fine-tuning approaches, and data security is also a significant concern. In this paper, we propose a self-instruction based fine-tuning approach for fact-checking that balances accuracy and explainability. Our method consists of Data Augmentation and Improved DPO fine-tuning. The former starts by instructing the model to generate both positive and negative explanations based on claim-evidence pairs and labels, then sampling the dataset according to our customized difficulty standards. The latter employs our proposed improved DPO to fine-tune the model using the generated samples. We fine-tune the smallest-scale LLaMA-7B model and evaluate it on the challenging fact-checking datasets FEVEROUS and HOVER, utilizing four fine-tuning methods and three few-shot learning methods for comparison. The experiments demonstrate that our approach not only retains accuracy comparable to, or even surpassing, traditional fine-tuning methods, but also generates fluent explanation text. Moreover, it also exhibit high generalization performance. Our method is the first to leverage self-supervised learning for fact-checking and innovatively combines contrastive learning and improved DPO in fine-tuning LLMs, as shown in the experiments.

5/24/2024

MAVEN-Fact: A Large-scale Event Factuality Detection Dataset

Chunyang Li, Hao Peng, Xiaozhi Wang, Yunjia Qi, Lei Hou, Bin Xu, Juanzi Li

Event Factuality Detection (EFD) task determines the factuality of textual events, i.e., classifying whether an event is a fact, possibility, or impossibility, which is essential for faithfully understanding and utilizing event knowledge. However, due to the lack of high-quality large-scale data, event factuality detection is under-explored in event understanding research, which limits the development of EFD community. To address these issues and provide faithful event understanding, we introduce MAVEN-Fact, a large-scale and high-quality EFD dataset based on the MAVEN dataset. MAVEN-Fact includes factuality annotations of 112,276 events, making it the largest EFD dataset. Extensive experiments demonstrate that MAVEN-Fact is challenging for both conventional fine-tuned models and large language models (LLMs). Thanks to the comprehensive annotations of event arguments and relations in MAVEN, MAVEN-Fact also supports some further analyses and we find that adopting event arguments and relations helps in event factuality detection for fine-tuned models but does not benefit LLMs. Furthermore, we preliminarily study an application case of event factuality detection and find it helps in mitigating event-related hallucination in LLMs. Our dataset and codes can be obtained from url{https://github.com/lcy2723/MAVEN-FACT}

7/23/2024