Retrieve to Explain: Evidence-driven Predictions with Language Models

Read original: arXiv:2402.04068 - Published 6/19/2024 by Ravi Patel, Angus Brayne, Rogier Hintzen, Daniel Jaroslawicz, Georgiana Neculae, Dane Corneil

💬

Overview

Introduces a novel language model called Retrieve to Explain (R2E) that can quantitatively and faithfully compare the plausibility of answers to complex scientific research questions based on the strength of their supporting evidence.
R2E scores and ranks possible answers by retrieving relevant evidence from a document corpus, representing each answer in terms of its supporting evidence rather than the answer itself.
This allows the use of feature attribution methods to transparently explain how the supporting evidence contributes to each answer's score.
R2E can incorporate new evidence without retraining, including non-textual data templated into natural language.
Evaluated on the challenging task of drug target identification, where R2E matches non-explainable literature-based models and surpasses a commonly used genetics-based approach.

Plain English Explanation

Retrieve to Explain (R2E) is a language model that aims to help with scientific discovery by comparing potential answers to complex research questions based on the strength of the supporting evidence. Many scientific questions have multiple plausible answers, each backed by some evidence. However, existing language models struggle to quantify and compare the strength of this evidence.

R2E addresses this by scoring and ranking possible answers based on the relevant evidence it retrieves from a corpus of scientific literature. Instead of representing the answers themselves, R2E focuses on the supporting evidence. This allows it to use techniques like Shapley values to explain how the different pieces of evidence contribute to the score of each answer.

Importantly, R2E can incorporate new evidence without having to retrain the model, including non-textual data like experimental results or genetic information. This makes it more flexible and adaptable than traditional language models.

The researchers tested R2E on the task of identifying drug targets - molecules that a drug can bind to and affect. This is a crucial but challenging process in drug development, where failures are very costly. R2E performed as well as or better than existing models, while also providing transparent explanations of its reasoning. This could help researchers make more informed decisions and speed up the drug discovery process.

Technical Explanation

Retrieve to Explain (R2E) is a retrieval-based language model designed to quantitatively and faithfully compare the plausibility of answers to complex scientific research questions. Many such questions have multiple answers supported by varying levels of evidence in the literature. However, existing language models lack the capability to transparently assess the strength of this supporting evidence.

The key innovation of R2E is that it scores and ranks possible answers based solely on the relevant evidence retrieved from a document corpus, rather than representing the answers themselves. This allows R2E to extend feature attribution methods, such as Shapley values, to explain how each piece of supporting evidence contributes to an answer's score. The architecture also enables R2E to incorporate new evidence, including non-textual data templated into natural language, without requiring retraining.

The researchers evaluate R2E on the critical task of drug target identification, a human-in-the-loop process where failures are extremely costly and explainability is paramount. When predicting which drug targets will be confirmed as efficacious in clinical trials, R2E not only matches the performance of non-explainable literature-based models but also surpasses a genetics-based target identification approach commonly used in the pharmaceutical industry.

Critical Analysis

The Retrieve to Explain (R2E) model represents an important step towards building more transparent and explainable language models for scientific research. By focusing on the supporting evidence rather than the answers themselves, R2E overcomes a key limitation of existing models and enables the use of powerful feature attribution techniques.

However, the paper does acknowledge several caveats and areas for further research. For example, the current implementation of R2E relies on a fixed corpus of scientific literature, which may not capture the full breadth of relevant evidence. Extending the model to incorporate dynamic, web-scale retrieval could further improve its performance and adaptability.

Additionally, while the drug target identification task is a critical and challenging real-world application, it would be valuable to evaluate R2E on a wider range of scientific research questions to fully assess its generalizability. Exploring the integration of R2E with other retrieval-augmented language model architectures could also lead to further improvements in performance and explanatory power.

Overall, the Retrieve to Explain (R2E) model represents an exciting development in the field of explainable AI for scientific discovery. By bridging the gap between language models and feature attribution techniques, it has the potential to significantly accelerate scientific progress by providing researchers with more transparent and trustworthy insights.

Conclusion

The Retrieve to Explain (R2E) model introduces a novel approach to language modeling that can quantitatively and faithfully compare the plausibility of answers to complex scientific research questions. By representing answers in terms of their supporting evidence rather than the answers themselves, R2E enables the use of powerful feature attribution methods to explain its reasoning.

Evaluated on the challenging task of drug target identification, R2E not only matches the performance of non-explainable literature-based models but also surpasses a commonly used genetics-based approach. This could have significant implications for accelerating drug discovery and other scientific processes where failures are extremely costly and transparency is paramount.

While the current implementation of R2E has some limitations, the core ideas behind the model represent an exciting development in the field of explainable AI for scientific discovery. By bridging the gap between language models and interpretability, R2E paves the way for more trustworthy and impactful AI-driven scientific research.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

💬

Retrieve to Explain: Evidence-driven Predictions with Language Models

Ravi Patel, Angus Brayne, Rogier Hintzen, Daniel Jaroslawicz, Georgiana Neculae, Dane Corneil

Language models hold incredible promise for enabling scientific discovery by synthesizing massive research corpora. Many complex scientific research questions have multiple plausible answers, each supported by evidence of varying strength. However, existing language models lack the capability to quantitatively and faithfully compare answer plausibility in terms of supporting evidence. To address this issue, we introduce Retrieve to Explain (R2E), a retrieval-based language model. R2E scores and ranks all possible answers to a research question based on evidence retrieved from a document corpus. The architecture represents each answer only in terms of its supporting evidence, with the answer itself masked. This allows us to extend feature attribution methods, such as Shapley values, to transparently attribute each answer's score back to its supporting evidence at inference time. The architecture also allows R2E to incorporate new evidence without retraining, including non-textual data modalities templated into natural language. We assess on the challenging task of drug target identification from scientific literature, a human-in-the-loop process where failures are extremely costly and explainability is paramount. When predicting whether drug targets will subsequently be confirmed as efficacious in clinical trials, R2E not only matches non-explainable literature-based models but also surpasses a genetics-based target identification approach used throughout the pharmaceutical industry.

6/19/2024

What Evidence Do Language Models Find Convincing?

Alexander Wan, Eric Wallace, Dan Klein

Retrieval-augmented language models are being increasingly tasked with subjective, contentious, and conflicting queries such as is aspartame linked to cancer. To resolve these ambiguous queries, one must search through a large range of websites and consider which, if any, of this evidence do I find convincing?. In this work, we study how LLMs answer this question. In particular, we construct ConflictingQA, a dataset that pairs controversial queries with a series of real-world evidence documents that contain different facts (e.g., quantitative results), argument styles (e.g., appeals to authority), and answers (Yes or No). We use this dataset to perform sensitivity and counterfactual analyses to explore which text features most affect LLM predictions. Overall, we find that current models rely heavily on the relevance of a website to the query, while largely ignoring stylistic features that humans find important such as whether a text contains scientific references or is written with a neutral tone. Taken together, these results highlight the importance of RAG corpus quality (e.g., the need to filter misinformation), and possibly even a shift in how LLMs are trained to better align with human judgements.

8/12/2024

Retrieval-Augmented Natural Language Reasoning for Explainable Visual Question Answering

Su Hyeon Lim, Minkuk Kim, Hyeon Bae Kim, Seong Tae Kim

Visual Question Answering with Natural Language Explanation (VQA-NLE) task is challenging due to its high demand for reasoning-based inference. Recent VQA-NLE studies focus on enhancing model networks to amplify the model's reasoning capability but this approach is resource-consuming and unstable. In this work, we introduce a new VQA-NLE model, ReRe (Retrieval-augmented natural language Reasoning), using leverage retrieval information from the memory to aid in generating accurate answers and persuasive explanations without relying on complex networks and extra datasets. ReRe is an encoder-decoder architecture model using a pre-trained clip vision encoder and a pre-trained GPT-2 language model as a decoder. Cross-attention layers are added in the GPT-2 for processing retrieval features. ReRe outperforms previous methods in VQA accuracy and explanation score and shows improvement in NLE with more persuasive, reliability.

9/2/2024

🤖

RAGE Against the Machine: Retrieval-Augmented LLM Explanations

Joel Rorseth, Parke Godfrey, Lukasz Golab, Divesh Srivastava, Jaroslaw Szlichta

This paper demonstrates RAGE, an interactive tool for explaining Large Language Models (LLMs) augmented with retrieval capabilities; i.e., able to query external sources and pull relevant information into their input context. Our explanations are counterfactual in the sense that they identify parts of the input context that, when removed, change the answer to the question posed to the LLM. RAGE includes pruning methods to navigate the vast space of possible explanations, allowing users to view the provenance of the produced answers.

5/24/2024