Dense Retrieval with Continuous Explicit Feedback for Systematic Review Screening Prioritisation

Read original: arXiv:2407.00635 - Published 7/18/2024 by Xinyu Mao, Shengyao Zhuang, Bevan Koopman, Guido Zuccon

Dense Retrieval with Continuous Explicit Feedback for Systematic Review Screening Prioritisation

Overview

This paper proposes a dense retrieval model with continuous explicit feedback for systematic review screening prioritization.
The model leverages user feedback to iteratively refine the relevance scores of retrieved documents, improving the efficiency of the systematic review process.
The approach is evaluated on two real-world systematic review datasets, demonstrating its effectiveness in prioritizing relevant documents.

Plain English Explanation

The paper describes a new way to help researchers quickly find the most relevant scientific papers for a systematic review. Systematic reviews are studies that summarize all the available evidence on a particular topic, but they can be time-consuming as researchers must manually screen through many papers to identify the relevant ones.

The researchers developed an information retrieval model that uses dense retrieval techniques to efficiently search through a large collection of papers. Importantly, the model also incorporates continuous explicit feedback from the researchers as they review the papers. This feedback allows the model to continuously refine its understanding of which papers are most relevant, prioritizing the most useful ones for the researchers.

The researchers tested their model on two real-world systematic review datasets and found that it was effective at surfacing the most relevant papers, making the systematic review process more efficient. This could save researchers a significant amount of time and effort when conducting these important literature reviews.

Technical Explanation

The paper presents a dense retrieval model with continuous explicit feedback for systematic review screening prioritization. The model consists of two key components:

Dense Retrieval: The researchers use a transformer-based language model to encode the query (e.g., the topic of the systematic review) and the candidate documents (e.g., the papers to be screened) into dense vector representations. These representations allow for efficient nearest neighbor search to quickly identify the most relevant documents.
Continuous Explicit Feedback: As the researchers review the retrieved documents, they provide explicit feedback on the relevance of each paper. This feedback is used to update the relevance scores of the documents, refining the model's understanding of which papers are most relevant to the systematic review.

The researchers evaluate their approach on two real-world systematic review datasets: Cochrane and Epistemonikos. They compare the performance of their model to several baselines, including traditional information retrieval methods and other relevance feedback techniques. The results show that their model is able to effectively prioritize the most relevant documents, improving the efficiency of the systematic review screening process.

Critical Analysis

The researchers have proposed an innovative approach to systematic review screening that leverages the power of dense retrieval and continuous explicit feedback. This is a valuable contribution to the field, as systematic reviews are essential for synthesizing the growing body of research, but can be labor-intensive and time-consuming.

One potential limitation of the study is the reliance on manual relevance feedback from the researchers. While this feedback is essential for refining the model's understanding of relevance, it may not be scalable for very large systematic reviews or in situations where multiple researchers are involved. Exploring ways to automate or semi-automate the feedback process could be an area for future research.

Additionally, the researchers only evaluated their model on two specific systematic review datasets. While these are real-world examples, it would be beneficial to test the approach on a wider range of systematic review topics and datasets to better understand its generalizability.

Overall, this paper presents a promising approach to improving the efficiency of systematic review screening, and the researchers have done a thorough job of evaluating and contextualizing their work. Their findings could have significant implications for information retrieval and systematic review automation.

Conclusion

This paper introduces a novel dense retrieval model with continuous explicit feedback for systematic review screening prioritization. By incorporating user feedback to iteratively refine the relevance scores of retrieved documents, the model is able to efficiently surface the most relevant papers for a given systematic review topic.

The researchers' evaluation on real-world datasets demonstrates the effectiveness of their approach, which could save researchers significant time and effort in the systematic review process. While the reliance on manual feedback is a potential limitation, the paper represents an important step forward in the use of information retrieval and dense retrieval techniques to support systematic review automation.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Dense Retrieval with Continuous Explicit Feedback for Systematic Review Screening Prioritisation

Xinyu Mao, Shengyao Zhuang, Bevan Koopman, Guido Zuccon

The goal of screening prioritisation in systematic reviews is to identify relevant documents with high recall and rank them in early positions for review. This saves reviewing effort if paired with a stopping criterion, and speeds up review completion if performed alongside downstream tasks. Recent studies have shown that neural models have good potential on this task, but their time-consuming fine-tuning and inference discourage their widespread use for screening prioritisation. In this paper, we propose an alternative approach that still relies on neural models, but leverages dense representations and relevance feedback to enhance screening prioritisation, without the need for costly model fine-tuning and inference. This method exploits continuous relevance feedback from reviewers during document screening to efficiently update the dense query representation, which is then applied to rank the remaining documents to be screened. We evaluate this approach across the CLEF TAR datasets for this task. Results suggest that the investigated dense query-driven approach is more efficient than directly using neural models and shows promising effectiveness compared to previous methods developed on the considered datasets. Our code is available at https://github.com/ielab/dense-screening-feedback.

7/18/2024

🔄

Graded Relevance Scoring of Written Essays with Dense Retrieval

Salam Albatarni, Sohaila Eltanbouly, Tamer Elsayed

Automated Essay Scoring automates the grading process of essays, providing a great advantage for improving the writing proficiency of students. While holistic essay scoring research is prevalent, a noticeable gap exists in scoring essays for specific quality traits. In this work, we focus on the relevance trait, which measures the ability of the student to stay on-topic throughout the entire essay. We propose a novel approach for graded relevance scoring of written essays that employs dense retrieval encoders. Dense representations of essays at different relevance levels then form clusters in the embeddings space, such that their centroids are potentially separate enough to effectively represent their relevance levels. We hence use the simple 1-Nearest-Neighbor classification over those centroids to determine the relevance level of an unseen essay. As an effective unsupervised dense encoder, we leverage Contriever, which is pre-trained with contrastive learning and demonstrated comparable performance to supervised dense retrieval models. We tested our approach on both task-specific (i.e., training and testing on same task) and cross-task (i.e., testing on unseen task) scenarios using the widely used ASAP++ dataset. Our method establishes a new state-of-the-art performance in the task-specific scenario, while its extension for the cross-task scenario exhibited a performance that is on par with the state-of-the-art model for that scenario. We also analyzed the performance of our approach in a more practical few-shot scenario, showing that it can significantly reduce the labeling cost while sacrificing only 10% of its effectiveness.

5/9/2024

A Comprehensive Survey on Retrieval Methods in Recommender Systems

Junjie Huang, Jizheng Chen, Jianghao Lin, Jiarui Qin, Ziming Feng, Weinan Zhang, Yong Yu

In an era dominated by information overload, effective recommender systems are essential for managing the deluge of data across digital platforms. Multi-stage cascade ranking systems are widely used in the industry, with retrieval and ranking being two typical stages. Retrieval methods sift through vast candidates to filter out irrelevant items, while ranking methods prioritize these candidates to present the most relevant items to users. Unlike studies focusing on the ranking stage, this survey explores the critical yet often overlooked retrieval stage of recommender systems. To achieve precise and efficient personalized retrieval, we summarize existing work in three key areas: improving similarity computation between user and item, enhancing indexing mechanisms for efficient retrieval, and optimizing training methods of retrieval. We also provide a comprehensive set of benchmarking experiments on three public datasets. Furthermore, we highlight current industrial applications through a case study on retrieval practices at a specific company, covering the entire retrieval process and online serving, along with practical implications and challenges. By detailing the retrieval stage, which is fundamental for effective recommendation, this survey aims to bridge the existing knowledge gap and serve as a cornerstone for researchers interested in optimizing this critical component of cascade recommender systems.

8/1/2024

Information Retrieval with Entity Linking

Dahlia Shehata

Despite the advantages of their low-resource settings, traditional sparse retrievers depend on exact matching approaches between high-dimensional bag-of-words (BoW) representations of both the queries and the collection. As a result, retrieval performance is restricted by semantic discrepancies and vocabulary gaps. On the other hand, transformer-based dense retrievers introduce significant improvements in information retrieval tasks by exploiting low-dimensional contextualized representations of the corpus. While dense retrievers are known for their relative effectiveness, they suffer from lower efficiency and lack of generalization issues, when compared to sparse retrievers. For a lightweight retrieval task, high computational resources and time consumption are major barriers encouraging the renunciation of dense models despite potential gains. In this work, I propose boosting the performance of sparse retrievers by expanding both the queries and the documents with linked entities in two formats for the entity names: 1) explicit and 2) hashed. A zero-shot end-to-end dense entity linking system is employed for entity recognition and disambiguation to augment the corpus. By leveraging the advanced entity linking methods, I believe that the effectiveness gap between sparse and dense retrievers can be narrowed. Experiments are conducted on the MS MARCO passage dataset using the original qrel set, the re-ranked qrels favoured by MonoT5 and the latter set further re-ranked by DuoT5. Since I am concerned with the early stage retrieval in cascaded ranking architectures of large information retrieval systems, the results are evaluated using recall@1000. The suggested approach is also capable of retrieving documents for query subsets judged to be particularly difficult in prior work.

4/16/2024