SyROCCo: Enhancing Systematic Reviews using Machine Learning

2406.16527

Published 6/26/2024 by Zheng Fang, Miguel Arana-Catania, Felix-Anselm van Lier, Juliana Outes Velarde, Harry Bregazzi, Mara Airoldi, Eleanor Carter, Rob Procter

cs.CL cs.CY cs.DL cs.LG

🐍

Abstract

The sheer number of research outputs published every year makes systematic reviewing increasingly time- and resource-intensive. This paper explores the use of machine learning techniques to help navigate the systematic review process. ML has previously been used to reliably 'screen' articles for review - that is, identify relevant articles based on reviewers' inclusion criteria. The application of ML techniques to subsequent stages of a review, however, such as data extraction and evidence mapping, is in its infancy. We therefore set out to develop a series of tools that would assist in the profiling and analysis of 1,952 publications on the theme of 'outcomes-based contracting'. Tools were developed for the following tasks: assign publications into 'policy area' categories; identify and extract key information for evidence mapping, such as organisations, laws, and geographical information; connect the evidence base to an existing dataset on the same topic; and identify subgroups of articles that may share thematic content. An interactive tool using these techniques and a public dataset with their outputs have been released. Our results demonstrate the utility of ML techniques to enhance evidence accessibility and analysis within the systematic review processes. These efforts show promise in potentially yielding substantial efficiencies for future systematic reviewing and for broadening their analytical scope. Our work suggests that there may be implications for the ease with which policymakers and practitioners can access evidence. While ML techniques seem poised to play a significant role in bridging the gap between research and policy by offering innovative ways of gathering, accessing, and analysing data from systematic reviews, we also highlight their current limitations and the need to exercise caution in their application, particularly given the potential for errors and biases.

Create account to get full access

Overview

The sheer volume of research publications makes systematic reviewing time-consuming and resource-intensive
This paper explores using machine learning (ML) techniques to assist in the systematic review process
ML has been used to reliably identify relevant articles for review, but its application to later stages like data extraction and evidence mapping is still in early stages
The authors developed a suite of tools to help profile and analyze 1,952 publications on "outcomes-based contracting"

Plain English Explanation

There is a vast and ever-growing number of research papers published every year, which makes the process of systematically reviewing all the relevant literature a major challenge. This paper looks at how machine learning techniques can be used to help streamline and enhance the systematic review process.

In the past, machine learning has proven useful for quickly identifying which research articles are relevant for inclusion in a systematic review, based on the reviewers' criteria. However, applying machine learning to later stages of the review process, such as extracting key information from the articles and mapping the evidence base, is still a relatively new area.

For this paper, the researchers developed a set of tools to help analyze a collection of 1,952 publications related to "outcomes-based contracting". These tools were designed to:

Automatically categorize the publications into different policy areas
Extract important details like organizations, laws, and geographic information for use in evidence mapping
Connect the publications to an existing dataset on the same topic
Identify subgroups of articles that may cover similar thematic content

The researchers have made an interactive tool and a public dataset containing the outputs of these machine learning-powered analysis techniques. This work demonstrates the potential for machine learning to enhance the efficiency and scope of systematic reviews, by making it easier to access and synthesize the evidence. However, the authors also caution that these techniques have limitations and potential biases, so care must be taken in their application.

Technical Explanation

The authors set out to develop a series of machine learning-powered tools to assist in the systematic review process for a collection of 1,952 publications on the topic of "outcomes-based contracting".

The first tool they created was designed to automatically assign each publication to a relevant "policy area" category. This helps provide an overview of the thematic landscape covered by the publications.

Next, the researchers developed techniques to identify and extract key information from the articles, such as the organizations, laws, and geographic locations mentioned. This extracted data can be used to create detailed "evidence maps" that visualize the connections and patterns in the literature.

The team also wrote code to connect the publication dataset to an existing dataset on the same topic. This allows the systematic review to leverage and build upon previous work in the area.

Finally, the authors employed clustering algorithms to identify subgroups of articles that may share common thematic content. This can help reviewers efficiently identify related publications and uncover emerging research trends.

The researchers have made an interactive tool and a public dataset available, which showcase the outputs of these various machine learning-powered analysis techniques. This work demonstrates the potential for machine learning to enhance systematic literature reviews by improving the accessibility and breadth of the evidence being synthesized.

However, the authors also note the current limitations of these approaches. They caution that machine learning techniques can introduce errors and biases, and emphasize the need for careful application and human oversight, particularly when the results may have implications for policy and practice.

Critical Analysis

The paper makes a compelling case for the utility of machine learning techniques in assisting with systematic literature reviews. The authors have developed a suite of practical tools that demonstrate how ML can be applied to tackle key challenges in the review process, such as categorizing publications, extracting relevant data, and identifying thematic connections.

One strength of the research is the scale of the dataset they worked with - 1,952 publications on a focused topic. This allowed them to thoroughly test and refine their ML-powered analysis techniques. The interactive tool and public dataset they have released also provide a valuable resource for other researchers and practitioners.

That said, the authors are right to emphasize the need for caution in applying these machine learning methods. As noted in other papers, ML models can be prone to errors and biases, which could have serious implications when used to inform evidence-based decision making.

Additionally, while the tools developed in this research show promise, their ultimate utility may depend on the specific needs and constraints of each systematic review project. Further research is needed to assess how well these techniques generalize to other domains and review contexts.

Overall, this paper makes a compelling case for the potential of machine learning to enhance the systematic literature review process. However, it also highlights the importance of carefully evaluating the limitations and risks of these approaches, and maintaining appropriate human oversight, to ensure the integrity and usefulness of the final review outputs.

Conclusion

This paper explores how machine learning techniques can be leveraged to streamline and augment the systematic literature review process. The researchers developed a suite of tools that demonstrated the ability of ML to categorize publications, extract key evidence, and identify thematic connections - all of which can potentially yield substantial efficiencies for future reviews.

The public release of the interactive tool and dataset created through this work provides a valuable resource for other researchers and practitioners interested in applying similar ML-powered techniques. At the same time, the authors rightly caution that these methods must be applied with care, as machine learning models can introduce errors and biases that must be carefully monitored.

Overall, this research suggests that machine learning is poised to play an increasingly important role in bridging the gap between academic research and real-world policy and practice. By enhancing the accessibility and analytical power of systematic reviews, these techniques hold promise for making evidence-based decision making more feasible and impactful. However, continued vigilance and further research will be needed to fully realize the benefits while mitigating the risks.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🗣️

A Systematic Literature Review on the Use of Machine Learning in Software Engineering

Nyaga Fred, I. O. Temkin

Software engineering (SE) is a dynamic field that involves multiple phases all of which are necessary to develop sustainable software systems. Machine learning (ML), a branch of artificial intelligence (AI), has drawn a lot of attention in recent years thanks to its ability to analyze massive volumes of data and extract useful patterns from data. Several studies have focused on examining, categorising, and assessing the application of ML in SE processes. We conducted a literature review on primary studies to address this gap. The study was carried out following the objective and the research questions to explore the current state of the art in applying machine learning techniques in software engineering processes. The review identifies the key areas within software engineering where ML has been applied, including software quality assurance, software maintenance, software comprehension, and software documentation. It also highlights the specific ML techniques that have been leveraged in these domains, such as supervised learning, unsupervised learning, and deep learning. Keywords: machine learning, deep learning, software engineering, natural language processing, source code

6/21/2024

cs.SE cs.LG

Data Cleaning and Machine Learning: A Systematic Literature Review

Pierre-Olivier C^ot'e, Amin Nikanjam, Nafisa Ahmed, Dmytro Humeniuk, Foutse Khomh

Context: Machine Learning (ML) is integrated into a growing number of systems for various applications. Because the performance of an ML model is highly dependent on the quality of the data it has been trained on, there is a growing interest in approaches to detect and repair data errors (i.e., data cleaning). Researchers are also exploring how ML can be used for data cleaning; hence creating a dual relationship between ML and data cleaning. To the best of our knowledge, there is no study that comprehensively reviews this relationship. Objective: This paper's objectives are twofold. First, it aims to summarize the latest approaches for data cleaning for ML and ML for data cleaning. Second, it provides future work recommendations. Method: We conduct a systematic literature review of the papers published between 2016 and 2022 inclusively. We identify different types of data cleaning activities with and for ML: feature cleaning, label cleaning, entity matching, outlier detection, imputation, and holistic data cleaning. Results: We summarize the content of 101 papers covering various data cleaning activities and provide 24 future work recommendations. Our review highlights many promising data cleaning techniques that can be further extended. Conclusion: We believe that our review of the literature will help the community develop better approaches to clean data.

6/3/2024

cs.LG cs.DB

Literature Filtering for Systematic Reviews with Transformers

John Hawkins, David Tivey

Identifying critical research within the growing body of academic work is an essential element of quality research. Systematic review processes, used in evidence-based medicine, formalise this as a procedure that must be followed in a research program. However, it comes with an increasing burden in terms of the time required to identify the important articles of research for a given topic. In this work, we develop a method for building a general-purpose filtering system that matches a research question, posed as a natural language description of the required content, against a candidate set of articles obtained via the application of broad search terms. Our results demonstrate that transformer models, pre-trained on biomedical literature then fine tuned for the specific task, offer a promising solution to this problem. The model can remove large volumes of irrelevant articles for most research questions.

6/3/2024

cs.DL cs.AI cs.CL cs.LG

Machine learning in business process management: A systematic literature review

Sven Weinzierl, Sandra Zilker, Sebastian Dunzer, Martin Matzner

Machine learning (ML) provides algorithms to create computer programs based on data without explicitly programming them. In business process management (BPM), ML applications are used to analyse and improve processes efficiently. Three frequent examples of using ML are providing decision support through predictions, discovering accurate process models, and improving resource allocation. This paper organises the body of knowledge on ML in BPM. We extract BPM tasks from different literature streams, summarise them under the phases of a process`s lifecycle, explain how ML helps perform these tasks and identify technical commonalities in ML implementations across tasks. This study is the first exhaustive review of how ML has been used in BPM. We hope that it can open the door for a new era of cumulative research by helping researchers to identify relevant preliminary work and then combine and further develop existing approaches in a focused fashion. Our paper helps managers and consultants to find ML applications that are relevant in the current project phase of a BPM initiative, like redesigning a business process. We also offer - as a synthesis of our review - a research agenda that spreads ten avenues for future research, including applying novel ML concepts like federated learning, addressing less regarded BPM lifecycle phases like process identification, and delivering ML applications with a focus on end-users.

5/28/2024

cs.LG