Mining experimental data from Materials Science literature with Large Language Models: an evaluation study

2401.11052

Published 6/3/2024 by Luca Foppiano, Guillaume Lambard, Toshiyuki Amagasa, Masashi Ishii

Mining experimental data from Materials Science literature with Large Language Models: an evaluation study

Abstract

This study is dedicated to assessing the capabilities of large language models (LLMs) such as GPT-3.5-Turbo, GPT-4, and GPT-4-Turbo in extracting structured information from scientific documents in materials science. To this end, we primarily focus on two critical tasks of information extraction: (i) a named entity recognition (NER) of studied materials and physical properties and (ii) a relation extraction (RE) between these entities. Due to the evident lack of datasets within Materials Informatics (MI), we evaluated using SuperMat, based on superconductor research, and MeasEval, a generic measurement evaluation corpus. The performance of LLMs in executing these tasks is benchmarked against traditional models based on the BERT architecture and rule-based approaches (baseline). We introduce a novel methodology for the comparative analysis of intricate material expressions, emphasising the standardisation of chemical formulas to tackle the complexities inherent in materials science information assessment. For NER, LLMs fail to outperform the baseline with zero-shot prompting and exhibit only limited improvement with few-shot prompting. However, a GPT-3.5-Turbo fine-tuned with the appropriate strategy for RE outperforms all models, including the baseline. Without any fine-tuning, GPT-4 and GPT-4-Turbo display remarkable reasoning and relationship extraction capabilities after being provided with merely a couple of examples, surpassing the baseline. Overall, the results suggest that although LLMs demonstrate relevant reasoning skills in connecting concepts, specialised models are currently a better choice for tasks requiring extracting complex domain-specific entities like materials. These insights provide initial guidance applicable to other materials science sub-domains in future work.

Create account to get full access

Overview

This paper explores using Large Language Models (LLMs) to extract experimental data from Materials Science literature.
The researchers developed a method that combines Named Entities Recognition and other techniques to automatically parse and extract relevant information from research papers.
The goal is to create a more efficient way to gather and aggregate materials data, which is crucial for advancing materials science and engineering.

Plain English Explanation

Materials science is the study of how the properties and behavior of different materials, like metals, ceramics, and polymers, can be engineered for specific applications. Researchers in this field often need to compile and analyze large amounts of experimental data from published studies. However, manually extracting this information from scientific papers can be a time-consuming and error-prone process.

The researchers in this paper explored using advanced AI language models as a solution. These models are trained on massive amounts of text data and can understand and "read" scientific papers in a way that mimics human comprehension. The researchers developed a method that combines named entity recognition - the ability to identify and extract key concepts like materials, experimental conditions, and measured properties - with other techniques to automatically parse and extract relevant information from research papers.

The goal is to create a more efficient way to gather and aggregate materials data, which is crucial for advancing materials science and engineering. By automating this process, researchers can spend more time analyzing the data and developing new materials and technologies, rather than tediously combing through papers.

Technical Explanation

The researchers developed a multi-step approach to extract experimental data from materials science literature using LLMs. First, they applied named entity recognition to identify key entities like material types, experimental conditions, and measured properties within the text of research papers.

Next, they used clustering algorithms to group related entities together and infer the experimental context. This allowed them to reconstruct the details of the materials experiments, such as the specific materials tested, the experimental procedures, and the measured outcomes.

Finally, the extracted data was structured into a standardized format, enabling it to be easily aggregated and analyzed. The researchers tested their approach on a dataset of materials science papers and found that it was able to accurately identify and extract key experimental details.

Critical Analysis

The researchers acknowledge several limitations of their approach. First, the accuracy of the named entity recognition is dependent on the quality and coverage of the training data used to fine-tune the LLM. If important material types or experimental concepts are missing from the training data, the model may fail to recognize them correctly.

Additionally, the clustering algorithms used to infer experimental context rely on heuristics and may not always be able to accurately reconstruct the full experimental setup, especially for more complex or ambiguous cases. The researchers suggest that incorporating structured knowledge bases could help address this limitation.

Another potential issue is that the extracted data may not capture all the nuances and context present in the original papers. For example, the method may struggle to extract qualitative insights or experimental details that are only described in narrative form. Integrating natural language understanding techniques could help address this challenge.

Overall, this research represents an important step towards automating the extraction of experimental data from scientific literature, which could have significant implications for materials science and other fields that rely on large, heterogeneous datasets. However, further work is needed to improve the robustness and versatility of the approach.

Conclusion

This paper presents a novel method for using Large Language Models to automatically extract experimental data from materials science literature. By combining named entity recognition, clustering algorithms, and structured data output, the researchers were able to accurately parse and reconstruct key details from research papers.

While the approach has some limitations, it demonstrates the potential for AI-powered text mining to significantly streamline the process of data collection and aggregation in materials science and other fields. As LLMs and related techniques continue to advance, we can expect to see more sophisticated tools for harvesting insights from the vast and ever-growing corpus of scientific literature.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

📊

Flexible, Model-Agnostic Method for Materials Data Extraction from Text Using General Purpose Language Models

Maciej P. Polak, Shrey Modi, Anna Latosinska, Jinming Zhang, Ching-Wen Wang, Shaonan Wang, Ayan Deep Hazra, Dane Morgan

Accurate and comprehensive material databases extracted from research papers are crucial for materials science and engineering, but their development requires significant human effort. With large language models (LLMs) transforming the way humans interact with text, LLMs provide an opportunity to revolutionize data extraction. In this study, we demonstrate a simple and efficient method for extracting materials data from full-text research papers leveraging the capabilities of LLMs combined with human supervision. This approach is particularly suitable for mid-sized databases and requires minimal to no coding or prior knowledge about the extracted property. It offers high recall and nearly perfect precision in the resulting database. The method is easily adaptable to new and superior language models, ensuring continued utility. We show this by evaluating and comparing its performance on GPT-3 and GPT-3.5/4 (which underlie ChatGPT), as well as free alternatives such as BART and DeBERTaV3. We provide a detailed analysis of the method's performance in extracting sentences containing bulk modulus data, achieving up to 90% precision at 96% recall, depending on the amount of human effort involved. We further demonstrate the method's broader effectiveness by developing a database of critical cooling rates for metallic glasses over twice the size of previous human curated databases.

6/13/2024

cs.AI cs.CL

Evaluating Large Language Models for Structured Science Summarization in the Open Research Knowledge Graph

Vladyslav Nechakhin, Jennifer D'Souza, Steffen Eger

Structured science summaries or research contributions using properties or dimensions beyond traditional keywords enhances science findability. Current methods, such as those used by the Open Research Knowledge Graph (ORKG), involve manually curating properties to describe research papers' contributions in a structured manner, but this is labor-intensive and inconsistent between the domain expert human curators. We propose using Large Language Models (LLMs) to automatically suggest these properties. However, it's essential to assess the readiness of LLMs like GPT-3.5, Llama 2, and Mistral for this task before application. Our study performs a comprehensive comparative analysis between ORKG's manually curated properties and those generated by the aforementioned state-of-the-art LLMs. We evaluate LLM performance through four unique perspectives: semantic alignment and deviation with ORKG properties, fine-grained properties mapping accuracy, SciNCL embeddings-based cosine similarity, and expert surveys comparing manual annotations with LLM outputs. These evaluations occur within a multidisciplinary science setting. Overall, LLMs show potential as recommendation systems for structuring science, but further finetuning is recommended to improve their alignment with scientific tasks and mimicry of human expertise.

5/6/2024

cs.AI cs.CL cs.IT

💬

Use of a Structured Knowledge Base Enhances Metadata Curation by Large Language Models

Sowmya S. Sundaram, Benjamin Solomon, Avani Khatri, Anisha Laumas, Purvesh Khatri, Mark A. Musen

Metadata play a crucial role in ensuring the findability, accessibility, interoperability, and reusability of datasets. This paper investigates the potential of large language models (LLMs), specifically GPT-4, to improve adherence to metadata standards. We conducted experiments on 200 random data records describing human samples relating to lung cancer from the NCBI BioSample repository, evaluating GPT-4's ability to suggest edits for adherence to metadata standards. We computed the adherence accuracy of field name-field value pairs through a peer review process, and we observed a marginal average improvement in adherence to the standard data dictionary from 79% to 80% (p<0.01). We then prompted GPT-4 with domain information in the form of the textual descriptions of CEDAR templates and recorded a significant improvement to 97% from 79% (p<0.01). These results indicate that, while LLMs may not be able to correct legacy metadata to ensure satisfactory adherence to standards when unaided, they do show promise for use in automated metadata curation when integrated with a structured knowledge base.

4/10/2024

cs.AI cs.CL cs.IR

Toward Reliable Ad-hoc Scientific Information Extraction: A Case Study on Two Materials Datasets

Satanu Ghosh, Neal R. Brodnik, Carolina Frey, Collin Holgate, Tresa M. Pollock, Samantha Daly, Samuel Carton

We explore the ability of GPT-4 to perform ad-hoc schema based information extraction from scientific literature. We assess specifically whether it can, with a basic prompting approach, replicate two existing material science datasets, given the manuscripts from which they were originally manually extracted. We employ materials scientists to perform a detailed manual error analysis to assess where the model struggles to faithfully extract the desired information, and draw on their insights to suggest research directions to address this broadly important task.

6/11/2024

cs.CL cs.AI cs.IR