From Zero to Hero: Harnessing Transformers for Biomedical Named Entity Recognition in Zero- and Few-shot Contexts

Read original: arXiv:2305.04928 - Published 8/27/2024 by Milov{s} Kov{s}prdi'c, Nikola Prodanovi'c, Adela Ljaji'c, Bojana Bav{s}aragin, Nikola Milov{s}evi'c

👁️

Overview

Developing named entity recognition (NER) models for the biomedical domain requires large annotated datasets, which can be time-consuming and expensive to create.
Extracting new entities often requires additional annotation and model retraining.
This paper proposes a method for zero-shot and few-shot NER in the biomedical domain to address these challenges.

Plain English Explanation

Named entity recognition (NER) is a task in natural language processing where computers try to identify and classify key terms or entities (like people, organizations, or diseases) in text. In the biomedical field, NER is important for tasks like extracting information from medical literature.

However, developing accurate NER models for the biomedical domain requires large datasets of text that has been manually labeled with the relevant entities. Creating these datasets can be time-consuming and expensive. And when researchers want to identify new types of entities, they often have to go through the whole process of labeling more data and retraining the model.

To address these challenges, the researchers in this paper propose a new method for zero-shot and few-shot NER in the biomedical domain. Their key idea is to transform the NER task into a simpler "binary classification" problem, where the model just has to decide if a given word is an entity or not. They also pre-train the model on a large amount of existing biomedical data and entities, which helps the model learn the semantic relationships between different entity types.

This allows the model to identify new types of entities with either no examples ("zero-shot") or just a few examples ("few-shot") - without having to fully retrain the model from scratch each time.

Technical Explanation

The paper's key technical contributions are:

Framing NER as binary classification: Rather than the standard multi-class classification approach, where the model has to identify the specific type of entity, the researchers reframe NER as a simpler binary task. The model just has to determine whether a given token is part of an entity or not.
Pre-training on diverse biomedical data: The researchers pre-train their model on a large collection of biomedical text and entity data. This allows the model to learn general semantic relationships between different types of entities, which helps it recognize new entities during the zero- and few-shot phases.
Evaluation on diverse biomedical entities: The researchers evaluate their method on 9 different types of biomedical entities, including things like diseases, chemicals, and genes. This demonstrates the broad applicability of their approach.

Through this technical approach, the researchers are able to achieve strong performance on the zero-shot (35.44% F1 score) and few-shot (up to 79.51% F1 score with 100 examples) NER tasks. Their results outperform previous transformer-based methods and are comparable to much larger GPT-3 based models, despite using a significantly smaller model.

Critical Analysis

The researchers make a compelling case for their zero-shot and few-shot NER approach, and the results are impressive. However, a few potential limitations or areas for further research are worth noting:

Reliance on existing entity data: The method still requires access to a large amount of existing biomedical entity data for pre-training. This may limit its applicability in domains where such data is scarce.
Performance on rare or complex entities: While the method works well for the 9 evaluated entities, its effectiveness on more rare or complex biomedical concepts is unclear and would be worth further investigation.
Interpretability and explainability: As with many deep learning models, the internal workings of the proposed approach may be difficult to interpret. Additional research into making the model's decision-making more transparent could be valuable.

Overall, this paper presents a promising step forward in addressing the data-hungry nature of supervised NER, with potential applications across the biomedical field and beyond. Readers are encouraged to think critically about the trade-offs and consider how the method might be further refined and extended.

Conclusion

This paper introduces a novel approach for zero-shot and few-shot named entity recognition in the biomedical domain. By reframing the task as binary classification and leveraging pre-training on diverse biomedical data, the researchers are able to achieve strong performance on identifying new entity types with limited or no labeled examples.

This work has the potential to significantly reduce the time and effort required to develop accurate NER models for emerging biomedical concepts, ultimately improving our ability to extract valuable information from the rapidly growing body of scientific literature. As the field continues to evolve, techniques like those presented in this paper will likely play an increasingly important role in making natural language processing more accessible and applicable across a wide range of domains.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

👁️

From Zero to Hero: Harnessing Transformers for Biomedical Named Entity Recognition in Zero- and Few-shot Contexts

Milov{s} Kov{s}prdi'c, Nikola Prodanovi'c, Adela Ljaji'c, Bojana Bav{s}aragin, Nikola Milov{s}evi'c

Supervised named entity recognition (NER) in the biomedical domain depends on large sets of annotated texts with the given named entities. The creation of such datasets can be time-consuming and expensive, while extraction of new entities requires additional annotation tasks and retraining the model. To address these challenges, this paper proposes a method for zero- and few-shot NER in the biomedical domain. The method is based on transforming the task of multi-class token classification into binary token classification and pre-training on a large amount of datasets and biomedical entities, which allow the model to learn semantic relations between the given and potentially novel named entity labels. We have achieved average F1 scores of 35.44% for zero-shot NER, 50.10% for one-shot NER, 69.94% for 10-shot NER, and 79.51% for 100-shot NER on 9 diverse evaluated biomedical entities with fine-tuned PubMedBERT-based model. The results demonstrate the effectiveness of the proposed method for recognizing new biomedical entities with no or limited number of examples, outperforming previous transformer-based methods, and being comparable to GPT3-based models using models with over 1000 times fewer parameters. We make models and developed code publicly available.

8/27/2024

Intent Detection and Entity Extraction from BioMedical Literature

Ankan Mullick, Mukur Gupta, Pawan Goyal

Biomedical queries have become increasingly prevalent in web searches, reflecting the growing interest in accessing biomedical literature. Despite recent research on large-language models (LLMs) motivated by endeavours to attain generalized intelligence, their efficacy in replacing task and domain-specific natural language understanding approaches remains questionable. In this paper, we address this question by conducting a comprehensive empirical evaluation of intent detection and named entity recognition (NER) tasks from biomedical text. We show that Supervised Fine Tuned approaches are still relevant and more effective than general-purpose LLMs. Biomedical transformer models such as PubMedBERT can surpass ChatGPT on NER task with only 5 supervised examples.

4/5/2024

👁️

Augmenting Biomedical Named Entity Recognition with General-domain Resources

Yu Yin, Hyunjae Kim, Xiao Xiao, Chih Hsuan Wei, Jaewoo Kang, Zhiyong Lu, Hua Xu, Meng Fang, Qingyu Chen

Training a neural network-based biomedical named entity recognition (BioNER) model usually requires extensive and costly human annotations. While several studies have employed multi-task learning with multiple BioNER datasets to reduce human effort, this approach does not consistently yield performance improvements and may introduce label ambiguity in different biomedical corpora. We aim to tackle those challenges through transfer learning from easily accessible resources with fewer concept overlaps with biomedical datasets. In this paper, we proposed GERBERA, a simple-yet-effective method that utilized a general-domain NER dataset for training. Specifically, we performed multi-task learning to train a pre-trained biomedical language model with both the target BioNER dataset and the general-domain dataset. Subsequently, we fine-tuned the models specifically for the BioNER dataset. We systematically evaluated GERBERA on five datasets of eight entity types, collectively consisting of 81,410 instances. Despite using fewer biomedical resources, our models demonstrated superior performance compared to baseline models trained with multiple additional BioNER datasets. Specifically, our models consistently outperformed the baselines in six out of eight entity types, achieving an average improvement of 0.9% over the best baseline performance across eight biomedical entity types sourced from five different corpora. Our method was especially effective in amplifying performance on BioNER datasets characterized by limited data, with a 4.7% improvement in F1 scores on the JNLPBA-RNA dataset.

6/21/2024

BioMNER: A Dataset for Biomedical Method Entity Recognition

Chen Tang, Bohao Yang, Kun Zhao, Bo Lv, Chenghao Xiao, Frank Guerin, Chenghua Lin

Named entity recognition (NER) stands as a fundamental and pivotal task within the realm of Natural Language Processing. Particularly within the domain of Biomedical Method NER, this task presents notable challenges, stemming from the continual influx of domain-specific terminologies in scholarly literature. Current research in Biomedical Method (BioMethod) NER suffers from a scarcity of resources, primarily attributed to the intricate nature of methodological concepts, which necessitate a profound understanding for precise delineation. In this study, we propose a novel dataset for biomedical method entity recognition, employing an automated BioMethod entity recognition and information retrieval system to assist human annotation. Furthermore, we comprehensively explore a range of conventional and contemporary open-domain NER methodologies, including the utilization of cutting-edge large-scale language models (LLMs) customised to our dataset. Our empirical findings reveal that the large parameter counts of language models surprisingly inhibit the effective assimilation of entity extraction patterns pertaining to biomedical methods. Remarkably, the approach, leveraging the modestly sized ALBERT model (only 11MB), in conjunction with conditional random fields (CRF), achieves state-of-the-art (SOTA) performance.

7/1/2024