Benchmarking In-the-wild Multimodal Disease Recognition and A Versatile Baseline

Read original: arXiv:2408.03120 - Published 8/7/2024 by Tianqi Wei, Zhi Chen, Zi Huang, Xin Yu

Benchmarking In-the-wild Multimodal Disease Recognition and A Versatile Baseline

Overview

Benchmarks in-the-wild multimodal plant disease recognition
Introduces a versatile baseline model

Plain English Explanation

This paper describes a benchmarking study for recognition of plant diseases using multimodal data, such as images and text. It proposes a versatile baseline model that can be applied to this task. The goal is to advance the state-of-the-art in real-world plant disease detection, which has applications in agriculture and food production.

Technical Explanation

The paper presents a new benchmark dataset for evaluating multimodal plant disease recognition models in in-the-wild conditions. This involves images and textual descriptions of plant diseases collected from the web.

The authors also introduce a baseline model that combines computer vision and natural language processing techniques to classify plant diseases from the multimodal data. This model is designed to be versatile and generalizable to different plant species and disease types.

The paper evaluates the benchmark dataset and baseline model through extensive experiments, demonstrating their effectiveness in real-world plant disease recognition tasks.

Critical Analysis

The paper provides a valuable contribution to the field by addressing the need for more realistic, in-the-wild datasets and versatile models for multimodal plant disease recognition. However, the authors acknowledge [some limitations of the dataset, such as potential biases in the web-crawled data].

Additionally, the baseline model, while effective, may not capture the full complexity of plant disease recognition, which can involve subtle visual patterns and contextual information. Further research could explore more advanced multimodal fusion techniques or incorporate domain-specific knowledge to improve performance.

Conclusion

This paper introduces a new benchmark and baseline model for multimodal plant disease recognition, advancing the state-of-the-art in this important real-world application. The findings can inform the development of more robust and generalizable plant disease detection systems, with potential benefits for agriculture, food security, and environmental sustainability.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Benchmarking In-the-wild Multimodal Disease Recognition and A Versatile Baseline

Tianqi Wei, Zhi Chen, Zi Huang, Xin Yu

Existing plant disease classification models have achieved remarkable performance in recognizing in-laboratory diseased images. However, their performance often significantly degrades in classifying in-the-wild images. Furthermore, we observed that in-the-wild plant images may exhibit similar appearances across various diseases (i.e., small inter-class discrepancy) while the same diseases may look quite different (i.e., large intra-class variance). Motivated by this observation, we propose an in-the-wild multimodal plant disease recognition dataset that contains the largest number of disease classes but also text-based descriptions for each disease. Particularly, the newly provided text descriptions are introduced to provide rich information in textual modality and facilitate in-the-wild disease classification with small inter-class discrepancy and large intra-class variance issues. Therefore, our proposed dataset can be regarded as an ideal testbed for evaluating disease recognition methods in the real world. In addition, we further present a strong yet versatile baseline that models text descriptions and visual data through multiple prototypes for a given class. By fusing the contributions of multimodal prototypes in classification, our baseline can effectively address the small inter-class discrepancy and large intra-class variance issues. Remarkably, our baseline model can not only classify diseases but also recognize diseases in few-shot or training-free scenarios. Extensive benchmarking results demonstrate that our proposed in-the-wild multimodal dataset sets many new challenges to the plant disease recognition task and there is a large space to improve for future works.

8/7/2024

Snap and Diagnose: An Advanced Multimodal Retrieval System for Identifying Plant Diseases in the Wild

Tianqi Wei, Zhi Chen, Xin Yu

Plant disease recognition is a critical task that ensures crop health and mitigates the damage caused by diseases. A handy tool that enables farmers to receive a diagnosis based on query pictures or the text description of suspicious plants is in high demand for initiating treatment before potential diseases spread further. In this paper, we develop a multimodal plant disease image retrieval system to support disease search based on either image or text prompts. Specifically, we utilize the largest in-the-wild plant disease dataset PlantWild, which includes over 18,000 images across 89 categories, to provide a comprehensive view of potential diseases relating to the query. Furthermore, cross-modal retrieval is achieved in the developed system, facilitated by a novel CLIP-based vision-language model that encodes both disease descriptions and disease images into the same latent space. Built on top of the retriever, our retrieval system allows users to upload either plant disease images or disease descriptions to retrieve the corresponding images with similar characteristics from the disease dataset to suggest candidate diseases for end users' consideration.

8/28/2024

PathInsight: Instruction Tuning of Multimodal Datasets and Models for Intelligence Assisted Diagnosis in Histopathology

Xiaomin Wu, Rui Xu, Pengchen Wei, Wenkang Qin, Peixiang Huang, Ziheng Li, Lin Luo

Pathological diagnosis remains the definitive standard for identifying tumors. The rise of multimodal large models has simplified the process of integrating image analysis with textual descriptions. Despite this advancement, the substantial costs associated with training and deploying these complex multimodal models, together with a scarcity of high-quality training datasets, create a significant divide between cutting-edge technology and its application in the clinical setting. We had meticulously compiled a dataset of approximately 45,000 cases, covering over 6 different tasks, including the classification of organ tissues, generating pathology report descriptions, and addressing pathology-related questions and answers. We have fine-tuned multimodal large models, specifically LLaVA, Qwen-VL, InternLM, with this dataset to enhance instruction-based performance. We conducted a qualitative assessment of the capabilities of the base model and the fine-tuned model in performing image captioning and classification tasks on the specific dataset. The evaluation results demonstrate that the fine-tuned model exhibits proficiency in addressing typical pathological questions. We hope that by making both our models and datasets publicly available, they can be valuable to the medical and research communities.

8/14/2024

Self-supervised transformer-based pre-training method with General Plant Infection dataset

Zhengle Wang, Ruifeng Wang, Minjuan Wang, Tianyun Lai, Man Zhang

Pest and disease classification is a challenging issue in agriculture. The performance of deep learning models is intricately linked to training data diversity and quantity, posing issues for plant pest and disease datasets that remain underdeveloped. This study addresses these challenges by constructing a comprehensive dataset and proposing an advanced network architecture that combines Contrastive Learning and Masked Image Modeling (MIM). The dataset comprises diverse plant species and pest categories, making it one of the largest and most varied in the field. The proposed network architecture demonstrates effectiveness in addressing plant pest and disease recognition tasks, achieving notable detection accuracy. This approach offers a viable solution for rapid, efficient, and cost-effective plant pest and disease detection, thereby reducing agricultural production costs. Our code and dataset will be publicly available to advance research in plant pest and disease recognition the GitHub repository at https://github.com/WASSER2545/GPID-22

7/23/2024