PlantSeg: A Large-Scale In-the-wild Dataset for Plant Disease Segmentation

Read original: arXiv:2409.04038 - Published 9/9/2024 by Tianqi Wei, Zhi Chen, Xin Yu, Scott Chapman, Paul Melloy, Zi Huang

PlantSeg: A Large-Scale In-the-wild Dataset for Plant Disease Segmentation

Overview

This paper introduces PlantSeg, a large-scale dataset for plant disease segmentation in the wild.
The dataset contains over 30,000 images of plants with various diseases, covering a wide range of plant species and disease types.
The images were collected from online sources and annotated with pixel-level segmentation masks, providing a comprehensive resource for training and evaluating plant disease segmentation models.

Plain English Explanation

The researchers who created this dataset wanted to help develop better computer vision systems for identifying and understanding plant diseases. Plant disease segmentation is the process of automatically detecting and outlining the areas of a plant image that show signs of disease.

To train these AI systems, they needed a large and diverse dataset of plant images with the diseased areas clearly marked. However, existing datasets were often small, focused on specific plant types, or had low-quality annotations.

So the researchers set out to create a new, much larger dataset called PlantSeg. They collected over 30,000 images of plants with various diseases from the internet, covering a wide variety of plant species and disease types. They then carefully annotated each image to highlight the precise areas affected by disease.

This resulting dataset is a valuable resource for researchers and developers working on plant disease recognition and automated plant health monitoring. By training AI models on this diverse, high-quality data, they can develop more robust and accurate systems to help farmers, gardeners, and botanists identify and manage plant diseases.

Technical Explanation

The PlantSeg dataset was created by crawling and curating a large number of plant disease images from online sources like social media, forums, and news articles. The images cover a diverse range of plant species, including crops, ornamentals, and weeds, and a wide variety of disease types, such as fungal infections, bacterial infections, and pest infestations.

To annotate the dataset, the researchers employed a team of expert annotators who manually outlined the diseased regions in each image using pixel-level segmentation masks. This process resulted in a total of 30,434 annotated images, making PlantSeg one of the largest publicly available datasets for plant disease segmentation.

The dataset is split into training, validation, and test sets to support the development and evaluation of plant disease segmentation models. The researchers also provide baseline results using a self-supervised transformer-based pre-training method for improved performance on the dataset.

Critical Analysis

One potential limitation of the PlantSeg dataset is the reliance on online image sources, which may introduce biases or noise into the data. The researchers acknowledge this issue and note that future work could involve collecting more controlled, in-the-field images to supplement the dataset.

Additionally, while the dataset covers a wide range of plant species and diseases, the distribution of these categories may not be perfectly representative of real-world scenarios. This could lead to challenges in generalizing models trained on PlantSeg to certain plant types or disease conditions not well represented in the data.

Further research could also explore the integration of additional modalities, such as multispectral or hyperspectral imaging, to provide more comprehensive information for plant disease diagnosis and monitoring.

Conclusion

The PlantSeg dataset represents a significant contribution to the field of plant disease recognition and segmentation. By providing a large-scale, diverse, and well-annotated collection of plant disease images, the researchers have created a valuable resource for the development and evaluation of advanced computer vision models in this domain.

The availability of this dataset has the potential to accelerate research and innovation in precision agriculture, smart farming, and automated plant health monitoring, ultimately benefiting farmers, gardeners, and the broader agricultural community.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

PlantSeg: A Large-Scale In-the-wild Dataset for Plant Disease Segmentation

Tianqi Wei, Zhi Chen, Xin Yu, Scott Chapman, Paul Melloy, Zi Huang

Plant diseases pose significant threats to agriculture. It necessitates proper diagnosis and effective treatment to safeguard crop yields. To automate the diagnosis process, image segmentation is usually adopted for precisely identifying diseased regions, thereby advancing precision agriculture. Developing robust image segmentation models for plant diseases demands high-quality annotations across numerous images. However, existing plant disease datasets typically lack segmentation labels and are often confined to controlled laboratory settings, which do not adequately reflect the complexity of natural environments. Motivated by this fact, we established PlantSeg, a large-scale segmentation dataset for plant diseases. PlantSeg distinguishes itself from existing datasets in three key aspects. (1) Annotation type: Unlike the majority of existing datasets that only contain class labels or bounding boxes, each image in PlantSeg includes detailed and high-quality segmentation masks, associated with plant types and disease names. (2) Image source: Unlike typical datasets that contain images from laboratory settings, PlantSeg primarily comprises in-the-wild plant disease images. This choice enhances the practical applicability, as the trained models can be applied for integrated disease management. (3) Scale: PlantSeg is extensive, featuring 11,400 images with disease segmentation masks and an additional 8,000 healthy plant images categorized by plant type. Extensive technical experiments validate the high quality of PlantSeg's annotations. This dataset not only allows researchers to evaluate their image classification methods but also provides a critical foundation for developing and benchmarking advanced plant disease segmentation algorithms.

9/9/2024

Self-supervised transformer-based pre-training method with General Plant Infection dataset

Zhengle Wang, Ruifeng Wang, Minjuan Wang, Tianyun Lai, Man Zhang

Pest and disease classification is a challenging issue in agriculture. The performance of deep learning models is intricately linked to training data diversity and quantity, posing issues for plant pest and disease datasets that remain underdeveloped. This study addresses these challenges by constructing a comprehensive dataset and proposing an advanced network architecture that combines Contrastive Learning and Masked Image Modeling (MIM). The dataset comprises diverse plant species and pest categories, making it one of the largest and most varied in the field. The proposed network architecture demonstrates effectiveness in addressing plant pest and disease recognition tasks, achieving notable detection accuracy. This approach offers a viable solution for rapid, efficient, and cost-effective plant pest and disease detection, thereby reducing agricultural production costs. Our code and dataset will be publicly available to advance research in plant pest and disease recognition the GitHub repository at https://github.com/WASSER2545/GPID-22

7/23/2024

From Seedling to Harvest: The GrowingSoy Dataset for Weed Detection in Soy Crops via Instance Segmentation

Raul Steinmetz, Victor A. Kich, Henrique Krever, Joao D. Rigo Mazzarolo, Ricardo B. Grando, Vinicius Marini, Celio Trois, Ard Nieuwenhuizen

Deep learning, particularly Convolutional Neural Networks (CNNs), has gained significant attention for its effectiveness in computer vision, especially in agricultural tasks. Recent advancements in instance segmentation have improved image classification accuracy. In this work, we introduce a comprehensive dataset for training neural networks to detect weeds and soy plants through instance segmentation. Our dataset covers various stages of soy growth, offering a chronological perspective on weed invasion's impact, with 1,000 meticulously annotated images. We also provide 6 state of the art models, trained in this dataset, that can understand and detect soy and weed in every stage of the plantation process. By using this dataset for weed and soy segmentation, we achieved a segmentation average precision of 79.1% and an average recall of 69.2% across all plant classes, with the YOLOv8X model. Moreover, the YOLOv8M model attained 78.7% mean average precision (mAp-50) in caruru weed segmentation, 69.7% in grassy weed segmentation, and 90.1% in soy plant segmentation.

6/6/2024

🖼️

Plant Doctor: A hybrid machine learning and image segmentation software to quantify plant damage in video footage

Marc Josep Montagut Marques, Liu Mingxin, Kuri Thomas Shiojiri, Tomika Hagiwara, Kayo Hirose, Kaori Shiojiri, Shinjiro Umezu

Artificial intelligence has significantly advanced the automation of diagnostic processes, benefiting various fields including agriculture. This study introduces an AI-based system for the automatic diagnosis of urban street plants using video footage obtained with accessible camera devices. The system aims to monitor plant health on a day-to-day basis, aiding in the control of disease spreading in urban areas. By combining two machine vision algorithms, YOLOv8 and DeepSORT, the system efficiently identifies and tracks individual leaves, extracting the optimal images for health analysis. YOLOv8, chosen for its speed and computational efficiency, locates leaves, while DeepSORT ensures robust tracking in complex environments. For detailed health assessment, DeepLabV3Plus, a convolutional neural network, is employed to segment and quantify leaf damage caused by bacteria, pests, and fungi. The hybrid system, named Plant Doctor, has been trained and validated using a diverse dataset including footage from Tokyo urban plants. The results demonstrate the robustness and accuracy of the system in diagnosing leaf damage, with potential applications in large scale urban flora illness monitoring. This approach provides a non-invasive, efficient, and scalable solution for urban tree health management, supporting sustainable urban ecosystems.

7/4/2024