Comparing YOLOv8 and Mask RCNN for object segmentation in complex orchard environments

Read original: arXiv:2312.07935 - Published 7/8/2024 by Ranjan Sapkota, Dawood Ahmed, Manoj Karkee

🖼️

Overview

This study compares two machine learning models, YOLOv8 and Mask R-CNN, for instance segmentation in agricultural applications.
Instance segmentation is a key image processing technique for automating tasks like selective harvesting and precision pruning.
The models were trained and evaluated on two datasets: one with dormant apple tree images and another with apple tree canopy images during the early growing season.

Plain English Explanation

The research paper discusses the use of instance segmentation in agricultural automation. Instance segmentation is a technique that can precisely identify and outline individual objects within an image, which is crucial for tasks like selective harvesting and precision pruning.

The researchers compared two machine learning models, YOLOv8 and Mask R-CNN, for this task. YOLOv8 is a one-stage model, meaning it can detect and segment objects in a single pass, while Mask R-CNN is a two-stage model, first detecting objects and then segmenting them.

The models were trained and tested on two datasets: one with images of dormant apple trees and another with images of apple tree canopies during the early growing season. The dormant tree dataset was used to train the models to segment tree branches and trunks, while the growing season dataset was used to train the models to segment immature green apples (fruitlets).

The results showed that YOLOv8 outperformed Mask R-CNN in terms of precision and recall (measures of accuracy) across both datasets. YOLOv8 also had faster inference times, meaning it could process the images more quickly. These findings suggest that YOLOv8 may be a more suitable choice for real-time agricultural automation tasks that require accurate and efficient instance segmentation.

Technical Explanation

The researchers evaluated the performance of the YOLOv8 and Mask R-CNN models for instance segmentation in two agricultural datasets.

Dataset 1 consisted of images of dormant apple trees, which were used to train multi-object segmentation models to delineate tree branches and trunks. Dataset 2 included images of apple tree canopies with green foliage and immature (green) apples, which were used to train single-object segmentation models to delineate only the immature green apples.

The results showed that YOLOv8 outperformed Mask R-CNN in both datasets. For Dataset 1, YOLOv8 achieved a precision of 0.90 and a recall of 0.95 across all classes, compared to Mask R-CNN's precision of 0.81 and recall of 0.81. For Dataset 2, YOLOv8 achieved a precision of 0.93 and a recall of 0.97, while Mask R-CNN achieved a precision of 0.85 and a recall of 0.88.

Additionally, the inference times for YOLOv8 were faster than Mask R-CNN's, with YOLOv8 taking 10.9 ms for multi-class segmentation (Dataset 1) and 7.8 ms for single-class segmentation (Dataset 2), compared to 15.6 ms and 12.8 ms for Mask R-CNN, respectively.

Critical Analysis

The paper provides a thorough evaluation of the two instance segmentation models, highlighting the strengths of the YOLOv8 model in terms of accuracy and inference speed. However, the researchers did not explore the potential limitations of their approach, such as the impact of varying lighting conditions, occlusion, or the scalability of the models to larger-scale agricultural settings.

Additionally, the paper could have benefited from a more in-depth discussion of the trade-offs between the one-stage YOLOv8 and the two-stage Mask R-CNN architectures, and how these design choices may affect the models' performance in different scenarios.

Further research could explore the performance of these models on a wider range of agricultural datasets, including more diverse crop types and environmental conditions, to better understand their generalizability and suitability for real-world agricultural automation applications.

Conclusion

This study demonstrates the effectiveness of the YOLOv8 model for instance segmentation in agricultural applications, particularly in terms of accuracy and inference speed. The findings suggest that YOLOv8 may be a more suitable choice than Mask R-CNN for real-time automated tasks like selective harvesting and precision pruning. The insights provided in this research can help inform the development of more advanced agricultural automation systems that leverage computer vision and machine learning techniques.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🖼️

Comparing YOLOv8 and Mask RCNN for object segmentation in complex orchard environments

Ranjan Sapkota, Dawood Ahmed, Manoj Karkee

Instance segmentation, an important image processing operation for automation in agriculture, is used to precisely delineate individual objects of interest within images, which provides foundational information for various automated or robotic tasks such as selective harvesting and precision pruning. This study compares the one-stage YOLOv8 and the two-stage Mask R-CNN machine learning models for instance segmentation under varying orchard conditions across two datasets. Dataset 1, collected in dormant season, includes images of dormant apple trees, which were used to train multi-object segmentation models delineating tree branches and trunks. Dataset 2, collected in the early growing season, includes images of apple tree canopies with green foliage and immature (green) apples (also called fruitlet), which were used to train single-object segmentation models delineating only immature green apples. The results showed that YOLOv8 performed better than Mask R-CNN, achieving good precision and near-perfect recall across both datasets at a confidence threshold of 0.5. Specifically, for Dataset 1, YOLOv8 achieved a precision of 0.90 and a recall of 0.95 for all classes. In comparison, Mask R-CNN demonstrated a precision of 0.81 and a recall of 0.81 for the same dataset. With Dataset 2, YOLOv8 achieved a precision of 0.93 and a recall of 0.97. Mask R-CNN, in this single-class scenario, achieved a precision of 0.85 and a recall of 0.88. Additionally, the inference times for YOLOv8 were 10.9 ms for multi-class segmentation (Dataset 1) and 7.8 ms for single-class segmentation (Dataset 2), compared to 15.6 ms and 12.8 ms achieved by Mask R-CNN's, respectively.

7/8/2024

Performance Evaluation of YOLOv8 Model Configurations, for Instance Segmentation of Strawberry Fruit Development Stages in an Open Field Environment

Abdul-Razak Alhassan Gamani, Ibrahim Arhin, Adrena Kyeremateng Asamoah

Accurate identification of strawberries during their maturing stages is crucial for optimizing yield management, and pest control, and making informed decisions related to harvest and post-harvest logistics. This study evaluates the performance of YOLOv8 model configurations for instance segmentation of strawberries into ripe and unripe stages in an open field environment. The YOLOv8n model demonstrated superior segmentation accuracy with a mean Average Precision (mAP) of 80.9%, outperforming other YOLOv8 configurations. In terms of inference speed, YOLOv8n processed images at 12.9 milliseconds, while YOLOv8s, the least-performing model, processed at 22.2 milliseconds. Over 86 test images with 348 ground truth labels, YOLOv8n detected 235 ripe fruit classes and 51 unripe fruit classes out of 251 ground truth ripe fruits and 97 unripe ground truth labels, respectively. In comparison, YOLOv8s detected 204 ripe fruits and 37 unripe fruits. Overall, YOLOv8n achieved the fastest inference speed of 24.2 milliseconds, outperforming YOLOv8s, YOLOv8m, YOLOv8l, and YOLOv8x, which processed images at 33.0 milliseconds, 44.3 milliseconds, 53.6 milliseconds, and 62.5 milliseconds, respectively. These results underscore the potential of advanced object segmentation algorithms to address complex visual recognition tasks in open-field agriculture effectively to address complex visual recognition tasks in open-field agriculture effectively.

8/14/2024

🎲

INSTA-YOLO: Real-Time Instance Segmentation

Eslam Mohamed, Abdelrahman Shaker, Ahmad El-Sallab, Mayada Hadhoud

Instance segmentation has gained recently huge attention in various computer vision applications. It aims at providing different IDs to different object of the scene, even if they belong to the same class. This is useful in various scenarios, especially in occlusions. Instance segmentation is usually performed as a two-stage pipeline. First, an object is detected, then semantic segmentation within the detected box area. This process involves costly up-sampling, especially for the segmentation part. Moreover, for some applications, such as LiDAR point clouds and aerial object detection, it is often required to predict oriented boxes, which add extra complexity to the two-stage pipeline. In this paper, we propose Insta-YOLO, a novel one-stage end-to-end deep learning model for real-time instance segmentation. The proposed model is inspired by the YOLO one-shot object detector, with the box regression loss is replaced with polynomial regression in the localization head. This modification enables us to skip the segmentation up-sampling decoder altogether and produces the instance segmentation contour from the polynomial output coefficients. In addition, this architecture is a natural fit for oriented objects. We evaluate our model on three datasets, namely, Carnva, Cityscapes and Airbus. The results show our model achieves competitive accuracy in terms of mAP with significant improvement in speed by 2x on GTX-1080 GPU.

9/4/2024

A Review and Implementation of Object Detection Models and Optimizations for Real-time Medical Mask Detection during the COVID-19 Pandemic

Ioanna Gogou, Dimitrios Koutsomitropoulos

Convolutional Neural Networks (CNN) are commonly used for the problem of object detection thanks to their increased accuracy. Nevertheless, the performance of CNN-based detection models is ambiguous when detection speed is considered. To the best of our knowledge, there has not been sufficient evaluation of the available methods in terms of the speed/accuracy trade-off in related literature. This work assesses the most fundamental object detection models on the Common Objects in Context (COCO) dataset with respect to this trade-off, their memory consumption, and computational and storage cost. Next, we select a highly efficient model called YOLOv5 to train on the topical and unexplored dataset of human faces with medical masks, the Properly-Wearing Masked Faces Dataset (PWMFD), and analyze the benefits of specific optimization techniques for real-time medical mask detection: transfer learning, data augmentations, and a Squeeze-and-Excitation attention mechanism. Using our findings in the context of the COVID-19 pandemic, we propose an optimized model based on YOLOv5s using transfer learning for the detection of correctly and incorrectly worn medical masks that surpassed more than two times in speed (69 frames per second) the state-of-the-art model SE-YOLOv3 on the PWMFD dataset while maintaining the same level of mean Average Precision (67%).

5/29/2024