Comprehensive Performance Evaluation of YOLOv10, YOLOv9 and YOLOv8 on Detecting and Counting Fruitlet in Complex Orchard Environments

Read original: arXiv:2407.12040 - Published 8/28/2024 by Ranjan Sapkota, Zhichao Meng, Martin Churuvija, Xiaoqiang Du, Zenghong Ma, Manoj Karkee

🚀

Overview

This study evaluated the performance of different configurations of YOLOv8, YOLOv9, and YOLOv10 object detection algorithms for detecting fruitlets (small, immature fruit) in commercial orchards.
The research also validated in-field counting of fruitlets using an iPhone and machine vision sensors across 5 different apple varieties.
The study examined a total of 17 configurations (5 for YOLOv8, 6 for YOLOv9, and 6 for YOLOv10) to determine the best-performing models.

Plain English Explanation

The researchers in this study wanted to find the best object detection algorithm for counting the small, immature fruit (called fruitlets) on apple trees in commercial orchards. They tested a total of 17 different configurations of three popular object detection models: YOLOv8, YOLOv9, and YOLOv10.

They also validated a way to count the fruitlets in real-world orchards using an iPhone and specialized machine vision sensors. This was tested on 5 different varieties of apples: Scifresh, Scilate, Honeycrisp, Cosmic Crisp, and Golden Delicious.

The key finding was that the YOLOv9 Gelan-e configuration achieved the highest accuracy, with a mean average precision (mAP) of 0.935. This outperformed the best YOLOv10 and YOLOv8 configurations. The YOLOv10x model also showed the highest precision and recall rates, indicating it was the most accurate at identifying and locating the fruitlets.

Technical Explanation

The researchers conducted an extensive evaluation of 17 different configurations across the YOLOv8, YOLOv9, and YOLOv10 object detection models. For YOLOv8, they tested 5 configurations, for YOLOv9 they tested 6 configurations, and for YOLOv10 they tested 6 configurations.

The key performance metrics they evaluated were mean average precision (mAP@50), precision, recall, and inference speed (processing time). Their results showed that the YOLOv9 Gelan-e configuration achieved the highest mAP@50 at 0.935, outperforming the best YOLOv10 (0.921) and YOLOv8 (0.924) models.

In terms of precision, the YOLOv10x model achieved the highest at 0.908, indicating superior object identification accuracy compared to other configurations like YOLOv9 Gelan-c (0.903) and YOLOv8m (0.897). For recall, YOLOv10s had the highest in its series at 0.872, while YOLOv9 Gelan-m (0.899) and YOLOv8n (0.883) performed best in their respective families.

The researchers also evaluated the post-processing speed (time to generate final detections) and found that three YOLOv10 configurations (YOLOv10b, YOLOv10l, YOLOv10x) achieved the fastest speeds at 1.5 milliseconds, outperforming the YOLOv9 Gelan-e at 1.9 ms and YOLOv8m at 2.1 ms. For inference speed (time to process an input image), YOLOv8n was the fastest at 4.1 ms, while YOLOv9 Gelan-t and YOLOv10n were slower at 9.3 ms and 5.5 ms, respectively.

Critical Analysis

The researchers provided a very comprehensive evaluation of the object detection models, testing a wide range of configurations across the three YOLO versions. This allows for a thorough understanding of the relative strengths and weaknesses of each model.

One potential limitation is that the evaluation was conducted on a specific use case of fruitlet detection in orchards. While the findings may generalize to other object detection tasks, additional testing would be needed to confirm this. The researchers also acknowledged that their in-field validation using iPhones and machine vision sensors could be further improved to increase accuracy and efficiency.

Another area for further research could be exploring ensemble or hybrid approaches that combine the strengths of different YOLO configurations to achieve even higher performance. The researchers could also investigate the impact of different hardware platforms and real-world deployment scenarios on the models' performance.

Overall, this study provides valuable insights for researchers and practitioners working on object detection in agricultural or similar applications. The detailed comparison of YOLO models can help guide the selection of the most appropriate algorithm and configuration for their specific needs.

Conclusion

This extensive study evaluated the performance of 17 different configurations of YOLOv8, YOLOv9, and YOLOv10 object detection models for the task of fruitlet detection in commercial apple orchards. The results showed that the YOLOv9 Gelan-e configuration achieved the highest mean average precision, while the YOLOv10x model exhibited the best precision and recall rates. The researchers also validated an in-field fruitlet counting approach using iPhones and machine vision sensors across multiple apple varieties.

These findings can help guide the selection of the most appropriate object detection model and configuration for agricultural applications, such as monitoring crop health and yield. The study demonstrates the importance of comprehensive model evaluations and the potential for leveraging the latest advancements in deep learning for real-world, practical applications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🚀

Comprehensive Performance Evaluation of YOLOv10, YOLOv9 and YOLOv8 on Detecting and Counting Fruitlet in Complex Orchard Environments

Ranjan Sapkota, Zhichao Meng, Martin Churuvija, Xiaoqiang Du, Zenghong Ma, Manoj Karkee

This study performed an extensive evaluation of the performances of all configurations of YOLOv8, YOLOv9, and YOLOv10 object detection algorithms for fruitlet (of green fruit) detection in commercial orchards. Additionally, this research performed and validated in-field counting of fruitlets using an iPhone and machine vision sensors in 5 different apple varieties (Scifresh, Scilate, Honeycrisp, Cosmic crisp & Golden delicious). This comprehensive investigation of total 17 different configurations (5 for YOLOv8, 6 for YOLOv9 and 6 for YOLOv10) revealed that YOLOv9 outperforms YOLOv10 and YOLOv8 in terms of mAP@50, while YOLOv10x outperformed all 17 configurations tested in terms of precision and recall. Specifically, YOLOv9 Gelan-e achieved the highest mAP@50 of 0.935, outperforming YOLOv10n's 0.921 and YOLOv8s's 0.924. In terms of precision, YOLOv10x achieved the highest precision of 0.908, indicating superior object identification accuracy compared to other configurations tested (e.g. YOLOv9 Gelan-c with a precision of 0.903 and YOLOv8m with 0.897. In terms of recall, YOLOv10s achieved the highest in its series (0.872), while YOLOv9 Gelan m performed the best among YOLOv9 configurations (0.899), and YOLOv8n performed the best among the YOLOv8 configurations (0.883). Meanwhile, three configurations of YOLOv10: YOLOv10b, YOLOv10l, and YOLOv10x achieved superior post-processing speeds of 1.5 milliseconds, outperforming all other configurations within the YOLOv9 and YOLOv8 families. Specifically, YOLOv9 Gelan-e recorded a post-processing speed of 1.9 milliseconds, and YOLOv8m achieved 2.1 milliseconds. Furthermore, YOLOv8n exhibited the highest inference speed among all configurations tested, achieving a processing time of 4.1 milliseconds while YOLOv9 Gelan-t and YOLOv10n also demonstrated comparatively slower inference speeds of 9.3 ms and 5.5 ms, respectively.

8/28/2024

Performance Evaluation of YOLOv8 Model Configurations, for Instance Segmentation of Strawberry Fruit Development Stages in an Open Field Environment

Abdul-Razak Alhassan Gamani, Ibrahim Arhin, Adrena Kyeremateng Asamoah

Accurate identification of strawberries during their maturing stages is crucial for optimizing yield management, and pest control, and making informed decisions related to harvest and post-harvest logistics. This study evaluates the performance of YOLOv8 model configurations for instance segmentation of strawberries into ripe and unripe stages in an open field environment. The YOLOv8n model demonstrated superior segmentation accuracy with a mean Average Precision (mAP) of 80.9%, outperforming other YOLOv8 configurations. In terms of inference speed, YOLOv8n processed images at 12.9 milliseconds, while YOLOv8s, the least-performing model, processed at 22.2 milliseconds. Over 86 test images with 348 ground truth labels, YOLOv8n detected 235 ripe fruit classes and 51 unripe fruit classes out of 251 ground truth ripe fruits and 97 unripe ground truth labels, respectively. In comparison, YOLOv8s detected 204 ripe fruits and 37 unripe fruits. Overall, YOLOv8n achieved the fastest inference speed of 24.2 milliseconds, outperforming YOLOv8s, YOLOv8m, YOLOv8l, and YOLOv8x, which processed images at 33.0 milliseconds, 44.3 milliseconds, 53.6 milliseconds, and 62.5 milliseconds, respectively. These results underscore the potential of advanced object segmentation algorithms to address complex visual recognition tasks in open-field agriculture effectively to address complex visual recognition tasks in open-field agriculture effectively.

8/14/2024

🔎

YOLOv10: Real-Time End-to-End Object Detection

Ao Wang, Hui Chen, Lihao Liu, Kai Chen, Zijia Lin, Jungong Han, Guiguang Ding

Over the past years, YOLOs have emerged as the predominant paradigm in the field of real-time object detection owing to their effective balance between computational cost and detection performance. Researchers have explored the architectural designs, optimization objectives, data augmentation strategies, and others for YOLOs, achieving notable progress. However, the reliance on the non-maximum suppression (NMS) for post-processing hampers the end-to-end deployment of YOLOs and adversely impacts the inference latency. Besides, the design of various components in YOLOs lacks the comprehensive and thorough inspection, resulting in noticeable computational redundancy and limiting the model's capability. It renders the suboptimal efficiency, along with considerable potential for performance improvements. In this work, we aim to further advance the performance-efficiency boundary of YOLOs from both the post-processing and model architecture. To this end, we first present the consistent dual assignments for NMS-free training of YOLOs, which brings competitive performance and low inference latency simultaneously. Moreover, we introduce the holistic efficiency-accuracy driven model design strategy for YOLOs. We comprehensively optimize various components of YOLOs from both efficiency and accuracy perspectives, which greatly reduces the computational overhead and enhances the capability. The outcome of our effort is a new generation of YOLO series for real-time end-to-end object detection, dubbed YOLOv10. Extensive experiments show that YOLOv10 achieves state-of-the-art performance and efficiency across various model scales. For example, our YOLOv10-S is 1.8$times$ faster than RT-DETR-R18 under the similar AP on COCO, meanwhile enjoying 2.8$times$ smaller number of parameters and FLOPs. Compared with YOLOv9-C, YOLOv10-B has 46% less latency and 25% fewer parameters for the same performance.

5/24/2024

👀

YOLOv5, YOLOv8 and YOLOv10: The Go-To Detectors for Real-time Vision

Muhammad Hussain

This paper presents a comprehensive review of the evolution of the YOLO (You Only Look Once) object detection algorithm, focusing on YOLOv5, YOLOv8, and YOLOv10. We analyze the architectural advancements, performance improvements, and suitability for edge deployment across these versions. YOLOv5 introduced significant innovations such as the CSPDarknet backbone and Mosaic Augmentation, balancing speed and accuracy. YOLOv8 built upon this foundation with enhanced feature extraction and anchor-free detection, improving versatility and performance. YOLOv10 represents a leap forward with NMS-free training, spatial-channel decoupled downsampling, and large-kernel convolutions, achieving state-of-the-art performance with reduced computational overhead. Our findings highlight the progressive enhancements in accuracy, efficiency, and real-time performance, particularly emphasizing their applicability in resource-constrained environments. This review provides insights into the trade-offs between model complexity and detection accuracy, offering guidance for selecting the most appropriate YOLO version for specific edge computing applications.

7/4/2024