A Dataset and Benchmark for Shape Completion of Fruits for Agricultural Robotics

Read original: arXiv:2407.13304 - Published 9/18/2024 by Federico Magistri, Thomas Labe, Elias Marks, Sumanth Nagulavancha, Yue Pan, Claus Smitt, Lasse Klingbeil, Michael Halstead, Heiner Kuhlmann, Chris McCool and 2 others

A Dataset and Benchmark for Shape Completion of Fruits for Agricultural Robotics

Overview

This paper presents a new dataset and benchmark for shape completion of fruits, which is an important task for agricultural robotics. The dataset contains 3D scans of various fruits in different poses and occlusion levels, along with ground truth complete 3D models. The authors also propose a novel shape completion model and evaluate its performance on the benchmark. This research aims to advance the state-of-the-art in 3D perception for robotic fruit harvesting and handling.

Plain English Explanation

The paper discusses a new dataset and evaluation framework for testing 3D shape completion algorithms on fruit models. 3D shape completion is the task of taking a partial or occluded 3D scan of an object and estimating the complete 3D shape. This is an important capability for agricultural robots that need to accurately perceive and interact with fruits.

The dataset contains 3D scans of different types of fruits (e.g. apples, oranges, bananas) in various poses and with varying levels of occlusion. The scans are paired with the ground truth complete 3D models of the fruits. This allows researchers to develop and benchmark algorithms that can take the partial 3D data from a sensor and reconstruct the full 3D shape of the fruit.

The paper also presents a new 3D shape completion model developed by the authors. This model is evaluated on the benchmark dataset, and its performance is compared to other state-of-the-art approaches. The goal is to advance the state of the art in 3D perception for agricultural robotics applications like fruit harvesting and handling.

Technical Explanation

The key technical contributions of the paper are:

A new dataset of 3D scans of various fruits (apples, oranges, bananas, etc.) in different poses and occlusion levels, paired with ground truth complete 3D models. This dataset, called Davis-AG, is designed to benchmark 3D shape completion algorithms for agricultural robotics.
A novel 3D shape completion model that is evaluated on the Davis-AG dataset. The model uses a deep learning architecture to take the partial 3D scan as input and output the completed 3D shape.
Experiments comparing the performance of the authors' model to other state-of-the-art 3D shape completion approaches on the Davis-AG benchmark. Metrics like reconstruction accuracy and completeness are used to evaluate the models.

The paper also discusses the potential applications of accurate 3D shape completion in agricultural robotics, such as improved fruit harvesting, handling, and manipulation. The dataset and benchmark provide a standardized way to measure progress in this important area of 3D perception for robotics.

Critical Analysis

The paper provides a well-designed dataset and benchmark for evaluating 3D shape completion algorithms on fruits, which is an important capability for agricultural robotics. The authors also contribute a novel deep learning-based model that achieves strong performance on the benchmark.

One potential limitation of the dataset is that it uses simulated 3D scans rather than real-world sensor data. While this allows for precise ground truth and controlled experiments, it may not fully capture the challenges of working with noisy, incomplete data from real-world 3D sensors. Evaluating the models on a dataset with real sensor data, such as the M18K mushroom dataset, could provide additional insights.

Additionally, the paper focuses on static 3D shape completion, but in a real agricultural setting, the robot would need to handle dynamic, moving fruits. Extending the dataset and models to handle temporal information and track fruits over time could be an important area for future research, as explored in the MetaFruit paper.

Overall, this paper makes a valuable contribution to the field of 3D perception for agricultural robotics by providing a high-quality dataset and benchmark, as well as a novel shape completion model. The insights and resources presented here could help drive further advancements in this important area of research.

Conclusion

This paper introduces a new dataset and benchmark for 3D shape completion of fruits, which is a crucial capability for agricultural robots tasked with harvesting and handling produce. The authors also propose a novel deep learning-based shape completion model and evaluate its performance on the benchmark.

The dataset, called Davis-AG, provides a standardized way to measure progress in 3D perception for agricultural robotics. The authors' shape completion model achieves strong results, suggesting that deep learning can be effective for this task. However, the paper also identifies areas for future research, such as incorporating real-world sensor data and extending the approach to handle dynamic fruit motion.

Overall, this work represents an important step forward in advancing 3D perception for agricultural robots, with the potential to enable more robust and capable fruit handling systems in the future.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

A Dataset and Benchmark for Shape Completion of Fruits for Agricultural Robotics

Federico Magistri, Thomas Labe, Elias Marks, Sumanth Nagulavancha, Yue Pan, Claus Smitt, Lasse Klingbeil, Michael Halstead, Heiner Kuhlmann, Chris McCool, Jens Behley, Cyrill Stachniss

As the world population is expected to reach 10 billion by 2050, our agricultural production system needs to double its productivity despite a decline of human workforce in the agricultural sector. Autonomous robotic systems are one promising pathway to increase productivity by taking over labor-intensive manual tasks like fruit picking. To be effective, such systems need to monitor and interact with plants and fruits precisely, which is challenging due to the cluttered nature of agricultural environments causing, for example, strong occlusions. Thus, being able to estimate the complete 3D shapes of objects in presence of occlusions is crucial for automating operations such as fruit harvesting. In this paper, we propose the first publicly available 3D shape completion dataset for agricultural vision systems. We provide an RGB-D dataset for estimating the 3D shape of fruits. Specifically, our dataset contains RGB-D frames of single sweet peppers in lab conditions but also in a commercial greenhouse. For each fruit, we additionally collected high-precision point clouds that we use as ground truth. For acquiring the ground truth shape, we developed a measuring process that allows us to record data of real sweet pepper plants, both in the lab and in the greenhouse with high precision, and determine the shape of the sensed fruits. We release our dataset, consisting of almost 7,000 RGB-D frames belonging to more than 100 different fruits. We provide segmented RGB-D frames, with camera intrinsics to easily obtain colored point clouds, together with the corresponding high-precision, occlusion-free point clouds obtained with a high-precision laser scanner. We additionally enable evaluation of shape completion approaches on a hidden test set through a public challenge on a benchmark server.

9/18/2024

High-throughput 3D shape completion of potato tubers on a harvester

Pieter M. Blok, Federico Magistri, Cyrill Stachniss, Haozhou Wang, James Burridge, Wei Guo

Potato yield is an important metric for farmers to further optimize their cultivation practices. Potato yield can be estimated on a harvester using an RGB-D camera that can estimate the three-dimensional (3D) volume of individual potato tubers. A challenge, however, is that the 3D shape derived from RGB-D images is only partially completed, underestimating the actual volume. To address this issue, we developed a 3D shape completion network, called CoRe++, which can complete the 3D shape from RGB-D images. CoRe++ is a deep learning network that consists of a convolutional encoder and a decoder. The encoder compresses RGB-D images into latent vectors that are used by the decoder to complete the 3D shape using the deep signed distance field network (DeepSDF). To evaluate our CoRe++ network, we collected partial and complete 3D point clouds of 339 potato tubers on an operational harvester in Japan. On the 1425 RGB-D images in the test set (representing 51 unique potato tubers), our network achieved a completion accuracy of 2.8 mm on average. For volumetric estimation, the root mean squared error (RMSE) was 22.6 ml, and this was better than the RMSE of the linear regression (31.1 ml) and the base model (36.9 ml). We found that the RMSE can be further reduced to 18.2 ml when performing the 3D shape completion in the center of the RGB-D image. With an average 3D shape completion time of 10 milliseconds per tuber, we can conclude that CoRe++ is both fast and accurate enough to be implemented on an operational harvester for high-throughput potato yield estimation. Our code, network weights and dataset are publicly available at https://github.com/UTokyo-FieldPhenomics-Lab/corepp.git.

8/1/2024

👀

DAVIS-Ag: A Synthetic Plant Dataset for Prototyping Domain-Inspired Active Vision in Agricultural Robots

Taeyeong Choi, Dario Guevara, Zifei Cheng, Grisha Bandodkar, Chonghan Wang, Brian N. Bailey, Mason Earles, Xin Liu

In agricultural environments, viewpoint planning can be a critical functionality for a robot with visual sensors to obtain informative observations of objects of interest (e.g., fruits) from complex structures of plant with random occlusions. Although recent studies on active vision have shown some potential for agricultural tasks, each model has been designed and validated on a unique environment that would not easily be replicated for benchmarking novel methods being developed later. In this paper, we introduce a dataset, so-called DAVIS-Ag, for promoting more extensive research on Domain-inspired Active VISion in Agriculture. To be specific, we leveraged our open-source AgML framework and 3D plant simulator of Helios to produce 502K RGB images from 30K densely sampled spatial locations in 632 synthetic orchards. Moreover, plant environments of strawberries, tomatoes, and grapes are considered at two different scales (i.e., Single-Plant and Multi-Plant). Useful labels are also provided for each image, including (1) bounding boxes and (2) instance segmentation masks for all identifiable fruits, and also (3) pointers to other images of the viewpoints that are reachable by an execution of action so as to simulate active viewpoint selections at each time step. Using DAVIS-Ag, we visualize motivating examples where fruit visibility can dramatically change depending on the pose of the camera view primarily due to occlusions by other components, such as leaves. Furthermore, we present several baseline models with experiment results for benchmarking in the task of target visibility maximization. Transferability to real strawberry environments is also investigated to demonstrate the feasibility of using the dataset for prototyping real-world solutions. For future research, our dataset is made publicly available online: https://github.com/ctyeong/DAVIS-Ag.

7/2/2024

M18K: A Comprehensive RGB-D Dataset and Benchmark for Mushroom Detection and Instance Segmentation

Abdollah Zakeri, Mulham Fawakherji, Jiming Kang, Bikram Koirala, Venkatesh Balan, Weihang Zhu, Driss Benhaddou, Fatima A. Merchant

Automating agricultural processes holds significant promise for enhancing efficiency and sustainability in various farming practices. This paper contributes to the automation of agricultural processes by providing a dedicated mushroom detection dataset related to automated harvesting, growth monitoring, and quality control of the button mushroom produced using Agaricus Bisporus fungus. With over 18,000 mushroom instances in 423 RGB-D image pairs taken with an Intel RealSense D405 camera, it fills the gap in mushroom-specific datasets and serves as a benchmark for detection and instance segmentation algorithms in smart mushroom agriculture. The dataset, featuring realistic growth environment scenarios with comprehensive annotations, is assessed using advanced detection and instance segmentation algorithms. The paper details the dataset's characteristics, evaluates algorithmic performance, and for broader applicability, we have made all resources publicly available including images, codes, and trained models via our GitHub repository https://github.com/abdollahzakeri/m18k

7/17/2024