DAVIS-Ag: A Synthetic Plant Dataset for Prototyping Domain-Inspired Active Vision in Agricultural Robots

Read original: arXiv:2303.05764 - Published 7/2/2024 by Taeyeong Choi, Dario Guevara, Zifei Cheng, Grisha Bandodkar, Chonghan Wang, Brian N. Bailey, Mason Earles, Xin Liu

👀

Overview

The paper introduces a dataset called DAVIS-Ag for promoting research on active vision in agricultural environments.
The dataset includes 502K RGB images from 30K spatial locations in 632 synthetic orchards, featuring fruits like strawberries, tomatoes, and grapes.
Useful labels are provided, such as bounding boxes, instance segmentation masks for fruits, and pointers to reachable viewpoints.
The dataset aims to address the lack of a common benchmark for evaluating novel active vision methods in agricultural tasks.

Plain English Explanation

The researchers created a dataset called DAVIS-Ag to help advance the field of robotic vision in agricultural settings. Agricultural robots equipped with cameras often need to navigate complex plant structures to get a good view of fruits or other objects of interest. However, the random placement of leaves and branches can make it challenging for the robot to find the best viewpoint.

To address this problem, the researchers used a 3D plant simulator to generate over 500,000 images from 30,000 different camera positions within 632 synthetic orchards. The images include strawberries, tomatoes, and grapes, along with detailed labels identifying the location and outlines of the fruits. The dataset also provides information about which other viewpoints the robot could potentially move to from each camera position.

By making this dataset publicly available, the researchers hope to encourage other scientists to develop and test new computer vision algorithms specifically designed for agricultural tasks. The ability to reliably maximize the visibility of fruits or other targets could be a crucial capability for robots working in farms and orchards.

Technical Explanation

The DAVIS-Ag dataset was created using the open-source AgML framework and the Helios 3D plant simulator. It consists of 502,000 RGB images captured from 30,000 densely sampled spatial locations within 632 synthetic orchards. The orchards contain three types of fruit-bearing plants: strawberries, tomatoes, and grapes, at two different scales (single-plant and multi-plant).

For each image, the dataset provides the following labels:

Bounding boxes around all identifiable fruits
Instance segmentation masks for the fruits
Pointers to other reachable viewpoints, simulating the active selection of camera poses.

The researchers demonstrate how fruit visibility can dramatically change depending on the camera's position, primarily due to occlusions by leaves and other plant structures. They also present several baseline models for the task of target visibility maximization and investigate their transferability to real strawberry environments, showing the feasibility of using the DAVIS-Ag dataset for prototyping real-world solutions.

The dataset is publicly available on GitHub to promote more extensive research on domain-inspired active vision in agriculture.

Critical Analysis

The DAVIS-Ag dataset provides a valuable resource for researchers working on active vision problems in agricultural settings. By using a 3D plant simulator, the researchers have been able to generate a large and diverse dataset that captures the complexities of real-world orchards and farms.

One potential limitation of the dataset is that it is based on synthetic data, which may not fully capture the nuances and variability of real-world agricultural environments. While the researchers have shown some success in transferring the baseline models to a real strawberry environment, further validation on a wider range of real-world settings would be beneficial.

Additionally, the dataset focuses on three specific fruit types (strawberries, tomatoes, and grapes). While these are important agricultural crops, expanding the dataset to include a broader range of fruits and vegetables could make it more widely applicable.

Overall, the DAVIS-Ag dataset represents a significant step forward in enabling more robust and effective computer vision systems for agricultural robotics. By providing a common benchmark, the researchers hope to encourage the development of novel active vision algorithms that can reliably navigate the complex structures of plant-based environments.

Conclusion

The DAVIS-Ag dataset is a valuable resource for researchers working on active vision problems in agricultural settings. By leveraging a 3D plant simulator, the researchers have created a large and diverse dataset that captures the complexities of real-world orchards and farms, including the effects of occlusions on fruit visibility.

The dataset's detailed labels and information about reachable viewpoints make it a useful tool for developing and evaluating new computer vision algorithms specifically designed for agricultural tasks. While the dataset is currently focused on three fruit types, its potential impact could extend to a wider range of agricultural crops and environments.

By making the DAVIS-Ag dataset publicly available, the researchers hope to encourage more extensive research on domain-inspired active vision in agriculture, ultimately leading to more robust and effective robotic systems for farmers and growers.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

👀

DAVIS-Ag: A Synthetic Plant Dataset for Prototyping Domain-Inspired Active Vision in Agricultural Robots

Taeyeong Choi, Dario Guevara, Zifei Cheng, Grisha Bandodkar, Chonghan Wang, Brian N. Bailey, Mason Earles, Xin Liu

In agricultural environments, viewpoint planning can be a critical functionality for a robot with visual sensors to obtain informative observations of objects of interest (e.g., fruits) from complex structures of plant with random occlusions. Although recent studies on active vision have shown some potential for agricultural tasks, each model has been designed and validated on a unique environment that would not easily be replicated for benchmarking novel methods being developed later. In this paper, we introduce a dataset, so-called DAVIS-Ag, for promoting more extensive research on Domain-inspired Active VISion in Agriculture. To be specific, we leveraged our open-source AgML framework and 3D plant simulator of Helios to produce 502K RGB images from 30K densely sampled spatial locations in 632 synthetic orchards. Moreover, plant environments of strawberries, tomatoes, and grapes are considered at two different scales (i.e., Single-Plant and Multi-Plant). Useful labels are also provided for each image, including (1) bounding boxes and (2) instance segmentation masks for all identifiable fruits, and also (3) pointers to other images of the viewpoints that are reachable by an execution of action so as to simulate active viewpoint selections at each time step. Using DAVIS-Ag, we visualize motivating examples where fruit visibility can dramatically change depending on the pose of the camera view primarily due to occlusions by other components, such as leaves. Furthermore, we present several baseline models with experiment results for benchmarking in the task of target visibility maximization. Transferability to real strawberry environments is also investigated to demonstrate the feasibility of using the dataset for prototyping real-world solutions. For future research, our dataset is made publicly available online: https://github.com/ctyeong/DAVIS-Ag.

7/2/2024

🖼️

PhenoBench -- A Large Dataset and Benchmarks for Semantic Image Interpretation in the Agricultural Domain

Jan Weyler, Federico Magistri, Elias Marks, Yue Linn Chong, Matteo Sodano, Gianmarco Roggiolani, Nived Chebrolu, Cyrill Stachniss, Jens Behley

The production of food, feed, fiber, and fuel is a key task of agriculture, which has to cope with many challenges in the upcoming decades, e.g., a higher demand, climate change, lack of workers, and the availability of arable land. Vision systems can support making better and more sustainable field management decisions, but also support the breeding of new crop varieties by allowing temporally dense and reproducible measurements. Recently, agricultural robotics got an increasing interest in the vision and robotics communities since it is a promising avenue for coping with the aforementioned lack of workers and enabling more sustainable production. While large datasets and benchmarks in other domains are readily available and enable significant progress, agricultural datasets and benchmarks are comparably rare. We present an annotated dataset and benchmarks for the semantic interpretation of real agricultural fields. Our dataset recorded with a UAV provides high-quality, pixel-wise annotations of crops and weeds, but also crop leaf instances at the same time. Furthermore, we provide benchmarks for various tasks on a hidden test set comprised of different fields: known fields covered by the training data and a completely unseen field. Our dataset, benchmarks, and code are available at url{https://www.phenobench.org}.

7/25/2024

A Dataset and Benchmark for Shape Completion of Fruits for Agricultural Robotics

Federico Magistri, Thomas Labe, Elias Marks, Sumanth Nagulavancha, Yue Pan, Claus Smitt, Lasse Klingbeil, Michael Halstead, Heiner Kuhlmann, Chris McCool, Jens Behley, Cyrill Stachniss

As the population is expected to reach 10 billion by 2050, our agricultural production system needs to double its productivity despite a decline of human workforce in the agricultural sector. Autonomous robotic systems are one promising pathway to increase productivity by taking over labor-intensive manual tasks like fruit picking. To be effective, such systems need to monitor and interact with plants and fruits precisely, which is challenging due to the cluttered nature of agricultural environments causing, for example, strong occlusions. Thus, being able to estimate the complete 3D shapes of objects in presence of occlusions is crucial for automating operations such as fruit harvesting. In this paper, we propose the first publicly available 3D shape completion dataset for agricultural vision systems. We provide an RGB-D dataset for estimating the 3D shape of fruits. Specifically, our dataset contains RGB-D frames of single sweet peppers in lab conditions but also in a commercial greenhouse. For each fruit, we additionally collected high-precision point clouds that we use as ground truth. For acquiring the ground truth shape, we developed a measuring process that allows us to record data of real sweet pepper plants, both in the lab and in the greenhouse with high precision, and determine the shape of the sensed fruits. We release our dataset, consisting of almost 7000 RGB-D frames belonging to more than 100 different fruits. We provide segmented RGB-D frames, with camera instrinsics to easily obtain colored point clouds, together with the corresponding high-precision, occlusion-free point clouds obtained with a high-precision laser scanner. We additionally enable evaluation ofshape completion approaches on a hidden test set through a public challenge on a benchmark server.

7/19/2024

Semantics-Aware Next-best-view Planning for Efficient Search and Detection of Task-relevant Plant Parts

Akshay K. Burusa, Joost Scholten, David Rapado Rincon, Xin Wang, Eldert J. van Henten, Gert Kootstra

To automate harvesting and de-leafing of tomato plants using robots, it is important to search and detect the task-relevant plant parts. This is challenging due to high levels of occlusion in tomato plants. Active vision is a promising approach to viewpoint planning, which helps robots to deliberately plan camera viewpoints to overcome occlusion and improve perception accuracy. However, current active-vision algorithms cannot differentiate between relevant and irrelevant plant parts and spend time on perceiving irrelevant plant parts, making them inefficient for targeted perception. We propose a semantics-aware active-vision strategy that uses semantic information to identify the relevant plant parts and prioritise them during view planning. We evaluated our strategy on the task of searching and detecting the relevant plant parts using simulation and real-world experiments. In simulation, using 3D models of tomato plants with varying structural complexity, our semantics-aware strategy could search and detect 81.8% of all the relevant plant parts using nine viewpoints. It was significantly faster and detected more plant parts than predefined, random, and volumetric active-vision strategies. Our strategy was also robust to uncertainty in plant and plant-part position, plant complexity, and different viewpoint-sampling strategies. Further, in real-world experiments, our strategy could search and detect 82.7% of all the relevant plant parts using seven viewpoints, under real-world conditions with natural variation and occlusion, natural illumination, sensor noise, and uncertainty in camera poses. Our results clearly indicate the advantage of using semantics-aware active vision for targeted perception of plant parts and its applicability in real-world setups. We believe that it can significantly improve the speed and robustness of automated harvesting and de-leafing in tomato crop production.

5/13/2024