PhenoBench -- A Large Dataset and Benchmarks for Semantic Image Interpretation in the Agricultural Domain

Read original: arXiv:2306.04557 - Published 7/25/2024 by Jan Weyler, Federico Magistri, Elias Marks, Yue Linn Chong, Matteo Sodano, Gianmarco Roggiolani, Nived Chebrolu, Cyrill Stachniss, Jens Behley

🖼️

Overview

Agriculture must address challenges like higher demand, climate change, labor shortages, and limited arable land.
Vision systems can support better field management and crop breeding by enabling detailed, reproducible measurements.
Agricultural robotics are a promising way to address labor shortages and enable more sustainable production.
High-quality agricultural datasets and benchmarks are rare compared to other domains.

Plain English Explanation

Agricultural Production Challenges Producing enough food, animal feed, natural fibers, and biofuels is a critical task for agriculture. However, the industry faces significant challenges in the coming decades. Demand for agricultural products will likely increase, while climate change, a lack of available workers, and limited arable land pose major obstacles.

Role of Vision Systems Vision systems, which analyze images and videos, can help address these challenges. They can support improved field management decisions, leading to more sustainable production. Vision systems can also assist in breeding new crop varieties by enabling detailed, consistent measurements over time.

Agricultural Robotics Another promising approach is the use of agricultural robots. These can help address the lack of available workers and enable more sustainable farming practices. The robotics and computer vision research communities have shown growing interest in this area.

Importance of Datasets While large, high-quality datasets are readily available in many domains to drive progress, datasets and benchmarks for agricultural applications are relatively scarce. This makes it challenging to develop and evaluate new vision-based technologies for agriculture.

Technical Explanation

The researchers present an annotated dataset and benchmarks for the semantic interpretation of real agricultural fields. The dataset was captured using a drone (UAV) and provides high-quality, pixel-level annotations of crop plants and weeds, as well as individual crop leaf instances.

The benchmarks cover various tasks, including:

Identifying known fields that were part of the training data
Evaluating performance on a completely new, unseen field

This allows assessing both in-domain and out-of-domain generalization capabilities of computer vision models.

The dataset, benchmarks, and associated code are publicly available at https://www.phenobench.org.

Critical Analysis

The researchers acknowledge that their dataset, while a valuable contribution, is still relatively small compared to datasets in other domains. Expanding the dataset with more diverse fields and environmental conditions could further improve the robustness of vision-based agricultural technologies.

Additionally, the paper does not address potential biases in the data collection or annotation process, which could impact the fairness and reliability of the benchmarks. Exploring and mitigating such biases would be an important area for future work.

Conclusion

This research presents a high-quality dataset and benchmark suite for advancing computer vision in agricultural applications. By enabling the development and evaluation of more robust and generalizable vision systems, this work can support efforts to improve the sustainability and efficiency of food, feed, fiber, and fuel production in the face of growing challenges.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🖼️

PhenoBench -- A Large Dataset and Benchmarks for Semantic Image Interpretation in the Agricultural Domain

Jan Weyler, Federico Magistri, Elias Marks, Yue Linn Chong, Matteo Sodano, Gianmarco Roggiolani, Nived Chebrolu, Cyrill Stachniss, Jens Behley

The production of food, feed, fiber, and fuel is a key task of agriculture, which has to cope with many challenges in the upcoming decades, e.g., a higher demand, climate change, lack of workers, and the availability of arable land. Vision systems can support making better and more sustainable field management decisions, but also support the breeding of new crop varieties by allowing temporally dense and reproducible measurements. Recently, agricultural robotics got an increasing interest in the vision and robotics communities since it is a promising avenue for coping with the aforementioned lack of workers and enabling more sustainable production. While large datasets and benchmarks in other domains are readily available and enable significant progress, agricultural datasets and benchmarks are comparably rare. We present an annotated dataset and benchmarks for the semantic interpretation of real agricultural fields. Our dataset recorded with a UAV provides high-quality, pixel-wise annotations of crops and weeds, but also crop leaf instances at the same time. Furthermore, we provide benchmarks for various tasks on a hidden test set comprised of different fields: known fields covered by the training data and a completely unseen field. Our dataset, benchmarks, and code are available at url{https://www.phenobench.org}.

7/25/2024

👀

DAVIS-Ag: A Synthetic Plant Dataset for Prototyping Domain-Inspired Active Vision in Agricultural Robots

Taeyeong Choi, Dario Guevara, Zifei Cheng, Grisha Bandodkar, Chonghan Wang, Brian N. Bailey, Mason Earles, Xin Liu

In agricultural environments, viewpoint planning can be a critical functionality for a robot with visual sensors to obtain informative observations of objects of interest (e.g., fruits) from complex structures of plant with random occlusions. Although recent studies on active vision have shown some potential for agricultural tasks, each model has been designed and validated on a unique environment that would not easily be replicated for benchmarking novel methods being developed later. In this paper, we introduce a dataset, so-called DAVIS-Ag, for promoting more extensive research on Domain-inspired Active VISion in Agriculture. To be specific, we leveraged our open-source AgML framework and 3D plant simulator of Helios to produce 502K RGB images from 30K densely sampled spatial locations in 632 synthetic orchards. Moreover, plant environments of strawberries, tomatoes, and grapes are considered at two different scales (i.e., Single-Plant and Multi-Plant). Useful labels are also provided for each image, including (1) bounding boxes and (2) instance segmentation masks for all identifiable fruits, and also (3) pointers to other images of the viewpoints that are reachable by an execution of action so as to simulate active viewpoint selections at each time step. Using DAVIS-Ag, we visualize motivating examples where fruit visibility can dramatically change depending on the pose of the camera view primarily due to occlusions by other components, such as leaves. Furthermore, we present several baseline models with experiment results for benchmarking in the task of target visibility maximization. Transferability to real strawberry environments is also investigated to demonstrate the feasibility of using the dataset for prototyping real-world solutions. For future research, our dataset is made publicly available online: https://github.com/ctyeong/DAVIS-Ag.

7/2/2024

VRSBench: A Versatile Vision-Language Benchmark Dataset for Remote Sensing Image Understanding

Xiang Li, Jian Ding, Mohamed Elhoseiny

We introduce a new benchmark designed to advance the development of general-purpose, large-scale vision-language models for remote sensing images. Although several vision-language datasets in remote sensing have been proposed to pursue this goal, existing datasets are typically tailored to single tasks, lack detailed object information, or suffer from inadequate quality control. Exploring these improvement opportunities, we present a Versatile vision-language Benchmark for Remote Sensing image understanding, termed VRSBench. This benchmark comprises 29,614 images, with 29,614 human-verified detailed captions, 52,472 object references, and 123,221 question-answer pairs. It facilitates the training and evaluation of vision-language models across a broad spectrum of remote sensing image understanding tasks. We further evaluated state-of-the-art models on this benchmark for three vision-language tasks: image captioning, visual grounding, and visual question answering. Our work aims to significantly contribute to the development of advanced vision-language models in the field of remote sensing. The data and code can be accessed at https://github.com/lx709/VRSBench.

6/19/2024

A Dataset and Benchmark for Shape Completion of Fruits for Agricultural Robotics

Federico Magistri, Thomas Labe, Elias Marks, Sumanth Nagulavancha, Yue Pan, Claus Smitt, Lasse Klingbeil, Michael Halstead, Heiner Kuhlmann, Chris McCool, Jens Behley, Cyrill Stachniss

As the world population is expected to reach 10 billion by 2050, our agricultural production system needs to double its productivity despite a decline of human workforce in the agricultural sector. Autonomous robotic systems are one promising pathway to increase productivity by taking over labor-intensive manual tasks like fruit picking. To be effective, such systems need to monitor and interact with plants and fruits precisely, which is challenging due to the cluttered nature of agricultural environments causing, for example, strong occlusions. Thus, being able to estimate the complete 3D shapes of objects in presence of occlusions is crucial for automating operations such as fruit harvesting. In this paper, we propose the first publicly available 3D shape completion dataset for agricultural vision systems. We provide an RGB-D dataset for estimating the 3D shape of fruits. Specifically, our dataset contains RGB-D frames of single sweet peppers in lab conditions but also in a commercial greenhouse. For each fruit, we additionally collected high-precision point clouds that we use as ground truth. For acquiring the ground truth shape, we developed a measuring process that allows us to record data of real sweet pepper plants, both in the lab and in the greenhouse with high precision, and determine the shape of the sensed fruits. We release our dataset, consisting of almost 7,000 RGB-D frames belonging to more than 100 different fruits. We provide segmented RGB-D frames, with camera intrinsics to easily obtain colored point clouds, together with the corresponding high-precision, occlusion-free point clouds obtained with a high-precision laser scanner. We additionally enable evaluation of shape completion approaches on a hidden test set through a public challenge on a benchmark server.

9/18/2024