Semantics-Aware Next-best-view Planning for Efficient Search and Detection of Task-relevant Plant Parts

Read original: arXiv:2306.09801 - Published 5/13/2024 by Akshay K. Burusa, Joost Scholten, David Rapado Rincon, Xin Wang, Eldert J. van Henten, Gert Kootstra

Semantics-Aware Next-best-view Planning for Efficient Search and Detection of Task-relevant Plant Parts

Overview

The paper presents an efficient approach for searching and detecting relevant plant parts using semantics-aware active vision.
It explores the use of active vision, next-best-view planning, and semantics to improve the performance of object detection systems in greenhouse robotics applications.
The proposed method aims to optimize the search and detection process by leveraging semantic information about the plant parts of interest.

Plain English Explanation

The research paper discusses a new way to help robots better identify and locate specific parts of plants, such as leaves, flowers, or fruits, in a greenhouse environment. The key idea is to use "active vision" - where the robot actively decides where to look next based on what it has already observed - and combine this with "semantic" information about the different plant parts.

By understanding the meaning and context of the plant parts, the robot can more efficiently search for and detect the relevant ones it needs to focus on. This is particularly useful in agricultural settings like greenhouses, where robots need to perform tasks like monitoring plant health or harvesting produce.

The active learning approach allows the robot to actively choose where to point its camera next, rather than just passively scanning the environment. And by incorporating semantic knowledge about the different plant parts, the robot can make more informed decisions about where to look to find the specific items it's trying to identify.

This combination of active vision and semantic awareness can help make the overall search and detection process more efficient compared to more basic object detection methods. It could lead to improvements in areas like precision agriculture where robots need to interact with and manipulate individual plants or plant parts.

Technical Explanation

The paper presents a framework for efficient search and detection of relevant plant parts using semantics-aware active vision. The approach combines next-best-view planning and object detection to optimize the search and localization of target plant parts.

The method first builds a semantic segmentation model to classify different plant parts based on visual features. It then uses this semantic awareness to guide the active vision component, which plans the next camera view that is most likely to observe the target plant part of interest.

The next-best-view planning module evaluates candidate viewpoints based on factors like expected information gain and occlusion. It then selects the view that will most efficiently detect the relevant plant parts.

The system was evaluated on a real-world greenhouse dataset, demonstrating improved search and detection performance compared to baseline methods that lacked the semantic-aware active vision capabilities.

Critical Analysis

The paper presents a promising approach for improving the efficiency and accuracy of plant part detection using active vision and semantic understanding. However, the authors acknowledge several limitations and areas for future work.

One key limitation is the reliance on a pre-trained semantic segmentation model, which may not generalize well to new plant species or environments. The authors suggest incorporating active learning techniques to adaptively refine the semantic model based on the specific greenhouse setting.

Additionally, the next-best-view planning component could be further enhanced by incorporating more sophisticated reasoning about occlusions, lighting conditions, and other environmental factors that can impact visibility and detectability of plant parts.

While the paper demonstrates promising results, more extensive real-world testing and validation would be needed to fully assess the practical applicability and scalability of the approach in diverse greenhouse settings.

Conclusion

The presented research offers a novel framework for efficiently searching and detecting relevant plant parts in greenhouse environments by combining semantic awareness and active vision capabilities. This approach has the potential to significantly improve the performance of robotic systems tasked with precision agriculture and plant monitoring applications.

By leveraging semantic knowledge about the different plant parts and actively planning the next best camera viewpoint, the system can optimize the search and detection process, leading to more efficient and accurate identification of the target plant parts of interest.

The authors have laid the groundwork for further advancements in this area, and future research could explore ways to enhance the adaptability, robustness, and scalability of the proposed framework to make it more widely applicable in real-world greenhouse settings.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Semantics-Aware Next-best-view Planning for Efficient Search and Detection of Task-relevant Plant Parts

Akshay K. Burusa, Joost Scholten, David Rapado Rincon, Xin Wang, Eldert J. van Henten, Gert Kootstra

To automate harvesting and de-leafing of tomato plants using robots, it is important to search and detect the task-relevant plant parts. This is challenging due to high levels of occlusion in tomato plants. Active vision is a promising approach to viewpoint planning, which helps robots to deliberately plan camera viewpoints to overcome occlusion and improve perception accuracy. However, current active-vision algorithms cannot differentiate between relevant and irrelevant plant parts and spend time on perceiving irrelevant plant parts, making them inefficient for targeted perception. We propose a semantics-aware active-vision strategy that uses semantic information to identify the relevant plant parts and prioritise them during view planning. We evaluated our strategy on the task of searching and detecting the relevant plant parts using simulation and real-world experiments. In simulation, using 3D models of tomato plants with varying structural complexity, our semantics-aware strategy could search and detect 81.8% of all the relevant plant parts using nine viewpoints. It was significantly faster and detected more plant parts than predefined, random, and volumetric active-vision strategies. Our strategy was also robust to uncertainty in plant and plant-part position, plant complexity, and different viewpoint-sampling strategies. Further, in real-world experiments, our strategy could search and detect 82.7% of all the relevant plant parts using seven viewpoints, under real-world conditions with natural variation and occlusion, natural illumination, sensor noise, and uncertainty in camera poses. Our results clearly indicate the advantage of using semantics-aware active vision for targeted perception of plant parts and its applicability in real-world setups. We believe that it can significantly improve the speed and robustness of automated harvesting and de-leafing in tomato crop production.

5/13/2024

🤷

Attention-driven Next-best-view Planning for Efficient Reconstruction of Plants and Targeted Plant Parts

Akshay K. Burusa, Eldert J. van Henten, Gert Kootstra

Robots in tomato greenhouses need to perceive the plant and plant parts accurately to automate monitoring, harvesting, and de-leafing tasks. Existing perception systems struggle with the high levels of occlusion in plants and often result in poor perception accuracy. One reason for this is because they use fixed cameras or predefined camera movements. Next-best-view (NBV) planning presents a alternate approach, in which the camera viewpoints are reasoned and strategically planned such that the perception accuracy is improved. However, existing NBV-planning algorithms are agnostic to the task-at-hand and give equal importance to all the plant parts. This strategy is inefficient for greenhouse tasks that require targeted perception of specific plant parts, such as the perception of leaf nodes for de-leafing. To improve targeted perception in complex greenhouse environments, NBV planning algorithms need an attention mechanism to focus on the task-relevant plant parts. In this paper, we investigated the role of attention in improving targeted perception using an attention-driven NBV planning strategy. Through simulation experiments using plants with high levels of occlusion and structural complexity, we showed that focusing attention on task-relevant plant parts can significantly improve the speed and accuracy of 3D reconstruction. Further, with real-world experiments, we showed that these benefits extend to complex greenhouse conditions with natural variation and occlusion, natural illumination, sensor noise, and uncertainty in camera poses. Our results clearly indicate that using attention-driven NBV planning in greenhouses can significantly improve the efficiency of perception and enhance the performance of robotic systems in greenhouse crop production.

5/13/2024

Gradient-based Local Next-best-view Planning for Improved Perception of Targeted Plant Nodes

Akshay K. Burusa, Eldert J. van Henten, Gert Kootstra

Robots are increasingly used in tomato greenhouses to automate labour-intensive tasks such as selective harvesting and de-leafing. To perform these tasks, robots must be able to accurately and efficiently perceive the plant nodes that need to be cut, despite the high levels of occlusion from other plant parts. We formulate this problem as a local next-best-view (NBV) planning task where the robot has to plan an efficient set of camera viewpoints to overcome occlusion and improve the quality of perception. Our formulation focuses on quickly improving the perception accuracy of a single target node to maximise its chances of being cut. Previous methods of NBV planning mostly focused on global view planning and used random sampling of candidate viewpoints for exploration, which could suffer from high computational costs, ineffective view selection due to poor candidates, or non-smooth trajectories due to inefficient sampling. We propose a gradient-based NBV planner using differential ray sampling, which directly estimates the local gradient direction for viewpoint planning to overcome occlusion and improve perception. Through simulation experiments, we showed that our planner can handle occlusions and improve the 3D reconstruction and position estimation of nodes equally well as a sampling-based NBV planner, while taking ten times less computation and generating 28% more efficient trajectories.

4/30/2024

👀

DAVIS-Ag: A Synthetic Plant Dataset for Prototyping Domain-Inspired Active Vision in Agricultural Robots

Taeyeong Choi, Dario Guevara, Zifei Cheng, Grisha Bandodkar, Chonghan Wang, Brian N. Bailey, Mason Earles, Xin Liu

In agricultural environments, viewpoint planning can be a critical functionality for a robot with visual sensors to obtain informative observations of objects of interest (e.g., fruits) from complex structures of plant with random occlusions. Although recent studies on active vision have shown some potential for agricultural tasks, each model has been designed and validated on a unique environment that would not easily be replicated for benchmarking novel methods being developed later. In this paper, we introduce a dataset, so-called DAVIS-Ag, for promoting more extensive research on Domain-inspired Active VISion in Agriculture. To be specific, we leveraged our open-source AgML framework and 3D plant simulator of Helios to produce 502K RGB images from 30K densely sampled spatial locations in 632 synthetic orchards. Moreover, plant environments of strawberries, tomatoes, and grapes are considered at two different scales (i.e., Single-Plant and Multi-Plant). Useful labels are also provided for each image, including (1) bounding boxes and (2) instance segmentation masks for all identifiable fruits, and also (3) pointers to other images of the viewpoints that are reachable by an execution of action so as to simulate active viewpoint selections at each time step. Using DAVIS-Ag, we visualize motivating examples where fruit visibility can dramatically change depending on the pose of the camera view primarily due to occlusions by other components, such as leaves. Furthermore, we present several baseline models with experiment results for benchmarking in the task of target visibility maximization. Transferability to real strawberry environments is also investigated to demonstrate the feasibility of using the dataset for prototyping real-world solutions. For future research, our dataset is made publicly available online: https://github.com/ctyeong/DAVIS-Ag.

7/2/2024