A Benchmark Grocery Dataset of Realworld Point Clouds From Single View

Read original: arXiv:2402.07819 - Published 4/9/2024 by Shivanand Venkanna Sheshappanavar, Tejas Anvekar, Shivanand Kundargi, Yufan Wang, Chandra Kambhamettu

🎯

Overview

This paper introduces a large-scale 3D dataset called 3DGrocery100 for fine-grained grocery object recognition.
The dataset consists of 87,898 3D point clouds across 100 grocery categories, created from 10,755 RGB-D images.
The availability of mobile 3D sensors like LiDAR and TrueDepth has enabled the creation of this dataset, which can help advance research in areas like automatic checkout and assistive technologies.
The paper evaluates the performance of several state-of-the-art 3D point cloud classification models on this dataset, as well as few-shot and continual learning tasks.

Plain English Explanation

Recognizing individual grocery items is an important problem in computer vision with many real-world applications. Existing datasets for this task are mostly 2D images, which limits the ability of models to learn from the full 3D structure of objects. The recent availability of 3D sensors in mobile devices has opened up new possibilities for creating large-scale 3D datasets of grocery items.

The 3DGrocery100 dataset introduced in this paper aims to fill this gap. It contains over 87,000 3D point cloud representations of 100 different grocery categories, such as various fruits, vegetables, canned goods, and packaged foods. This diverse dataset can be used to train and evaluate 3D object recognition models, which could then be deployed in applications like automated checkout systems or assistive technologies for the visually impaired.

The researchers benchmarked several state-of-the-art 3D classification models on this dataset, as well as more specialized tasks like few-shot and continual learning. This helps establish a baseline for 3D grocery object recognition and identifies areas for future improvement.

Technical Explanation

The paper presents the 3DGrocery100 dataset, which is a large-scale 3D point cloud dataset for fine-grained grocery object recognition. Existing 2D grocery datasets, such as Panoptic Perception and Towards Fine-Grained, are limited in their ability to capture the full 3D structure of objects. The availability of mobile 3D sensors like LiDAR and TrueDepth has enabled the creation of this new dataset.

The 3DGrocery100 dataset consists of 87,898 3D point clouds across 100 grocery categories, created from 10,755 RGB-D images. The dataset is larger and more diverse than existing 3D grocery datasets like POCO, which had a smaller number of categories and samples.

The paper evaluates the performance of six recent state-of-the-art 3D point cloud classification models on the 3DGrocery100 dataset. It also benchmarks the dataset on few-shot and continual learning tasks, which are important for real-world deployment of these models.

Critical Analysis

The 3DGrocery100 dataset represents a significant contribution to the field of 3D object recognition, particularly for fine-grained grocery items. The availability of a large-scale, diverse 3D dataset in this domain can spur further advancements in areas like automated checkout and assistive technologies.

However, the paper does not discuss potential limitations or biases in the dataset, such as the distribution of object sizes, occlusion patterns, or lighting conditions. Additionally, the benchmarking experiments are limited to classification tasks, and the performance of these models on more complex tasks like segmentation or detection is not evaluated.

Future research could explore the use of this dataset for other 3D vision tasks, as well as investigate techniques to improve model robustness and generalization, particularly in few-shot and continual learning scenarios.

Conclusion

The 3DGrocery100 dataset represents an important step forward in fine-grained 3D object recognition for grocery items. By providing a large-scale, diverse dataset of 3D point clouds, this work can enable the development of more capable computer vision models for a variety of real-world applications, from automated checkout to assistive technologies for the visually impaired. The benchmarking results establish a solid baseline for future research in this area, and the dataset is a valuable resource for the broader computer vision community.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🎯

A Benchmark Grocery Dataset of Realworld Point Clouds From Single View

Shivanand Venkanna Sheshappanavar, Tejas Anvekar, Shivanand Kundargi, Yufan Wang, Chandra Kambhamettu

Fine-grained grocery object recognition is an important computer vision problem with broad applications in automatic checkout, in-store robotic navigation, and assistive technologies for the visually impaired. Existing datasets on groceries are mainly 2D images. Models trained on these datasets are limited to learning features from the regular 2D grids. While portable 3D sensors such as Kinect were commonly available for mobile phones, sensors such as LiDAR and TrueDepth, have recently been integrated into mobile phones. Despite the availability of mobile 3D sensors, there are currently no dedicated real-world large-scale benchmark 3D datasets for grocery. In addition, existing 3D datasets lack fine-grained grocery categories and have limited training samples. Furthermore, collecting data by going around the object versus the traditional photo capture makes data collection cumbersome. Thus, we introduce a large-scale grocery dataset called 3DGrocery100. It constitutes 100 classes, with a total of 87,898 3D point clouds created from 10,755 RGB-D single-view images. We benchmark our dataset on six recent state-of-the-art 3D point cloud classification models. Additionally, we also benchmark the dataset on few-shot and continual learning point cloud classification tasks. Project Page: https://bigdatavision.org/3DGrocery100/.

4/9/2024

KITchen: A Real-World Benchmark and Dataset for 6D Object Pose Estimation in Kitchen Environments

Abdelrahman Younes, Tamim Asfour

Despite the recent progress on 6D object pose estimation methods for robotic grasping, a substantial performance gap persists between the capabilities of these methods on existing datasets and their efficacy in real-world grasping and mobile manipulation tasks, particularly when robots rely solely on their monocular egocentric field of view (FOV). Existing real-world datasets primarily focus on table-top grasping scenarios, where a robot arm is placed in a fixed position and the objects are centralized within the FOV of fixed external camera(s). Assessing performance on such datasets may not accurately reflect the challenges encountered in everyday grasping and mobile manipulation tasks within kitchen environments such as retrieving objects from higher shelves, sinks, dishwashers, ovens, refrigerators, or microwaves. To address this gap, we present KITchen, a novel benchmark designed specifically for estimating the 6D poses of objects located in diverse positions within kitchen settings. For this purpose, we recorded a comprehensive dataset comprising around 205k real-world RGBD images for 111 kitchen objects captured in two distinct kitchens, utilizing a humanoid robot with its egocentric perspectives. Subsequently, we developed a semi-automated annotation pipeline, to streamline the labeling process of such datasets, resulting in the generation of 2D object labels, 2D object segmentation masks, and 6D object poses with minimal human effort. The benchmark, the dataset, and the annotation pipeline will be publicly available at https://kitchen-dataset.github.io/KITchen.

7/30/2024

MetaFood3D: Large 3D Food Object Dataset with Nutrition Values

Yuhao Chen, Jiangpeng He, Chris Czarnecki, Gautham Vinod, Talha Ibn Mahmud, Siddeshwar Raghavan, Jinge Ma, Dayou Mao, Saeejith Nair, Pengcheng Xi, Alexander Wong, Edward Delp, Fengqing Zhu

Food computing is both important and challenging in computer vision (CV). It significantly contributes to the development of CV algorithms due to its frequent presence in datasets across various applications, ranging from classification and instance segmentation to 3D reconstruction. The polymorphic shapes and textures of food, coupled with high variation in forms and vast multimodal information, including language descriptions and nutritional data, make food computing a complex and demanding task for modern CV algorithms. 3D food modeling is a new frontier for addressing food-related problems, due to its inherent capability to deal with random camera views and its straightforward representation for calculating food portion size. However, the primary hurdle in the development of algorithms for food object analysis is the lack of nutrition values in existing 3D datasets. Moreover, in the broader field of 3D research, there is a critical need for domain-specific test datasets. To bridge the gap between general 3D vision and food computing research, we propose MetaFood3D. This dataset consists of 637 meticulously labeled 3D food objects across 108 categories, featuring detailed nutrition information, weight, and food codes linked to a comprehensive nutrition database. The dataset emphasizes intra-class diversity and includes rich modalities such as textured mesh files, RGB-D videos, and segmentation masks. Experimental results demonstrate our dataset's significant potential for improving algorithm performance, highlight the challenging gap between video captures and 3D scanned data, and show the strength of the MetaFood3D dataset in high-quality data generation, simulation, and augmentation.

9/4/2024

A Dataset and Benchmark for Shape Completion of Fruits for Agricultural Robotics

Federico Magistri, Thomas Labe, Elias Marks, Sumanth Nagulavancha, Yue Pan, Claus Smitt, Lasse Klingbeil, Michael Halstead, Heiner Kuhlmann, Chris McCool, Jens Behley, Cyrill Stachniss

As the population is expected to reach 10 billion by 2050, our agricultural production system needs to double its productivity despite a decline of human workforce in the agricultural sector. Autonomous robotic systems are one promising pathway to increase productivity by taking over labor-intensive manual tasks like fruit picking. To be effective, such systems need to monitor and interact with plants and fruits precisely, which is challenging due to the cluttered nature of agricultural environments causing, for example, strong occlusions. Thus, being able to estimate the complete 3D shapes of objects in presence of occlusions is crucial for automating operations such as fruit harvesting. In this paper, we propose the first publicly available 3D shape completion dataset for agricultural vision systems. We provide an RGB-D dataset for estimating the 3D shape of fruits. Specifically, our dataset contains RGB-D frames of single sweet peppers in lab conditions but also in a commercial greenhouse. For each fruit, we additionally collected high-precision point clouds that we use as ground truth. For acquiring the ground truth shape, we developed a measuring process that allows us to record data of real sweet pepper plants, both in the lab and in the greenhouse with high precision, and determine the shape of the sensed fruits. We release our dataset, consisting of almost 7000 RGB-D frames belonging to more than 100 different fruits. We provide segmented RGB-D frames, with camera instrinsics to easily obtain colored point clouds, together with the corresponding high-precision, occlusion-free point clouds obtained with a high-precision laser scanner. We additionally enable evaluation ofshape completion approaches on a hidden test set through a public challenge on a benchmark server.

7/19/2024