Computer Vision in the Food Industry: Accurate, Real-time, and Automatic Food Recognition with Pretrained MobileNetV2

Read original: arXiv:2405.11621 - Published 5/21/2024 by Shayan Rokhva, Babak Teimourpour, Amir Hossein Soltani

👀

Overview

This research paper explores the potential of using artificial intelligence (AI) for automatic food recognition, which could have significant benefits for nutrition tracking, reducing food waste, and improving productivity in food production and consumption.
The study employs the efficient and fast MobileNetV2 model for food recognition on a public dataset, and utilizes various techniques such as transfer learning, data augmentation, and hyperparameter tuning to enhance performance and robustness.
Despite using a lighter model with fewer parameters compared to some deep learning models, the study achieved commendable accuracy, underscoring the potential for practical implementation.

Plain English Explanation

The paper discusses how modern AI technologies, such as computer vision and deep learning, can be used to automatically recognize different types of food. This could be very useful for things like tracking people's nutrition, reducing food waste, and making food production and consumption more efficient.

The researchers used a specific deep learning model called MobileNetV2, which is designed to be fast and efficient. They trained this model on a public dataset of over 16,000 food images. To improve the model's performance, they used various techniques like transfer learning (using a model trained on a different task), data augmentation (artificially expanding the dataset), and tuning the model's hyperparameters (adjusting the settings).

Even though the MobileNetV2 model is relatively simple and lightweight compared to some other deep learning models, the researchers were able to achieve good accuracy in a short amount of time. This suggests that this approach could be practical for real-world applications, which is the main goal of the study.

Technical Explanation

The researchers in this study leveraged the MobileNetV2 model, a lightweight and efficient convolutional neural network, to perform automatic food recognition on the public Food11 dataset. This dataset contains 16,643 images across 11 food categories.

To enhance the model's performance, the researchers employed various techniques, including:

Transfer learning: They used a pre-trained version of the MobileNetV2 model as a starting point, which had been trained on a large general-purpose image dataset.
Data augmentation: They applied transformations like rotation, flipping, and scaling to the input images to artificially expand the dataset and improve the model's generalization.
Regularization: They used techniques like L2 regularization to prevent overfitting and ensure the model performs well on new, unseen data.
Dynamic learning rate: They adjusted the learning rate (the step size for updating the model's parameters) during training to speed up convergence.
Hyperparameter tuning: They experimented with different hyperparameter settings, such as the batch size and the number of training epochs, to find the optimal configuration.
Consideration of image sizes: They trained and evaluated the model on images of different sizes to assess its performance and robustness.

The researchers' approach, despite using a relatively simple and lightweight model, achieved commendable accuracy on the Food11 dataset. This underscores the potential for practical implementation of automatic food recognition using efficient deep learning models, which is the primary goal of this study.

Critical Analysis

The researchers have demonstrated the feasibility of using the MobileNetV2 model for accurate and efficient automatic food recognition. However, the study does have some limitations:

Dataset size and diversity: The Food11 dataset, while publicly available, may not be representative of the full diversity of real-world food items. Expanding the dataset or evaluating the model on additional datasets could provide a more comprehensive assessment of its performance.
Real-world deployment: The study was conducted in a controlled, laboratory-like setting. Applying the model to real-world scenarios, such as cell phone-based food recognition or in-environment monitoring, may present additional challenges that were not addressed in this study.
Computational and hardware requirements: While the MobileNetV2 model is efficient, the study does not provide a detailed analysis of the computational resources required for deployment, which could be an important consideration for practical applications.

Further research could explore the model's performance on more diverse and challenging food datasets, investigate its suitability for real-world deployment scenarios, and analyze the computational and hardware requirements for practical implementation.

Conclusion

This study demonstrates the potential of using efficient deep learning models, such as MobileNetV2, for automatic food recognition. The researchers' approach, which leverages various techniques to enhance performance and robustness, achieved commendable accuracy on a public dataset.

The findings of this study suggest that practical implementation of automatic food recognition is feasible, which could lead to significant benefits in areas like nutrition tracking, food waste reduction, and improved productivity in food-related industries. Further research is needed to address the limitations and explore the model's performance in real-world settings, but this work represents an important step towards the practical application of AI for food-related tasks.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

👀

Computer Vision in the Food Industry: Accurate, Real-time, and Automatic Food Recognition with Pretrained MobileNetV2

Shayan Rokhva, Babak Teimourpour, Amir Hossein Soltani

In contemporary society, the application of artificial intelligence for automatic food recognition offers substantial potential for nutrition tracking, reducing food waste, and enhancing productivity in food production and consumption scenarios. Modern technologies such as Computer Vision and Deep Learning are highly beneficial, enabling machines to learn automatically, thereby facilitating automatic visual recognition. Despite some research in this field, the challenge of achieving accurate automatic food recognition quickly remains a significant research gap. Some models have been developed and implemented, but maintaining high performance swiftly, with low computational cost and low access to expensive hardware accelerators, still needs further exploration and research. This study employs the pretrained MobileNetV2 model, which is efficient and fast, for food recognition on the public Food11 dataset, comprising 16643 images. It also utilizes various techniques such as dataset understanding, transfer learning, data augmentation, regularization, dynamic learning rate, hyperparameter tuning, and consideration of images in different sizes to enhance performance and robustness. These techniques aid in choosing appropriate metrics, achieving better performance, avoiding overfitting and accuracy fluctuations, speeding up the model, and increasing the generalization of findings, making the study and its results applicable to practical applications. Despite employing a light model with a simpler structure and fewer trainable parameters compared to some deep and dense models in the deep learning area, it achieved commendable accuracy in a short time. This underscores the potential for practical implementation, which is the main intention of this study.

5/21/2024

Vision-Based Approach for Food Weight Estimation from 2D Images

Chathura Wimalasiri, Prasan Kumar Sahoo

In response to the increasing demand for efficient and non-invasive methods to estimate food weight, this paper presents a vision-based approach utilizing 2D images. The study employs a dataset of 2380 images comprising fourteen different food types in various portions, orientations, and containers. The proposed methodology integrates deep learning and computer vision techniques, specifically employing Faster R-CNN for food detection and MobileNetV3 for weight estimation. The detection model achieved a mean average precision (mAP) of 83.41%, an average Intersection over Union (IoU) of 91.82%, and a classification accuracy of 100%. For weight estimation, the model demonstrated a root mean squared error (RMSE) of 6.3204, a mean absolute percentage error (MAPE) of 0.0640%, and an R-squared value of 98.65%. The study underscores the potential applications of this technology in healthcare for nutrition counseling, fitness and wellness for dietary intake assessment, and smart food storage solutions to reduce waste. The results indicate that the combination of Faster R-CNN and MobileNetV3 provides a robust framework for accurate food weight estimation from 2D images, showcasing the synergy of computer vision and deep learning in practical applications.

5/28/2024

🤿

NutritionVerse-Direct: Exploring Deep Neural Networks for Multitask Nutrition Prediction from Food Images

Matthew Keller, Chi-en Amy Tai, Yuhao Chen, Pengcheng Xi, Alexander Wong

Many aging individuals encounter challenges in effectively tracking their dietary intake, exacerbating their susceptibility to nutrition-related health complications. Self-reporting methods are often inaccurate and suffer from substantial bias; however, leveraging intelligent prediction methods can automate and enhance precision in this process. Recent work has explored using computer vision prediction systems to predict nutritional information from food images. Still, these methods are often tailored to specific situations, require other inputs in addition to a food image, or do not provide comprehensive nutritional information. This paper aims to enhance the efficacy of dietary intake estimation by leveraging various neural network architectures to directly predict a meal's nutritional content from its image. Through comprehensive experimentation and evaluation, we present NutritionVerse-Direct, a model utilizing a vision transformer base architecture with three fully connected layers that lead to five regression heads predicting calories (kcal), mass (g), protein (g), fat (g), and carbohydrates (g) present in a meal. NutritionVerse-Direct yields a combined mean average error score on the NutritionVerse-Real dataset of 412.6, an improvement of 25.5% over the Inception-ResNet model, demonstrating its potential for improving dietary intake estimation accuracy.

5/14/2024

Enhancing Fruit and Vegetable Detection in Unconstrained Environment with a Novel Dataset

Sandeep Khanna, Chiranjoy Chattopadhyay, Suman Kundu

Automating the detection of fruits and vegetables using computer vision is essential for modernizing agriculture, improving efficiency, ensuring food quality, and contributing to technologically advanced and sustainable farming practices. This paper presents an end-to-end pipeline for detecting and localizing fruits and vegetables in real-world scenarios. To achieve this, we have curated a dataset named FRUVEG67 that includes images of 67 classes of fruits and vegetables captured in unconstrained scenarios, with only a few manually annotated samples per class. We have developed a semi-supervised data annotation algorithm (SSDA) that generates bounding boxes for objects to label the remaining non-annotated images. For detection, we introduce the Fruit and Vegetable Detection Network (FVDNet), an ensemble version of YOLOv7 featuring three distinct grid configurations. We employ an averaging approach for bounding-box prediction and a voting mechanism for class prediction. We have integrated Jensen-Shannon divergence (JSD) in conjunction with focal loss to better detect smaller objects. Our experimental results highlight the superiority of FVDNet compared to previous versions of YOLO, showcasing remarkable improvements in detection and localization performance. We achieved an impressive mean average precision (mAP) score of 0.78 across all classes. Furthermore, we evaluated the efficacy of FVDNet using open-category refrigerator images, where it demonstrates promising results.

9/23/2024