Automatic Recognition of Food Ingestion Environment from the AIM-2 Wearable Sensor

2405.07827

Published 5/14/2024 by Yuning Huang, Mohamed Abul Hassan, Jiangpeng He, Janine Higgins, Megan McCrory, Heather Eicher-Miller, Graham Thomas, Edward O Sazonov, Fengqing Maggie Zhu

cs.MM cs.AI cs.CV

👁️

Abstract

Detecting an ingestion environment is an important aspect of monitoring dietary intake. It provides insightful information for dietary assessment. However, it is a challenging problem where human-based reviewing can be tedious, and algorithm-based review suffers from data imbalance and perceptual aliasing problems. To address these issues, we propose a neural network-based method with a two-stage training framework that tactfully combines fine-tuning and transfer learning techniques. Our method is evaluated on a newly collected dataset called ``UA Free Living Study, which uses an egocentric wearable camera, AIM-2 sensor, to simulate food consumption in free-living conditions. The proposed training framework is applied to common neural network backbones, combined with approaches in the general imbalanced classification field. Experimental results on the collected dataset show that our proposed method for automatic ingestion environment recognition successfully addresses the challenging data imbalance problem in the dataset and achieves a promising overall classification accuracy of 96.63%.

Create account to get full access

Overview

Detecting dietary intake is an important task for monitoring and assessing people's diets
However, it is a challenging problem with limitations in both human-based and algorithm-based approaches
The paper proposes a new neural network-based method using a two-stage training framework to address these challenges

Plain English Explanation

The paper focuses on the problem of automatically detecting when a person is eating or ingesting food. Being able to accurately track someone's dietary intake provides valuable insights for assessing their overall health and nutrition. However, this is a difficult task that has limitations with both manual (human-based) and automated (algorithm-based) approaches.

Manual review of dietary intake data can be tedious and time-consuming for researchers. On the other hand, automated algorithms can struggle with problems like data imbalance and perceptual aliasing, where the algorithms have trouble accurately distinguishing eating behaviors from other similar movements or activities.

To address these challenges, the researchers propose a new neural network-based method that uses a two-stage training approach. This combines techniques like fine-tuning and transfer learning to build a model that can effectively detect eating or ingestion events, even in complex, real-world scenarios. The researchers evaluate their method on a new dataset they collected, called the "UA Free Living Study," which uses a wearable camera and sensor to simulate eating in everyday settings.

Technical Explanation

The paper presents a neural network-based approach for automatically recognizing when a person is in an "ingestion environment," meaning they are eating or drinking. This is an important capability for dietary assessment and monitoring, but it is a challenging computer vision problem.

The researchers developed a two-stage training framework that combines fine-tuning and transfer learning techniques to build their ingestion detection model. They evaluated this approach using a new dataset called the "UA Free Living Study," which contains egocentric (first-person) video and sensor data captured by a wearable camera and an AIM-2 sensor. This dataset aims to simulate real-world eating behaviors in free-living conditions.

The two-stage training process first pre-trains the model on a large, general dataset to learn basic visual features. It then fine-tunes the model using the specialized UA Free Living dataset to optimize performance on the ingestion detection task. The researchers experimented with different common neural network backbones and techniques from the imbalanced classification literature to address the data skew issues in the dataset.

The results show that the proposed method achieves an impressive overall classification accuracy of 96.63% on the UA Free Living dataset, demonstrating its effectiveness at addressing the challenges of detecting ingestion events in unconstrained, real-world settings.

Critical Analysis

The paper presents a well-designed and thorough approach to the problem of automatic ingestion detection, leveraging state-of-the-art techniques in deep learning and computer vision. The two-stage training framework and use of the specialized UA Free Living dataset are particularly noteworthy strengths of the research.

However, the paper does acknowledge some limitations of the work. For example, the dataset, while valuable, is still relatively small and may not fully capture the diverse range of real-world eating behaviors. Additionally, the paper does not deeply explore how the method would perform in more challenging scenarios, such as when the subject's face or hands are occluded, or when eating occurs in cluttered environments.

Further research could investigate the robustness of the ingestion detection model in these more complex situations. It would also be interesting to see how the method compares to other recently proposed approaches, such as those using multi-modal sensor fusion or edge computing for real-time dietary monitoring.

Conclusion

This paper presents a promising neural network-based approach for automatically detecting when a person is in an ingestion environment, which is a crucial capability for dietary assessment and monitoring. By leveraging fine-tuning and transfer learning techniques, the researchers were able to build a model that achieves high accuracy on a new dataset designed to simulate real-world eating behaviors.

While the method has some limitations, the strong performance demonstrated in this work highlights the potential of advanced computer vision and deep learning techniques to transform the way we track and understand people's dietary intake. Further advancements in this area could lead to more effective and user-friendly nutrition monitoring systems, with valuable applications in healthcare, public health, and beyond.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🗣️

How Much You Ate? Food Portion Estimation on Spoons

Aaryam Sharma, Chris Czarnecki, Yuhao Chen, Pengcheng Xi, Linlin Xu, Alexander Wong

Monitoring dietary intake is a crucial aspect of promoting healthy living. In recent years, advances in computer vision technology have facilitated dietary intake monitoring through the use of images and depth cameras. However, the current state-of-the-art image-based food portion estimation algorithms assume that users take images of their meals one or two times, which can be inconvenient and fail to capture food items that are not visible from a top-down perspective, such as ingredients submerged in a stew. To address these limitations, we introduce an innovative solution that utilizes stationary user-facing cameras to track food items on utensils, not requiring any change of camera perspective after installation. The shallow depth of utensils provides a more favorable angle for capturing food items, and tracking them on the utensil's surface offers a significantly more accurate estimation of dietary intake without the need for post-meal image capture. The system is reliable for estimation of nutritional content of liquid-solid heterogeneous mixtures such as soups and stews. Through a series of experiments, we demonstrate the exceptional potential of our method as a non-invasive, user-friendly, and highly accurate dietary intake monitoring tool.

5/15/2024

cs.CV cs.AI

👀

Computer Vision in the Food Industry: Accurate, Real-time, and Automatic Food Recognition with Pretrained MobileNetV2

Shayan Rokhva, Babak Teimourpour, Amir Hossein Soltani

In contemporary society, the application of artificial intelligence for automatic food recognition offers substantial potential for nutrition tracking, reducing food waste, and enhancing productivity in food production and consumption scenarios. Modern technologies such as Computer Vision and Deep Learning are highly beneficial, enabling machines to learn automatically, thereby facilitating automatic visual recognition. Despite some research in this field, the challenge of achieving accurate automatic food recognition quickly remains a significant research gap. Some models have been developed and implemented, but maintaining high performance swiftly, with low computational cost and low access to expensive hardware accelerators, still needs further exploration and research. This study employs the pretrained MobileNetV2 model, which is efficient and fast, for food recognition on the public Food11 dataset, comprising 16643 images. It also utilizes various techniques such as dataset understanding, transfer learning, data augmentation, regularization, dynamic learning rate, hyperparameter tuning, and consideration of images in different sizes to enhance performance and robustness. These techniques aid in choosing appropriate metrics, achieving better performance, avoiding overfitting and accuracy fluctuations, speeding up the model, and increasing the generalization of findings, making the study and its results applicable to practical applications. Despite employing a light model with a simpler structure and fewer trainable parameters compared to some deep and dense models in the deep learning area, it achieved commendable accuracy in a short time. This underscores the potential for practical implementation, which is the main intention of this study.

5/21/2024

cs.CV

MunchSonic: Tracking Fine-grained Dietary Actions through Active Acoustic Sensing on Eyeglasses

Saif Mahmud, Devansh Agarwal, Ashwin Ajit, Qikang Liang, Thalia Viranda, Francois Guimbretiere, Cheng Zhang

We introduce MunchSonic, an AI-powered active acoustic sensing system integrated into eyeglasses, designed to track fine-grained dietary actions like hand-to-mouth movements for food intake, chewing, and drinking. MunchSonic emits inaudible ultrasonic waves from a commodity eyeglass frame. The reflected signals contain rich information about the position and movements of various body parts, including the mouth, jaw, arms, and hands, all of which are involved in eating activities. These signals are then processed by a custom deep-learning pipeline to classify six actions: food intake, chewing, drinking, talking, face-hand touching, and other activities (null). In an unconstrained user study with 12 participants, MunchSonic achieves a 93.5% macro F1-score in a user-independent evaluation with a 2-second time resolution, demonstrating its effectiveness. Additionally, MunchSonic accurately tracks eating episodes and the frequency of food intake within those episodes.

6/3/2024

cs.HC cs.ET

🖼️

Leveraging Automatic Personalised Nutrition: Food Image Recognition Benchmark and Dataset based on Nutrition Taxonomy

Sergio Romero-Tapiador, Ruben Tolosana, Aythami Morales, Julian Fierrez, Ruben Vera-Rodriguez, Isabel Espinosa-Salinas, Gala Freixer, Enrique Carrillo de Santa Pau, Ana Ram'irez de Molina, Javier Ortega-Garcia

Maintaining a healthy lifestyle has become increasingly challenging in today's sedentary society marked by poor eating habits. To address this issue, both national and international organisations have made numerous efforts to promote healthier diets and increased physical activity. However, implementing these recommendations in daily life can be difficult, as they are often generic and not tailored to individuals. This study presents the AI4Food-NutritionDB database, the first nutrition database that incorporates food images and a nutrition taxonomy based on recommendations by national and international health authorities. The database offers a multi-level categorisation, comprising 6 nutritional levels, 19 main categories (e.g., Meat), 73 subcategories (e.g., White Meat), and 893 specific food products (e.g., Chicken). The AI4Food-NutritionDB opens the doors to new food computing approaches in terms of food intake frequency, quality, and categorisation. Also, we present a standardised experimental protocol and benchmark including three tasks based on the nutrition taxonomy (i.e., category, subcategory, and final product recognition). These resources are available to the research community, including our deep learning models trained on AI4Food-NutritionDB, which can serve as pre-trained models, achieving accurate recognition results for challenging food image databases.

4/22/2024

cs.CV cs.MM