TinyM$^2$Net-V3: Memory-Aware Compressed Multimodal Deep Neural Networks for Sustainable Edge Deployment

Read original: arXiv:2405.12353 - Published 5/22/2024 by Hasib-Al Rashid, Tinoosh Mohsenin

🤿

Overview

The paper discusses the growing problem of energy usage and carbon emissions from advanced artificial intelligence (AI) algorithms.
It introduces a system called TinyM$^2$Net-V3 that aims to develop sustainable AI solutions for resource-constrained devices.
TinyM$^2$Net-V3 processes multimodal data, designs efficient deep neural network models, and employs compression techniques to fit the models on low-memory hardware.
The system is evaluated on two case studies: COVID-19 detection using audio data and pose classification from depth and thermal images.

Plain English Explanation

As artificial intelligence (AI) systems become more sophisticated, they are also consuming more energy and contributing to higher carbon emissions. This is a growing concern, especially as AI expands into various industries. To address this problem, researchers have developed a system called TinyM$^2$Net-V3 that aims to create sustainable AI solutions for devices with limited resources.

The key idea behind TinyM$^2$Net-V3 is to process different types of data (known as multimodal data) efficiently and design deep learning models that are small and energy-efficient. This is challenging because combining multiple data sources can increase the complexity, latency, and power consumption of the models.

To overcome these challenges, TinyM$^2$Net-V3 employs techniques like knowledge distillation and low-bit quantization to compress the deep learning models and fit them on devices with limited memory. This allows the models to run quickly and efficiently, even on resource-constrained hardware.

The researchers tested TinyM$^2$Net-V3 on two real-world applications: detecting COVID-19 from audio data (cough, speech, and breathing) and classifying human poses from depth and thermal images. Despite the tiny size of the models (6 KB and 58 KB), they achieved high accuracy, low latency, and excellent power efficiency.

Technical Explanation

The paper introduces TinyM$^2$Net-V3, a system designed to process multimodal data, create efficient deep neural network (DNN) models, and employ model compression techniques to fit the models on resource-constrained devices.

The key components of TinyM$^2$Net-V3 include:

Multimodal Data Processing: The system is capable of handling diverse data types, such as audio (cough, speech, breathing), depth images, and thermal images, to tackle various real-world problems.
DNN Model Design: The researchers designed DNN models that can effectively learn from the multimodal data, leveraging techniques like attention mechanisms to capture cross-modal relationships.
Model Compression: To fit the models on low-memory hardware, TinyM$^2$Net-V3 employs knowledge distillation and low-bit quantization with memory-aware considerations.

The researchers evaluated TinyM$^2$Net-V3 on two case studies:

COVID-19 Detection: The system processed cough, speech, and breathing audio data to detect COVID-19 infections. The tiny inference model (6 KB) achieved 92.95% accuracy.
Pose Classification: TinyM$^2$Net-V3 classified human poses from depth and thermal images. The tiny model (58 KB) reached 90.7% accuracy.

In both case studies, the tiny models deployed on resource-limited hardware demonstrated low latencies (within milliseconds) and high power efficiency.

Critical Analysis

The paper presents a promising approach to developing sustainable AI solutions for resource-constrained devices. The use of multimodal data and efficient model design techniques, such as attention mechanisms, are well-justified and aligned with the goal of creating compact, energy-efficient models.

However, the paper could have provided more details on the specific compression techniques used, such as the quantization schemes and the trade-offs between model size, accuracy, and efficiency. Additionally, the authors could have discussed the potential limitations of their approach, such as the ability to scale to more complex tasks or the generalizability of the techniques to other application domains.

Furthermore, the paper could have addressed the broader implications of sustainable AI development, such as the need for holistic system-level optimization beyond just the machine learning models, and the potential impact on the overall energy consumption and carbon footprint of AI systems.

Conclusion

This paper presents a compelling approach to addressing the growing energy and emissions concerns associated with advanced artificial intelligence algorithms. The TinyM$^2$Net-V3 system demonstrates that it is possible to create efficient, sustainable AI solutions for resource-constrained devices by leveraging multimodal data, optimized model design, and state-of-the-art compression techniques.

The successful application of TinyM$^2$Net-V3 to real-world problems, such as COVID-19 detection and pose classification, showcases the potential of this approach to drive technological progress while also considering environmental responsibility. As the field of AI continues to evolve, the insights and techniques presented in this paper can serve as a valuable framework for developing future sustainable AI systems.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🤿

TinyM$^2$Net-V3: Memory-Aware Compressed Multimodal Deep Neural Networks for Sustainable Edge Deployment

Hasib-Al Rashid, Tinoosh Mohsenin

The advancement of sophisticated artificial intelligence (AI) algorithms has led to a notable increase in energy usage and carbon dioxide emissions, intensifying concerns about climate change. This growing problem has brought the environmental sustainability of AI technologies to the forefront, especially as they expand across various sectors. In response to these challenges, there is an urgent need for the development of sustainable AI solutions. These solutions must focus on energy-efficient embedded systems that are capable of handling diverse data types even in environments with limited resources, thereby ensuring both technological progress and environmental responsibility. Integrating complementary multimodal data into tiny machine learning models for edge devices is challenging due to increased complexity, latency, and power consumption. This work introduces TinyM$^2$Net-V3, a system that processes different modalities of complementary data, designs deep neural network (DNN) models, and employs model compression techniques including knowledge distillation and low bit-width quantization with memory-aware considerations to fit models within lower memory hierarchy levels, reducing latency and enhancing energy efficiency on resource-constrained devices. We evaluated TinyM$^2$Net-V3 in two multimodal case studies: COVID-19 detection using cough, speech, and breathing audios, and pose classification from depth and thermal images. With tiny inference models (6 KB and 58 KB), we achieved 92.95% and 90.7% accuracies, respectively. Our tiny machine learning models, deployed on resource limited hardware, demonstrated low latencies within milliseconds and very high power efficiency.

5/22/2024

TinyVQA: Compact Multimodal Deep Neural Network for Visual Question Answering on Resource-Constrained Devices

Hasib-Al Rashid, Argho Sarkar, Aryya Gangopadhyay, Maryam Rahnemoonfar, Tinoosh Mohsenin

Traditional machine learning models often require powerful hardware, making them unsuitable for deployment on resource-limited devices. Tiny Machine Learning (tinyML) has emerged as a promising approach for running machine learning models on these devices, but integrating multiple data modalities into tinyML models still remains a challenge due to increased complexity, latency, and power consumption. This paper proposes TinyVQA, a novel multimodal deep neural network for visual question answering tasks that can be deployed on resource-constrained tinyML hardware. TinyVQA leverages a supervised attention-based model to learn how to answer questions about images using both vision and language modalities. Distilled knowledge from the supervised attention-based VQA model trains the memory aware compact TinyVQA model and low bit-width quantization technique is employed to further compress the model for deployment on tinyML devices. The TinyVQA model was evaluated on the FloodNet dataset, which is used for post-disaster damage assessment. The compact model achieved an accuracy of 79.5%, demonstrating the effectiveness of TinyVQA for real-world applications. Additionally, the model was deployed on a Crazyflie 2.0 drone, equipped with an AI deck and GAP8 microprocessor. The TinyVQA model achieved low latencies of 56 ms and consumes 693 mW power while deployed on the tiny drone, showcasing its suitability for resource-constrained embedded systems.

4/5/2024

ExtremeMETA: High-speed Lightweight Image Segmentation Model by Remodeling Multi-channel Metamaterial Imagers

Quan Liu, Brandon T. Swartz, Ivan Kravchenko, Jason G. Valentine, Yuankai Huo

Deep neural networks (DNNs) have heavily relied on traditional computational units like CPUs and GPUs. However, this conventional approach brings significant computational burdens, latency issues, and high power consumption, limiting their effectiveness. This has sparked the need for lightweight networks like ExtremeC3Net. On the other hand, there have been notable advancements in optical computational units, particularly with metamaterials, offering the exciting prospect of energy-efficient neural networks operating at the speed of light. Yet, the digital design of metamaterial neural networks (MNNs) faces challenges such as precision, noise, and bandwidth, limiting their application to intuitive tasks and low-resolution images. In this paper, we propose a large kernel lightweight segmentation model, ExtremeMETA. Based on the ExtremeC3Net, the ExtremeMETA maximizes the ability of the first convolution layer by exploring a larger convolution kernel and multiple processing paths. With the proposed large kernel convolution model, we extend the optic neural network application boundary to the segmentation task. To further lighten the computation burden of the digital processing part, a set of model compression methods is applied to improve model efficiency in the inference stage. The experimental results on three publicly available datasets demonstrate that the optimized efficient design improved segmentation performance from 92.45 to 95.97 on mIoU while reducing computational FLOPs from 461.07 MMacs to 166.03 MMacs. The proposed the large kernel lightweight model ExtremeMETA showcases the hybrid design's ability on complex tasks.

5/29/2024

Memory-efficient Energy-adaptive Inference of Pre-Trained Models on Batteryless Embedded Systems

Pietro Farina, Subrata Biswas, Eren Y{i}ld{i}z, Khakim Akhunov, Saad Ahmed, Bashima Islam, Kas{i}m Sinan Y{i}ld{i}r{i}m

Batteryless systems frequently face power failures, requiring extra runtime buffers to maintain inference progress and leaving only a memory space for storing ultra-tiny deep neural networks (DNNs). Besides, making these models responsive to stochastic energy harvesting dynamics during inference requires a balance between inference accuracy, latency, and energy overhead. Recent works on compression mostly focus on time and memory, but often ignore energy dynamics or significantly reduce the accuracy of pre-trained DNNs. Existing energy-adaptive inference works modify the architecture of pre-trained models and have significant memory overhead. Thus, energy-adaptive and accurate inference of pre-trained DNNs on batteryless devices with extreme memory constraints is more challenging than traditional microcontrollers. We combat these issues by proposing FreeML, a framework to optimize pre-trained DNN models for memory-efficient and energy-adaptive inference on batteryless systems. FreeML comprises (1) a novel compression technique to reduce the model footprint and runtime memory requirements simultaneously, making them executable on extremely memory-constrained batteryless platforms; and (2) the first early exit mechanism that uses a single exit branch for all exit points to terminate inference at any time, making models energy-adaptive with minimal memory overhead. Our experiments showed that FreeML reduces the model sizes by up to $95 times$, supports adaptive inference with a $2.03-19.65 times$ less memory overhead, and provides significant time and energy benefits with only a negligible accuracy drop compared to the state-of-the-art.

5/20/2024