Memory-efficient Energy-adaptive Inference of Pre-Trained Models on Batteryless Embedded Systems

Read original: arXiv:2405.10426 - Published 5/20/2024 by Pietro Farina, Subrata Biswas, Eren Y{i}ld{i}z, Khakim Akhunov, Saad Ahmed, Bashima Islam, Kas{i}m Sinan Y{i}ld{i}r{i}m

Memory-efficient Energy-adaptive Inference of Pre-Trained Models on Batteryless Embedded Systems

Overview

This paper presents a memory-efficient and energy-adaptive inference system for running pre-trained deep neural network (DNN) models on batteryless embedded systems.
The proposed approach aims to enable DNN inference on devices with limited memory and intermittent power, which are common constraints in low-power edge computing applications.
Key techniques include model compression, dynamic memory management, and energy-aware model adaptation to handle varying power availability.

Plain English Explanation

The paper focuses on running advanced AI models, like deep neural networks, on small, battery-free devices. These types of devices are common in the "edge computing" field, where processing is done close to the source of data rather than in a centralized data center.

The challenge is that these edge devices often have very limited memory and can only run when they have enough power, which may be intermittent. This makes it difficult to use powerful AI models that typically require a lot of memory and consistent power.

The researchers developed a system to overcome these constraints. They use model compression techniques to shrink the neural network size so it can fit in the device's small memory. They also dynamically manage the memory usage and adapt the model to the available power, so the device can continue operating even when the power supply is interrupted.

By addressing the memory and power challenges, this work enables deploying advanced AI capabilities on a new class of small, battery-free devices. This could lead to exciting real-world applications in areas like environmental monitoring, healthcare, and smart infrastructure, where data needs to be processed locally rather than sending it to the cloud.

Technical Explanation

The paper proposes a memory-efficient and energy-adaptive inference system for running pre-trained deep neural network (DNN) models on batteryless embedded systems.

To address the memory constraints, the system uses model compression techniques like weight pruning and quantization to shrink the size of the neural network. This allows the compressed model to fit within the limited memory available on the target edge device.

To handle the intermittent power supply, the system employs dynamic memory management and an energy-aware model adaptation mechanism. This ensures that the device can continue DNN inference even when the power is interrupted, by intelligently allocating memory and adjusting the model complexity to match the available energy.

The researchers evaluate their system on several DNN models and edge hardware platforms. They demonstrate significant improvements in memory efficiency and energy adaptability compared to baseline approaches, enabling resource-aware deployment of dynamic DNNs on batteryless embedded systems.

Critical Analysis

The paper provides a well-designed and comprehensive solution for running DNN inference on resource-constrained, batteryless edge devices. The techniques of model compression, dynamic memory management, and energy-aware model adaptation are well-justified and show promising results.

However, the paper does not address the potential challenges of model retraining or fine-tuning on the edge device, which may be necessary to maintain high accuracy in real-world, non-stationary environments. Additionally, the energy-aware adaptation mechanism could be further improved to handle more complex power fluctuations and ensure reliable operation under diverse environmental conditions.

Future research could explore techniques for data-efficient meta-learning to enable on-device model updates, as well as more advanced power management strategies to cope with the unpredictability of energy harvesting systems.

Conclusion

This paper presents a memory-efficient and energy-adaptive inference system that enables the deployment of pre-trained deep neural network models on batteryless embedded systems. By addressing the key challenges of limited memory and intermittent power, the proposed techniques unlock the potential for advanced AI capabilities on a new class of small, low-power edge devices.

The successful implementation of this system could lead to a wide range of innovative applications in areas such as environmental monitoring, healthcare, and smart infrastructure, where data processing needs to be performed locally rather than relying on cloud connectivity. The insights and methods from this work can serve as a foundation for further research and development in the field of resource-constrained edge computing.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Memory-efficient Energy-adaptive Inference of Pre-Trained Models on Batteryless Embedded Systems

Pietro Farina, Subrata Biswas, Eren Y{i}ld{i}z, Khakim Akhunov, Saad Ahmed, Bashima Islam, Kas{i}m Sinan Y{i}ld{i}r{i}m

Batteryless systems frequently face power failures, requiring extra runtime buffers to maintain inference progress and leaving only a memory space for storing ultra-tiny deep neural networks (DNNs). Besides, making these models responsive to stochastic energy harvesting dynamics during inference requires a balance between inference accuracy, latency, and energy overhead. Recent works on compression mostly focus on time and memory, but often ignore energy dynamics or significantly reduce the accuracy of pre-trained DNNs. Existing energy-adaptive inference works modify the architecture of pre-trained models and have significant memory overhead. Thus, energy-adaptive and accurate inference of pre-trained DNNs on batteryless devices with extreme memory constraints is more challenging than traditional microcontrollers. We combat these issues by proposing FreeML, a framework to optimize pre-trained DNN models for memory-efficient and energy-adaptive inference on batteryless systems. FreeML comprises (1) a novel compression technique to reduce the model footprint and runtime memory requirements simultaneously, making them executable on extremely memory-constrained batteryless platforms; and (2) the first early exit mechanism that uses a single exit branch for all exit points to terminate inference at any time, making models energy-adaptive with minimal memory overhead. Our experiments showed that FreeML reduces the model sizes by up to $95 times$, supports adaptive inference with a $2.03-19.65 times$ less memory overhead, and provides significant time and energy benefits with only a negligible accuracy drop compared to the state-of-the-art.

5/20/2024

🧠

Resource-Efficient Neural Networks for Embedded Systems

Wolfgang Roth, Gunther Schindler, Bernhard Klein, Robert Peharz, Sebastian Tschiatschek, Holger Froning, Franz Pernkopf, Zoubin Ghahramani

While machine learning is traditionally a resource intensive task, embedded systems, autonomous navigation, and the vision of the Internet of Things fuel the interest in resource-efficient approaches. These approaches aim for a carefully chosen trade-off between performance and resource consumption in terms of computation and energy. The development of such approaches is among the major challenges in current machine learning research and key to ensure a smooth transition of machine learning technology from a scientific environment with virtually unlimited computing resources into everyday's applications. In this article, we provide an overview of the current state of the art of machine learning techniques facilitating these real-world requirements. In particular, we focus on resource-efficient inference based on deep neural networks (DNNs), the predominant machine learning models of the past decade. We give a comprehensive overview of the vast literature that can be mainly split into three non-mutually exclusive categories: (i) quantized neural networks, (ii) network pruning, and (iii) structural efficiency. These techniques can be applied during training or as post-processing, and they are widely used to reduce the computational demands in terms of memory footprint, inference speed, and energy efficiency. We also briefly discuss different concepts of embedded hardware for DNNs and their compatibility with machine learning techniques as well as potential for energy and latency reduction. We substantiate our discussion with experiments on well-known benchmark data sets using compression techniques (quantization, pruning) for a set of resource-constrained embedded systems, such as CPUs, GPUs and FPGAs. The obtained results highlight the difficulty of finding good trade-offs between resource efficiency and prediction quality.

4/9/2024

Revisiting DNN Training for Intermittently Powered Energy Harvesting Micro Computers

Cyan Subhra Mishra, Deeksha Chaudhary, Jack Sampson, Mahmut Taylan Knademir, Chita Das

The deployment of Deep Neural Networks in energy-constrained environments, such as Energy Harvesting Wireless Sensor Networks, presents unique challenges, primarily due to the intermittent nature of power availability. To address these challenges, this study introduces and evaluates a novel training methodology tailored for DNNs operating within such contexts. In particular, we propose a dynamic dropout technique that adapts to both the architecture of the device and the variability in energy availability inherent in energy harvesting scenarios. Our proposed approach leverages a device model that incorporates specific parameters of the network architecture and the energy harvesting profile to optimize dropout rates dynamically during the training phase. By modulating the network's training process based on predicted energy availability, our method not only conserves energy but also ensures sustained learning and inference capabilities under power constraints. Our preliminary results demonstrate that this strategy provides 6 to 22 percent accuracy improvements compared to the state of the art with less than 5 percent additional compute. This paper details the development of the device model, describes the integration of energy profiles with intermittency aware dropout and quantization algorithms, and presents a comprehensive evaluation of the proposed approach using real-world energy harvesting data.

8/27/2024

🏋️

On-Device Training Under 256KB Memory

Ji Lin, Ligeng Zhu, Wei-Ming Chen, Wei-Chen Wang, Chuang Gan, Song Han

On-device training enables the model to adapt to new data collected from the sensors by fine-tuning a pre-trained model. Users can benefit from customized AI models without having to transfer the data to the cloud, protecting the privacy. However, the training memory consumption is prohibitive for IoT devices that have tiny memory resources. We propose an algorithm-system co-design framework to make on-device training possible with only 256KB of memory. On-device training faces two unique challenges: (1) the quantized graphs of neural networks are hard to optimize due to low bit-precision and the lack of normalization; (2) the limited hardware resource does not allow full back-propagation. To cope with the optimization difficulty, we propose Quantization-Aware Scaling to calibrate the gradient scales and stabilize 8-bit quantized training. To reduce the memory footprint, we propose Sparse Update to skip the gradient computation of less important layers and sub-tensors. The algorithm innovation is implemented by a lightweight training system, Tiny Training Engine, which prunes the backward computation graph to support sparse updates and offload the runtime auto-differentiation to compile time. Our framework is the first solution to enable tiny on-device training of convolutional neural networks under 256KB SRAM and 1MB Flash without auxiliary memory, using less than 1/1000 of the memory of PyTorch and TensorFlow while matching the accuracy on tinyML application VWW. Our study enables IoT devices not only to perform inference but also to continuously adapt to new data for on-device lifelong learning. A video demo can be found here: https://youtu.be/0pUFZYdoMY8.

4/4/2024