Usability and Performance Analysis of Embedded Development Environment for On-device Learning

Read original: arXiv:2404.07948 - Published 4/12/2024 by Enzo Scaffi (DYNAMID), Antoine Bonneau (DYNAMID, EE), Fr'ed'eric Le Mouel (DYNAMID), Fabien Mieyeville (INL, EE)

🚀

Overview

This research examines embedded development tools for implementing TinyML on resource-constrained IoT devices.
It evaluates various tools with different abstraction levels, from basic hardware manipulation to deploying minimalistic ML training.
The analysis includes metrics like memory usage, energy consumption, performance during training and inference, and overall usability.

Plain English Explanation

The research paper explores the different software tools and frameworks that can be used to develop TinyML applications on low-power, memory-constrained IoT devices. TinyML refers to running machine learning models on small, embedded systems with very limited resources.

The researchers tested several development tools, ranging from basic hardware control libraries like Arduino to more advanced operating systems like RIOT OS. They measured the tools' performance in terms of memory usage, energy consumption, and the speed of model training and inference. They also looked at how easy each tool was to use.

The key findings are:

The Arduino framework is straightforward to use, but it tends to be less energy-efficient than working directly with the device's hardware.
RIOT OS is more memory-intensive but provides better energy efficiency, while still being relatively easy to use.
None of the tools fully integrated features like dynamic voltage and frequency scaling (DVFS), which allows fine-grained control over the device's hardware for optimal performance and energy usage.

Overall, the research highlights the trade-offs between ease of use, resource efficiency, and low-level hardware control when developing TinyML applications on resource-constrained devices.

Technical Explanation

The researchers evaluated several embedded development tools for their suitability in implementing TinyML on IoT devices with limited memory and processing power. They tested tools at different abstraction levels, from the Arduino framework for basic hardware manipulation to the RIOT OS operating system, which provides a higher-level programming interface.

The analysis focused on key metrics like memory usage, energy consumption, and performance during both model training and inference. The researchers also assessed the overall usability of the different solutions.

Their results show that the Arduino framework is relatively easy to use but less energy-efficient than working directly with the device's hardware. In contrast, RIOT OS exhibits better energy efficiency despite higher memory utilization, while maintaining a similar level of usability.

However, the researchers noted that none of the tools they tested fully integrated critical functionalities like dynamic voltage and frequency scaling (DVFS), which would allow for fine-grained control of the device's hardware to optimize performance and energy usage.

Critical Analysis

The research provides a valuable assessment of the trade-offs involved in selecting embedded development tools for TinyML applications. However, it is important to note that the study was limited to a specific set of tools and devices, and the results may not generalize to all possible hardware and software configurations.

Additionally, the paper does not delve deeply into the underlying reasons for the observed differences in memory usage, energy consumption, and performance. Further research could explore the architectural and implementation details that contribute to these variations, which would provide more insight for developers choosing the right tools for their projects.

The lack of integrated DVFS functionality in the tested tools is an interesting limitation highlighted by the research. Exploring solutions that seamlessly combine high-level programming interfaces with low-level hardware control could be a fruitful direction for future work in this area.

Overall, this research offers a solid foundation for understanding the capabilities and trade-offs of different embedded development tools for TinyML implementation, but there is still room for further investigation and innovation in this rapidly evolving field.

Conclusion

This research paper provides a comprehensive evaluation of embedded development tools for implementing TinyML on resource-constrained IoT devices. The key findings highlight the trade-offs between ease of use, resource efficiency, and low-level hardware control when selecting the right tools for a given project.

The insights from this study can help developers make more informed choices when building TinyML applications, balancing factors like memory usage, energy consumption, and performance to meet the unique requirements of their IoT devices. Continued research and innovation in this area will be crucial as the demand for embedded machine learning solutions continues to grow.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🚀

Usability and Performance Analysis of Embedded Development Environment for On-device Learning

Enzo Scaffi (DYNAMID), Antoine Bonneau (DYNAMID, EE), Fr'ed'eric Le Mouel (DYNAMID), Fabien Mieyeville (INL, EE)

This research empirically examines embedded development tools viable for on-device TinyML implementation. The research evaluates various development tools with various abstraction levels on resource-constrained IoT devices, from basic hardware manipulation to deployment of minimalistic ML training. The analysis encompasses memory usage, energy consumption, and performance metrics during model training and inference and usability of the different solutions. Arduino Framework offers ease of implementation but with increased energy consumption compared to the native option, while RIOT OS exhibits efficient energy consumption despite higher memory utilization with equivalent ease of use. The absence of certain critical functionalities like DVFS directly integrated into the OS highlights limitations for fine hardware control.

4/12/2024

🤷

On-device Online Learning and Semantic Management of TinyML Systems

Haoyu Ren, Xue Li, Darko Anicic, Thomas A. Runkler

Recent advances in Tiny Machine Learning (TinyML) empower low-footprint embedded devices for real-time on-device Machine Learning. While many acknowledge the potential benefits of TinyML, its practical implementation presents unique challenges. This study aims to bridge the gap between prototyping single TinyML models and developing reliable TinyML systems in production: (1) Embedded devices operate in dynamically changing conditions. Existing TinyML solutions primarily focus on inference, with models trained offline on powerful machines and deployed as static objects. However, static models may underperform in the real world due to evolving input data distributions. We propose online learning to enable training on constrained devices, adapting local models towards the latest field conditions. (2) Nevertheless, current on-device learning methods struggle with heterogeneous deployment conditions and the scarcity of labeled data when applied across numerous devices. We introduce federated meta-learning incorporating online learning to enhance model generalization, facilitating rapid learning. This approach ensures optimal performance among distributed devices by knowledge sharing. (3) Moreover, TinyML's pivotal advantage is widespread adoption. Embedded devices and TinyML models prioritize extreme efficiency, leading to diverse characteristics ranging from memory and sensors to model architectures. Given their diversity and non-standardized representations, managing these resources becomes challenging as TinyML systems scale up. We present semantic management for the joint management of models and devices at scale. We demonstrate our methods through a basic regression example and then assess them in three real-world TinyML applications: handwritten character image classification, keyword audio classification, and smart building presence detection, confirming our approaches' effectiveness.

5/17/2024

🏋️

On-Device Training Under 256KB Memory

Ji Lin, Ligeng Zhu, Wei-Ming Chen, Wei-Chen Wang, Chuang Gan, Song Han

On-device training enables the model to adapt to new data collected from the sensors by fine-tuning a pre-trained model. Users can benefit from customized AI models without having to transfer the data to the cloud, protecting the privacy. However, the training memory consumption is prohibitive for IoT devices that have tiny memory resources. We propose an algorithm-system co-design framework to make on-device training possible with only 256KB of memory. On-device training faces two unique challenges: (1) the quantized graphs of neural networks are hard to optimize due to low bit-precision and the lack of normalization; (2) the limited hardware resource does not allow full back-propagation. To cope with the optimization difficulty, we propose Quantization-Aware Scaling to calibrate the gradient scales and stabilize 8-bit quantized training. To reduce the memory footprint, we propose Sparse Update to skip the gradient computation of less important layers and sub-tensors. The algorithm innovation is implemented by a lightweight training system, Tiny Training Engine, which prunes the backward computation graph to support sparse updates and offload the runtime auto-differentiation to compile time. Our framework is the first solution to enable tiny on-device training of convolutional neural networks under 256KB SRAM and 1MB Flash without auxiliary memory, using less than 1/1000 of the memory of PyTorch and TensorFlow while matching the accuracy on tinyML application VWW. Our study enables IoT devices not only to perform inference but also to continuously adapt to new data for on-device lifelong learning. A video demo can be found here: https://youtu.be/0pUFZYdoMY8.

4/4/2024

🧠

Resource-Efficient Neural Networks for Embedded Systems

Wolfgang Roth, Gunther Schindler, Bernhard Klein, Robert Peharz, Sebastian Tschiatschek, Holger Froning, Franz Pernkopf, Zoubin Ghahramani

While machine learning is traditionally a resource intensive task, embedded systems, autonomous navigation, and the vision of the Internet of Things fuel the interest in resource-efficient approaches. These approaches aim for a carefully chosen trade-off between performance and resource consumption in terms of computation and energy. The development of such approaches is among the major challenges in current machine learning research and key to ensure a smooth transition of machine learning technology from a scientific environment with virtually unlimited computing resources into everyday's applications. In this article, we provide an overview of the current state of the art of machine learning techniques facilitating these real-world requirements. In particular, we focus on resource-efficient inference based on deep neural networks (DNNs), the predominant machine learning models of the past decade. We give a comprehensive overview of the vast literature that can be mainly split into three non-mutually exclusive categories: (i) quantized neural networks, (ii) network pruning, and (iii) structural efficiency. These techniques can be applied during training or as post-processing, and they are widely used to reduce the computational demands in terms of memory footprint, inference speed, and energy efficiency. We also briefly discuss different concepts of embedded hardware for DNNs and their compatibility with machine learning techniques as well as potential for energy and latency reduction. We substantiate our discussion with experiments on well-known benchmark data sets using compression techniques (quantization, pruning) for a set of resource-constrained embedded systems, such as CPUs, GPUs and FPGAs. The obtained results highlight the difficulty of finding good trade-offs between resource efficiency and prediction quality.

4/9/2024