Micro-power spoken keyword spotting on Xylo Audio 2

Read original: arXiv:2406.15112 - Published 6/24/2024 by Hannah Bos, Dylan R. Muir
Total Score

0

Micro-power spoken keyword spotting on Xylo Audio 2

Sign in to get full access

or

If you already have an account, we'll log you in

Overview

  • Presents a micro-power spoken keyword spotting system on the Xylo Audio 2 platform
  • Focuses on efficient hardware implementation for low-power applications
  • Explores spiking neural network architectures and techniques for rapid keyword detection

Plain English Explanation

This research paper describes a new system for quickly and efficiently detecting spoken keywords using very low amounts of power. The key innovation is the use of spiking neural networks - a type of artificial intelligence inspired by how the human brain processes information.

By leveraging early decision spiking neural networks and sparse binarization, the system is able to rapidly detect keywords with very low power consumption. This makes it well-suited for deployment on low-power devices like smartphones or internet-of-things sensors.

The researchers also explore global-local convolution in spiking neural networks and a neuromorphic cochlea implementation to further improve the efficiency of the system. Overall, the goal is to enable always-on keyword spotting capabilities on resource-constrained hardware.

Technical Explanation

The paper presents a micro-power spoken keyword spotting system for the Xylo Audio 2 platform. The key innovations include:

The researchers trained and evaluated their system on the Xylo Audio 2 hardware platform, demonstrating significant improvements in power efficiency and response time compared to traditional approaches.

Critical Analysis

The paper provides a comprehensive evaluation of the proposed micro-power spoken keyword spotting system, including comparisons to state-of-the-art baselines. However, the authors acknowledge several limitations and areas for further research:

  • The system was only evaluated on a limited set of keywords, and its performance on a larger, more diverse keyword set remains to be explored.
  • The integration of the neuromorphic cochlea implementation may introduce additional complexity and potential failure modes that were not fully addressed in the paper.
  • The power consumption and latency of the system could be further optimized through additional architectural and algorithmic innovations.

Additionally, while the use of spiking neural networks and sparse binarization techniques is promising, there may be concerns about their robustness and generalization capabilities compared to more traditional deep learning approaches. Further research is needed to fully understand the tradeoffs and ensure the reliability of the system in real-world applications.

Conclusion

This research paper presents a novel micro-power spoken keyword spotting system that leverages spiking neural networks and other efficient techniques to enable always-on keyword detection on resource-constrained hardware. The key innovations, including the use of early decision spiking neural networks and sparse binarization, demonstrate significant improvements in power efficiency and response time.

The integration of a neuromorphic cochlea implementation and exploration of global-local convolution further enhance the system's capabilities. While the paper highlights several promising results, additional research is needed to address the identified limitations and further optimize the system for real-world deployment.

Overall, this work represents an important step towards enabling low-power, always-on keyword spotting on a wide range of IoT and mobile devices, with potential applications in voice interfaces, smart home systems, and beyond.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Micro-power spoken keyword spotting on Xylo Audio 2
Total Score

0

Micro-power spoken keyword spotting on Xylo Audio 2

Hannah Bos, Dylan R. Muir

For many years, designs for Neuromorphic or brain-like processors have been motivated by achieving extreme energy efficiency, compared with von-Neumann and tensor processor devices. As part of their design language, Neuromorphic processors take advantage of weight, parameter, state and activity sparsity. In the extreme case, neural networks based on these principles mimic the sparse activity oof biological nervous systems, in ``Spiking Neural Networks'' (SNNs). Few benchmarks are available for Neuromorphic processors, that have been implemented for a range of Neuromorphic and non-Neuromorphic platforms, which can therefore demonstrate the energy benefits of Neuromorphic processor designs. Here we describes the implementation of a spoken audio keyword-spotting (KWS) benchmark Aloha on the Xylo Audio 2 (SYNS61210) Neuromorphic processor device. We obtained high deployed quantized task accuracy, (95%), exceeding the benchmark task accuracy. We measured real continuous power of the deployed application on Xylo. We obtained best-in-class dynamic inference power ($291mu$W) and best-in-class inference efficiency ($6.6mu$J / Inf). Xylo sets a new minimum power for the Aloha KWS benchmark, and highlights the extreme energy efficiency achievable with Neuromorphic processor designs. Our results show that Neuromorphic designs are well-suited for real-time near- and in-sensor processing on edge devices.

Read more

6/24/2024

Neuromorphic Keyword Spotting with Pulse Density Modulation MEMS Microphones
Total Score

0

Neuromorphic Keyword Spotting with Pulse Density Modulation MEMS Microphones

Sidi Yaya Arnaud Yarga, Sean U. N. Wood

The Keyword Spotting (KWS) task involves continuous audio stream monitoring to detect predefined words, requiring low energy devices for continuous processing. Neuromorphic devices effectively address this energy challenge. However, the general neuromorphic KWS pipeline, from microphone to Spiking Neural Network (SNN), entails multiple processing stages. Leveraging the popularity of Pulse Density Modulation (PDM) microphones in modern devices and their similarity to spiking neurons, we propose a direct microphone-to-SNN connection. This approach eliminates intermediate stages, notably reducing computational costs. The system achieved an accuracy of 91.54% on the Google Speech Command (GSC) dataset, surpassing the state-of-the-art for the Spiking Speech Command (SSC) dataset which is a bio-inspired encoded GSC. Furthermore, the observed sparsity in network activity and connectivity indicates potential for remarkably low energy consumption in a neuromorphic device implementation.

Read more

8/12/2024

A 65nm 36nJ/Decision Bio-inspired Temporal-Sparsity-Aware Digital Keyword Spotting IC with 0.6V Near-Threshold SRAM
Total Score

0

A 65nm 36nJ/Decision Bio-inspired Temporal-Sparsity-Aware Digital Keyword Spotting IC with 0.6V Near-Threshold SRAM

Qinyu Chen, Kwantae Kim, Chang Gao, Sheng Zhou, Taekwang Jang, Tobi Delbruck, Shih-Chii Liu

This paper introduces, to the best of the authors' knowledge, the first fine-grained temporal sparsity-aware keyword spotting (KWS) IC leveraging temporal similarities between neighboring feature vectors extracted from input frames and network hidden states, eliminating unnecessary operations and memory accesses. This KWS IC, featuring a bio-inspired delta-gated recurrent neural network ({Delta}RNN) classifier, achieves an 11-class Google Speech Command Dataset (GSCD) KWS accuracy of 90.5% and energy consumption of 36nJ/decision. At 87% temporal sparsity, computing latency and energy per inference are reduced by 2.4$times$/3.4$times$, respectively. The 65nm design occupies 0.78mm$^2$ and features two additional blocks, a compact 0.084mm$^2$ digital infinite-impulse-response (IIR)-based band-pass filter (BPF) audio feature extractor (FEx) and a 24kB 0.6V near-Vth weight SRAM with 6.6$times$ lower read power compared to the standard SRAM.

Read more

5/8/2024

A compact neuromorphic system for ultra energy-efficient, on-device robot localization
Total Score

0

A compact neuromorphic system for ultra energy-efficient, on-device robot localization

Adam D. Hines, Michael Milford, Tobias Fischer

Neuromorphic computing offers a transformative pathway to overcome the computational and energy challenges faced in deploying robotic localization and navigation systems at the edge. Visual place recognition, a critical component for navigation, is often hampered by the high resource demands of conventional systems, making them unsuitable for small-scale robotic platforms which still require to perform complex, long-range tasks. Although neuromorphic approaches offer potential for greater efficiency, real-time edge deployment remains constrained by the complexity and limited scalability of bio-realistic networks. Here, we demonstrate a neuromorphic localization system that performs accurate place recognition in up to 8km of traversal using models as small as 180 KB with 44k parameters, while consuming less than 1% of the energy required by conventional methods. Our Locational Encoding with Neuromorphic Systems (LENS) integrates spiking neural networks, an event-based dynamic vision sensor, and a neuromorphic processor within a single SPECK(TM) chip, enabling real-time, energy-efficient localization on a hexapod robot. LENS represents the first fully neuromorphic localization system capable of large-scale, on-device deployment, setting a new benchmark for energy efficient robotic place recognition.

Read more

8/30/2024