Annealing-inspired training of an optical neural network with ternary weights

Read original: arXiv:2409.01042 - Published 9/4/2024 by Anas Skalli, Mirko Goldmann, Nasibeh Haghighi, Stephan Reitzenstein, James A. Lott, Daniel Brunner

Annealing-inspired training of an optical neural network with ternary weights

Overview

Presents a training method for an optical neural network (ONN) with ternary weights
Leverages an annealing-inspired optimization technique to achieve high accuracy with low-precision weights
Demonstrates state-of-the-art performance on image classification tasks while using significantly fewer parameters than comparable networks

Plain English Explanation

The paper introduces a new training method for an optical neural network (ONN) that uses ternary weights. Ternary weights means the network's parameters can only take on three possible values: -1, 0, or 1. This is a form of low-precision or quantized neural network, which can lead to significant improvements in efficiency and speed compared to networks with full-precision weights.

The key innovation in this work is the training method, which draws inspiration from the physics concept of annealing. Annealing is a process of slowly cooling a material to achieve a low-energy, stable state. The researchers apply a similar approach to training the ONN, gradually reducing the "temperature" of the optimization process to converge on a solution with ternary weights that still achieves high accuracy.

This allows the ONN to match or exceed the performance of conventional neural networks on image classification tasks, while using far fewer parameters (i.e., a more compact model). The compact model size and low-precision weights make this ONN well-suited for energy-efficient inference on edge devices or in neuromorphic hardware.

Technical Explanation

The paper presents an annealing-inspired training method for an optical neural network (ONN) with ternary weights. The ONN architecture consists of a series of diffractive layers that perform the linear operations typically done by the weights in a conventional neural network.

The training process starts with full-precision weights and gradually constrains them to the ternary set {-1, 0, 1} as the optimization progresses. This is achieved by adding a penalty term to the loss function that encourages the weights to move towards the nearest ternary value. The "temperature" of this penalty term is slowly decreased over the course of training, similar to the cooling process in physical annealing.

The researchers demonstrate the effectiveness of this approach on standard image classification benchmarks, where the ternary-weighted ONN matches or exceeds the accuracy of full-precision networks while using significantly fewer parameters. They also provide analysis on the robustness of the trained ONNs to perturbations and faults, which is an important consideration for practical deployment.

Critical Analysis

The paper presents a novel and promising approach to training low-precision optical neural networks. The use of an annealing-inspired optimization technique is a clever way to achieve high accuracy with ternary weights, which can lead to more efficient and compact models.

However, the paper does not fully address some important practical concerns. For example, the proposed ONN architecture relies on precise control and alignment of the optical components, which may be challenging to achieve in real-world deployment scenarios. Additionally, the robustness analysis is limited to simple perturbations, and more comprehensive testing would be needed to assess the reliability of these systems in dynamic, real-world environments.

Further research is also needed to explore the scalability of this approach to larger, more complex neural network architectures. While the results on the tested image classification tasks are impressive, it remains to be seen how well the annealing-inspired training will generalize to a wider range of applications and problem domains.

Conclusion

The paper introduces a novel training method for optical neural networks with ternary weights, leveraging an annealing-inspired optimization technique. The resulting models demonstrate state-of-the-art performance on image classification tasks while using significantly fewer parameters than comparable networks.

This work represents an important step towards the development of energy-efficient, compact neural network architectures that can be implemented in photonic hardware. The annealing-inspired training approach could potentially be applied to other low-precision or quantized neural network models, contributing to the broader goal of deploying high-performance AI systems on resource-constrained edge devices.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Annealing-inspired training of an optical neural network with ternary weights

Anas Skalli, Mirko Goldmann, Nasibeh Haghighi, Stephan Reitzenstein, James A. Lott, Daniel Brunner

Artificial neural networks (ANNs) represent a fundamentally connectionnist and distributed approach to computing, and as such they differ from classical computers that utilize the von Neumann architecture. This has revived research interest in new unconventional hardware to enable more efficient implementations of ANNs rather than emulating them on traditional machines. In order to fully leverage the capabilities of this new generation of ANNs, optimization algorithms that take into account hardware limitations and imperfections are necessary. Photonics represents a particularly promising platform, offering scalability, high speed, energy efficiency, and the capability for parallel information processing. Yet, fully fledged implementations of autonomous optical neural networks (ONNs) with in-situ learning remain scarce. In this work, we propose a ternary weight architecture high-dimensional semiconductor laser-based ONN. We introduce a simple method for achieving ternary weights with Boolean hardware, significantly increasing the ONN's information processing capabilities. Furthermore, we design a novel in-situ optimization algorithm that is compatible with, both, Boolean and ternary weights, and provide a detailed hyperparameter study of said algorithm for two different tasks. Our novel algorithm results in benefits, both in terms of convergence speed and performance. Finally, we experimentally characterize the long-term inference stability of our ONN and find that it is extremely stable with a consistency above 99% over a period of more than 10 hours, addressing one of the main concerns in the field. Our work is of particular relevance in the context of in-situ learning under restricted hardware resources, especially since minimizing the power consumption of auxiliary hardware is crucial to preserving efficiency gains achieved by non-von Neumann ANN implementations.

9/4/2024

Measurement-driven neural-network training for integrated magnetic tunnel junction arrays

William A. Borders, Advait Madhavan, Matthew W. Daniels, Vasileia Georgiou, Martin Lueker-Boden, Tiffany S. Santos, Patrick M. Braganca, Mark D. Stiles, Jabez J. McClelland, Brian D. Hoskins

The increasing scale of neural networks needed to support more complex applications has led to an increasing requirement for area- and energy-efficient hardware. One route to meeting the budget for these applications is to circumvent the von Neumann bottleneck by performing computation in or near memory. An inevitability of transferring neural networks onto hardware is that non-idealities such as device-to-device variations or poor device yield impact performance. Methods such as hardware-aware training, where substrate non-idealities are incorporated during network training, are one way to recover performance at the cost of solution generality. In this work, we demonstrate inference on hardware neural networks consisting of 20,000 magnetic tunnel junction arrays integrated on a complementary metal-oxide-semiconductor chips that closely resembles market-ready spin transfer-torque magnetoresistive random access memory technology. Using 36 dies, each containing a crossbar array with its own non-idealities, we show that even a small number of defects in physically mapped networks significantly degrades the performance of networks trained without defects and show that, at the cost of generality, hardware-aware training accounting for specific defects on each die can recover to comparable performance with ideal networks. We then demonstrate a robust training method that extends hardware-aware training to statistics-aware training, producing network weights that perform well on most defective dies regardless of their specific defect locations. When evaluated on the 36 physical dies, statistics-aware trained solutions can achieve a mean misclassification error on the MNIST dataset that differs from the software-baseline by only 2 %. This statistics-aware training method could be generalized to networks with many layers that are mapped to hardware suited for industry-ready applications.

5/15/2024

Ternary Spike-based Neuromorphic Signal Processing System

Shuai Wang, Dehao Zhang, Ammar Belatreche, Yichen Xiao, Hongyu Qing, Wenjie We, Malu Zhang, Yang Yang

Deep Neural Networks (DNNs) have been successfully implemented across various signal processing fields, resulting in significant enhancements in performance. However, DNNs generally require substantial computational resources, leading to significant economic costs and posing challenges for their deployment on resource-constrained edge devices. In this study, we take advantage of spiking neural networks (SNNs) and quantization technologies to develop an energy-efficient and lightweight neuromorphic signal processing system. Our system is characterized by two principal innovations: a threshold-adaptive encoding (TAE) method and a quantized ternary SNN (QT-SNN). The TAE method can efficiently encode time-varying analog signals into sparse ternary spike trains, thereby reducing energy and memory demands for signal processing. QT-SNN, compatible with ternary spike trains from the TAE method, quantifies both membrane potentials and synaptic weights to reduce memory requirements while maintaining performance. Extensive experiments are conducted on two typical signal-processing tasks: speech and electroencephalogram recognition. The results demonstrate that our neuromorphic signal processing system achieves state-of-the-art (SOTA) performance with a 94% reduced memory requirement. Furthermore, through theoretical energy consumption analysis, our system shows 7.5x energy saving compared to other SNN works. The efficiency and efficacy of the proposed system highlight its potential as a promising avenue for energy-efficient signal processing.

7/9/2024

xTern: Energy-Efficient Ternary Neural Network Inference on RISC-V-Based Edge Systems

Georg Rutishauser, Joan Mihali, Moritz Scherer, Luca Benini

Ternary neural networks (TNNs) offer a superior accuracy-energy trade-off compared to binary neural networks. However, until now, they have required specialized accelerators to realize their efficiency potential, which has hindered widespread adoption. To address this, we present xTern, a lightweight extension of the RISC-V instruction set architecture (ISA) targeted at accelerating TNN inference on general-purpose cores. To complement the ISA extension, we developed a set of optimized kernels leveraging xTern, achieving 67% higher throughput than their 2-bit equivalents. Power consumption is only marginally increased by 5.2%, resulting in an energy efficiency improvement by 57.1%. We demonstrate that the proposed xTern extension, integrated into an octa-core compute cluster, incurs a minimal silicon area overhead of 0.9% with no impact on timing. In end-to-end benchmarks, we demonstrate that xTern enables the deployment of TNNs achieving up to 1.6 percentage points higher CIFAR-10 classification accuracy than 2-bit networks at equal inference latency. Our results show that xTern enables RISC-V-based ultra-low-power edge AI platforms to benefit from the efficiency potential of TNNs.

5/30/2024