Asymmetrical estimator for training grey-box deep photonic neural networks

Read original: arXiv:2405.18458 - Published 8/16/2024 by Yizhi Wang, Minjia Chen, Chunhui Yao, Jie Ma, Ting Yan, Richard Penty, Qixiang Cheng

🏋️

Overview

Physical neural networks (PNNs) are a new approach to accelerating neural networks using high-bandwidth, analog processing.
While PNNs are effective for inference, training them is challenging because the physical transformations in the network are not well understood.
The paper presents a method called Asymmetrical Training (AT) that can train deep neural networks implemented on PNNs without full knowledge of the physical control-transformation mapping.

Plain English Explanation

Physical neural networks (PNNs) are a new way to speed up the processing of neural networks. They do this by using special hardware that can perform the computations in an analog, high-bandwidth way, rather than the digital, low-bandwidth approach used in traditional neural networks.

The key advantage of PNNs is that they can process information much faster than traditional neural networks. However, training these PNNs is really difficult because the physical transformations happening inside the network are not well understood. The Asymmetrical Training (AT) method, described in this paper, is a way to train deep neural networks implemented on PNNs without needing to know all the details about how the physical transformations work.

The researchers demonstrated that the AT method can improve the accuracy of PNNs on tasks like classifying iris flowers and handwritten digits, even when the PNN hardware is not perfectly calibrated. They also showed that AT consistently outperforms the standard training approach, called backpropagation, across different datasets like MNIST, fashion-MNIST, and Kuzushiji-MNIST. The key advantage of AT is that it can train PNNs effectively without needing a lot of extra hardware or computation, making it a practical way to take advantage of the speed benefits of physical neural networks.

Technical Explanation

The paper presents the Asymmetrical Training (AT) method, which allows for the training of deep neural networks implemented on physical neural networks (PNNs) without requiring full knowledge of the underlying physical control-transformation mapping.

PNNs are an emerging paradigm for accelerating neural network inference due to their ability to perform high-bandwidth, analog processing. However, training PNNs is challenging because the imperfect information about the physical transformations means that conventional gradient-based updates from backpropagation (BP) fail.

The AT method treats the PNN structure as a "grey box", only requiring knowledge of the last layer output and neuron topological connectivity of the deep neural network, not the details of the physical control-transformation mapping. The researchers experimentally demonstrated AT on deep grey-box PNNs implemented using uncalibrated photonic integrated circuits (PICs). They showed that AT can improve the classification accuracy of tasks like Iris flower and modified MNIST handwritten digits from random guessing to near the theoretical maximum.

The paper also showcased the consistently enhanced performance of AT over BP for different datasets, including MNIST, fashion-MNIST, and Kuzushiji-MNIST. The key advantages of the AT method are that it can train PNNs effectively with minimal hardware overhead and reduced computational requirements, making it a promising lightweight training alternative to fully explore the benefits of physical computation.

Critical Analysis

The paper presents a compelling approach to training physical neural networks (PNNs) using the Asymmetrical Training (AT) method. The main strength of AT is that it can train deep neural networks on PNN hardware without requiring full knowledge of the underlying physical control-transformation mapping. This is a significant advantage, as understanding the complex physical processes in PNNs is a major challenge.

However, the paper does not extensively discuss the limitations or potential issues with the AT method. For example, it is unclear how the performance of AT compares to other training approaches specifically designed for PNNs, such as those that attempt to model the physical transformations more explicitly. Additionally, the paper does not explore the scalability of AT to larger, more complex PNN architectures or its robustness to different types of hardware imperfections.

Further research would be needed to fully understand the strengths and weaknesses of the AT method and its broader applicability to the field of physical neural networks. Nonetheless, the paper presents a promising step forward in overcoming the training challenges of PNNs and unlocking their potential for high-performance, energy-efficient computing.

Conclusion

The paper introduces the Asymmetrical Training (AT) method, which enables the training of deep neural networks implemented on physical neural networks (PNNs) without requiring detailed knowledge of the underlying physical control-transformation mapping. The researchers demonstrated the effectiveness of AT on tasks like iris flower and handwritten digit classification, showing that it can improve performance compared to traditional backpropagation training.

The key advantage of the AT method is that it can train PNNs with minimal hardware and computational overhead, making it a practical approach to leveraging the speed and efficiency benefits of physical neural network hardware. While the paper does not extensively explore the limitations of AT, it represents an important step forward in addressing the training challenges of PNNs and paves the way for further research into this promising area of neural network acceleration.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🏋️

Asymmetrical estimator for training grey-box deep photonic neural networks

Yizhi Wang, Minjia Chen, Chunhui Yao, Jie Ma, Ting Yan, Richard Penty, Qixiang Cheng

Scalable isomorphic physical neural networks (PNNs) are emerging NN acceleration paradigms for their high-bandwidth, in-propagation computation. Despite backpropagation (BP)-based training is often the industry standard for its robustness and fast gradient convergences, existing BP-PNN training methods need to truncate the propagation of analogue signal at each layer and acquire accurate hidden neuron readouts for deep networks. This compromises the incentive of PNN for fast in-propagation processing. In addition, the required readouts introduce massive bottlenecks due to the conversions between the analogue-digital interfaces to shuttle information across. These factors limit both the time and energy efficiency during training. Here we introduce the asymmetrical training (AT) method, a BP-based method that can perform training on an encapsulated deep network, where the information propagation is maintained within the analogue domain until the output layer. AT's minimum information access bypass analogue-digital interface bottleneck wherever possible. For any deep network structure, AT offers significantly improved time and energy efficiency compared to existing BP-PNN methods, and scales well for large network sizes. We demonstrated AT's error-tolerant and calibration-free training for encapsulated integrated photonic deep networks to achieve near ideal BP performances. AT's well-behaved training is demonstrated repeatably across different datasets and network structures

8/16/2024

Training of Physical Neural Networks

Ali Momeni, Babak Rahmani, Benjamin Scellier, Logan G. Wright, Peter L. McMahon, Clara C. Wanjura, Yuhang Li, Anas Skalli, Natalia G. Berloff, Tatsuhiro Onodera, Ilker Oguz, Francesco Morichetti, Philipp del Hougne, Manuel Le Gallo, Abu Sebastian, Azalia Mirhoseini, Cheng Zhang, Danijela Markovi'c, Daniel Brunner, Christophe Moser, Sylvain Gigan, Florian Marquardt, Aydogan Ozcan, Julie Grollier, Andrea J. Liu, Demetri Psaltis, Andrea Al`u, Romain Fleury

Physical neural networks (PNNs) are a class of neural-like networks that leverage the properties of physical systems to perform computation. While PNNs are so far a niche research area with small-scale laboratory demonstrations, they are arguably one of the most underappreciated important opportunities in modern AI. Could we train AI models 1000x larger than current ones? Could we do this and also have them perform inference locally and privately on edge devices, such as smartphones or sensors? Research over the past few years has shown that the answer to all these questions is likely yes, with enough research: PNNs could one day radically change what is possible and practical for AI systems. To do this will however require rethinking both how AI models work, and how they are trained - primarily by considering the problems through the constraints of the underlying hardware physics. To train PNNs at large scale, many methods including backpropagation-based and backpropagation-free approaches are now being explored. These methods have various trade-offs, and so far no method has been shown to scale to the same scale and performance as the backpropagation algorithm widely used in deep learning today. However, this is rapidly changing, and a diverse ecosystem of training techniques provides clues for how PNNs may one day be utilized to create both more efficient realizations of current-scale AI models, and to enable unprecedented-scale models.

6/6/2024

Optical training of large-scale Transformers and deep neural networks with direct feedback alignment

Ziao Wang, Kilian Muller, Matthew Filipovich, Julien Launay, Ruben Ohana, Gustave Pariente, Safa Mokaadi, Charles Brossollet, Fabien Moreau, Alessandro Cappelli, Iacopo Poli, Igor Carron, Laurent Daudet, Florent Krzakala, Sylvain Gigan

Modern machine learning relies nearly exclusively on dedicated electronic hardware accelerators. Photonic approaches, with low consumption and high operation speed, are increasingly considered for inference but, to date, remain mostly limited to relatively basic tasks. Simultaneously, the problem of training deep and complex neural networks, overwhelmingly performed through backpropagation, remains a significant limitation to the size and, consequently, the performance of current architectures and a major compute and energy bottleneck. Here, we experimentally implement a versatile and scalable training algorithm, called direct feedback alignment, on a hybrid electronic-photonic platform. An optical processing unit performs large-scale random matrix multiplications, which is the central operation of this algorithm, at speeds up to 1500 TeraOps. We perform optical training of one of the most recent deep learning architectures, including Transformers, with more than 1B parameters, and obtain good performances on both language and vision tasks. We study the compute scaling of our hybrid optical approach, and demonstrate a potential advantage for ultra-deep and wide neural networks, thus opening a promising route to sustain the exponential growth of modern artificial intelligence beyond traditional von Neumann approaches.

9/23/2024

Robust Training of Neural Networks at Arbitrary Precision and Sparsity

Chengxi Ye, Grace Chu, Yanfeng Liu, Yichi Zhang, Lukasz Lew, Andrew Howard

The discontinuous operations inherent in quantization and sparsification introduce obstacles to backpropagation. This is particularly challenging when training deep neural networks in ultra-low precision and sparse regimes. We propose a novel, robust, and universal solution: a denoising affine transform that stabilizes training under these challenging conditions. By formulating quantization and sparsification as perturbations during training, we derive a perturbation-resilient approach based on ridge regression. Our solution employs a piecewise constant backbone model to ensure a performance lower bound and features an inherent noise reduction mechanism to mitigate perturbation-induced corruption. This formulation allows existing models to be trained at arbitrarily low precision and sparsity levels with off-the-shelf recipes. Furthermore, our method provides a novel perspective on training temporal binary neural networks, contributing to ongoing efforts to narrow the gap between artificial and biological neural networks.

9/17/2024