Progressive Feedforward Collapse of ResNet Training

2405.00985

Published 5/3/2024 by Sicong Wang, Kuo Gai, Shihua Zhang

Progressive Feedforward Collapse of ResNet Training

Abstract

Neural collapse (NC) is a simple and symmetric phenomenon for deep neural networks (DNNs) at the terminal phase of training, where the last-layer features collapse to their class means and form a simplex equiangular tight frame aligning with the classifier vectors. However, the relationship of the last-layer features to the data and intermediate layers during training remains unexplored. To this end, we characterize the geometry of intermediate layers of ResNet and propose a novel conjecture, progressive feedforward collapse (PFC), claiming the degree of collapse increases during the forward propagation of DNNs. We derive a transparent model for the well-trained ResNet according to that ResNet with weight decay approximates the geodesic curve in Wasserstein space at the terminal phase. The metrics of PFC indeed monotonically decrease across depth on various datasets. We propose a new surrogate model, multilayer unconstrained feature model (MUFM), connecting intermediate layers by an optimal transport regularizer. The optimal solution of MUFM is inconsistent with NC but is more concentrated relative to the input data. Overall, this study extends NC to PFC to model the collapse phenomenon of intermediate layers and its dependence on the input data, shedding light on the theoretical understanding of ResNet in classification problems.

Create account to get full access

Overview

This paper investigates the progressive feedforward collapse that occurs during the training of ResNet models, a popular deep learning architecture.
The authors analyze this phenomenon using tools from optimal transport theory and the geometry of Wasserstein space.
Their findings shed light on the mechanisms underlying neural collapse, a process observed in deep learning models.

Plain English Explanation

Deep learning models, like the widely used ResNet architecture, often exhibit a curious behavior during training called "progressive feedforward collapse." This means that as the model learns, the internal representations in the hidden layers gradually become more and more similar to each other, collapsing into a lower-dimensional space.

The authors of this paper use advanced mathematical tools, specifically from the field of optimal transport theory, to study this phenomenon in depth. They show how the model's representations evolve over the course of training, tracing out a path in a special geometric space called Wasserstein space.

This analysis provides insights into the underlying mechanisms of neural collapse, a related process where the model's outputs converge towards a small number of distinct categories. By understanding the progressive collapse happening inside the model, the researchers hope to shed light on how deep learning models are able to generalize so effectively, even when trained on complex, high-dimensional data.

Technical Explanation

The authors leverage tools from optimal transport theory and the geometry of Wasserstein space to study the progressive feedforward collapse observed during ResNet training. They show that the internal representations in the hidden layers of the model trace out a geodesic curve in Wasserstein space as training progresses.

This geometric perspective reveals that the representations gradually collapse onto a lower-dimensional subspace, a phenomenon the authors refer to as "progressive feedforward collapse." They demonstrate that this collapse is closely linked to the neural collapse observed in the model's outputs, providing a unified view of these two related processes.

The authors also draw connections between their findings and other recent work on efficient and flexible methods for reducing the memory footprint of deep learning models and breaking the memory wall in federated learning.

Critical Analysis

The authors provide a compelling analysis of the progressive feedforward collapse phenomenon in ResNet models, using the powerful tools of optimal transport theory and Wasserstein geometry. However, the paper does not address several important questions:

What are the implications of this collapse for the model's generalization performance? Does it help or hinder the model's ability to learn and transfer knowledge to new tasks?
Are there ways to control or manipulate the collapse process, for example, to prevent premature collapse or to encourage a more desirable trajectory in Wasserstein space?
How do these insights apply to other deep learning architectures beyond ResNet? Is the progressive feedforward collapse a universal phenomenon, or is it specific to certain model families?

Addressing these and other related questions could further strengthen the impact of this research and deepen our understanding of the inner workings of deep neural networks.

Conclusion

This paper offers a novel geometric perspective on the progressive feedforward collapse observed during the training of ResNet models. By framing the problem in terms of optimal transport and Wasserstein space, the authors provide valuable insights into the mechanisms underlying neural collapse and the remarkable generalization capabilities of deep learning.

While the paper raises interesting questions for future research, it represents an important step forward in our understanding of how deep neural networks learn and organize their internal representations. These findings may have implications for the design of more efficient and flexible deep learning models, as well as for our broader understanding of the principles governing neural information processing.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🧠

Neural Collapse versus Low-rank Bias: Is Deep Neural Collapse Really Optimal?

Peter S'uken'ik, Marco Mondelli, Christoph Lampert

Deep neural networks (DNNs) exhibit a surprising structure in their final layer known as neural collapse (NC), and a growing body of works has currently investigated the propagation of neural collapse to earlier layers of DNNs -- a phenomenon called deep neural collapse (DNC). However, existing theoretical results are restricted to special cases: linear models, only two layers or binary classification. In contrast, we focus on non-linear models of arbitrary depth in multi-class classification and reveal a surprising qualitative shift. As soon as we go beyond two layers or two classes, DNC stops being optimal for the deep unconstrained features model (DUFM) -- the standard theoretical framework for the analysis of collapse. The main culprit is a low-rank bias of multi-layer regularization schemes: this bias leads to optimal solutions of even lower rank than the neural collapse. We support our theoretical findings with experiments on both DUFM and real data, which show the emergence of the low-rank structure in the solution found by gradient descent.

5/24/2024

cs.LG stat.ML

Neural Collapse for Cross-entropy Class-Imbalanced Learning with Unconstrained ReLU Feature Model

Hien Dang, Tho Tran, Tan Nguyen, Nhat Ho

The current paradigm of training deep neural networks for classification tasks includes minimizing the empirical risk that pushes the training loss value towards zero, even after the training error has been vanished. In this terminal phase of training, it has been observed that the last-layer features collapse to their class-means and these class-means converge to the vertices of a simplex Equiangular Tight Frame (ETF). This phenomenon is termed as Neural Collapse (NC). To theoretically understand this phenomenon, recent works employ a simplified unconstrained feature model to prove that NC emerges at the global solutions of the training problem. However, when the training dataset is class-imbalanced, some NC properties will no longer be true. For example, the class-means geometry will skew away from the simplex ETF when the loss converges. In this paper, we generalize NC to imbalanced regime for cross-entropy loss under the unconstrained ReLU feature model. We prove that, while the within-class features collapse property still holds in this setting, the class-means will converge to a structure consisting of orthogonal vectors with different lengths. Furthermore, we find that the classifier weights are aligned to the scaled and centered class-means with scaling factors depend on the number of training samples of each class, which generalizes NC in the class-balanced setting. We empirically prove our results through experiments on practical architectures and dataset.

6/7/2024

cs.LG stat.ML

🧠

Neural Collapse in Multi-label Learning with Pick-all-label Loss

Pengyu Li, Xiao Li, Yutong Wang, Qing Qu

We study deep neural networks for the multi-label classification (MLab) task through the lens of neural collapse (NC). Previous works have been restricted to the multi-class classification setting and discovered a prevalent NC phenomenon comprising of the following properties for the last-layer features: (i) the variability of features within every class collapses to zero, (ii) the set of feature means form an equi-angular tight frame (ETF), and (iii) the last layer classifiers collapse to the feature mean upon some scaling. We generalize the study to multi-label learning, and prove for the first time that a generalized NC phenomenon holds with the pick-all-label formulation, which we term as MLab NC. While the ETF geometry remains consistent for features with a single label, multi-label scenarios introduce a unique combinatorial aspect we term the tag-wise average property, where the means of features with multiple labels are the scaled averages of means for single-label instances. Theoretically, under proper assumptions on the features, we establish that the only global optimizer of the pick-all-label cross-entropy loss satisfy the multi-label NC. In practice, we demonstrate that our findings can lead to better test performance with more efficient training techniques for MLab learning.

6/21/2024

cs.LG

Navigate Beyond Shortcuts: Debiased Learning through the Lens of Neural Collapse

Yining Wang, Junjie Sun, Chenyue Wang, Mi Zhang, Min Yang

Recent studies have noted an intriguing phenomenon termed Neural Collapse, that is, when the neural networks establish the right correlation between feature spaces and the training targets, their last-layer features, together with the classifier weights, will collapse into a stable and symmetric structure. In this paper, we extend the investigation of Neural Collapse to the biased datasets with imbalanced attributes. We observe that models will easily fall into the pitfall of shortcut learning and form a biased, non-collapsed feature space at the early period of training, which is hard to reverse and limits the generalization capability. To tackle the root cause of biased classification, we follow the recent inspiration of prime training, and propose an avoid-shortcut learning framework without additional training complexity. With well-designed shortcut primes based on Neural Collapse structure, the models are encouraged to skip the pursuit of simple shortcuts and naturally capture the intrinsic correlations. Experimental results demonstrate that our method induces better convergence properties during training, and achieves state-of-the-art generalization performance on both synthetic and real-world biased datasets.

5/10/2024

cs.CV cs.LG