The Lie Derivative for Measuring Learned Equivariance

2210.02984

Published 6/19/2024 by Nate Gruver, Marc Finzi, Micah Goldblum, Andrew Gordon Wilson

✨

Abstract

Equivariance guarantees that a model's predictions capture key symmetries in data. When an image is translated or rotated, an equivariant model's representation of that image will translate or rotate accordingly. The success of convolutional neural networks has historically been tied to translation equivariance directly encoded in their architecture. The rising success of vision transformers, which have no explicit architectural bias towards equivariance, challenges this narrative and suggests that augmentations and training data might also play a significant role in their performance. In order to better understand the role of equivariance in recent vision models, we introduce the Lie derivative, a method for measuring equivariance with strong mathematical foundations and minimal hyperparameters. Using the Lie derivative, we study the equivariance properties of hundreds of pretrained models, spanning CNNs, transformers, and Mixer architectures. The scale of our analysis allows us to separate the impact of architecture from other factors like model size or training method. Surprisingly, we find that many violations of equivariance can be linked to spatial aliasing in ubiquitous network layers, such as pointwise non-linearities, and that as models get larger and more accurate they tend to display more equivariance, regardless of architecture. For example, transformers can be more equivariant than convolutional neural networks after training.

Create account to get full access

Overview

This paper examines the concept of equivariance in modern computer vision models, including convolutional neural networks (CNNs) and vision transformers.
Equivariance refers to the property of a model where its representations change in a predictable way when the input is transformed (e.g., translated or rotated).
The paper introduces a mathematical method called the Lie derivative to measure equivariance and uses it to analyze hundreds of pre-trained models.
The findings challenge the common assumption that equivariance is primarily determined by a model's architecture, suggesting that other factors like data augmentation and model size also play a significant role.

Plain English Explanation

Equivariance is an important property in computer vision models, where the model's representation of an image should change in a predictable way when the image is transformed (e.g., translated or rotated). This is crucial for tasks like object recognition, where the model needs to recognize the same object even if it appears in a different location or orientation.

Historically, the success of convolutional neural networks (CNNs) has been attributed to their built-in translation equivariance, which is directly encoded in their architectural design. However, the recent rise of vision transformers, which have no explicit bias towards equivariance, suggests that other factors like data augmentation and model size may also contribute to a model's equivariance properties.

To better understand this, the researchers introduce a mathematical method called the Lie derivative, which allows them to measure the equivariance of different vision models in a rigorous and consistent way. By applying this method to hundreds of pre-trained models, they're able to separate the impact of architecture from other factors and make some surprising discoveries.

For example, they find that many violations of equivariance can be traced back to common network layers, like pointwise non-linearities, which can introduce spatial aliasing. Interestingly, they also observe that as models get larger and more accurate, they tend to display more equivariance, regardless of their underlying architecture. This means that even transformers can be more equivariant than CNNs after sufficient training.

Technical Explanation

The researchers introduce a mathematical framework based on the Lie derivative to measure the equivariance of computer vision models. The Lie derivative is a rigorous and highly interpretable way to quantify how a model's representation changes when the input is transformed, with minimal hyperparameters.

Using the Lie derivative, the researchers analyze hundreds of pre-trained models, spanning CNNs, transformers, and Mixer architectures. By separating the impact of architecture from other factors like model size and training method, they are able to make several key observations:

Many violations of equivariance can be linked to spatial aliasing in common network layers, such as pointwise non-linearities. This suggests that equivariance is not solely determined by architectural design.
As models get larger and more accurate, they tend to display more equivariance, regardless of their underlying architecture. This means that even transformers can become more equivariant than CNNs after sufficient training.
The role of data augmentation and other training techniques in promoting equivariance may be more significant than previously thought, challenging the narrative that equivariance is primarily encoded in a model's architecture.

These findings have important implications for the design and training of future computer vision models, as they suggest that equivariance can be achieved through a combination of architectural choices, data augmentation, and model scaling, rather than relying solely on hard-coded equivariance in the model structure.

Critical Analysis

The paper presents a robust and comprehensive analysis of equivariance in modern computer vision models, and the introduction of the Lie derivative as a tool for measuring equivariance is a valuable contribution to the field. However, there are a few potential limitations and areas for further research:

The analysis is primarily focused on pre-trained models, and the impact of the training process itself on equivariance properties is not explored in depth. Investigating the role of different training techniques, such as the probabilistic approach to learning degree equivariance or any-dimensional equivariant neural networks, could provide additional insights.
The study is limited to 2D computer vision tasks, and it's unclear how the findings would extend to other domains, such as equivariant quantum neural networks or architecture-agnostic equivariance.
While the Lie derivative is a powerful tool for measuring equivariance, it may not capture all nuances of the concept, and alternative approaches could be explored to provide a more comprehensive understanding.

Overall, this paper makes a significant contribution to our understanding of equivariance in modern computer vision models and highlights the need for a more holistic approach to achieving equivariance, beyond just architectural design.

Conclusion

This paper challenges the traditional narrative around equivariance in computer vision models, suggesting that factors like data augmentation and model size may play a more important role than previously thought. By introducing a rigorous mathematical framework based on the Lie derivative, the researchers are able to analyze the equivariance properties of hundreds of pre-trained models, spanning different architectures.

The key findings indicate that many violations of equivariance can be traced back to common network layers, and that larger, more accurate models tend to display more equivariance, regardless of their underlying architecture. These insights have important implications for the design and training of future computer vision models, as they suggest that equivariance can be achieved through a combination of architectural choices, data augmentation, and model scaling.

Overall, this paper represents a significant step forward in our understanding of equivariance and its role in the success of modern computer vision models. By challenging the traditional assumptions and introducing new analytical tools, the researchers have paved the way for further exploration and innovation in this important field of study.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Approximately Equivariant Neural Processes

Matthew Ashman, Cristiana Diaconu, Adrian Weller, Wessel Bruinsma, Richard E. Turner

Equivariant deep learning architectures exploit symmetries in learning problems to improve the sample efficiency of neural-network-based models and their ability to generalise. However, when modelling real-world data, learning problems are often not exactly equivariant, but only approximately. For example, when estimating the global temperature field from weather station observations, local topographical features like mountains break translation equivariance. In these scenarios, it is desirable to construct architectures that can flexibly depart from exact equivariance in a data-driven way. In this paper, we develop a general approach to achieving this using existing equivariant architectures. Our approach is agnostic to both the choice of symmetry group and model architecture, making it widely applicable. We consider the use of approximately equivariant architectures in neural processes (NPs), a popular family of meta-learning models. We demonstrate the effectiveness of our approach on a number of synthetic and real-world regression experiments, demonstrating that approximately equivariant NP models can outperform both their non-equivariant and strictly equivariant counterparts.

6/21/2024

stat.ML cs.LG

A Probabilistic Approach to Learning the Degree of Equivariance in Steerable CNNs

Lars Veefkind, Gabriele Cesa

Steerable convolutional neural networks (SCNNs) enhance task performance by modelling geometric symmetries through equivariance constraints on weights. Yet, unknown or varying symmetries can lead to overconstrained weights and decreased performance. To address this, this paper introduces a probabilistic method to learn the degree of equivariance in SCNNs. We parameterise the degree of equivariance as a likelihood distribution over the transformation group using Fourier coefficients, offering the option to model layer-wise and shared equivariance. These likelihood distributions are regularised to ensure an interpretable degree of equivariance across the network. Advantages include the applicability to many types of equivariant networks through the flexible framework of SCNNs and the ability to learn equivariance with respect to any subgroup of any compact group without requiring additional layers. Our experiments reveal competitive performance on datasets with mixed symmetries, with learnt likelihood distributions that are representative of the underlying degree of equivariance.

6/7/2024

cs.LG

🧠

Any-dimensional equivariant neural networks

Eitan Levin, Mateo D'iaz

Traditional supervised learning aims to learn an unknown mapping by fitting a function to a set of input-output pairs with a fixed dimension. The fitted function is then defined on inputs of the same dimension. However, in many settings, the unknown mapping takes inputs in any dimension; examples include graph parameters defined on graphs of any size and physics quantities defined on an arbitrary number of particles. We leverage a newly-discovered phenomenon in algebraic topology, called representation stability, to define equivariant neural networks that can be trained with data in a fixed dimension and then extended to accept inputs in any dimension. Our approach is user-friendly, requiring only the network architecture and the groups for equivariance, and can be combined with any training procedure. We provide a simple open-source implementation of our methods and offer preliminary numerical experiments.

5/1/2024

cs.LG stat.ML

🖼️

Learning Probabilistic Symmetrization for Architecture Agnostic Equivariance

Jinwoo Kim, Tien Dat Nguyen, Ayhan Suleymanzade, Hyeokjun An, Seunghoon Hong

We present a novel framework to overcome the limitations of equivariant architectures in learning functions with group symmetries. In contrary to equivariant architectures, we use an arbitrary base model such as an MLP or a transformer and symmetrize it to be equivariant to the given group by employing a small equivariant network that parameterizes the probabilistic distribution underlying the symmetrization. The distribution is end-to-end trained with the base model which can maximize performance while reducing sample complexity of symmetrization. We show that this approach ensures not only equivariance to given group but also universal approximation capability in expectation. We implement our method on various base models, including patch-based transformers that can be initialized from pretrained vision transformers, and test them for a wide range of symmetry groups including permutation and Euclidean groups and their combinations. Empirical tests show competitive results against tailored equivariant architectures, suggesting the potential for learning equivariant functions for diverse groups using a non-equivariant universal base architecture. We further show evidence of enhanced learning in symmetric modalities, like graphs, when pretrained from non-symmetric modalities, like vision. Code is available at https://github.com/jw9730/lps.

4/16/2024

cs.LG cs.AI