Retinotopic Mapping Enhances the Robustness of Convolutional Neural Networks

Read original: arXiv:2402.15480 - Published 8/12/2024 by Jean-Nicolas J'er'emie, Emmanuel Dauc'e, Laurent U Perrinet

Retinotopic Mapping Enhances the Robustness of Convolutional Neural Networks

Overview

CNNs (Convolutional Neural Networks) are powerful tools for image recognition, but can be vulnerable to geometric transformations like rotations.
This paper investigates how incorporating retinotopic mapping - a biological principle that mimics the visual cortex - can enhance the robustness of CNNs to such transformations.
The researchers propose a retinotopic mapping layer and demonstrate its effectiveness in improving CNN performance on rotated images.

Plain English Explanation

The human visual system is highly adept at recognizing objects, even when they are rotated or transformed in various ways. This is thanks to the structure of the visual cortex, which uses a retinotopic mapping to preserve the spatial relationships between different parts of the visual field.

Convolutional Neural Networks (CNNs) are a type of machine learning model that have been highly successful at image recognition tasks. However, they can struggle with geometric transformations like rotations, which can cause significant performance degradation.

In this paper, the researchers investigated how incorporating a retinotopic mapping layer into a CNN could help improve its robustness to rotation. The key idea is that by mimicking the structure of the visual cortex, the model can better preserve the spatial relationships between different parts of the image, even when it is rotated.

The researchers tested their approach on a variety of image classification tasks and found that the retinotopic mapping layer significantly improved the CNN's performance on rotated images, compared to a standard CNN architecture. This suggests that incorporating biologically-inspired principles like retinotopic mapping can be a promising approach for enhancing the robustness of artificial vision systems.

Technical Explanation

The paper proposes a retinotopic mapping layer that can be integrated into a CNN architecture to improve its robustness to geometric transformations, specifically rotations.

The retinotopic mapping layer is inspired by the structure of the visual cortex, which preserves the spatial relationships between different parts of the visual field through a retinotopic mapping. The layer works by applying a series of affine transformations to the input image, effectively rotating and warping it to align with a canonical coordinate frame.

The researchers evaluated the performance of their retinotopic mapping CNN on several image classification tasks, including CIFAR-10 and ImageNet, and compared it to a standard CNN architecture. They found that the retinotopic mapping CNN consistently outperformed the standard CNN on rotated versions of the test images, demonstrating its enhanced robustness to geometric transformations.

The paper also provides insights into the mechanism underlying the improved performance. The retinotopic mapping layer appears to effectively "undo" the rotation of the input image, allowing the subsequent CNN layers to focus on recognizing the underlying object features rather than having to adapt to the rotation.

Critical Analysis

The paper presents a novel and promising approach for enhancing the robustness of CNNs to geometric transformations. The incorporation of a retinotopic mapping layer, inspired by the structure of the human visual system, is a compelling idea that aligns with the growing interest in biologically-inspired AI research.

One potential limitation of the approach is that it may not generalize as well to more complex or combined geometric transformations, such as a combination of rotation and scaling. The paper focuses primarily on the case of rotation, and it would be interesting to see how the retinotopic mapping layer performs in the face of other types of geometric transformations.

Additionally, the paper does not provide a comprehensive analysis of the computational cost and memory requirements of the retinotopic mapping layer, which could be an important consideration for real-world applications, especially on resource-constrained devices.

Overall, the research presented in this paper is a valuable contribution to the field of computer vision and highlights the potential benefits of incorporating biologically-inspired principles into the design of artificial vision systems.

Conclusion

This paper demonstrates that incorporating a retinotopic mapping layer into a Convolutional Neural Network (CNN) can significantly enhance the model's robustness to geometric transformations, particularly rotation.

By mimicking the structure of the human visual cortex, the retinotopic mapping layer effectively "undoes" the rotation of the input image, allowing the subsequent CNN layers to focus on recognizing the underlying object features rather than having to adapt to the transformation.

The results presented in the paper suggest that this biologically-inspired approach could be a promising direction for improving the inherent adversarial robustness of artificial vision systems, which is an important consideration for real-world applications. Further research is needed to explore the generalization of this approach to other types of geometric transformations and to address potential computational and memory constraints.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Retinotopic Mapping Enhances the Robustness of Convolutional Neural Networks

Jean-Nicolas J'er'emie, Emmanuel Dauc'e, Laurent U Perrinet

Foveated vision, a trait shared by many animals, including humans, has not been fully utilized in machine learning applications, despite its significant contributions to biological visual function. This study investigates whether retinotopic mapping, a critical component of foveated vision, can enhance image categorization and localization performance when integrated into deep convolutional neural networks (CNNs). Retinotopic mapping was integrated into the inputs of standard off-the-shelf convolutional neural networks (CNNs), which were then retrained on the ImageNet task. As expected, the logarithmic-polar mapping improved the network's ability to handle arbitrary image zooms and rotations, particularly for isolated objects. Surprisingly, the retinotopically mapped network achieved comparable performance in classification. Furthermore, the network demonstrated improved classification localization when the foveated center of the transform was shifted. This replicates a crucial ability of the human visual system that is absent in typical convolutional neural networks (CNNs). These findings suggest that retinotopic mapping may be fundamental to significant preattentive visual processes.

8/12/2024

➖

Enhancing 3T Retinotopic Maps Using Diffeomorphic Registration

Negar Jalili-Mallak, Yanshuai Tu, Zhong-Lin Lu, Yalin Wang

Retinotopic mapping aims to uncover the relationship between visual stimuli on the retina and neural responses on the visual cortical surface. This study advances retinotopic mapping by applying diffeomorphic registration to the 3T NYU retinotopy dataset, encompassing analyze-PRF and mrVista data. Diffeomorphic Registration for Retinotopic Maps (DRRM) quantifies the diffeomorphic condition, ensuring accurate alignment of retinotopic maps without topological violations. Leveraging the Beltrami coefficient and topological condition, DRRM significantly enhances retinotopic map accuracy. Evaluation against existing methods demonstrates DRRM's superiority on various datasets, including 3T and 7T retinotopy data. The application of diffeomorphic registration improves the interpretability of low-quality retinotopic maps, holding promise for clinical applications.

5/6/2024

🗣️

nnMobileNe: Rethinking CNN for Retinopathy Research

Wenhui Zhu, Peijie Qiu, Xiwen Chen, Xin Li, Natasha Lepore, Oana M. Dumitrascu, Yalin Wang

Over the past few decades, convolutional neural networks (CNNs) have been at the forefront of the detection and tracking of various retinal diseases (RD). Despite their success, the emergence of vision transformers (ViT) in the 2020s has shifted the trajectory of RD model development. The leading-edge performance of ViT-based models in RD can be largely credited to their scalability-their ability to improve as more parameters are added. As a result, ViT-based models tend to outshine traditional CNNs in RD applications, albeit at the cost of increased data and computational demands. ViTs also differ from CNNs in their approach to processing images, working with patches rather than local regions, which can complicate the precise localization of small, variably presented lesions in RD. In our study, we revisited and updated the architecture of a CNN model, specifically MobileNet, to enhance its utility in RD diagnostics. We found that an optimized MobileNet, through selective modifications, can surpass ViT-based models in various RD benchmarks, including diabetic retinopathy grading, detection of multiple fundus diseases, and classification of diabetic macular edema. The code is available at https://github.com/Retinal-Research/NN-MOBILENET

4/17/2024

Contextual fusion enhances robustness to image blurring

Shruti Joshi, Aiswarya Akumalla, Seth Haney, Maxim Bazhenov

Mammalian brains handle complex reasoning by integrating information across brain regions specialized for particular sensory modalities. This enables improved robustness and generalization versus deep neural networks, which typically process one modality and are vulnerable to perturbations. While defense methods exist, they do not generalize well across perturbations. We developed a fusion model combining background and foreground features from CNNs trained on Imagenet and Places365. We tested its robustness to human-perceivable perturbations on MS COCO. The fusion model improved robustness, especially for classes with greater context variability. Our proposed solution for integrating multiple modalities provides a new approach to enhance robustness and may be complementary to existing methods.

6/10/2024