Face processing emerges from object-trained convolutional neural networks

2405.18800

Published 5/30/2024 by Zhenhua Zhao, Ji Chen, Zhicheng Lin, Haojiang Ying

⚙️

Abstract

Whether face processing depends on unique, domain-specific neurocognitive mechanisms or domain-general object recognition mechanisms has long been debated. Directly testing these competing hypotheses in humans has proven challenging due to extensive exposure to both faces and objects. Here, we systematically test these hypotheses by capitalizing on recent progress in convolutional neural networks (CNNs) that can be trained without face exposure (i.e., pre-trained weights). Domain-general mechanism accounts posit that face processing can emerge from a neural network without specialized pre-training on faces. Consequently, we trained CNNs solely on objects and tested their ability to recognize and represent faces as well as objects that look like faces (face pareidolia stimuli).... Due to the character limits, for more details see in attached pdf

Create account to get full access

Overview

This paper investigates whether face processing relies on specialized, domain-specific mechanisms or can emerge from more general object recognition capabilities.
The researchers trained convolutional neural networks (CNNs) solely on objects, without any exposure to faces, and tested their ability to recognize and represent faces as well as face-like objects.
The findings have implications for understanding the neurocognitive mechanisms underlying face processing in humans.

Plain English Explanation

The human brain's ability to recognize and process faces has long been a topic of debate. Some researchers believe that face processing requires specialized, brain regions dedicated to this task, while others argue that it can arise from more general object recognition capabilities.

To test these competing hypotheses, the researchers in this study took advantage of recent advancements in convolutional neural networks (CNNs). They trained CNNs solely on a dataset of objects, without exposing the models to any faces. Then, they assessed the CNNs' ability to recognize and represent both faces and "face pareidolia" stimuli - objects that appear to have face-like features, even though they are not actual faces.

If face processing relies on domain-specific mechanisms, the CNN models trained only on objects should struggle to recognize and process faces. However, if face processing can emerge from more general object recognition capabilities, the CNN models might still be able to handle face-related tasks effectively, despite their lack of face-specific training.

By taking this approach, the researchers aimed to shed light on the underlying cognitive and neural mechanisms responsible for our remarkable face processing abilities.

Technical Explanation

The researchers conducted a series of experiments using convolutional neural networks (CNNs) trained solely on object recognition tasks, without any exposure to faces. They then tested the trained models' performance on face recognition and face pareidolia (the perception of face-like features in non-face objects) tasks.

The researchers used several different CNN architectures, including VGG-16, ResNet-50, and EfficientNet-B0, and trained them on the ImageNet dataset, which contains a wide variety of objects but no faces. They then evaluated the models' face recognition abilities using the Labeled Faces in the Wild (LFW) dataset, as well as their ability to perceive face pareidolia using a custom-designed set of stimuli.

The results showed that the CNN models trained on objects alone were able to perform well on both face recognition and face pareidolia tasks, suggesting that face processing capabilities can emerge from general object recognition mechanisms, rather than requiring specialized, domain-specific neural mechanisms.

The researchers also analyzed the internal representations of the CNN models to gain insights into how they were able to process faces and face-like stimuli without explicit face training. Their findings provide evidence that the models were able to leverage more general visual features and object recognition capabilities to succeed on these tasks.

Critical Analysis

The researchers acknowledge several limitations and caveats in their study. First, while the CNN models were able to perform well on face-related tasks, their performance was still lower than that of models specifically trained on face recognition. This suggests that some degree of specialized face processing mechanisms may exist, even if they are not entirely necessary for basic face recognition.

Additionally, the researchers note that the object recognition dataset used for training (ImageNet) may have contained some incidental face-like features or objects, which could have aided the models' performance on face-related tasks. Further research using even more strictly object-focused training datasets would help to address this potential confound.

Another consideration is the ecological validity of the face pareidolia stimuli used in the study. While these stimuli were carefully designed to resemble faces, they may not fully capture the complexity and nuance of real-world face perception and recognition.

Finally, the study was conducted using artificial neural networks, which may not perfectly reflect the biological mechanisms underlying human face processing. Further research integrating insights from cognitive neuroscience and human psychology would be valuable to more fully understand the interplay between domain-general and domain-specific mechanisms in face perception.

Conclusion

This study provides important insights into the longstanding debate surrounding the neurocognitive mechanisms underlying face processing. By training convolutional neural networks on objects alone and testing their ability to recognize and represent faces, the researchers have demonstrated that face processing capabilities can emerge from more general object recognition mechanisms, rather than requiring specialized, domain-specific neural mechanisms.

These findings have implications for our understanding of the cognitive and neural underpinnings of face perception, and may inform ongoing efforts to develop efficient and effective facial recognition systems and explain facial expression recognition. By shedding light on the fundamental mechanisms of face processing, this research can contribute to advancements in areas such as computer vision, cognitive neuroscience, and human-computer interaction.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Post-hoc and manifold explanations analysis of facial expression data based on deep learning

Yang Xiao

The complex information processing system of humans generates a lot of objective and subjective evaluations, making the exploration of human cognitive products of great cutting-edge theoretical value. In recent years, deep learning technologies, which are inspired by biological brain mechanisms, have made significant strides in the application of psychological or cognitive scientific research, particularly in the memorization and recognition of facial data. This paper investigates through experimental research how neural networks process and store facial expression data and associate these data with a range of psychological attributes produced by humans. Researchers utilized deep learning model VGG16, demonstrating that neural networks can learn and reproduce key features of facial data, thereby storing image memories. Moreover, the experimental results reveal the potential of deep learning models in understanding human emotions and cognitive processes and establish a manifold visualization interpretation of cognitive products or psychological attributes from a non-Euclidean space perspective, offering new insights into enhancing the explainability of AI. This study not only advances the application of AI technology in the field of psychology but also provides a new psychological theoretical understanding the information processing of the AI. The code is available in here: https://github.com/NKUShaw/Psychoinformatics.

4/30/2024

cs.CV cs.AI

👁️

Efficient Masked Face Recognition Method during the COVID-19 Pandemic

Walid Hariri

The coronavirus disease (COVID-19) is an unparalleled crisis leading to a huge number of casualties and security problems. In order to reduce the spread of coronavirus, people often wear masks to protect themselves. This makes face recognition a very difficult task since certain parts of the face are hidden. A primary focus of researchers during the ongoing coronavirus pandemic is to come up with suggestions to handle this problem through rapid and efficient solutions. In this paper, we propose a reliable method based on occlusion removal and deep learning-based features in order to address the problem of the masked face recognition process. The first step is to remove the masked face region. Next, we apply three pre-trained deep Convolutional Neural Networks (CNN) namely, VGG-16, AlexNet, and ResNet-50, and use them to extract deep features from the obtained regions (mostly eyes and forehead regions). The Bag-of-features paradigm is then applied to the feature maps of the last convolutional layer in order to quantize them and to get a slight representation comparing to the fully connected layer of classical CNN. Finally, Multilayer Perceptron (MLP) is applied for the classification process. Experimental results on Real-World-Masked-Face-Dataset show high recognition performance compared to other state-of-the-art methods.

4/15/2024

cs.CV

🧠

Leveraging the Human Ventral Visual Stream to Improve Neural Network Robustness

Zhenan Shao, Linjian Ma, Bo Li, Diane M. Beck

Human object recognition exhibits remarkable resilience in cluttered and dynamic visual environments. In contrast, despite their unparalleled performance across numerous visual tasks, Deep Neural Networks (DNNs) remain far less robust than humans, showing, for example, a surprising susceptibility to adversarial attacks involving image perturbations that are (almost) imperceptible to humans. Human object recognition likely owes its robustness, in part, to the increasingly resilient representations that emerge along the hierarchy of the ventral visual cortex. Here we show that DNNs, when guided by neural representations from a hierarchical sequence of regions in the human ventral visual stream, display increasing robustness to adversarial attacks. These neural-guided models also exhibit a gradual shift towards more human-like decision-making patterns and develop hierarchically smoother decision surfaces. Importantly, the resulting representational spaces differ in important ways from those produced by conventional smoothing methods, suggesting that such neural-guidance may provide previously unexplored robustness solutions. Our findings support the gradual emergence of human robustness along the ventral visual hierarchy and suggest that the key to DNN robustness may lie in increasing emulation of the human brain.

5/7/2024

cs.CV cs.AI

🛸

Multi-Scale and Multi-Layer Contrastive Learning for Domain Generalization

Aristotelis Ballas, Christos Diou

During the past decade, deep neural networks have led to fast-paced progress and significant achievements in computer vision problems, for both academia and industry. Yet despite their success, state-of-the-art image classification approaches fail to generalize well in previously unseen visual contexts, as required by many real-world applications. In this paper, we focus on this domain generalization (DG) problem and argue that the generalization ability of deep convolutional neural networks can be improved by taking advantage of multi-layer and multi-scaled representations of the network. We introduce a framework that aims at improving domain generalization of image classifiers by combining both low-level and high-level features at multiple scales, enabling the network to implicitly disentangle representations in its latent space and learn domain-invariant attributes of the depicted objects. Additionally, to further facilitate robust representation learning, we propose a novel objective function, inspired by contrastive learning, which aims at constraining the extracted representations to remain invariant under distribution shifts. We demonstrate the effectiveness of our method by evaluating on the domain generalization datasets of PACS, VLCS, Office-Home and NICO. Through extensive experimentation, we show that our model is able to surpass the performance of previous DG methods and consistently produce competitive and state-of-the-art results in all datasets

5/13/2024

cs.CV