MCNet: A crowd denstity estimation network based on integrating multiscale attention module

Read original: arXiv:2403.20173 - Published 4/1/2024 by Qiang Guo, Rubo Zhang, Di Zhao

🌐

The paper proposes a Metro Crowd density estimation Network (MCNet) to automatically classify the crowd density level of passengers in metro video surveillance systems. The key innovations are:

An Integrating Multi-scale Attention (IMA) module that fuses dilation convolution, multi-scale feature extraction, and attention mechanisms. This enhances the ability of the classifier to extract semantic crowd texture features.
A lightweight crowd texture feature extraction network that can directly process video frames and extract texture features for crowd density estimation. This network has faster image processing speed and fewer parameters, allowing deployment on embedded platforms with limited resources.
Integration of the IMA module and lightweight network to construct the MCNet, which is evaluated on image classification and crowd density datasets. This validates the feasibility of MCNet as a solution for crowd density estimation in metro video surveillance, which faces challenges like high density, occlusion, perspective distortion, and limited hardware.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🌐

MCNet: A crowd denstity estimation network based on integrating multiscale attention module

Qiang Guo, Rubo Zhang, Di Zhao

Aiming at the metro video surveillance system has not been able to effectively solve the metro crowd density estimation problem, a Metro Crowd density estimation Network (called MCNet) is proposed to automatically classify crowd density level of passengers. Firstly, an Integrating Multi-scale Attention (IMA) module is proposed to enhance the ability of the plain classifiers to extract semantic crowd texture features to accommodate to the characteristics of the crowd texture feature. The innovation of the IMA module is to fuse the dilation convolution, multiscale feature extraction and attention mechanism to obtain multi-scale crowd feature activation from a larger receptive field with lower computational cost, and to strengthen the crowds activation state of convolutional features in top layers. Secondly, a novel lightweight crowd texture feature extraction network is proposed, which can directly process video frames and automatically extract texture features for crowd density estimation, while its faster image processing speed and fewer network parameters make it flexible to be deployed on embedded platforms with limited hardware resources. Finally, this paper integrates IMA module and the lightweight crowd texture feature extraction network to construct the MCNet, and validate the feasibility of this network on image classification dataset: Cifar10 and four crowd density datasets: PETS2009, Mall, QUT and SH_METRO to validate the MCNet whether can be a suitable solution for crowd density estimation in metro video surveillance where there are image processing challenges such as high density, high occlusion, perspective distortion and limited hardware resources.

4/1/2024

📶

Learning Discriminative Features for Crowd Counting

Yuehai Chen, Qingzhong Wang, Jing Yang, Badong Chen, Haoyi Xiong, Shaoyi Du

Crowd counting models in highly congested areas confront two main challenges: weak localization ability and difficulty in differentiating between foreground and background, leading to inaccurate estimations. The reason is that objects in highly congested areas are normally small and high level features extracted by convolutional neural networks are less discriminative to represent small objects. To address these problems, we propose a learning discriminative features framework for crowd counting, which is composed of a masked feature prediction module (MPM) and a supervised pixel-level contrastive learning module (CLM). The MPM randomly masks feature vectors in the feature map and then reconstructs them, allowing the model to learn about what is present in the masked regions and improving the model's ability to localize objects in high density regions. The CLM pulls targets close to each other and pushes them far away from background in the feature space, enabling the model to discriminate foreground objects from background. Additionally, the proposed modules can be beneficial in various computer vision tasks, such as crowd counting and object detection, where dense scenes or cluttered environments pose challenges to accurate localization. The proposed two modules are plug-and-play, incorporating the proposed modules into existing models can potentially boost their performance in these scenarios.

6/19/2024

CrowdMAC: Masked Crowd Density Completion for Robust Crowd Density Forecasting

Ryo Fujii, Ryo Hachiuma, Hideo Saito

A crowd density forecasting task aims to predict how the crowd density map will change in the future from observed past crowd density maps. However, the past crowd density maps are often incomplete due to the miss-detection of pedestrians, and it is crucial to develop a robust crowd density forecasting model against the miss-detection. This paper presents a MAsked crowd density Completion framework for crowd density forecasting (CrowdMAC), which is simultaneously trained to forecast future crowd density maps from partially masked past crowd density maps (i.e., forecasting maps from past maps with miss-detection) while reconstructing the masked observation maps (i.e., imputing past maps with miss-detection). Additionally, we propose Temporal-Density-aware Masking (TDM), which non-uniformly masks tokens in the observed crowd density map, considering the sparsity of the crowd density maps and the informativeness of the subsequent frames for the forecasting task. Moreover, we introduce multi-task masking to enhance training efficiency. In the experiments, CrowdMAC achieves state-of-the-art performance on seven large-scale datasets, including SDD, ETH-UCY, inD, JRDB, VSCrowd, FDST, and croHD. We also demonstrate the robustness of the proposed method against both synthetic and realistic miss-detections.

7/23/2024

Enhancing Wireless Networks with Attention Mechanisms: Insights from Mobile Crowdsensing

Yaoqi Yang, Hongyang Du, Zehui Xiong, Dusit Niyato, Abbas Jamalipour, Zhu Han

The increasing demand for sensing, collecting, transmitting, and processing vast amounts of data poses significant challenges for resource-constrained mobile users, thereby impacting the performance of wireless networks. In this regard, from a case of mobile crowdsensing (MCS), we aim at leveraging attention mechanisms in machine learning approaches to provide solutions for building an effective, timely, and secure MCS. Specifically, we first evaluate potential combinations of attention mechanisms and MCS by introducing their preliminaries. Then, we present several emerging scenarios about how to integrate attention into MCS, including task allocation, incentive design, terminal recruitment, privacy preservation, data collection, and data transmission. Subsequently, we propose an attention-based framework to solve network optimization problems with multiple performance indicators in large-scale MCS. The designed case study have evaluated the effectiveness of the proposed framework. Finally, we outline important research directions for advancing attention-enabled MCS.

7/23/2024