SLAM-based Joint Calibration of Multiple Asynchronous Microphone Arrays and Sound Source Localization

Read original: arXiv:2405.19813 - Published 5/31/2024 by Jiang Wang, Yuanzheng He, Daobilige Su, Katsutoshi Itoyama, Kazuhiro Nakadai, Junfeng Wu, Shoudong Huang, Youfu Li, He Kong

SLAM-based Joint Calibration of Multiple Asynchronous Microphone Arrays and Sound Source Localization

Overview

This paper presents a SLAM-based approach for joint calibration of multiple asynchronous microphone arrays and sound source localization.
The proposed method can handle multiple microphone arrays that are not synchronized, allowing for more flexible deployment in real-world scenarios.
The system uses a SLAM-based framework to jointly estimate the positions and orientations of the microphone arrays, as well as the location of sound sources.

Plain English Explanation

The research paper describes a new way to accurately locate and track sound sources, even when using multiple microphone arrays that are not perfectly synchronized with each other. This is an important challenge in "robot audition" - the field of enabling robots to hear and understand sounds in their environment.

The key insight is to use a SLAM (Simultaneous Localization and Mapping) approach, which is a technique commonly used in robotics to build a map of an environment while also locating the robot within that map. By applying SLAM principles, the researchers were able to jointly calibrate the positions and orientations of the microphone arrays, while also pinpointing the locations of sound sources.

This is a significant advance over previous methods, which required the microphone arrays to be precisely synchronized. The new approach can handle asynchronous arrays, making the system much more flexible and practical for real-world deployment, such as in audio simulation for sound source localization in virtual environments.

Technical Explanation

The paper proposes a SLAM-based framework for jointly calibrating multiple asynchronous microphone arrays and localizing sound sources. The system operates in two stages:

Microphone array calibration: The positions and orientations of the microphone arrays are estimated using a SLAM-based optimization process. This takes into account the asynchronous nature of the arrays, allowing them to be calibrated even if they are not perfectly synchronized.
Sound source localization: With the microphone array positions and orientations known, the system can then locate sound sources by triangulating their positions based on the audio signals received by the different arrays.

The SLAM-based approach has several advantages over previous calibration methods. It can handle multi-robot object SLAM using distributed variational inference, it can operate in SLAM-based indoor mapping for wide-area construction environments, and it does not require the microphone arrays to be synchronized, making the system more flexible and practical for real-world applications.

Critical Analysis

The paper presents a well-designed and thorough evaluation of the proposed SLAM-based calibration and localization system. The authors demonstrate its effectiveness on both simulated and real-world datasets, showing significant improvements over previous state-of-the-art methods.

One potential limitation of the approach is that it assumes the sound sources are static during the calibration and localization process. While this may be a reasonable assumption in many scenarios, it could be challenging to apply the method to dynamic sound sources. Additionally, the paper does not address potential issues with SLAM-based joint calibration of multiple asynchronous microphone arrays and sound source localization in noisy or reverberant environments, which could impact the accuracy of the system.

Overall, the research represents an important step forward in robot audition and has the potential to enable more flexible and robust sound localization systems for a variety of applications, such as audio simulation for sound source localization in virtual environments and SLAM-based indoor mapping for wide-area construction environments.

Conclusion

This paper presents a novel SLAM-based approach for jointly calibrating multiple asynchronous microphone arrays and localizing sound sources. The key innovation is the ability to handle arrays that are not perfectly synchronized, which significantly improves the flexibility and practicality of the system for real-world deployment.

The authors demonstrate the effectiveness of their method through extensive simulations and experiments, showing substantial improvements over previous state-of-the-art techniques. While the approach has some limitations, such as the assumption of static sound sources, it represents an important contribution to the field of robot audition and has the potential to enable more advanced sound localization capabilities in a wide range of applications, from audio simulation for sound source localization in virtual environments to SLAM-based indoor mapping for wide-area construction environments.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

SLAM-based Joint Calibration of Multiple Asynchronous Microphone Arrays and Sound Source Localization

Jiang Wang, Yuanzheng He, Daobilige Su, Katsutoshi Itoyama, Kazuhiro Nakadai, Junfeng Wu, Shoudong Huang, Youfu Li, He Kong

Robot audition systems with multiple microphone arrays have many applications in practice. However, accurate calibration of multiple microphone arrays remains challenging because there are many unknown parameters to be identified, including the relative transforms (i.e., orientation, translation) and asynchronous factors (i.e., initial time offset and sampling clock difference) between microphone arrays. To tackle these challenges, in this paper, we adopt batch simultaneous localization and mapping (SLAM) for joint calibration of multiple asynchronous microphone arrays and sound source localization. Using the Fisher information matrix (FIM) approach, we first conduct the observability analysis (i.e., parameter identifiability) of the above-mentioned calibration problem and establish necessary/sufficient conditions under which the FIM and the Jacobian matrix have full column rank, which implies the identifiability of the unknown parameters. We also discover several scenarios where the unknown parameters are not uniquely identifiable. Subsequently, we propose an effective framework to initialize the unknown parameters, which is used as the initial guess in batch SLAM for multiple microphone arrays calibration, aiming to further enhance optimization accuracy and convergence. Extensive numerical simulations and real experiments have been conducted to verify the performance of the proposed method. The experiment results show that the proposed pipeline achieves higher accuracy with fast convergence in comparison to methods that use the noise-corrupted ground truth of the unknown parameters as the initial guess in the optimization and other existing frameworks.

5/31/2024

🧠

A Physics-Informed Neural Network-Based Approach for the Spatial Upsampling of Spherical Microphone Arrays

Federico Miotello, Ferdinando Terminiello, Mirco Pezzoli, Alberto Bernardini, Fabio Antonacci, Augusto Sarti

Spherical microphone arrays are convenient tools for capturing the spatial characteristics of a sound field. However, achieving superior spatial resolution requires arrays with numerous capsules, consequently leading to expensive devices. To address this issue, we present a method for spatially upsampling spherical microphone arrays with a limited number of capsules. Our approach exploits a physics-informed neural network with Rowdy activation functions, leveraging physical constraints to provide high-order microphone array signals, starting from low-order devices. Results show that, within its domain of application, our approach outperforms a state of the art method based on signal processing for spherical microphone arrays upsampling.

7/29/2024

Design and Analysis of Binaural Signal Matching with Arbitrary Microphone Arrays

Lior Madmoni, Zamir Ben-Hur, Jacob Donley, Vladimir Tourbabin, Boaz Rafaely

Binaural reproduction is rapidly becoming a topic of great interest in the research community, especially with the surge of new and popular devices, such as virtual reality headsets, smart glasses, and head-tracked headphones. In order to immerse the listener in a virtual or remote environment with such devices, it is essential to generate realistic and accurate binaural signals. This is challenging, especially since the microphone arrays mounted on these devices are typically composed of an arbitrarily-arranged small number of microphones, which impedes the use of standard audio formats like Ambisonics, and provides limited spatial resolution. The binaural signal matching (BSM) method was developed recently to overcome these challenges. While it produced binaural signals with low error using relatively simple arrays, its performance degraded significantly when head rotation was introduced. This paper aims to develop the BSM method further and overcome its limitations. For this purpose, the method is first analyzed in detail, and a design framework that guarantees accurate binaural reproduction for relatively complex acoustic environments is presented. Next, it is shown that the BSM accuracy may significantly degrade at high frequencies, and thus, a perceptually motivated extension to the method is proposed, based on a magnitude least-squares (MagLS) formulation. These insights and developments are then analyzed with the help of an extensive simulation study of a simple six-microphone semi-circular array. It is further shown that the BSM-MagLS method can be very useful in compensating for head rotations with this array. Finally, a listening experiment is conducted with a four-microphone array on a pair of glasses in a reverberant speech environment and including head rotations, where it is shown that BSM-MagLS can indeed produce binaural signals with a high perceived quality.

8/9/2024

Stabilized Adaptive Steering for 3D Sonar Microphone Arrays with IMU Sensor Fusion

Wouter Jansen, Dennis Laurijssen, Jan Steckel

This paper presents a novel software-based approach to stabilizing the acoustic images for in-air 3D sonars. Due to uneven terrain, traditional static beamforming techniques can be misaligned, causing inaccurate measurements and imaging artifacts. Furthermore, mechanical stabilization can be more costly and prone to failure. We propose using an adaptive conventional beamforming approach by fusing it with real-time IMU data to adjust the sonar array's steering matrix dynamically based on the elevation tilt angle caused by the uneven ground. Additionally, we propose gaining compensation to offset emission energy loss due to the transducer's directivity pattern and validate our approach through various experiments, which show significant improvements in temporal consistency in the acoustic images. We implemented a GPU-accelerated software system that operates in real-time with an average execution time of 210ms, meeting autonomous navigation requirements.

6/11/2024