ASASVIcomtech: The Vicomtech-UGR Speech Deepfake Detection and SASV Systems for the ASVspoof5 Challenge

Read original: arXiv:2408.10361 - Published 8/21/2024 by Juan M. Mart'in-Do~nas, Eros Rosell'o, Angel M. Gomez, Aitor 'Alvarez, Iv'an L'opez-Espejo, Antonio M. Peinado

ASASVIcomtech: The Vicomtech-UGR Speech Deepfake Detection and SASV Systems for the ASVspoof5 Challenge

Overview

The paper presents the Vicomtech-UGR speech deepfake detection and speaker verification (SASV) systems for the ASVspoof5 challenge.
The ASVspoof5 dataset is a newly released dataset for evaluating speech deepfake detection and speaker verification.
The paper analyzes the dataset and describes the authors' two systems: one for speech deepfake detection and one for SASV (Spoofing-Aware Speaker Verification).

Plain English Explanation

The paper discusses two systems developed by the Vicomtech-UGR team for the ASVspoof5 challenge. The ASVspoof5 dataset is a new dataset that includes speech samples that are either real or have been manipulated using AI (known as "deepfakes"). The first system is designed to detect whether a given speech sample is real or a deepfake. The second system is for speaker verification, which means determining if a speech sample is from a specific person. This second system is "spoofing-aware", which means it can distinguish between real speech and deepfakes.

The paper provides an analysis of the ASVspoof5 dataset, looking at the different types of manipulations used to create the deepfake samples. It then describes the technical details of the two systems developed by the authors. The speech deepfake detection system uses machine learning models to analyze the audio and identify whether it is real or fake. The SASV system combines speaker verification with deepfake detection to verify the identity of the speaker while also checking if the speech is authentic.

Technical Explanation

The paper presents two systems developed by the Vicomtech-UGR team for the ASVspoof5 challenge:

Speech Deepfake Detection System: This system uses machine learning models to analyze speech samples and determine if they are real or deepfakes. The authors experimented with different model architectures, including temporal variability modeling and model ensembling, to improve the detection performance.
Spoofing-Aware Speaker Verification (SASV) System: This system combines speaker verification with deepfake detection to verify the identity of the speaker while also checking if the speech is authentic. The authors used a multi-task learning approach, where the system was trained to both verify the speaker and detect spoofing.

The paper provides a detailed analysis of the ASVspoof5 dataset, including the different types of manipulations used to create the deepfake samples. This analysis helps the authors understand the challenges and characteristics of the dataset, which informs the development of their systems.

Critical Analysis

The paper presents a comprehensive approach to addressing the challenges of speech deepfake detection and speaker verification on the ASVspoof5 dataset. The authors have carefully designed and evaluated their systems, exploring various model architectures and techniques to improve performance.

One potential limitation of the research is the reliance on the ASVspoof5 dataset, which may not fully capture the diversity and complexity of real-world deepfake scenarios. As the authors acknowledge, further validation on more diverse datasets would be valuable to assess the generalizability of their systems.

Additionally, the paper does not provide a detailed discussion of the computational and resource requirements of the proposed systems, which could be an important consideration for practical deployment in real-world applications.

Overall, the research represents a significant contribution to the field of speech deepfake detection and speaker verification, and the insights and techniques presented could serve as a foundation for further advancements in this area.

Conclusion

The Vicomtech-UGR team has developed two innovative systems for the ASVspoof5 challenge: a speech deepfake detection system and a spoofing-aware speaker verification (SASV) system. These systems leverage advanced machine learning techniques to effectively identify manipulated speech samples and verify the identity of speakers, addressing an important challenge in the field of audio source tracing.

The detailed analysis of the ASVspoof5 dataset and the technical explanations provided in the paper offer valuable insights for researchers and practitioners working in the area of speech deepfake detection and speaker verification. The critical analysis highlights the need for further validation on diverse datasets and consideration of practical deployment factors, but the overall contribution of this work is significant and paves the way for continued advancements in this rapidly evolving field.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

ASASVIcomtech: The Vicomtech-UGR Speech Deepfake Detection and SASV Systems for the ASVspoof5 Challenge

Juan M. Mart'in-Do~nas, Eros Rosell'o, Angel M. Gomez, Aitor 'Alvarez, Iv'an L'opez-Espejo, Antonio M. Peinado

This paper presents the work carried out by the ASASVIcomtech team, made up of researchers from Vicomtech and University of Granada, for the ASVspoof5 Challenge. The team has participated in both Track 1 (speech deepfake detection) and Track 2 (spoofing-aware speaker verification). This work started with an analysis of the challenge available data, which was regarded as an essential step to avoid later potential biases of the trained models, and whose main conclusions are presented here. With respect to the proposed approaches, a closed-condition system employing a deep complex convolutional recurrent architecture was developed for Track 1, although, unfortunately, no noteworthy results were achieved. On the other hand, different possibilities of open-condition systems, based on leveraging self-supervised models, augmented training data from previous challenges, and novel vocoders, were explored for both tracks, finally achieving very competitive results with an ensemble system.

8/21/2024

BUT Systems and Analyses for the ASVspoof 5 Challenge

Johan Rohdin, Lin Zhang, Oldv{r}ich Plchot, Vojtv{e}ch Stanv{e}k, David Mihola, Junyi Peng, Themos Stafylakis, Dmitriy Beveraki, Anna Silnova, Jan Brukner, Luk'av{s} Burget

This paper describes the BUT submitted systems for the ASVspoof 5 challenge, along with analyses. For the conventional deepfake detection task, we use ResNet18 and self-supervised models for the closed and open conditions, respectively. In addition, we analyze and visualize different combinations of speaker information and spoofing information as label schemes for training. For spoofing-robust automatic speaker verification (SASV), we introduce effective priors and propose using logistic regression to jointly train affine transformations of the countermeasure scores and the automatic speaker verification scores in such a way that the SASV LLR is optimized.

8/22/2024

USTC-KXDIGIT System Description for ASVspoof5 Challenge

Yihao Chen, Haochen Wu, Nan Jiang, Xiang Xia, Qing Gu, Yunqi Hao, Pengfei Cai, Yu Guan, Jialong Wang, Weilin Xie, Lei Fang, Sian Fang, Yan Song, Wu Guo, Lin Liu, Minqiang Xu

This paper describes the USTC-KXDIGIT system submitted to the ASVspoof5 Challenge for Track 1 (speech deepfake detection) and Track 2 (spoofing-robust automatic speaker verification, SASV). Track 1 showcases a diverse range of technical qualities from potential processing algorithms and includes both open and closed conditions. For these conditions, our system consists of a cascade of a frontend feature extractor and a back-end classifier. We focus on extensive embedding engineering and enhancing the generalization of the back-end classifier model. Specifically, the embedding engineering is based on hand-crafted features and speech representations from a self-supervised model, used for closed and open conditions, respectively. To detect spoof attacks under various adversarial conditions, we trained multiple systems on an augmented training set. Additionally, we used voice conversion technology to synthesize fake audio from genuine audio in the training set to enrich the synthesis algorithms. To leverage the complementary information learned by different model architectures, we employed activation ensemble and fused scores from different systems to obtain the final decision score for spoof detection. During the evaluation phase, the proposed methods achieved 0.3948 minDCF and 14.33% EER in the close condition, and 0.0750 minDCF and 2.59% EER in the open condition, demonstrating the robustness of our submitted systems under adversarial conditions. In Track 2, we continued using the CM system from Track 1 and fused it with a CNN-based ASV system. This approach achieved 0.2814 min-aDCF in the closed condition and 0.0756 min-aDCF in the open condition, showcasing superior performance in the SASV system.

9/4/2024

ASVspoof 5: Crowdsourced Speech Data, Deepfakes, and Adversarial Attacks at Scale

Xin Wang, Hector Delgado, Hemlata Tak, Jee-weon Jung, Hye-jin Shim, Massimiliano Todisco, Ivan Kukanov, Xuechen Liu, Md Sahidullah, Tomi Kinnunen, Nicholas Evans, Kong Aik Lee, Junichi Yamagishi

ASVspoof 5 is the fifth edition in a series of challenges that promote the study of speech spoofing and deepfake attacks, and the design of detection solutions. Compared to previous challenges, the ASVspoof 5 database is built from crowdsourced data collected from a vastly greater number of speakers in diverse acoustic conditions. Attacks, also crowdsourced, are generated and tested using surrogate detection models, while adversarial attacks are incorporated for the first time. New metrics support the evaluation of spoofing-robust automatic speaker verification (SASV) as well as stand-alone detection solutions, i.e., countermeasures without ASV. We describe the two challenge tracks, the new database, the evaluation metrics, baselines, and the evaluation platform, and present a summary of the results. Attacks significantly compromise the baseline systems, while submissions bring substantial improvements.

8/19/2024