Voice activity detection (VAD) is an essential pre-processing step for tasks such as automatic speech recognition (ASR) and speaker recognition. A basic goal is to remove silent segments within an audio, while a more general VAD system could remove …
The robustness of an anti-spoofing system is progressively more important in order to develop a reliable speaker verification system. Previous challenges and datasets mainly focus on a specific type of spoofing attacks. The ASVspoof 2019 edition is …
This paper presents a simplified version of the previously proposed diarization algorithm based on Bayesian Hidden Markov Models, which uses Variational Bayesian inference for very fast and robust clustering of x-vector (neural network based speaker …
Replay spoofing attacks are a major threat for speaker verification systems. Although many anti-spoofing systems or countermeasures are proposed to detect dataset-specific replay attacks with promising performance, they generalize poorly when applied …
Domain or environment mismatch between training and testing, such as various noises and channels, is a major challenge for speaker verification. In this paper, a variational autoencoder (VAE) is designed to learn the patterns of speaker embeddings …
We proposed the segment-level representation for phonetic information and the corresponding segment-level multi-task/adversarial training framework, we revisited the usage the phonetic information for the text-independent embedding learning and designed experiments to verify the assumption: For TI-SV, it could be benificial to remove the phonetic variation in the final speaker embeddings
Recently, researchers propose to build deep learning based endto-end speaker verification (SV) systems and achieve competitive results compared with the standard i-vector approach. In addition to deep learning architectures, optimization metric such …
d-vector approach achieved impressive results in speaker verification. Representation is obtained at utterance level by calculating the mean of the frame level outputs of a hidden layer of the DNN. Although mean based speaker identity representation …
Linear Discriminant Analysis (LDA) has been used as a standard post-processing procedure in many state-of-the-art speaker recognition tasks. Through maximizing the inter-speaker difference and minimizing the intra-speaker variation, LDA projects …
Data augmentation is an effective method to increase the quantity of training data, which improves the model's robustness and generalization ability. In this paper, we propose a generative adversarial network (GAN) based data augmentation approach …