1

End-to-End Speaker-Dependent Voice Activity Detection

Voice activity detection (VAD) is an essential pre-processing step for tasks such as automatic speech recognition (ASR) and speaker recognition. A basic goal is to remove silent segments within an audio, while a more general VAD system could remove …

The SJTU Robust Anti-Spoofing System for the ASVspoof 2019 Challenge

The robustness of an anti-spoofing system is progressively more important in order to develop a reliable speaker verification system. Previous challenges and datasets mainly focus on a specific type of spoofing attacks. The ASVspoof 2019 edition is …

Bayesian HMM Based x-Vector Clustering for Speaker Diarization

This paper presents a simplified version of the previously proposed diarization algorithm based on Bayesian Hidden Markov Models, which uses Variational Bayesian inference for very fast and robust clustering of x-vector (neural network based speaker …

Cross-Domain Replay Spoofing Attack Detection Using Domain Adversarial Training

Replay spoofing attacks are a major threat for speaker verification systems. Although many anti-spoofing systems or countermeasures are proposed to detect dataset-specific replay attacks with promising performance, they generalize poorly when applied …

Data Augmentation Using Variational Autoencoder for Embedding Based Speaker Verification

Domain or environment mismatch between training and testing, such as various noises and channels, is a major challenge for speaker verification. In this paper, a variational autoencoder (VAE) is designed to learn the patterns of speaker embeddings …

On the Usage of Phonetic Information for Text-independent Speaker Embedding Extraction

We proposed the segment-level representation for phonetic information and the corresponding segment-level multi-task/adversarial training framework, we revisited the usage the phonetic information for the text-independent embedding learning and designed experiments to verify the assumption: For TI-SV, it could be benificial to remove the phonetic variation in the final speaker embeddings

Angular Softmax for Short-Duration Text-independent Speaker Verification.

Recently, researchers propose to build deep learning based endto-end speaker verification (SV) systems and achieve competitive results compared with the standard i-vector approach. In addition to deep learning architectures, optimization metric such …

Covariance based deep feature for text-dependent speaker verification

d-vector approach achieved impressive results in speaker verification. Representation is obtained at utterance level by calculating the mean of the frame level outputs of a hidden layer of the DNN. Although mean based speaker identity representation …

Deep discriminant analysis for i-vector based robust speaker recognition

Linear Discriminant Analysis (LDA) has been used as a standard post-processing procedure in many state-of-the-art speaker recognition tasks. Through maximizing the inter-speaker difference and minimizing the intra-speaker variation, LDA projects …

Generative Adversarial Networks based X-vector Augmentation for Robust Probabilistic Linear Discriminant Analysis in Speaker Verification

Data augmentation is an effective method to increase the quantity of training data, which improves the model's robustness and generalization ability. In this paper, we propose a generative adversarial network (GAN) based data augmentation approach …