SEL

BUT system description to voxceleb speaker recognition challenge 2019

This paper describes the winning systems developed by the BUT team for the two tracks of the First VoxSRC Speaker Recognition Challenge, we proposed r-vector in this paper. Update: I lanuched an open-source project wespeaker, where the implementation can be found

Discriminative Neural Embedding Learning for Short-Duration Text-Independent Speaker Verification

Short duration text-independent speaker verification remains a hot research topic in recent years, and deep neural network based embeddings have shown impressive results in such conditions. Good speaker embeddings require the property of both small …

Margin matters: Towards more discriminative deep neural network embeddings for speaker recognition

Recently, speaker embeddings extracted from a speaker discriminative deep neural network (DNN) yield better performance than the conventional methods such as i-vector. In most cases, the DNN speaker classifier is trained using cross entropy loss with …

Knowledge Distillation for Small Foot-print Deep Speaker Embedding

Deep speaker embedding learning is an effective method for speaker identity modelling. Very deep models such as ResNet can achieve remarkable results but are usually too computationally expensive for real applications with limited resources. On the …

Data Augmentation Using Variational Autoencoder for Embedding Based Speaker Verification

Domain or environment mismatch between training and testing, such as various noises and channels, is a major challenge for speaker verification. In this paper, a variational autoencoder (VAE) is designed to learn the patterns of speaker embeddings …

On the Usage of Phonetic Information for Text-independent Speaker Embedding Extraction

We proposed the segment-level representation for phonetic information and the corresponding segment-level multi-task/adversarial training framework, we revisited the usage the phonetic information for the text-independent embedding learning and designed experiments to verify the assumption: For TI-SV, it could be benificial to remove the phonetic variation in the final speaker embeddings

Angular Softmax for Short-Duration Text-independent Speaker Verification.

Recently, researchers propose to build deep learning based endto-end speaker verification (SV) systems and achieve competitive results compared with the standard i-vector approach. In addition to deep learning architectures, optimization metric such …

Deep discriminant analysis for i-vector based robust speaker recognition

Linear Discriminant Analysis (LDA) has been used as a standard post-processing procedure in many state-of-the-art speaker recognition tasks. Through maximizing the inter-speaker difference and minimizing the intra-speaker variation, LDA projects …

What Does the Speaker Embedding Encode?

The first attempt to systematically analyze the information encoded in speaker embeddings (prior to x-vector), detailed analysis on x-vectors could be refered to the paper Probing the Information Encoded in X-vectors from JHU