Data Augmentation Using Variational Autoencoder for Embedding Based Speaker Verification

Zhanghao Wu, Shuai Wang, Yanmin Qian, Kai Yu

June 2019 SEL

PDF DOI

Abstract

Domain or environment mismatch between training and testing, such as various noises and channels, is a major challenge for speaker verification. In this paper, a variational autoencoder (VAE) is designed to learn the patterns of speaker embeddings extracted from noisy speech segments, including i-vector and x-vector, and generate embeddings with more diversity to improve the robustness of speaker verification systems with probabilistic linear discriminant analysis (PLDA) back-end. The approach is evaluated on the standard NIST SRE 2016 dataset. Compared to manual and generative adversarial network (GAN) based augmentation approaches, the proposed VAE based augmentation achieves a slightly better performance for i-vector on Tagalog and Cantonese with EERs of 15.54% and 7.84%, and a more significant improvement for x-vector on those two languages with EERs of 11.86% and 4.20%.

Type

Conference paper

Publication

In 20th Annual Conference of the International Speech Communication Association (InterSpeech), Graz, Austria, 2019

Data Augmentation Using Variational Autoencoder for Embedding Based Speaker Verification

Abstract

Related