Generative Adversarial Networks based X-vector Augmentation for Robust Probabilistic Linear Discriminant Analysis in Speaker Verification


Data augmentation is an effective method to increase the quantity of training data, which improves the model’s robustness and generalization ability. In this paper, we propose a generative adversarial network (GAN) based data augmentation approach for probabilistic linear discriminant analysis (PLDA), which is a standard back-end for state-of-the-art x-vector based speaker verification system. Instead of generating new spectral feature samples, a conditional Wasserstein GAN is adopted to directly generate x-vectors. Experiments are carried out on the standard NIST SRE 2016 evaluation dataset. Compared to manually adding noise, the GAN augmented PLDA achieves better performance and this performance can be further boosted when combined with manual augmented data. EER of 11.68% and 4.43% were obtained for Tagalog and Cantonese evaluation condition, respectively.

In the 11th International Symposium on Chinese Spoken Language Processing(ISCSLP), Taipei, Taiwan, China, 2018.