Knowledge Distillation for Small Foot-print Deep Speaker Embedding


Deep speaker embedding learning is an effective method for speaker identity modelling. Very deep models such as ResNet can achieve remarkable results but are usually too computationally expensive for real applications with limited resources. On the other hand, simply reducing model size is likely to result in significant performance degradation. In this paper, label-level and embedding-level knowledge distillation are proposed to narrow down the performance gap between large and small models. Label-level distillation utilizes the posteriors obtained by a well-trained teacher model to guide the opti-mization of the student model, while embedding-level distillation directly constrains the similarity between embeddings learned by two models. Experiments were carried out on the VoxCeleb1 dataset. Results show that the proposed knowledge distillation methods can significantly boost the performance of highly compact student models

IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 2019