2

Voice activity detection in the wild: A data-driven approach using teacher-student training

Leaveraging weak-labeled data for voice activity detection in a teacher-student manner

Audio-Visual Deep Neural Network for Robust Person Verification

An investigation of combining audio and visual information for person identity verification

Data Augmentation using Deep Generative Models for Embedding based Speaker Recognition

Data augmentation is an effective method to improve the robustness of embedding based speaker verification systems, which could be applied to either the front-end speaker embedding extractor or the back-end PLDA. Different from the conventional …

Discriminative Neural Embedding Learning for Short-Duration Text-Independent Speaker Verification

Short duration text-independent speaker verification remains a hot research topic in recent years, and deep neural network based embeddings have shown impressive results in such conditions. Good speaker embeddings require the property of both small …

Past review, current progress, and challenges ahead on the cocktail party problem

The cocktail party problem, i.e., tracing and recognizing the speech of a specific speaker when multiple speakers talk simultaneously, is one of the critical problems yet to be solved to enable the wide application of automatic speech recognition …