1

Audio-Visual Active Speaker Extraction for Sparsely Overlapped Multi-talker Speech

AutoPrep: An Automatic Preprocessing Framework for In-the-Wild Speech Data

DualVC 2: Dynamic Masked Convolution for Unified Streaming and Non-Streaming Voice Conversion

Leveraging In-the-Wild Data for Effective Self-Supervised Pretraining in Speaker Recognition

UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding

AAAI 2024

Adversarial Speaker Disentanglement Using Unannotated External Data for Self-supervised Representation Based Voice Conversion

Interspeech 2023

Attention-based Encoder-Decoder Network for End-to-End Neural Speaker Diarization with Target Speaker Attractor

Interspeech 2023

Wespeaker: A Research and Production oriented Speaker Embedding Learning Toolkit

Adversarial Speaker Disentanglement Using Unannotated External Data for Self-supervised Representation Based Voice Conversion

ICME 2023

Context-aware Multimodal Fusion for Emotion Recognition

Interspeech 2022