Publications

(2024). Leveraging In-the-Wild Data for Effective Self-Supervised Pretraining in Speaker Recognition. ICASSP 2024 (Accepted).

PDF Code

(2024). DualVC 2: Dynamic Masked Convolution for Unified Streaming and Non-Streaming Voice Conversion. ICASSP 2024 (Accepted).

(2024). AutoPrep: An Automatic Preprocessing Framework for In-the-Wild Speech Data. ICASSP 2024 (Accepted).

(2024). Audio-Visual Active Speaker Extraction for Sparsely Overlapped Multi-talker Speech. ICASSP 2024 (Accepted).

(2024). UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding. AAAI 2024 (Accepted).

PDF Code Project

(2023). Attention-based Encoder-Decoder Network for End-to-End Neural Speaker Diarization with Target Speaker Attractor. Interspeech 2023.

(2023). Adversarial Speaker Disentanglement Using Unannotated External Data for Self-supervised Representation Based Voice Conversion. Interspeech 2023.

(2023). Wespeaker: A Research and Production oriented Speaker Embedding Learning Toolkit. ICASSP 2023.

PDF Code

(2022). Context-aware Multimodal Fusion for Emotion Recognition. Interspeech 2022.

(2022). DF-ResNet: Boosting Speaker Verification Performance with Depth-First Design. Interspeech 2022.

(2022). On the Importance of Different Frequency Bins for Speaker Verification. ICASSP 2022.

(2022). Self-Knowledge Distillation via Feature Enhancement for Speaker Verification. ICASSP 2022.

(2021). Non-Parallel Any-to-Many Voice Conversion by Replacing Speaker Statistics. Interspeech 2021.

PDF

(2021). Voice activity detection in the wild: A data-driven approach using teacher-student training. TASLP 2021.

PDF Code DOI

(2021). Speaker Embedding Augmentation with Noise Distribution Matching.. ISCSLP 2021.

(2021). Revisiting the Statistics Pooling Layer in Deep Speaker Embedding Learning. ISCSLP 2021.

(2021). SELF-SUPERVISED LEARNING BASED DOMAIN ADAPTATION FOR ROBUST SPEAKER VERIFICATION. ICASSP 2021.

(2021). SYNAUG:SYNTHESIS-BASED DATA AUGMENTATION FOR TEXT-DEPENDENT SPEAKER VERIFICATION. ICASSP 2021.

(2021). Unit Selection Synthesis based Data Augmentation for Fixed Phrase Speaker Verification. ICASSP 2021.

(2021). Audio-Visual Deep Neural Network for Robust Person Verification. TASLP 2021.

(2020). Data Augmentation using Deep Generative Models for Embedding based Speaker Recognition. TASLP 2020.

PDF DOI

(2020). Dual-adversarial domain adaptation for generalized replay attack detection. Interspeech 2020.

(2020). Multi-modality Matters: A Performance Leap on VoxCeleb. Interspeech 2020.

(2020). Adversarial Domain Adaptation for Speaker Verification using Partially Shared Network. Interspeech 2020.

(2020). Analysis of ABC Submission to NIST SRE 2019 CMN and VAST Challenge. Odyssey 2020.

PDF

(2020). But System for the Second Dihard Speech Diarization Challenge. ICASSP 2020.

PDF Code DOI

(2020). Optimizing Bayesian HMM based x-vector clustering for the second DIHARD speech diarization challenge. ICASSP 2020.

PDF Code DOI

(2020). Text Adaptation for Speaker Verification with Speaker-Text Factorized Embeddings. ICASSP 2020.

PDF DOI

(2020). Channel Invariant Speaker Embedding Learning with Joint Multi-Task and Adversarial Training. ICASSP 2020.

PDF DOI

(2020). Investigation of Specaugment for Deep Speaker Embedding Learning. ICASSP 2020.

PDF DOI

(2019). BUT system description to voxceleb speaker recognition challenge 2019.

Preprint Code

(2019). Margin matters: Towards more discriminative deep neural network embeddings for speaker recognition. APSIPA 2019.

PDF DOI

(2019). Knowledge Distillation for Small Foot-print Deep Speaker Embedding. ICASSP 2019.

PDF DOI

(2019). End-to-End Speaker-Dependent Voice Activity Detection. In The 15th National Conference on Man-Machine Speech Communication (NCMMSC2019), Xining, Qinghai, China, 2019..

PDF

(2019). The SJTU Robust Anti-Spoofing System for the ASVspoof 2019 Challenge. Interspeech 2019.

PDF DOI

(2019). Bayesian HMM Based x-Vector Clustering for Speaker Diarization. Interspeech 2019.

PDF DOI

(2019). Cross-Domain Replay Spoofing Attack Detection Using Domain Adversarial Training. Interspeech 2019.

(2019). Data Augmentation Using Variational Autoencoder for Embedding Based Speaker Verification. Interspeech 2019.

PDF DOI

(2019). On the Usage of Phonetic Information for Text-independent Speaker Embedding Extraction. Interspeech 2019.

PDF DOI

(2018). Past review, current progress, and challenges ahead on the cocktail party problem. FITEE 2018.

(2018). Angular Softmax for Short-Duration Text-independent Speaker Verification.. Interspeech 2018.

PDF

(2018). Covariance based deep feature for text-dependent speaker verification. IScIDE 2018.

PDF

(2018). Deep discriminant analysis for i-vector based robust speaker recognition. ISCSLP 2018.

DOI

(2017). Integrating Online i-vector into GMM-UBM for Text-dependent Speaker Verification. APSIPA 2017.

PDF DOI

(2017). What Does the Speaker Embedding Encode?. InterSpeech 2017.

PDF DOI