Shuai Wang

Research Scientist

SRIBD, CUHK(SZ)

Biography

I obtained my Ph.D. degree in Shanghai Jiao Tong University in 2020.09, under the supervision of Kai Yu and Yanmin Qian. During the Ph.D. my research interests include deep learning based approaches for speaker recognition, speaker diarization and voice activity detection. After my graduation, I joined Tencent Games as a senior researcher, where I (informally) led a speech group and extended the research interest to speech synthesis, voice conversion, music generation and audio retrivial. Currently, I am with the SpeechLab in Shenzhen Research Institute of Big Data, Chinese University of Hong Kong (Shenzhen) led by Haizhou Li.

I am the creator of Wespeaker, a research and product oriented speaker representation learning toolkit. You can check my tutorial to see what you can do using speaker modeling and how to easily apply wespeaker to your tasks. Welcome to use and contribute!

Services: I serve as a regular reviewer for speech/deep learning related conferences/journals: Interspeech, ICASSP, ICME, SPL, TASLP, Neural Networks and Pattern Recognition. I will serve as the publication chair for SLT 2024.

Openings: We are actively seeking self-motivated students to join our team as research assistants, visiting students and potential Ph.D. students. Multiple positions are immediately available in Shenzhen, with competitive salary and benefits. If you are interested, please drop me an email with your CV.

Interests

Voice conversion
Speech synthesis
Speaker recognition
Speaker diarization
Target speech extraction

Education

PhD in Computer Science and Technology, 2020

Shanghai Jiao Tong University
BSc in Software Engineering, 2014

Northwestern Polytechnical University

Experience

Research Scientist

Shenzhen Research Institute of Big Data, Chinese University of Hong Kong (Shenzhen)

May 2023 – Present Shenzhen, China

Senior Research Scientist

Tencent

Oct 2020 – Apr 2023 Shenzhen, China

Led the speech team, working on Speech Synthesis, Voice Conversion, Spoken Emotion Recognition, Audio Generation/Retrivial, Symbolic Music Generation, etc. Serving Products of Tencent Games such as League of legends, Game for Peace, DouDiZhu, etc.

Research assistant

Speech@FIT in Brno University of Technology

Feb 2019 – Oct 2019 Brno, Czech Republic

Work on several research papers and contribute to

The VoxSRC 2019 speaker recognition challenge (1st place in 2 tracks)
The DIHARD 2019 speaker diarization challenge (1st place in 4 tracks)
The NIST SRE 2019 speaker recognition challenge

Recent Talks

Speaker Representation Learning: Theories, Applications and Practice

I was invited to give a tutorial on speaker representation learning at the National Conference on Man-Machine Speech Communication (NCMMSC2023).

Dec 8, 2023 9:00 AM

Shuai Wang, Bing Han

PDF Code NCMMSC Tutorial Link

Speaker Representation Learning: Theories, Applications and Practice

Featured Publications

Chenpeng Du, Yiwei Guo, Feiyu Shen, Zhijun Liu, Zheng Liang, Xie Chen, Shuai Wang, Hui Zhang, Kai Yu

February 2024 AAAI 2024 (Accepted) tts

UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding

AAAI 2024

PDF Code Project

Hongji Wang, Chengdong Liang, Shuai Wang, Zhengyang Chen, Binbin Zhang, Xu Xiang, Yanlei Deng, Yanmin Qian

May 2023 ICASSP 2023 SEL

Wespeaker: A Research and Production oriented Speaker Embedding Learning Toolkit

PDF Code

Yufei Liu, Chengzhu Yu, Shuai Wang, Zhenchuan Yang, Chao Yang, Weibin Zhang

September 2021 Interspeech 2021 vc

Non-Parallel Any-to-Many Voice Conversion by Replacing Speaker Statistics

We introduce the speaker modeling method (statistics based) into the voice conversion

PDF

Federico Landini, Shuai Wang, Mireia Diez, Lukáš Burget, Pavel Matějka, Kateřina Žmolíková, Ladislav Mošner, Anna Silnova, Oldřich Plchot, Ondřej Novotný, Hossein Zeinali, Johan Rohdin

June 2020 ICASSP 2020 diarization

But System for the Second Dihard Speech Diarization Challenge

This paper describes the winning systems developed by the BUT team for the four tracks of the Second DIHARD Speech Diarization Challenge, with source code available

PDF Code DOI

Yexin Yang*, Shuai Wang*, Xun Gong, Yanmin Qian, Kai Yu

June 2020 ICASSP 2020 SEL

Text Adaptation for Speaker Verification with Speaker-Text Factorized Embeddings

We proposed the text-adptation speaker verification task and an intital solution called Speaker-text factorization network, which could deal with different text-mismatch conditions

PDF DOI

Hossein Zeinali, Shuai Wang, Anna Silnova, Pavel Matějka, Oldřich Plchot

June 2019 SEL

BUT system description to voxceleb speaker recognition challenge 2019

This paper describes the winning systems developed by the BUT team for the two tracks of the First VoxSRC Speaker Recognition Challenge, we proposed r-vector in this paper. Update: I lanuched an open-source project wespeaker, where the implementation can be found

Preprint Code

Shuai Wang, Johan Rohdin, Lukáš Burget, Oldřich Plchot, Yanmin Qian, Kai Yu

June 2019 Interspeech 2019 SEL

On the Usage of Phonetic Information for Text-independent Speaker Embedding Extraction

We proposed the segment-level representation for phonetic information and the corresponding segment-level multi-task/adversarial training framework, we revisited the usage the phonetic information for the text-independent embedding learning and designed experiments to verify the assumption: For TI-SV, it could be benificial to remove the phonetic variation in the final speaker embeddings

PDF DOI

Shuai Wang, Yanmin Qian and Kai Yu

June 2017 InterSpeech 2017 SEL

What Does the Speaker Embedding Encode?

The first attempt to systematically analyze the information encoded in speaker embeddings (prior to x-vector), detailed analysis on x-vectors could be refered to the paper Probing the Information Encoded in X-vectors from JHU

PDF DOI

Recent Publications

Quickly discover relevant content by filtering publications.

PDF Code Project

See all publications

Shuai Wang

Research Scientist

SRIBD, CUHK(SZ)

Biography

Interests

Education

Experience

Research Scientist

Shenzhen Research Institute of Big Data, Chinese University of Hong Kong (Shenzhen)

Senior Research Scientist

Tencent

Research assistant

Speech@FIT in Brno University of Technology

Recent Talks

Speaker Representation Learning: Theories, Applications and Practice

Featured Publications

UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding

Wespeaker: A Research and Production oriented Speaker Embedding Learning Toolkit

Non-Parallel Any-to-Many Voice Conversion by Replacing Speaker Statistics

But System for the Second Dihard Speech Diarization Challenge

Text Adaptation for Speaker Verification with Speaker-Text Factorized Embeddings

BUT system description to voxceleb speaker recognition challenge 2019

On the Usage of Phonetic Information for Text-independent Speaker Embedding Extraction

What Does the Speaker Embedding Encode?

Recent Publications

Audio-Visual Active Speaker Extraction for Sparsely Overlapped Multi-talker Speech

AutoPrep: An Automatic Preprocessing Framework for In-the-Wild Speech Data

DualVC 2: Dynamic Masked Convolution for Unified Streaming and Non-Streaming Voice Conversion

Leveraging In-the-Wild Data for Effective Self-Supervised Pretraining in Speaker Recognition

UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding

Contact