1 | Work hard

Multi-modality Matters: A Performance Leap on VoxCeleb

INTERSPEECH 2020

Adversarial Domain Adaptation for Speaker Verification using Partially Shared Network

INTERSPEECH 2020

Analysis of ABC Submission to NIST SRE 2019 CMN and VAST Challenge

We present a condensed description and analysis of the joint submission of ABC team for NIST SRE 2019, by BUT, CRIM, Phonexia, Omilia and UAM. We concentrate on challenges that arose during development and we analyze the results obtained on the …

But System for the Second Dihard Speech Diarization Challenge

This paper describes the winning systems developed by the BUT team for the four tracks of the Second DIHARD Speech Diarization Challenge, with source code available

Optimizing Bayesian HMM based x-vector clustering for the second DIHARD speech diarization challenge

This paper presents an analysis of our diarization system winning the second DIHARD speech diarization challenge, track 1. This system is based on clustering x-vector speaker embeddings extracted every 0.25s from short segments of the input …

Text Adaptation for Speaker Verification with Speaker-Text Factorized Embeddings

We proposed the text-adptation speaker verification task and an intital solution called Speaker-text factorization network, which could deal with different text-mismatch conditions

Channel Invariant Speaker Embedding Learning with Joint Multi-Task and Adversarial Training

Using deep neural network to extract speaker embedding has significantly improved the speaker verification task. However, such embeddings are still vulnerable to channel variability. Previous works have used adversarial training to suppress channel …

Investigation of Specaugment for Deep Speaker Embedding Learning

SpecAugment is a newly proposed data augmentation method for speech recognition. By randomly masking bands in the log Mel spectogram this method leads to impressive performance improvements. In this paper, we investigate the usage of SpecAugment for …

Margin matters: Towards more discriminative deep neural network embeddings for speaker recognition

Recently, speaker embeddings extracted from a speaker discriminative deep neural network (DNN) yield better performance than the conventional methods such as i-vector. In most cases, the DNN speaker classifier is trained using cross entropy loss with …

Knowledge Distillation for Small Foot-print Deep Speaker Embedding

Deep speaker embedding learning is an effective method for speaker identity modelling. Very deep models such as ResNet can achieve remarkable results but are usually too computationally expensive for real applications with limited resources. On the …