tse

Audio-Visual Active Speaker Extraction for Sparsely Overlapped Multi-talker Speech