Joon Son Chung

Orcid: 0000-0001-7741-7275

According to our database1, Joon Son Chung authored at least 80 papers between 2014 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
EquiAV: Leveraging Equivariance for Audio-Visual Contrastive Learning.
CoRR, 2024

FreGrad: Lightweight and Fast Frequency-aware Diffusion Vocoder.
CoRR, 2024

From Coarse to Fine: Efficient Training for Audio Spectrogram Transformers.
CoRR, 2024

Can CLIP Help Sound Source Localization?
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

Let There Be Sound: Reconstructing High Quality Speech from Silent Videos.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Seeing Through the Conversation: Audio-Visual Speech Separation based on Diffusion Model.
CoRR, 2023

Rethinking Session Variability: Leveraging Session Embeddings for Session Robustness in Speaker Verification.
CoRR, 2023

VoiceLDM: Text-to-Speech with Environmental Context.
CoRR, 2023

TalkNCE: Improving Active Speaker Detection with Talk-Aware Contrastive Learning.
CoRR, 2023

SlowFast Network for Continuous Sign Language Recognition.
CoRR, 2023

FlexiAST: Flexibility is What AST Needs.
CoRR, 2023

VoxSRC 2022: The Fourth VoxCeleb Speaker Recognition Challenge.
CoRR, 2023

That's What I Said: Fully-Controllable Talking Face Generation.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Sound Source Localization is All about Cross-Modal Alignment.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Hindi as a Second Language: Improving Visually Grounded Speech with Semantically Similar Samples.
Proceedings of the IEEE International Conference on Acoustics, 2023

MarginNCE: Robust Sound Localization with a Negative Margin.
Proceedings of the IEEE International Conference on Acoustics, 2023

Imaginary Voice: Face-Styled Diffusion Model for Text-to-Speech.
Proceedings of the IEEE International Conference on Acoustics, 2023

Advancing the Dimensionality Reduction of Speaker Embeddings for Speaker Diarisation: Disentangling Noise and Informing Speech Activity.
Proceedings of the IEEE International Conference on Acoustics, 2023

Metric Learning for User-Defined Keyword Spotting.
Proceedings of the IEEE International Conference on Acoustics, 2023

In Search of Strong Embedding Extractors for Speaker Diarisation.
Proceedings of the IEEE International Conference on Acoustics, 2023

Self-Sufficient Framework for Continuous Sign Language Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

2022
Deep Audio-Visual Speech Recognition.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

Augmentation Adversarial Training for Self-Supervised Speaker Representation Learning.
IEEE J. Sel. Top. Signal Process., 2022

Disentangled representation learning for multilingual speaker recognition.
CoRR, 2022

Large-scale learning of generalised representations for speaker recognition.
CoRR, 2022

VoxSRC 2021: The Third VoxCeleb Speaker Recognition Challenge.
CoRR, 2022

Baseline Systems for the First Spoofing-Aware Speaker Verification Challenge: Score and Embedding Fusion.
Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022

Pushing the limits of raw waveform speaker recognition.
Proceedings of the Interspeech 2022, 2022

Multi-Scale Speaker Embedding-Based Graph Attention Networks For Speaker Diarisation.
Proceedings of the IEEE International Conference on Acoustics, 2022

Spell My Name: Keyword Boosted Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022

AASIST: Audio Anti-Spoofing Using Integrated Spectro-Temporal Graph Attention Networks.
Proceedings of the IEEE International Conference on Acoustics, 2022

Signing Outside the Studio: Benchmarking Background Robustness for Continuous Sign Language Recognition.
Proceedings of the 33rd British Machine Vision Conference 2022, 2022

2021
Disentangled dimensionality reduction for noise-robust speaker diarisation.
CoRR, 2021

Cross Attentive Pooling for Speaker Verification.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Supervised Attention for Speaker Recognition.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Look Who's Not Talking.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Metric Learning for Keyword Spotting.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Adapting Speaker Embeddings for Speaker Diarisation.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Look Who's Talking: Active Speaker Detection in the Wild.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Three-Class Overlapped Speech Detection Using a Convolutional Recurrent Neural Network.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

The ins and outs of speaker recognition: lessons from VoxSRC 2020.
Proceedings of the IEEE International Conference on Acoustics, 2021

Graph Attention Networks for Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2021

Playing a Part: Speaker Verification at the movies.
Proceedings of the IEEE International Conference on Acoustics, 2021

2020
Perfect Match: Self-Supervised Embeddings for Cross-Modal Retrieval.
IEEE J. Sel. Top. Signal Process., 2020

Voxceleb: Large-scale speaker verification in the wild.
Comput. Speech Lang., 2020

VoxSRC 2020: The Second VoxCeleb Speaker Recognition Challenge.
CoRR, 2020

Clova Baseline System for the VoxCeleb Speaker Recognition Challenge 2020.
CoRR, 2020

Augmentation adversarial training for unsupervised speaker recognition.
CoRR, 2020

Delving into VoxCeleb: Environment Invariant Speaker Recognition.
Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

Seeing Voices and Hearing Voices: Learning Discriminative Embeddings Using Cross-Modal Self-Supervision.
Proceedings of the Interspeech 2020, 2020

Spot the Conversation: Speaker Diarisation in the Wild.
Proceedings of the Interspeech 2020, 2020

In Defence of Metric Learning for Speaker Recognition.
Proceedings of the Interspeech 2020, 2020

FaceFilter: Audio-Visual Speech Separation Using Still Images.
Proceedings of the Interspeech 2020, 2020

Now You're Speaking My Language: Visual Language Identification.
Proceedings of the Interspeech 2020, 2020

Disentangled Speech Embeddings Using Cross-Modal Self-Supervision.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

The Sound of My Voice: Speaker Representation Loss for Target Voice Separation.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

ASR is All You Need: Cross-Modal Distillation for Lip Reading.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

BSL-1K: Scaling Up Co-articulated Sign Language Recognition Using Mouthing Cues.
Proceedings of the Computer Vision - ECCV 2020, 2020

Self-supervised Learning of Audio-Visual Objects from Video.
Proceedings of the Computer Vision - ECCV 2020, 2020

2019
You Said That?: Synthesising Talking Faces from Audio.
Int. J. Comput. Vis., 2019

VoxSRC 2019: The first VoxCeleb Speaker Recognition Challenge.
CoRR, 2019

Naver at ActivityNet Challenge 2019 - Task B Active Speaker Detection (AVA).
CoRR, 2019

Who Said That?: Audio-Visual Speaker Diarisation of Real-World Meetings.
Proceedings of the Interspeech 2019, 2019

My Lips Are Concealed: Audio-Visual Speech Enhancement Through Obstructions.
Proceedings of the Interspeech 2019, 2019

Utterance-level Aggregation for Speaker Recognition in the Wild.
Proceedings of the IEEE International Conference on Acoustics, 2019

Perfect Match: Improved Cross-modal Embeddings for Audio-visual Synchronisation.
Proceedings of the IEEE International Conference on Acoustics, 2019

2018
Learning to lip read words by watching videos.
Comput. Vis. Image Underst., 2018

LRS3-TED: a large-scale dataset for visual speech recognition.
CoRR, 2018

VoxCeleb2: Deep Speaker Recognition.
Proceedings of the Interspeech 2018, 2018

Deep Lip Reading: A Comparison of Models and an Online Application.
Proceedings of the Interspeech 2018, 2018

The Conversation: Deep Audio-Visual Speech Enhancement.
Proceedings of the Interspeech 2018, 2018

2017
Visual recognition of human communication.
PhD thesis, 2017

VoxCeleb: A Large-Scale Speaker Identification Dataset.
Proceedings of the Interspeech 2017, 2017

Lip Reading Sentences in the Wild.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Lip Reading in Profile.
Proceedings of the British Machine Vision Conference 2017, 2017

You said that?
Proceedings of the British Machine Vision Conference 2017, 2017

2016
Signs in time: Encoding human motion as a temporal image.
CoRR, 2016

Out of Time: Automated Lip Sync in the Wild.
Proceedings of the Computer Vision - ACCV 2016 Workshops, 2016

Lip Reading in the Wild.
Proceedings of the Computer Vision - ACCV 2016, 2016

2014
Re-presentations of Art Collections.
Proceedings of the Computer Vision - ECCV 2014 Workshops, 2014


  Loading...