Joon Son Chung

Int. J. Comput. Vis., May, 2026

SpeakerLLM: A Speaker-Specialized Audio-LLM for Speaker Understanding and Verification Reasoning.

[BibT_eX]

[DOI]

CoRR, May, 2026

FiTS: Interpretable Spiking Neurons via Frequency Selectivity and Temporal Shaping.

[BibT_eX]

[DOI]

Jongmin Choi

CoRR, May, 2026

Keep What Audio Cannot Say: Context-Preserving Token Pruning for Omni-LLMs.

[BibT_eX]

[DOI]

Chaeyoung Jung

Kyeongha Rho

CoRR, May, 2026

Probing Cross-modal Information Hubs in Audio-Visual LLMs.

[BibT_eX]

[DOI]

CoRR, May, 2026

Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., April, 2026

LRS-VoxMM: A benchmark for in-the-wild audio-visual speech recognition.

[BibT_eX]

[DOI]

CoRR, April, 2026

Seeing Through Touch: Tactile-Driven Visual Localization of Material Regions.

[BibT_eX]

[DOI]

CoRR, April, 2026

Cinematic Audio Source Separation Using Visual Cues.

[BibT_eX]

[DOI]

CoRR, March, 2026

Plug-and-Steer: Decoupling Separation and Selection in Audio-Visual Target Speaker Extraction.

[BibT_eX]

[DOI]

Doyeop Kwak

Suyeon Lee

CoRR, March, 2026

On the Nature of Attention Sink that Shapes Decoding Strategy in MLLMs.

[BibT_eX]

[DOI]

Suho Yoo

CoRR, March, 2026

FastAV: Efficient Token Pruning for Audio-Visual Large Language Model Inference.

[BibT_eX]

[DOI]

CoRR, January, 2026

UNMIXX: Untangling Highly Correlated Singing Voices Mixtures.

[BibT_eX]

[DOI]

CoRR, January, 2026

LAMB: LLM-based Audio Captioning with Modality Gap Bridging via Cauchy-Schwarz Divergence.

[BibT_eX]

[DOI]

CoRR, January, 2026

MMAU-Pro: A Challenging and Comprehensive Benchmark for Holistic Evaluation of Audio General Intelligence.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

TAVID: Text-Driven Audio-Visual Interactive Dialogue Generation.

[BibT_eX]

[DOI]

CoRR, December, 2025

Lost in Translation, Found in Embeddings: Sign Language Translation and Alignment.

[BibT_eX]

[DOI]

CoRR, December, 2025

Video Diffusion Models Excel at Tracking Similar-Looking Objects Without Supervision.

[BibT_eX]

[DOI]

CoRR, December, 2025

MoLT: Mixture of Layer-Wise Tokens for Efficient Audio-Visual Learning.

[BibT_eX]

[DOI]

CoRR, December, 2025

Model-Guided Dual-Role Alignment for High-Fidelity Open-Domain Video-to-Audio Generation.

[BibT_eX]

[DOI]

CoRR, October, 2025

Two Heads Are Better Than One: Audio-Visual Speech Error Correction with Dual Hypotheses.

[BibT_eX]

[DOI]

CoRR, October, 2025

Diffusion-Link: Diffusion Probabilistic Model for Bridging the Audio-Text Modality Gap.

[BibT_eX]

[DOI]

CoRR, October, 2025

Toward Interactive Sound Source Localization: Better Align Sight and Sound!

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., September, 2025

SPADE: Structured Pruning and Adaptive Distillation for Efficient LLM-TTS.

[BibT_eX]

[DOI]

CoRR, September, 2025

MAGE: A Coarse-to-Fine Speech Enhancer with Masked Generative Model.

[BibT_eX]

[DOI]

CoRR, September, 2025

MambaVideo for Discrete Video Tokenization with Channel-Split Quantization.

[BibT_eX]

[DOI]

CoRR, July, 2025

EDNet: A Distortion-Agnostic Speech Enhancement Framework with Gating Mamba Mechanism and Phase Shift-Invariant Training.

[BibT_eX]

[DOI]

CoRR, June, 2025

Fork-Merge Decoding: Enhancing Multimodal Understanding in Audio-Visual Large Language Models.

[BibT_eX]

[DOI]

CoRR, May, 2025

AVCD: Mitigating Hallucinations in Audio-Visual Large Language Models through Contrastive Decoding.

[BibT_eX]

[DOI]

Chaeyoung Jung

CoRR, May, 2025

Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment.

[BibT_eX]

[DOI]

CoRR, May, 2025

Seeing Speech and Sound: Distinguishing and Locating Audios in Visual Scenes.

[BibT_eX]

[DOI]

CoRR, March, 2025

Deep Understanding of Sign Language for Sign to Subtitle Alignment.

[BibT_eX]

[DOI]

CoRR, March, 2025

AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

SEED: Speaker Embedding Enhancement Diffusion Model.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

The Text-to-speech in the Wild (TITW) Database.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

InfiniteAudio: Infinite-Length Audio Generation with Consistency.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Accelerating Diffusion-based Text-to-Speech Model Trainingwith Dual Modality Alignment.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

High-Quality Joint Image and Video Tokenization with Causal VAE.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

LAVCap: LLM-based Audio-Visual Captioning using Optimal Transport.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

AdaptVC: High Quality Voice Conversion with Adaptive Learning.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

VoiceDiT: Dual-Condition Diffusion Transformer for Environment-Aware Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

V2SFlow: Video-to-Speech Generation with Speech Decomposition and Rectified Flow.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Test-Time Augmentation for Pose-invariant Face Recognition.

[BibT_eX]

[DOI]

Jaemin Jung

Proceedings of the 19th IEEE International Conference on Automatic Face and Gesture Recognition, 2025

Dub-S2ST: Textless Speech-to-Speech Translation for Seamless Dubbing.

[BibT_eX]

[DOI]

Jeongsoo Choi

Jaehun Kim

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

Seeing Speech and Sound: Distinguishing and Locating Audio Sources in Visual Scenes.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024

The VoxCeleb Speaker Recognition Challenge: A Retrospective.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Bridging the Gap Between Audio and Text Using Parallel-Attention for User-Defined Keyword Spotting.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2024

Audio Mamba: Bidirectional State Space Model for Audio Representation Learning.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2024

CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation.

[BibT_eX]

[DOI]

CoRR, 2024

SpoofCeleb: Speech Deepfake Detection and SASV In The Wild.

[BibT_eX]

[DOI]

CoRR, 2024

Text-To-Speech Synthesis In The Wild.

[BibT_eX]

[DOI]

CoRR, 2024

Aligning Sight and Sound: Advanced Sound Source Localization Through Audio-Visual Alignment.

[BibT_eX]

[DOI]

CoRR, 2024

To what extent can ASV systems naturally defend against spoofing attacks?

[BibT_eX]

[DOI]

CoRR, 2024

Can CLIP Help Sound Source Localization?

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

Multimodal Learning of Speech and Speaker Representations.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2024: The Speaker and Language Recognition Workshop, 2024

Let Me Finish My Sentence: Video Temporal Grounding with Holistic Text Understanding.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Disentangled Representation Learning for Environment-agnostic Speaker Recognition.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Lightweight Audio Segmentation for Long-form Speech Translation.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

FlowAVSE: Efficient Audio-Visual Speech Enhancement with Conditional Flow Matching.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

To what extent can ASV systems naturally defend against spoofing attacks?

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

ElasticAST: An Audio Spectrogram Transformer for All Length and Resolutions.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

VoxSim: A perceptual voice similarity dataset.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

EquiAV: Leveraging Equivariance for Audio-Visual Contrastive Learning.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Speech Guided Masked Image Modeling for Visually Grounded Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Fregrad: Lightweight and Fast Frequency-Aware Diffusion Vocoder.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

VoiceLDM: Text-to-Speech with Environmental Context.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Seeing Through The Conversation: Audio-Visual Speech Separation Based on Diffusion Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

VoxMM: Rich Transcription of Conversations in the Wild.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

TalkNCE: Improving Active Speaker Detection with Talk-Aware Contrastive Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Rethinking Session Variability: Leveraging Session Embeddings for Session Robustness in Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

From Coarse to Fine: Efficient Training for Audio Spectrogram Transformers.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Slowfast Network for Continuous Sign Language Recognition.

[BibT_eX]

[DOI]

Junseok Ahn

Proceedings of the IEEE International Conference on Acoustics, 2024

Faces that Speak: Jointly Synthesising Talking Face and Speech from Text.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Towards Automated Movie Trailer Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Scaling Up Video Summarization Pretraining with Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Let There Be Sound: Reconstructing High Quality Speech from Silent Videos.

[BibT_eX]

[DOI]

Ji-Hoon Kim

Jaehun Kim

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

VoxSRC 2022: The Fourth VoxCeleb Speaker Recognition Challenge.

[BibT_eX]

[DOI]

CoRR, 2023

That's What I Said: Fully-Controllable Talking Face Generation.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Disentangled Representation Learning for Multilingual Speaker Recognition.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Curriculum Learning for Self-supervised Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

FlexiAST: Flexibility is What AST Needs.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Sound Source Localization is All about Cross-Modal Alignment.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Hindi as a Second Language: Improving Visually Grounded Speech with Semantically Similar Samples.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

MarginNCE: Robust Sound Localization with a Negative Margin.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Imaginary Voice: Face-Styled Diffusion Model for Text-to-Speech.

[BibT_eX]

[DOI]

Jiyoung Lee

Proceedings of the IEEE International Conference on Acoustics, 2023

Advancing the Dimensionality Reduction of Speaker Embeddings for Speaker Diarisation: Disentangling Noise and Informing Speech Activity.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Metric Learning for User-Defined Keyword Spotting.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

In Search of Strong Embedding Extractors for Speaker Diarisation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Self-Sufficient Framework for Continuous Sign Language Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

2022

Deep Audio-Visual Speech Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2022

Augmentation Adversarial Training for Self-Supervised Speaker Representation Learning.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Signal Process., 2022

Disentangled representation learning for multilingual speaker recognition.

[BibT_eX]

[DOI]

CoRR, 2022

Large-scale learning of generalised representations for speaker recognition.

[BibT_eX]

[DOI]

CoRR, 2022

VoxSRC 2021: The Third VoxCeleb Speaker Recognition Challenge.

[BibT_eX]

[DOI]

CoRR, 2022

Baseline Systems for the First Spoofing-Aware Speaker Verification Challenge: Score and Embedding Fusion.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022

Pushing the limits of raw waveform speaker recognition.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Multi-Scale Speaker Embedding-Based Graph Attention Networks For Speaker Diarisation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Spell My Name: Keyword Boosted Speech Recognition.

[BibT_eX]

[DOI]

Namkyu Jung

Geonmin Kim

Proceedings of the IEEE International Conference on Acoustics, 2022

AASIST: Audio Anti-Spoofing Using Integrated Spectro-Temporal Graph Attention Networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Signing Outside the Studio: Benchmarking Background Robustness for Continuous Sign Language Recognition.

[BibT_eX]

[DOI]

Proceedings of the 33rd British Machine Vision Conference 2022, 2022

2021

Disentangled dimensionality reduction for noise-robust speaker diarisation.

[BibT_eX]

[DOI]

CoRR, 2021

Cross Attentive Pooling for Speaker Verification.

[BibT_eX]

[DOI]

Seong Min Kye

Yoohwan Kwon

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Supervised Attention for Speaker Recognition.

[BibT_eX]

[DOI]

Seong Min Kye

Hoirin Kim

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Look Who's Not Talking.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Metric Learning for Keyword Spotting.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Adapting Speaker Embeddings for Speaker Diarisation.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Look Who's Talking: Active Speaker Detection in the Wild.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Three-Class Overlapped Speech Detection Using a Convolutional Recurrent Neural Network.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

The ins and outs of speaker recognition: lessons from VoxSRC 2020.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Graph Attention Networks for Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Playing a Part: Speaker Verification at the movies.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

Perfect Match: Self-Supervised Embeddings for Cross-Modal Retrieval.

[BibT_eX]

[DOI]

Hong-Goo Kang

IEEE J. Sel. Top. Signal Process., 2020

Voxceleb: Large-scale speaker verification in the wild.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2020

VoxSRC 2020: The Second VoxCeleb Speaker Recognition Challenge.

[BibT_eX]

[DOI]

CoRR, 2020

Clova Baseline System for the VoxCeleb Speaker Recognition Challenge 2020.

[BibT_eX]

[DOI]

CoRR, 2020

Augmentation adversarial training for unsupervised speaker recognition.

[BibT_eX]

[DOI]

CoRR, 2020

Delving into VoxCeleb: Environment Invariant Speaker Recognition.

[BibT_eX]

[DOI]

Jaesung Huh

Seongkyu Mun

Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

Seeing Voices and Hearing Voices: Learning Discriminative Embeddings Using Cross-Modal Self-Supervision.

[BibT_eX]

[DOI]

Hong-Goo Kang

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Spot the Conversation: Speaker Diarisation in the Wild.

[BibT_eX]

[DOI]

Jaesung Huh

Arsha Nagrani

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

In Defence of Metric Learning for Speaker Recognition.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

FaceFilter: Audio-Visual Speech Separation Using Still Images.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Now You're Speaking My Language: Visual Language Identification.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Disentangled Speech Embeddings Using Cross-Modal Self-Supervision.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

The Sound of My Voice: Speaker Representation Loss for Target Voice Separation.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

ASR is All You Need: Cross-Modal Distillation for Lip Reading.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

BSL-1K: Scaling Up Co-articulated Sign Language Recognition Using Mouthing Cues.

[BibT_eX]

[DOI]

Samuel Albanie

Gül Varol

Liliane Momeni

Neil Fox

Proceedings of the Computer Vision - ECCV 2020, 2020

Self-supervised Learning of Audio-Visual Objects from Video.

[BibT_eX]

[DOI]

Andrew Owens

Proceedings of the Computer Vision - ECCV 2020, 2020

2019

You Said That?: Synthesising Talking Faces from Audio.

[BibT_eX]

[DOI]

Amir Jamaludin

Int. J. Comput. Vis., 2019

VoxSRC 2019: The first VoxCeleb Speaker Recognition Challenge.

[BibT_eX]

[DOI]

CoRR, 2019

Naver at ActivityNet Challenge 2019 - Task B Active Speaker Detection (AVA).

[BibT_eX]

[DOI]

CoRR, 2019

Who Said That?: Audio-Visual Speaker Diarisation of Real-World Meetings.

[BibT_eX]

[DOI]

Bong-Jin Lee

Icksang Han

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

My Lips Are Concealed: Audio-Visual Speech Enhancement Through Obstructions.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Utterance-level Aggregation for Speaker Recognition in the Wild.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Perfect Match: Improved Cross-modal Embeddings for Audio-visual Synchronisation.

[BibT_eX]

[DOI]

Hong-Goo Kang

Proceedings of the IEEE International Conference on Acoustics, 2019

2018

Learning to lip read words by watching videos.

[BibT_eX]

[DOI]

Comput. Vis. Image Underst., 2018

LRS3-TED: a large-scale dataset for visual speech recognition.

[BibT_eX]

[DOI]

CoRR, 2018

VoxCeleb2: Deep Speaker Recognition.

[BibT_eX]

[DOI]

Arsha Nagrani

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Deep Lip Reading: A Comparison of Models and an Online Application.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

The Conversation: Deep Audio-Visual Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

2017

Visual recognition of human communication.

[BibT_eX]

[DOI]

PhD thesis, 2017

VoxCeleb: A Large-Scale Speaker Identification Dataset.

[BibT_eX]

[DOI]

Arsha Nagrani

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Lip Reading Sentences in the Wild.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Lip Reading in Profile.

[BibT_eX]

[DOI]

Proceedings of the British Machine Vision Conference 2017, 2017

You said that?

[BibT_eX]

[DOI]

Amir Jamaludin

Proceedings of the British Machine Vision Conference 2017, 2017

2016

Signs in time: Encoding human motion as a temporal image.

[BibT_eX]

[DOI]

CoRR, 2016

Out of Time: Automated Lip Sync in the Wild.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ACCV 2016 Workshops, 2016

Lip Reading in the Wild.

[BibT_eX]

[DOI]