Jeongsoo Choi

Orcid: 0009-0005-6817-604X

According to our database¹, Jeongsoo Choi authored at least 28 papers between 2022 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

LRS-VoxMM: A benchmark for in-the-wild audio-visual speech recognition.

[BibT_eX]

[DOI]

CoRR, April, 2026

DiFlowDubber: Discrete Flow Matching for Automated Video Dubbing via Cross-Modal Alignment and Synchronization.

[BibT_eX]

[DOI]

Ngoc Son Nguyen

Thanh V. T. Tran

Jeongsoo Choi

Hieu-Nghia Huynh-Nguyen

Truong-Son Hy

Van Nguyen

CoRR, March, 2026

2025

Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment.

[BibT_eX]

[DOI]

CoRR, May, 2025

Deep Understanding of Sign Language for Sign to Subtitle Alignment.

[BibT_eX]

[DOI]

CoRR, March, 2025

AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Accelerating Diffusion-based Text-to-Speech Model Trainingwith Dual Modality Alignment.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

MAVFlow: Preserving Paralinguistic Elements with Conditional Flow Matching for Zero-Shot AV2AV Multilingual Translation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

V2SFlow: Video-to-Speech Generation with Speech Decomposition and Rectified Flow.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Dub-S2ST: Textless Speech-to-Speech Translation for Seamless Dubbing.

[BibT_eX]

[DOI]

Jeongsoo Choi

Jaehun Kim

Joon Son Chung

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024

AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2024

Textless Unit-to-Unit Training for Many-to-Many Multilingual Speech-to-Speech Translation.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Multilingual Visual Speech Recognition with a Single Model by Learning with Discrete Visual Speech Units.

[BibT_eX]

[DOI]

CoRR, 2024

Exploring Phonetic Context-Aware Lip-Sync for Talking Face Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-Training and Multi-Modal Tokens.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Text-Driven Talking Face Synthesis by Reprogramming Audio-Driven Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation.

[BibT_eX]

[DOI]

CoRR, 2023

Reprogramming Audio-driven Talking Face Synthesis into Text-driven.

[BibT_eX]

[DOI]

CoRR, 2023

Exploring Phonetic Context in Lip Movement for Authentic Talking Face Generation.

[BibT_eX]

[DOI]

CoRR, 2023

Intelligible Lip-to-Speech Synthesis with Speech Units.

[BibT_eX]

[DOI]

Jeongsoo Choi

Minsu Kim

Yong Man Ro

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding.

[BibT_eX]

[DOI]

Jeongsoo Choi

Joanna Hong

Yong Man Ro

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

Jeongsoo Choi

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...