Jeongsoo Choi

Orcid: 0009-0005-6817-604X

According to our database1, Jeongsoo Choi authored at least 25 papers between 2022 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Dub-S2ST: Textless Speech-to-Speech Translation for Seamless Dubbing.
CoRR, May, 2025

Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment.
CoRR, May, 2025

AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation.
CoRR, April, 2025

VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models.
CoRR, April, 2025

MAVFlow: Preserving Paralinguistic Elements with Conditional Flow Matching for Zero-Shot AV2AV Multilingual Translation.
CoRR, March, 2025

Deep Understanding of Sign Language for Sign to Subtitle Alignment.
CoRR, March, 2025

ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

V2SFlow: Video-to-Speech Generation with Speech Decomposition and Rectified Flow.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024
AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model.
IEEE Trans. Multim., 2024

Textless Unit-to-Unit Training for Many-to-Many Multilingual Speech-to-Speech Translation.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Multilingual Visual Speech Recognition with a Single Model by Learning with Discrete Visual Speech Units.
CoRR, 2024

Exploring Phonetic Context-Aware Lip-Sync for Talking Face Generation.
Proceedings of the IEEE International Conference on Acoustics, 2024

Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-Training and Multi-Modal Tokens.
Proceedings of the IEEE International Conference on Acoustics, 2024

Text-Driven Talking Face Synthesis by Reprogramming Audio-Driven Models.
Proceedings of the IEEE International Conference on Acoustics, 2024

AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation.
CoRR, 2023

Reprogramming Audio-driven Talking Face Synthesis into Text-driven.
CoRR, 2023

Exploring Phonetic Context in Lip Movement for Authentic Talking Face Generation.
CoRR, 2023

Intelligible Lip-to-Speech Synthesis with Speech Units.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022


  Loading...