Kim Sung-Bin

According to our database1, Kim Sung-Bin authored at least 17 papers between 2022 and 2025.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation.
CoRR, April, 2025

VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models.
CoRR, April, 2025

AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Perceptually Accurate 3D Talking Head Generation: New Definitions, Speech-Mesh Representation, and Evaluation Metrics.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

SoundBrush: Sound as a Brush for Visual Scene Editing.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
A Large-Scale 3D Face Mesh Video Dataset via Neural Re-parameterized Optimization.
Trans. Mach. Learn. Res., 2024

Sound2Vision: Generating Diverse Visuals from Audio through Cross-Modal Latent Alignment.
CoRR, 2024

Enhancing Speech-Driven 3D Facial Animation with Audio-Visual Guidance from Lip Reading Expert.
CoRR, 2024

Revisiting Learning-based Video Motion Magnification for Real-time Processing.
CoRR, 2024

LaughTalk: Expressive 3D Talking Head Generation with Laughter.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

MultiTalk: Enhancing 3D Talking Head Generation Across Languages with Multilingual Video Dataset.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Enhancing Speech-Driven 3D Facial Animation with Audio-Visual Guidance from Lip Reading Expert.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

2023
The Devil in the Details: Simple and Effective Optical Flow Synthetic Data Generation.
CoRR, 2023

Prefix Tuning for Automated Audio Captioning.
Proceedings of the IEEE International Conference on Acoustics, 2023

Sound to Visual Scene Generation by Audio-to-Visual Latent Alignment.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Lightweight Speaker Recognition in Poincaré Spaces.
IEEE Signal Process. Lett., 2022


  Loading...