Dahun Kim

Orcid: 0000-0003-1776-6195

According to our database1, Dahun Kim authored at least 47 papers between 2017 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
EmbeddingGemma: Powerful and Lightweight Text Representations.
CoRR, September, 2025

Zero-Shot Multi-Spectral Learning: Reimagining a Generalist Multimodal Gemini 2.5 Model for Remote Sensing Applications.
CoRR, September, 2025

Time-Scaling State-Space Models for Dense Video Captioning.
CoRR, September, 2025

Context-Adaptive Multi-Prompt Embedding with Large Language Models for Vision-Language Alignment.
CoRR, August, 2025

Learning Visual Grounding from Generative Vision and Language Model.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

VideoComp: Advancing Fine-Grained Compositional and Temporal Alignment in Video-Text Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024
Uni-DVPS: Unified Model for Depth-Aware Video Panoptic Segmentation.
IEEE Robotics Autom. Lett., July, 2024

Textless Unit-to-Unit Training for Many-to-Many Multilingual Speech-to-Speech Translation.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Whats in a Video: Factorized Autoregressive Decoding for Online Dense Video Captioning.
CoRR, 2024

OmniBind: Teach to Build Unequal-Scale Modality Interaction for Omni-Bind of All.
CoRR, 2024

Video-kMaX: A Simple Unified Approach for Online and Near-Online Video Panoptic Segmentation.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

Region-Centric Image-Language Pretraining for Open-Vocabulary Detection.
Proceedings of the Computer Vision - ECCV 2024, 2024

Mirasol3B: A Multimodal Autoregressive Model for Time-Aligned and Contextual Modalities.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
RECLIP: Resource-efficient CLIP by Training with Small Images.
Trans. Mach. Learn. Res., 2023

MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks.
Trans. Mach. Learn. Res., 2023

Detection-Oriented Image-Text Pretraining for Open-Vocabulary Detection.
CoRR, 2023

Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation.
CoRR, 2023

Memory-Aware DVFS Governing Policy for Improved Energy-Saving in the Linux Kernel.
Proceedings of the 29th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, 2023

Neural Image-based Avatars: Generalizable Radiance Fields for Human Avatar Modeling.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Contrastive Feature Masking Open-Vocabulary Vision Transformer.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Dense Pixel-Level Interpretation of Dynamic Scenes With Video Panoptic Segmentation.
IEEE Trans. Image Process., 2022

Learning Open-World Object Proposals Without Learning to Classify.
IEEE Robotics Autom. Lett., 2022

Tailor Me: An Editing Network for Fashion Attribute Shape Manipulation.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022

CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

TubeFormer-DeepLab: Video Mask Transformer.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
DeepLab2: A TensorFlow Library for Deep Labeling.
CoRR, 2021

The Devil is in the Boundary: Exploiting Boundary Representation for Basis-based Instance Segmentation.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2021

Neural Human Performer: Learning Generalizable Radiance Fields for Human Performance Rendering.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Learning To Associate Every Segment for Video Panoptic Segmentation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Global Context and Geometric Priors for Effective Non-Local Self-Attention.
Proceedings of the 32nd British Machine Vision Conference 2021, 2021

2020
Recurrent Temporal Aggregation Framework for Deep Video Inpainting.
IEEE Trans. Pattern Anal. Mach. Intell., 2020

Rotationally-Consistent Novel View Synthesis for Humans.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Rotationally-Temporally Consistent Novel View Synthesis of Human Performance Video.
Proceedings of the Computer Vision - ECCV 2020, 2020

Video Panoptic Segmentation.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Align-and-Attend Network for Globally and Locally Coherent Video Inpainting.
Proceedings of the 31st British Machine Vision Conference 2020, 2020

Hide-and-Tell: Learning to Bridge Photo Streams for Visual Storytelling.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Preserving Semantic and Temporal Consistency for Unpaired Video-to-Video Translation.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Video Retargeting: Trade-off between Content Preservation and Spatio-temporal Consistency.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Deep Video Inpainting.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Deep Blind Video Decaptioning by Temporal Aggregation and Recurrence.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Self-Supervised Video Representation Learning with Space-Time Cubic Puzzles.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

Discriminative Feature Learning for Unsupervised Video Summarization.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
Learning Image Representations by Completing Damaged Jigsaw Puzzles.
Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision, 2018

LinkNet: Relational Embedding for Scene Graph.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

2017
Two-Phase Learning for Weakly Supervised Object Localization.
CoRR, 2017

Two-Phase Learning for Weakly Supervised Object Localization.
Proceedings of the IEEE International Conference on Computer Vision, 2017


  Loading...