Chao Huang

Orcid: 0000-0002-1469-1020

Affiliations:
  • University of Rochester, Department of Computer Science, NY, USA


According to our database1, Chao Huang authored at least 30 papers between 2022 and 2026.

Collaborative distances:

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
High-Quality Sound Separation Across Diverse Categories via Visually-Guided Generative Modeling.
Int. J. Comput. Vis., March, 2026

Video Understanding With Large Language Models: A Survey.
IEEE Trans. Circuits Syst. Video Technol., February, 2026

Omni-Judge: Can Omni-LLMs Serve as Human-Aligned Judges for Text-Conditioned Audio-Video Generation?
CoRR, February, 2026

Semantic visually-guided acoustic highlighting with large vision-language models.
CoRR, January, 2026

Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
Video-R4: Reinforcing Text-Rich Video Reasoning with Visual Rumination.
CoRR, November, 2025

When to Think and When to Look: Uncertainty-Guided Lookback.
CoRR, November, 2025

XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models.
CoRR, October, 2025

Directional Reasoning Injection for Fine-Tuning MLLMs.
CoRR, October, 2025

Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models.
CoRR, October, 2025

ZeroSep: Separate Anything in Audio with Zero Training.
CoRR, May, 2025

MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness.
CoRR, May, 2025

The Sword of Damocles in ViTs: Computational Redundancy Amplifies Adversarial Transferability.
CoRR, April, 2025

Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting.
CoRR, April, 2025

Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1).
CoRR, April, 2025

FreSca: Unveiling the Scaling Space in Diffusion Models.
CoRR, April, 2025

Generative AI for Cel-Animation: A Survey.
Proceedings of the IEEE/CVF International Conference on Computer Vision, ICCV 2025, 2025

$\pi$-AVAS: Can Physics-Integrated Audio-Visual Modeling Boost Neural Acoustic Synthesis?
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Learning to Highlight Audio by Watching Movies.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024
Scaling Concept With Text-Guided Diffusion Models.
CoRR, 2024

Modeling and Driving Human Body Soundfields Through Acoustic Primitives.
Proceedings of the Computer Vision - ECCV 2024, 2024

Language-Guided Joint Audio-Visual Editing via One-Shot Adaptation.
Proceedings of the Computer Vision - ACCV 2024, 2024

High-Quality Visually-Guided Sound Separation from Diverse Categories.
Proceedings of the Computer Vision - ACCV 2024, 2024

2023
Video Understanding with Large Language Models: A Survey.
CoRR, 2023

Neural Acoustic Context Field: Rendering Realistic Room Impulse Response With Neural Fields.
CoRR, 2023

DAVIS: High-Quality Audio-Visual Separation with Generative Diffusion Models.
CoRR, 2023

AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Egocentric Audio-Visual Object Localization.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
How to Prepare for the Next Pandemic - Investigation of Correlation Between Food Prices and COVID-19 From Global and Local Perspectives.
Proceedings of the IEEE International Conference on Big Data, 2022


  Loading...