Chao Huang

Orcid: 0000-0002-1469-1020

Affiliations:

University of Rochester, Department of Computer Science, NY, USA

According to our database¹, Chao Huang authored at least 30 papers between 2022 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

High-Quality Sound Separation Across Diverse Categories via Visually-Guided Generative Modeling.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., March, 2026

Video Understanding With Large Language Models: A Survey.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., February, 2026

Omni-Judge: Can Omni-LLMs Serve as Human-Aligned Judges for Text-Conditioned Audio-Video Generation?

[BibT_eX]

[DOI]

CoRR, February, 2026

Semantic visually-guided acoustic highlighting with large vision-language models.

[BibT_eX]

[DOI]

Junhua Huang

Chao Huang

Chenliang Xu

CoRR, January, 2026

Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

Video-R4: Reinforcing Text-Rich Video Reasoning with Visual Rumination.

[BibT_eX]

[DOI]

CoRR, November, 2025

When to Think and When to Look: Uncertainty-Guided Lookback.

[BibT_eX]

[DOI]

CoRR, November, 2025

XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models.

[BibT_eX]

[DOI]

CoRR, October, 2025

Directional Reasoning Injection for Fine-Tuning MLLMs.

[BibT_eX]

[DOI]

CoRR, October, 2025

Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models.

[BibT_eX]

[DOI]

CoRR, October, 2025

The Sword of Damocles in ViTs: Computational Redundancy Amplifies Adversarial Transferability.

[BibT_eX]

[DOI]

CoRR, April, 2025

Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1).

[BibT_eX]

[DOI]

CoRR, April, 2025

FreSca: Unveiling the Scaling Space in Diffusion Models.

[BibT_eX]

[DOI]

CoRR, April, 2025

MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Harnessing the Computation Redundancy in ViTs to Boost Adversarial Transferability.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

ZeroSep: Separate Anything in Audio with Zero Training.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Generative AI for Cel-Animation: A Survey.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, ICCV 2025, 2025

$\pi$-AVAS: Can Physics-Integrated Audio-Visual Modeling Boost Neural Acoustic Synthesis?

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Learning to Highlight Audio by Watching Movies.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024

Scaling Concept With Text-Guided Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2024

Modeling and Driving Human Body Soundfields Through Acoustic Primitives.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Language-Guided Joint Audio-Visual Editing via One-Shot Adaptation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ACCV 2024, 2024

High-Quality Visually-Guided Sound Separation from Diverse Categories.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ACCV 2024, 2024

2023

Video Understanding with Large Language Models: A Survey.

[BibT_eX]

[DOI]

CoRR, 2023

Neural Acoustic Context Field: Rendering Realistic Room Impulse Response With Neural Fields.

[BibT_eX]

[DOI]

CoRR, 2023

DAVIS: High-Quality Audio-Visual Separation with Generative Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2023

AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Egocentric Audio-Visual Object Localization.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

How to Prepare for the Next Pandemic - Investigation of Correlation Between Food Prices and COVID-19 From Global and Local Perspectives.

[BibT_eX]

[DOI]

Yufei Zhao

Chao Huang

Jiebo Luo

Proceedings of the IEEE International Conference on Big Data, 2022

Chao Huang

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...