Hang Hua

Orcid: 0000-0002-5441-5776

According to our database1, Hang Hua authored at least 43 papers between 2018 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
JobBench: Aligning Agent Work With Human Will.
CoRR, May, 2026

GEM-4D: Geometry-Enhanced Video World Models for Robot Manipulation.
CoRR, May, 2026

Aurora: Unified Video Editing with a Tool-Using Agent.
CoRR, May, 2026

MementoGUI: Learning Agentic Multimodal Memory Control for Long-Horizon GUI Agents.
CoRR, May, 2026

Visual Aesthetic Benchmark: Can Frontier Models Judge Beauty?
CoRR, May, 2026

Guardian-as-an-Advisor: Advancing Next-Generation Guardian Models for Trustworthy LLMs.
CoRR, April, 2026

ChartNet: A Million-Scale, High-Quality Multimodal Dataset for Robust Chart Understanding.
CoRR, March, 2026

SPARC: Separating Perception And Reasoning Circuits for Test-time Scaling of VLMs.
CoRR, February, 2026

MMCOMPOSITION: Revisiting the Compositionality of Pre- trained Vision-Language Models.
Trans. Mach. Learn. Res., 2026

Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
DAVE: A VLM Vision Encoder for Document Understanding and Web Agents.
CoRR, December, 2025

MIRA: Multimodal Iterative Reasoning Agent for Image Editing.
CoRR, November, 2025

Video-R4: Reinforcing Text-Rich Video Reasoning with Visual Rumination.
CoRR, November, 2025

Building a Foundational Guardrail for General Agentic Systems via Synthetic Data.
CoRR, October, 2025

Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models.
CoRR, October, 2025

VQualA 2025 Challenge on Engagement Prediction for Short Videos: Methods and Results.
CoRR, September, 2025

The Tenth NTIRE 2025 Efficient Super-Resolution Challenge Report.
CoRR, April, 2025

Improving Pretrained Language Model Fine-Tuning With Noise Stability Regularization.
IEEE Trans. Neural Networks Learn. Syst., January, 2025

MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Latent Chain-of-Thought for Visual Reasoning.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

MMIG-Bench: Towards Comprehensive and Explainable Evaluation of Multi-Modal Image Generation Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Generative AI for Cel-Animation: A Survey.
Proceedings of the IEEE/CVF International Conference on Computer Vision, ICCV 2025, 2025


FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning.
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding.
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

GaussianStyle: Gaussian Head Avatar via StyleGAN.
Proceedings of the International Conference on 3D Vision, 2025

2024
VideoXum: Cross-Modal Visual and Textural Summarization of Videos.
IEEE Trans. Multim., 2024

FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity.
CoRR, 2024

MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models.
CoRR, 2024

PromptFix: You Prompt and We Fix the Photo.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

BattleAgent: Multi-modal Dynamic Emulation on Historical Battles to Complement Historical Analysis.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: EMNLP 2024, 2024

FineMatch: Aspect-Based Fine-Grained Image and Text Mismatch Detection and Correction.
Proceedings of the Computer Vision - ECCV 2024, 2024

2023
PromptCap: Prompt-Guided Image Captioning for VQA with GPT-3.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

2022
PromptCap: Prompt-Guided Task-Aware Image Captioning.
CoRR, 2022

Fine-tuning Pre-trained Language Models with Noise Stability Regularization.
CoRR, 2022

2021
Noise Stability Regularization for Improving BERT Fine-tuning.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

2020
Tracking Public Opinion in China through Various Stages of the COVID-19 Pandemic.
CoRR, 2020

Pilot-Assisted Channel Estimation and Signal Detection in Uplink Multi-User MIMO Systems With Deep Learning.
IEEE Access, 2020

2019
Controllable Unsupervised Text Attribute Transfer via Editing Entangled Latent Representation.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Signal Detection in Uplink Pilot-Assisted Multi-User MIMO Systems with Deep Learning.
Proceedings of the Computing, Communications and IoT Applications, ComComAp 2019, Shenzhen, 2019

2018
Attention Enhanced Chinese Word Embeddings.
Proceedings of the Artificial Neural Networks and Machine Learning - ICANN 2018, 2018


  Loading...