Hang Hua

Orcid: 0000-0002-5441-5776

According to our database¹, Hang Hua authored at least 43 papers between 2018 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

JobBench: Aligning Agent Work With Human Will.

[BibT_eX]

[DOI]

CoRR, May, 2026

GEM-4D: Geometry-Enhanced Video World Models for Robot Manipulation.

[BibT_eX]

[DOI]

CoRR, May, 2026

Aurora: Unified Video Editing with a Tool-Using Agent.

[BibT_eX]

[DOI]

CoRR, May, 2026

MementoGUI: Learning Agentic Multimodal Memory Control for Long-Horizon GUI Agents.

[BibT_eX]

[DOI]

CoRR, May, 2026

Visual Aesthetic Benchmark: Can Frontier Models Judge Beauty?

[BibT_eX]

[DOI]

Bhaskar Ramasubramanian

CoRR, May, 2026

Guardian-as-an-Advisor: Advancing Next-Generation Guardian Models for Trustworthy LLMs.

[BibT_eX]

[DOI]

CoRR, April, 2026

ChartNet: A Million-Scale, High-Quality Multimodal Dataset for Robust Chart Understanding.

[BibT_eX]

[DOI]

CoRR, March, 2026

SPARC: Separating Perception And Reasoning Circuits for Test-time Scaling of VLMs.

[BibT_eX]

[DOI]

CoRR, February, 2026

MMCOMPOSITION: Revisiting the Compositionality of Pre- trained Vision-Language Models.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2026

Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

DAVE: A VLM Vision Encoder for Document Understanding and Web Agents.

[BibT_eX]

[DOI]

CoRR, December, 2025

MIRA: Multimodal Iterative Reasoning Agent for Image Editing.

[BibT_eX]

[DOI]

Ziyun Zeng

Hang Hua

Jiebo Luo

CoRR, November, 2025

Video-R4: Reinforcing Text-Rich Video Reasoning with Visual Rumination.

[BibT_eX]

[DOI]

CoRR, November, 2025

Building a Foundational Guardrail for General Agentic Systems via Synthetic Data.

[BibT_eX]

[DOI]

CoRR, October, 2025

Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models.

[BibT_eX]

[DOI]

CoRR, October, 2025

VQualA 2025 Challenge on Engagement Prediction for Short Videos: Methods and Results.

[BibT_eX]

[DOI]

CoRR, September, 2025

The Tenth NTIRE 2025 Efficient Super-Resolution Challenge Report.

[BibT_eX]

[DOI]

CoRR, April, 2025

Improving Pretrained Language Model Fine-Tuning With Noise Stability Regularization.

[BibT_eX]

[DOI]

IEEE Trans. Neural Networks Learn. Syst., January, 2025

MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Latent Chain-of-Thought for Visual Reasoning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

MMIG-Bench: Towards Comprehensive and Explainable Evaluation of Multi-Modal Image Generation Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Generative AI for Cel-Animation: A Survey.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, ICCV 2025, 2025

VQualA 2025 Challenge on Engagement Prediction for Short Videos: Methods and Results.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, ICCV 2025, 2025

FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

GaussianStyle: Gaussian Head Avatar via StyleGAN.

[BibT_eX]

[DOI]

Proceedings of the International Conference on 3D Vision, 2025

2024

VideoXum: Cross-Modal Visual and Textural Summarization of Videos.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2024

FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity.

[BibT_eX]

[DOI]

CoRR, 2024

MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

PromptFix: You Prompt and We Fix the Photo.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

BattleAgent: Multi-modal Dynamic Emulation on Historical Battles to Complement Historical Analysis.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: EMNLP 2024, 2024

FineMatch: Aspect-Based Fine-Grained Image and Text Mismatch Detection and Correction.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

2023

PromptCap: Prompt-Guided Image Captioning for VQA with GPT-3.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

2022

PromptCap: Prompt-Guided Task-Aware Image Captioning.

[BibT_eX]

[DOI]

CoRR, 2022

Fine-tuning Pre-trained Language Models with Noise Stability Regularization.

[BibT_eX]

[DOI]

CoRR, 2022

2021

Noise Stability Regularization for Improving BERT Fine-tuning.

[BibT_eX]

[DOI]

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

2020

Tracking Public Opinion in China through Various Stages of the COVID-19 Pandemic.

[BibT_eX]

[DOI]

Yuqi Gao

Hang Hua

Jiebo Luo

CoRR, 2020

Pilot-Assisted Channel Estimation and Signal Detection in Uplink Multi-User MIMO Systems With Deep Learning.

[BibT_eX]

[DOI]

Xiaoming Wang

Hang Hua

Youyun Xu

IEEE Access, 2020

2019

Controllable Unsupervised Text Attribute Transfer via Editing Entangled Latent Representation.

[BibT_eX]

[DOI]

Ke Wang

Hang Hua

Xiaojun Wan

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Signal Detection in Uplink Pilot-Assisted Multi-User MIMO Systems with Deep Learning.

[BibT_eX]

[DOI]

Hang Hua

Xiaoming Wang

Youyun Xu

Proceedings of the Computing, Communications and IoT Applications, ComComAp 2019, Shenzhen, 2019

2018

Attention Enhanced Chinese Word Embeddings.

[BibT_eX]

[DOI]

Proceedings of the Artificial Neural Networks and Machine Learning - ICANN 2018, 2018

Hang Hua

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...