Haozhe Zhao

Orcid: 0000-0003-0502-4426

According to our database1, Haozhe Zhao authored at least 39 papers between 2021 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
SADGCN-GC: self-attention-based deep graph convolutional neural network with quantization and visualization for graph classification.
Vis. Comput., May, 2026

Crafter: A Multi-Agent Harness for Editable Scientific Figure Generation from Diverse Inputs.
CoRR, May, 2026

Step-wise Rubric Rewards for LLM Reasoning.
CoRR, May, 2026

From Context to Skills: Can Language Models Learn from Context Skillfully?
CoRR, April, 2026

Less Data, Faster Convergence: Goal-Driven Data Optimization for Multimodal Instruction Tuning.
CoRR, March, 2026

From Preferences to Prejudice: The Role of Alignment Tuning in Shaping Social Bias in Video Diffusion Models.
Trans. Mach. Learn. Res., 2026

A Goal Without a Plan Is Just a Wish: Efficient and Effective Global Planner Training for Long-Horizon Agent Task.
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026

Teaching Large Language Models to Maintain Contextual Faithfulness via Synthetic Tasks and Reinforcement Learning.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
FaithLens: Detecting and Explaining Faithfulness Hallucination.
CoRR, December, 2025

MMGR: Multi-Modal Generative Reasoning.
CoRR, December, 2025

A Goal Without a Plan Is Just a Wish: Efficient and Effective Global Planner Training for Long-Horizon Agent Tasks.
CoRR, October, 2025

Sparse Training Scheme for Multimodal LLM.
CoRR, September, 2025

MENTOR: Efficient Multimodal-Conditioned Tuning for Autoregressive Vision Generation Models.
CoRR, July, 2025

LongViTU: Instruction Tuning for Long-Form Video Understanding.
CoRR, January, 2025

NEP: Autoregressive Image Editing via Next Editing Token Prediction.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image Generation.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control is Easier than You Think.
Proceedings of the IEEE/CVF International Conference on Computer Vision, ICCV 2025, 2025

Looking Beyond Text: Reducing Language Bias in Large Vision-Language Models via Multimodal Dual-Attention and Soft-Image Guidance.
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

GATEAU: Selecting Influential Samples for Long Context Alignment.
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

CCAgent: Coordinating Collaborative Data Scaling for Operating System Agents via Web3.
Proceedings of the 34th ACM International Conference on Information and Knowledge Management, 2025

Aligning Large Language Models to Follow Instructions and Hallucinate Less via Effective Data Filtering.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Rethinking Semantic Parsing for Large Language Models: Enhancing LLM Performance with Semantic Hints.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2025

2024
Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey.
CoRR, 2024

Looking Beyond Text: Reducing Language bias in Large Vision-Language Models via Multimodal Dual-Attention and Soft-Image Guidance.
CoRR, 2024

Selecting Influential Samples for Long Context Alignment via Homologous Models' Guidance and Contextual Awareness Measurement.
CoRR, 2024

UltraEdit: Instruction-based Fine-Grained Image Editing at Scale.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Mitigating Language-Level Performance Disparity in mPLMs via Teacher Language Selection and Cross-lingual Self-Distillation.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models.
Proceedings of the Computer Vision - ECCV 2024, 2024

Improving the Robustness of Distantly-Supervised Named Entity Recognition via Uncertainty-Aware Teacher Learning and Student-Student Collaborative Learning.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023
ML-Bench: Large Language Models Leverage Open-source Libraries for Machine Learning Tasks.
CoRR, 2023

Distantly-Supervised Named Entity Recognition with Uncertainty-aware Teacher Learning and Student-student Collaborative Learning.
CoRR, 2023

Towards End-to-End Embodied Decision Making via Multi-modal Large Language Model: Explorations with GPT4-Vision and Beyond.
CoRR, 2023

Removing Camouflage and Revealing Collusion: Leveraging Gang-crime Pattern in Fraudster Detection.
Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023

Coarse-to-Fine Dual Encoders are Better Frame Identification Learners.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Empowering MultiModal Models' In-Context Learning Ability through Large Language Models.
Proceedings of the ACM Turing Award Celebration Conference - China 2023, 2023

2021
Traffic Accident Prediction Methods Based on Multi-factor Models.
Proceedings of the Knowledge Science, Engineering and Management, 2021


  Loading...