We stand with Ukraine

We stand with Ukraine

Haozhe Zhao

Orcid: 0000-0003-0502-4426

According to our database¹, Haozhe Zhao authored at least 25 papers between 2021 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

NEP: Autoregressive Image Editing via Next Editing Token Prediction.

[BibT_eX]

[DOI]

,

,

,

,

CoRR, August, 2025

MENTOR: Efficient Multimodal-Conditioned Tuning for Autoregressive Vision Generation Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

CoRR, July, 2025

Teaching Large Language Models to Maintain Contextual Faithfulness via Synthetic Tasks and Reinforcement Learning.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, May, 2025

Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

CoRR, February, 2025

LongViTU: Instruction Tuning for Long-Form Video Understanding.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

CoRR, January, 2025

MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image Generation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Aligning Large Language Models to Follow Instructions and Hallucinate Less via Effective Data Filtering.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Andreas Vlachos

,

,

,

,

,

,

CoRR, 2024

Looking Beyond Text: Reducing Language bias in Large Vision-Language Models via Multimodal Dual-Attention and Soft-Image Guidance.

[BibT_eX]

[DOI]

,

,

,

,

,

,

CoRR, 2024

Selecting Influential Samples for Long Context Alignment via Homologous Models' Guidance and Contextual Awareness Measurement.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

CoRR, 2024

Rethinking Semantic Parsing for Large Language Models: Enhancing LLM Performance with Semantic Hints.

[BibT_eX]

[DOI]

,

,

,

,

,

,

CoRR, 2024

UltraEdit: Instruction-based Fine-Grained Image Editing at Scale.

[BibT_eX]

[DOI]

,

Xiaojian (Shawn) Ma

,

,

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Mitigating Language-Level Performance Disparity in mPLMs via Teacher Language Selection and Cross-lingual Self-Distillation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the Twelfth International Conference on Learning Representations, 2024

An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the Computer Vision - ECCV 2024, 2024

Improving the Robustness of Distantly-Supervised Named Entity Recognition via Uncertainty-Aware Teacher Learning and Student-Student Collaborative Learning.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics, 2024

PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023

ML-Bench: Large Language Models Leverage Open-source Libraries for Machine Learning Tasks.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Wangchunshu Zhou

,

,

,

CoRR, 2023

Distantly-Supervised Named Entity Recognition with Uncertainty-aware Teacher Learning and Student-student Collaborative Learning.

[BibT_eX]

[DOI]

,

,

,

,

,

,

CoRR, 2023

Towards End-to-End Embodied Decision Making via Multi-modal Large Language Model: Explorations with GPT4-Vision and Beyond.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

CoRR, 2023

Removing Camouflage and Revealing Collusion: Leveraging Gang-crime Pattern in Fraudster Detection.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Manuel Cristofaro

,

Paola Jafrancesco

,

Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023

Coarse-to-Fine Dual Encoders are Better Frame Identification Learners.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Empowering MultiModal Models' In-Context Learning Ability through Large Language Models.

[BibT_eX]

[DOI]

,

,

Proceedings of the ACM Turing Award Celebration Conference - China 2023, 2023

2021

Traffic Accident Prediction Methods Based on Multi-factor Models.

[BibT_eX]

[DOI]

,

Proceedings of the Knowledge Science, Engineering and Management, 2021

Loading...