Haiwen Diao

Orcid: 0000-0002-4156-5417

According to our database¹, Haiwen Diao authored at least 23 papers between 2021 and 2026.

Collaborative distances:

Dijkstra number² of five.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

From Pixels to Words - Towards Native One-Vision Models at Scale.

[BibT_eX]

[DOI]

CoRR, May, 2026

SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture.

[BibT_eX]

[DOI]

CoRR, May, 2026

VISTA-Bench: Do Vision-Language Models Really Understand Visualized Text as Well as Pure Text?

[BibT_eX]

[DOI]

CoRR, February, 2026

DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation.

[BibT_eX]

[DOI]

CoRR, January, 2026

2025

The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding.

[BibT_eX]

[DOI]

CoRR, December, 2025

From Pixels to Words - Towards Native Vision-Language Primitives at Scale.

[BibT_eX]

[DOI]

CoRR, October, 2025

Visual Jigsaw Post-Training Improves MLLMs.

[BibT_eX]

[DOI]

CoRR, September, 2025

Exploring Dynamic Transformer for Efficient Object Tracking.

[BibT_eX]

[DOI]

IEEE Trans. Neural Networks Learn. Syst., August, 2025

End-to-End Vision Tokenizer Tuning.

[BibT_eX]

[DOI]

CoRR, May, 2025

Regularizing Subspace Redundancy of Low-Rank Adaptation.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Autoregressive Video Generation without Vector Quantization.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

EVEv2: Improved Baselines for Encoder-Free Vision-Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

KARST: Multi-Kernel Kronecker Adaptation with Re-Scaling Transmission for Visual Classification.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

2024

GSSF: Generalized Structural Sparse Function for Deep Cross-Modal Metric Learning.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2024

Deep Boosting Learning: A Brand-New Cooperative Approach for Image-Text Matching.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2024

LLMs Can Evolve Continually on Modality for X-Modal Reasoning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Unveiling Encoder-Free Vision-Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

MoTrans: Customized Motion Transfer with Text-driven Video Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

Plug-and-Play Regulators for Image-Text Matching.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2023

2021

Similarity Reasoning and Filtration for Image-Text Matching.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

Haiwen Diao

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...