Haiwen Diao

Orcid: 0000-0002-4156-5417

According to our database1, Haiwen Diao authored at least 23 papers between 2021 and 2026.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
From Pixels to Words - Towards Native One-Vision Models at Scale.
CoRR, May, 2026

SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture.
CoRR, May, 2026

VISTA-Bench: Do Vision-Language Models Really Understand Visualized Text as Well as Pure Text?
CoRR, February, 2026

DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation.
CoRR, January, 2026

2025
The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding.
CoRR, December, 2025

From Pixels to Words - Towards Native Vision-Language Primitives at Scale.
CoRR, October, 2025

Visual Jigsaw Post-Training Improves MLLMs.
CoRR, September, 2025

Exploring Dynamic Transformer for Efficient Object Tracking.
IEEE Trans. Neural Networks Learn. Syst., August, 2025

End-to-End Vision Tokenizer Tuning.
CoRR, May, 2025

Regularizing Subspace Redundancy of Low-Rank Adaptation.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Autoregressive Video Generation without Vector Quantization.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

EVEv2: Improved Baselines for Encoder-Free Vision-Language Models.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

KARST: Multi-Kernel Kronecker Adaptation with Re-Scaling Transmission for Visual Classification.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

2024
GSSF: Generalized Structural Sparse Function for Deep Cross-Modal Metric Learning.
IEEE Trans. Image Process., 2024

Deep Boosting Learning: A Brand-New Cooperative Approach for Image-Text Matching.
IEEE Trans. Image Process., 2024

LLMs Can Evolve Continually on Modality for X-Modal Reasoning.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Unveiling Encoder-Free Vision-Language Models.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

MoTrans: Customized Motion Transfer with Text-driven Video Diffusion Models.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning.
Proceedings of the Computer Vision - ECCV 2024, 2024

UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
Plug-and-Play Regulators for Image-Text Matching.
IEEE Trans. Image Process., 2023

2021
Similarity Reasoning and Filtration for Image-Text Matching.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021


  Loading...