Haoyu Cao

Orcid: 0000-0002-3789-9705

Affiliations:
  • Tencent YouTu Lab, Hefei, China


According to our database1, Haoyu Cao authored at least 17 papers between 2022 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
DREAM: Document Reconstruction via End-to-end Autoregressive Model.
CoRR, July, 2025

TACO: Think-Answer Consistency for Optimized Long-Chain Reasoning and Efficient Data Learning via Reinforcement Learning in LVLMs.
CoRR, May, 2025

VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model.
CoRR, May, 2025

Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy.
CoRR, February, 2025

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction.
CoRR, January, 2025

2024
Turning a CLIP Model Into a Scene Text Spotter.
IEEE Trans. Pattern Anal. Mach. Intell., September, 2024

Communication-efficient clustered federated learning via model distance.
Mach. Learn., June, 2024

Break the Visual Perception: Adversarial Attacks Targeting Encoded Visual Tokens of Large Vision-Language Models.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

HRVDA: High-Resolution Visual Document Assistant.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Few-shot Temporal Pruning Accelerates Diffusion Models for Text Generation.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
ICDAR 2023 Competition on Structured Text Extraction from Visually-Rich Document Images.
Proceedings of the Document Analysis and Recognition - ICDAR 2023, 2023

Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

2022
GMN: Generative Multi-modal Network for Practical Document Information Extraction.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Relational Representation Learning in Visually-Rich Documents.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Query-driven Generative Network for Document Information Extraction in the Wild.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022


  Loading...