Hao Liu

Orcid: 0000-0001-8087-1102

Affiliations:
  • Tencent YouTu Lab, Hefei, China
  • Hefei University of Technology, School of Computer and Information, China


According to our database1, Hao Liu authored at least 49 papers between 2014 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild?
CoRR, May, 2025

MoLoRec: A Generalizable and Efficient Framework for LLM-Based Recommendation.
CoRR, February, 2025

OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning.
CoRR, January, 2025

Assessing the grid-connected capacity of county-level photovoltaic systems: A study to enhance flexibility and reliability.
J. Comput. Methods Sci. Eng., 2025

MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024
Asymmetric Deformable Spatio-temporal Framework for Infrared Object Tracking.
ACM Trans. Multim. Comput. Commun. Appl., October, 2024

SYRER: Synergistic Relational Reasoning for RGB-D Cross-Modal Re-Identification.
IEEE Trans. Multim., 2024

A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding.
CoRR, 2024

TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy.
CoRR, 2024

MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering.
CoRR, 2024

TextSquare: Scaling up Text-Centric Visual Instruction Tuning.
CoRR, 2024

DocPedia: unleashing the power of large multimodal model in the frequency domain for versatile document understanding.
Sci. China Inf. Sci., 2024

Harmonizing Visual Text Comprehension and Generation.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Prompt-Enhanced Software Vulnerability Detection Using ChatGPT.
Proceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings, 2024

Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Grab What You Need: Rethinking Complex Table Structure Recognition with Flexible Components Deliberation.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding.
CoRR, 2023

RefBERT: A Two-Stage Pre-trained Framework for Automatic Rename Refactoring.
Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, 2023

Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

TaCo: Textual Attribute Recognition via Contrastive Learning.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

The Devil Is in the Frequency: Geminated Gestalt Autoencoder for Self-Supervised Visual Pre-training.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
The Devil is in the Frequency: Geminated Gestalt Autoencoder for Self-Supervised Visual Pre-Training.
CoRR, 2022

GMN: Generative Multi-modal Network for Practical Document Information Extraction.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Query-driven Generative Network for Document Information Extraction in the Wild.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Knowledge Mining with Scene Text for Fine-Grained Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Neural Collaborative Graph Machines for Table Structure Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

NomMer: Nominate Synergistic Context in Vision Transformer for Visual Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Perceiving Stroke-Semantic Context: Hierarchical Contrastive Learning for Robust Scene Text Recognition.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
Show, Read and Reason: Table Structure Recognition with Flexible Context Aggregator.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

RecycleNet: An Overlapped Text Instance Recovery Approach.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

2020
Person Attribute Recognition by Sequence Contextual Relation Learning.
IEEE Trans. Circuits Syst. Video Technol., 2020

PuzzleNet: Scene Text Detection by Segment Context Graph Learning.
CoRR, 2020

Accurate Structured-Text Spotting for Arithmetical Exercise Correction.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Independent metric learning with aligned multi-part features for video-based person re-identification.
Multim. Tools Appl., 2019

Deep feature representation and multiple metric ensembles for person re-identification in security surveillance system.
Multim. Tools Appl., 2019

Local region partition for person re-identification.
Multim. Tools Appl., 2019

2018
Video-Based Person Re-Identification With Accumulative Motion Context.
IEEE Trans. Circuits Syst. Video Technol., 2018

Sequence-based Person Attribute Recognition with Joint CTC-Attention Model.
CoRR, 2018

Video-Based Person Re-identification with Adaptive Multi-part Features Learning.
Proceedings of the Advances in Multimedia Information Processing - PCM 2018, 2018

Multi-View Image Generation from a Single-View.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

2017
End-to-End Comparative Attention Networks for Person Re-Identification.
IEEE Trans. Image Process., 2017

Multi-View Image Generation from a Single-View.
CoRR, 2017

Neural Person Search Machines.
Proceedings of the IEEE International Conference on Computer Vision, 2017

2016
Robust Face Recognition with Deep Multi-View Representation Learning.
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

2015
Kernelized Relaxed Margin Components Analysis for Person Re-identification.
IEEE Signal Process. Lett., 2015

2014
Non-linear metric learning with multiple features for person re-identification.
Proceedings of the IEEE China Summit & International Conference on Signal and Information Processing, 2014


  Loading...