Hao Sun

Orcid: 0009-0007-7917-1628

Affiliations:
  • China Telecom, Beijing, China


According to our database1, Hao Sun authored at least 25 papers between 2023 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
Segment-Aligned Policy Optimization for Multi-Modal Reasoning.
CoRR, May, 2026

VCE: A zero-cost hallucination mitigation method of LVLMs via visual contrastive editing.
CoRR, April, 2026

Controllable Exploration in Hybrid-Policy RLVR for Multi-Modal Reasoning.
CoRR, February, 2026

Geometric Image Editing via Effects-Sensitive In-Context Inpainting with Diffusion Transformers.
CoRR, February, 2026

VAV-R1: Difficulty-Aware Multimodal Reasoning for Video Anomaly Validation.
Proceedings of the 2026 International Conference on Multimedia Retrieval, 2026

Exploring Efficient Open-Vocabulary Segmentation in the Remote Sensing.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

Adaptive Evidential Learning for Temporal-Semantic Robustness in Moment Retrieval.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
Structure-Aware Prototype Guided Trusted Multi-View Classification.
CoRR, November, 2025

TeleEgo: Benchmarking Egocentric AI Assistants in the Wild.
CoRR, October, 2025

Infinite Video Understanding.
CoRR, July, 2025

SVGen: Interpretable Vector Graphics Generation with Large Language Models.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Secure Tug-of-War (SecTOW): Iterative Defense-Attack Training with Reinforcement Learning for Multimodal Model Security.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025

ViCo: A Multitask Video-enhanced and Cognition-preserving Modality Alignment Training Framework.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

FASTER: Face Attribute Sliders with Semantic Rewards.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Trusted Unified Feature-Neighborhood Dynamics for Multi-View Classification.
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024
BoViLA: Bootstrapping Video-Language Alignment via LLM-Based Self-Questioning and Answering.
CoRR, 2024

Beyond Uncertainty: Evidential Deep Learning for Robust Video Temporal Grounding.
CoRR, 2024

Disentangle and denoise: Tackling context misalignment for video moment retrieval.
CoRR, 2024

GOAL: Grounded text-to-image Synthesis with Joint Layout Alignment Tuning.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

ProTA: Probabilistic Token Aggregation for Text-Video Retrieval.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

2023
Mask to reconstruct: Cooperative Semantics Completion for Video-text Retrieval.
CoRR, 2023

A Baseline Investigation: Transformer-based Cross-view Baseline for Text-based Person Search.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Mask to Reconstruct: Cooperative Semantics Completion for Video-text Retrieval.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

LLaViLo: Boosting Video Moment Retrieval via Adapter-Based Multimodal Modeling.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Alignment and Generation Adapter for Efficient Video-text Understanding.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023


  Loading...