Hanrong Ye

Orcid: 0000-0002-7986-6143

According to our database1, Hanrong Ye authored at least 33 papers between 2019 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
JetViT: Efficient High-Resolution Vision Transformer with Post-Training Attention Search.
CoRR, May, 2026

Attend Before Attention: Efficient and Scalable Video Understanding via Autoregressive Gazing.
CoRR, March, 2026

Speech-Hands: A Self-Reflection Voice Agentic Approach to Speech Recognition and Audio Reasoning with Omni Perception.
CoRR, January, 2026

Speech-Hands: A Self-Reflection Voice Agentic Approach to Speech Recognition and Audio Reasoning with Omni Perception.
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026

2025
GSPN-2: Efficient Parallel Sequence Modeling.
CoRR, December, 2025

ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration.
CoRR, November, 2025

Nemotron-Flash: Towards Latency-Optimal Hybrid Small Language Models.
CoRR, November, 2025

OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM.
CoRR, October, 2025

UALM: Unified Audio Language Model for Understanding, Generation and Reasoning.
CoRR, October, 2025

QeRL: Beyond Efficiency - Quantization-enhanced Reinforcement Learning for LLMs.
CoRR, October, 2025

Test-Time Scaling Strategies for Generative Retrieval in Multimodal Conversational Recommendations.
CoRR, August, 2025

Nemotron-Flash: Towards Latency-Optimal Hybrid Small Language Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Scaling RL to Long Videos.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Multi-Task Label Discovery via Hierarchical Task Tokens for Partially Annotated Dense Predictions.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025

I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models.
Proceedings of the Forty-second International Conference on Machine Learning, 2025

MMEgo: Towards Building Egocentric Multimodal LLMs for Video QA.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024
InvPT++: Inverted Pyramid Multi-Task Transformer for Visual Scene Understanding.
IEEE Trans. Pattern Anal. Mach. Intell., December, 2024

MM-Ego: Towards Building Egocentric Multimodal LLMs.
CoRR, 2024

X-VILA: Cross-Modality Alignment for Large Language Model.
CoRR, 2024

SegGen: Supercharging Segmentation Models with Text2Mask and Mask2Img Synthesis.
Proceedings of the Computer Vision - ECCV 2024, 2024

DiffusionMTL: Learning Multi-Task Denoising Diffusion Model from Partially Annotated Data.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
Joint 2D-3D Multi-Task Learning on Cityscapes-3D: 3D Detection, Segmentation, and Depth Estimation.
CoRR, 2023

TaskPrompter: Spatial-Channel Multi-Task Prompting for Dense Scene Understanding.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

TaskExpert: Dynamically Assembling Multi-Task Representations with Memorial Mixture-of-Experts.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Contrastive Multi-Task Dense Prediction.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Improving Model Training with Multi-fidelity Hyperparameter Evaluation.
Proceedings of the Fifth Conference on Machine Learning and Systems, 2022

Inverted Pyramid Multi-task Transformer for Dense Scene Understanding.
Proceedings of the Computer Vision - ECCV 2022, 2022

2021
Bi-Directional Exponential Angular Triplet Loss for RGB-Infrared Person Re-Identification.
IEEE Trans. Image Process., 2021

Modality-aware Style Adaptation for RGB-Infrared Person Re-Identification.
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021

2020
Bi-directional Exponential Angular Triplet Loss for RGB-Infrared Person Re-Identification.
CoRR, 2020

Video Logo Retrieval Based on Local Features.
Proceedings of the IEEE International Conference on Image Processing, 2020

2019
Self-Refining Deep Symmetry Enhanced Network for Rain Removal.
Proceedings of the 2019 IEEE International Conference on Image Processing, 2019


  Loading...