Yao Hu

Orcid: 0009-0006-1274-7111

Affiliations:
  • Xiaohongshu Inc., Beijing, China


According to our database1, Yao Hu authored at least 119 papers between 2012 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Cross-Scenario Unified Modeling of User Interests at Billion Scale.
CoRR, October, 2025

HyMiRec: A Hybrid Multi-interest Learning Framework for LLM-based Sequential Recommendation.
CoRR, October, 2025

GIR-Bench: Versatile Benchmark for Generating Images with Reasoning.
CoRR, October, 2025

DecEx-RAG: Boosting Agentic Retrieval-Augmented Generation with Decision and Execution Optimization via Process Supervision.
CoRR, October, 2025

RealBench: A Chinese Multi-image Understanding Benchmark Close to Real-world Scenarios.
CoRR, September, 2025

InstanceAssemble: Layout-Aware Image Generation via Instance Assembling Attention.
CoRR, September, 2025

Interleaving Reasoning for Better Text-to-Image Generation.
CoRR, September, 2025

FireRedChat: A Pluggable, Full-Duplex Voice Interaction System with Cascaded and Semi-Cascaded Implementations.
CoRR, September, 2025

SelfAug: Mitigating Catastrophic Forgetting in Retrieval-Augmented Generation via Distribution Self-Alignment.
CoRR, September, 2025

FireRedTTS-2: Towards Long Conversational Speech Generation for Podcast and Chatbot.
CoRR, September, 2025

Decomposed Reasoning with Reinforcement Learning for Relevance Assessment in UGC Platforms.
CoRR, August, 2025

Object-centric Video Question Answering with Visual Grounding and Referring.
CoRR, July, 2025

SelfRACG: Enabling LLMs to Self-Express and Retrieve for Code Generation.
CoRR, July, 2025

RedOne: Revealing Domain-specific LLM Post-Training in Social Networking Services.
CoRR, July, 2025

Flux-Sculptor: Text-Driven Rich-Attribute Portrait Editing through Decomposed Spatial Flow Control.
CoRR, July, 2025

AgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System Need.
CoRR, June, 2025

Plan Your Travel and Travel with Your Plan: Wide-Horizon Planning and Evaluation via LLM.
CoRR, June, 2025

Progressive Scaling Visual Object Tracking.
CoRR, May, 2025

MT<sup>3</sup>: Scaling MLLM-based Text Image Machine Translation via Multi-Task Reinforcement Learning.
CoRR, May, 2025

Redefining Machine Translation on Social Network Services with Large Language Models.
CoRR, April, 2025

Hierarchical Self-Distilled Feature Learning for Fine-Grained Visual Categorization.
IEEE Trans. Neural Networks Learn. Syst., March, 2025

FireRedTTS-1S: An Upgraded Streamable Foundation Text-to-Speech System.
CoRR, March, 2025

CQ-DINO: Mitigating Gradient Dilution via Category Queries for Vast Vocabulary Object Detection.
CoRR, March, 2025

Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models.
CoRR, March, 2025

WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs.
CoRR, February, 2025

DynaPrompt: Dynamic Test-Time Prompt Tuning.
CoRR, January, 2025

FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration.
CoRR, January, 2025

DynamicFace: High-Quality and Consistent Video Face Swapping using Composable 3D Facial Priors.
CoRR, January, 2025

Scalable Overload-Aware Graph-Based Index Construction for 10-Billion-Scale Vector Similarity Search.
Proceedings of the Companion Proceedings of the ACM on Web Conference 2025, 2025

PaRT: Enhancing Proactive Social Chatbots with Personalized Real-Time Retrieval.
Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2025

Qilin: A Multimodal Information Retrieval Dataset with APP-level User Sessions.
Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2025

Multi-Granularity Distribution Modeling for Video Watch Time Prediction via Exponential-Gaussian Mixture Network.
Proceedings of the Nineteenth ACM Conference on Recommender Systems, 2025

MoDification: Mixture of Depths Made Easy.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

NoteLLM-2: Multimodal Large Representation Models for Recommendation.
Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, V.1, 2025

Improving Synthetic Image Detection Towards Generalization: An Image Transformation Perspective.
Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, V.1, 2025

A Sanity Check for AI-generated Image Detection.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

DynaPrompt: Dynamic Test-Time Prompt Tuning.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Understanding the RoPE Extensions of Long-Context LLMs: An Attention Perspective.
Proceedings of the 31st International Conference on Computational Linguistics, 2025

ZigZagKV: Dynamic KV Cache Compression for Long-context Modeling based on Layer Uncertainty.
Proceedings of the 31st International Conference on Computational Linguistics, 2025

Towards the Law of Capacity Gap in Distilling Language Models.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024
OV-VIS: Open-Vocabulary Video Instance Segmentation.
Int. J. Comput. Vis., November, 2024

OV-DAR: Open-Vocabulary Object Detection and Attributes Recognition.
Int. J. Comput. Vis., November, 2024

TOMGPT: Reliable Text-Only Training Approach for Cost-Effective Multi-modal Large Language Model.
ACM Trans. Knowl. Discov. Data, August, 2024

PiClick: Picking the desired mask from multiple candidates in click-based interactive segmentation.
Neurocomputing, 2024

Single Trajectory Distillation for Accelerating Image and Video Style Transfer.
CoRR, 2024

LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant.
CoRR, 2024

ScalingNote: Scaling up Retrievers with Large Language Models for Real-World Dense Retrieval.
CoRR, 2024

GPRec: Bi-level User Modeling for Deep Recommenders.
CoRR, 2024

Benchmarking Large Language Models for Conversational Question Answering in Multi-instructional Documents.
CoRR, 2024

P4Q: Learning to Prompt for Quantization in Visual-language Models.
CoRR, 2024

Target-Driven Distillation: Consistency Distillation with Target Timestep Selection and Decoupled Guidance.
CoRR, 2024

Mining Open Semantics from CLIP: A Relation Transition Perspective for Few-Shot Learning.
CoRR, 2024

From a Social Cognitive Perspective: Context-aware Visual Social Relationship Recognition.
CoRR, 2024

NoteLLM-2: Multimodal Large Representation Models for Recommendation.
CoRR, 2024

From Image to Video, what do we need in multimodal LLMs?
CoRR, 2024

Agent Group Chat: An Interactive Group Chat Simulacra For Better Eliciting Collective Emergent Behavior.
CoRR, 2024

StableGarment: Garment-Centric Generation via Stable Diffusion.
CoRR, 2024

Stable-Makeup: When Real-World Makeup Transfer Meets Diffusion Model.
CoRR, 2024

NoteLLM: A Retrievable Large Language Model for Note Recommendation.
CoRR, 2024

NoteLLM: A Retrievable Large Language Model for Note Recommendation.
Proceedings of the Companion Proceedings of the ACM on Web Conference 2024, 2024

Vript: A Video Is Worth Thousands of Words.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Small-loss Adaptive Regret for Online Convex Optimization.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Efficient Stochastic Approximation of Minimax Excess Risk Optimization.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Knowledge-Enhanced Multi-perspective Incongruity Perception Network for Multimodal Sarcasm Detection.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

Caseg: Clip-Based Action Segmentation With Learnable Text Prompt.
Proceedings of the IEEE International Conference on Image Processing, 2024

Bi-Level User Modeling for Deep Recommenders.
Proceedings of the IEEE International Conference on Data Mining, 2024

VISA: Reasoning Video Object Segmentation via Large Language Models.
Proceedings of the Computer Vision - ECCV 2024, 2024

SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

ZONE: Zero-Shot Instruction-Guided Local Editing.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

Controllable Mind Visual Diffusion Model.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

AQ-DETR: Low-Bit Quantized Detection Transformer with Auxiliary Queries.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Optimizing traffic efficiency via a reinforcement learning approach based on time allocation.
Int. J. Mach. Learn. Cybern., October, 2023

ZONE: Zero-Shot Instruction-Guided Local Editing.
CoRR, 2023

PiClick: Picking the desired mask in click-based interactive segmentation.
CoRR, 2023

MVP-SEG: Multi-View Prompt Learning for Open-Vocabulary Semantic Segmentation.
CoRR, 2023

Towards Open-Vocabulary Video Instance Segmentation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

2INER: Instructive and In-Context Learning on Few-Shot Named Entity Recognition.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

OvarNet: Towards Open-Vocabulary Object Attribute Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Horizontal-to-Vertical Video Conversion.
IEEE Trans. Multim., 2022

End-to-End Temporal Action Detection With Transformer.
IEEE Trans. Image Process., 2022

Occluded Video Instance Segmentation: A Benchmark.
Int. J. Comput. Vis., 2022

Non-stationary Dueling Bandits for Online Learning to Rank.
Proceedings of the Web and Big Data - 6th International Joint Conference, 2022

2021
Socializing the Videos: A Multimodal Approach for Social Relation Recognition.
ACM Trans. Multim. Comput. Commun. Appl., 2021

End-to-end Temporal Action Detection with Transformer.
CoRR, 2021

Occluded Video Instance Segmentation.
CoRR, 2021

Pyramid Self-attention for Semantic Segmentation.
Proceedings of the Pattern Recognition and Computer Vision - 4th Chinese Conference, 2021

Occluded Video Instance Segmentation: Dataset and ICCV 2021 Challenge.
Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

Linking the Characters: Video-oriented Social Graph Generation via Hierarchical-cumulative GCN.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Decoupled IoU Regression for Object Detection.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Deep Interactive Video Inpainting: An Invisibility Cloak for Harry Potter.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Salient Object Ranking with Position-Preserved Attention.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

SwiftNet: Real-Time Video Object Segmentation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Multi-Shot Temporal Event Localization: A Benchmark.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Stochastic Bandits with Graph Feedback in Non-Stationary Environments.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
LAMP: Label Augmented Multimodal Pretraining.
CoRR, 2020

Spherical Knowledge Distillation.
CoRR, 2020

Adapting to Smoothness: A More Universal Algorithm for Online Convex Optimization.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Correlation Maximized Structural Similarity Loss for Semantic Segmentation.
CoRR, 2019

Multi-Objective Generalized Linear Bandits.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Optimal Algorithms for Lipschitz Bandits with Heavy-tailed Rewards.
Proceedings of the 36th International Conference on Machine Learning, 2019

2016
Atom Decomposition with Adaptive Basis Selection Strategy for Matrix Completion.
ACM Trans. Multim. Comput. Commun. Appl., 2016

Online robust principal component analysis via truncated nuclear norm regularization.
Neurocomputing, 2016

Atom Decomposition Based Subgradient Descent for matrix classification.
Neurocomputing, 2016

2015
Large scale multi-class classification with truncated nuclear norm regularization.
Neurocomputing, 2015

Event Recovery by Faster Truncated Nuclear Norm Minimization.
Proceedings of the Intelligence Science and Big Data Engineering. Big Data and Machine Learning Techniques, 2015

2014
Fast and Accurate Hashing Via Iterative Nearest Neighbors Expansion.
IEEE Trans. Cybern., 2014

Matrix Completion for Cross-view Pairwise Constraint Propagation.
Proceedings of the ACM International Conference on Multimedia, MM '14, Orlando, FL, USA, November 03, 2014

Iterative Multi-View Hashing for Cross Media Indexing.
Proceedings of the ACM International Conference on Multimedia, MM '14, Orlando, FL, USA, November 03, 2014

Sparse Learning for Stochastic Composite Optimization.
Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, 2014

2013
Fast and Accurate Matrix Completion via Truncated Nuclear Norm Regularization.
IEEE Trans. Pattern Anal. Mach. Intell., 2013

Salient Object Detection via Fast Iterative Truncated Nuclear Norm Recovery.
Proceedings of the Intelligence Science and Big Data Engineering, 2013

A Unified Approximate Nearest Neighbor Search Scheme by Combining Data Structure and Hashing.
Proceedings of the IJCAI 2013, 2013

Active Learning Based on Local Representation.
Proceedings of the IJCAI 2013, 2013

Complementary Projection Hashing.
Proceedings of the IEEE International Conference on Computer Vision, 2013

2012
Accelerated singular value thresholding for matrix completion.
Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2012

Matrix completion by Truncated Nuclear Norm Regularization.
Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012


  Loading...