Yao Hu

Orcid: 0009-0006-1274-7111

Affiliations:
  • Xiaohongshu Inc., Beijing, China
  • Zhejiang University of Technology, Hangzhou, China (2021 - 2024)
  • Zhejiang University, Hangzhou, China (PhD 2015)


According to our database1, Yao Hu authored at least 211 papers between 2012 and 2026.

Collaborative distances:

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
Preference-Aware Rubric Learning for Personalized Evaluation.
CoRR, May, 2026

AgentCVR: Active Multi-Agent Cross-Video Reasoning via Script-Simulated Reinforcement Learning.
CoRR, May, 2026

Share More, Search Less: Collaborative Parallel Thinking for Efficient Test-Time Scaling.
CoRR, May, 2026

Tournament-GRPO: Group-Wise Tournament Rewards for Reinforcement Learning in Open-Ended Long-Form Generation.
CoRR, May, 2026

UnityMAS-O: A General RL Optimization Framework for LLM-Based Multi-Agent Systems.
CoRR, May, 2026

VideoSeeker: Incentivizing Instance-level Video Understanding via Native Agentic Tool Invocation.
CoRR, May, 2026

CCD-Level and Load-Aware Thread Orchestration for In-Memory Vector ANNS on Multi-Core CPUs.
CoRR, May, 2026

HTPO: Towards Exploration-Exploitation Balanced Policy Optimization via Hierarchical Token-level Objective Control.
CoRR, May, 2026

Knowledge-Graph Paths as Intermediate Supervision for Self-Evolving Search Agents.
CoRR, May, 2026

MUSE: Resolving Manifold Misalignment in Visual Tokenization via Topological Orthogonality.
CoRR, May, 2026

Focus on the Core: Empowering Diffusion Large Language Models by Self-Contrast.
CoRR, May, 2026

From a Social Cognitive Perspective: Context-Aware Visual Social Relationship Recognition.
IEEE Trans. Neural Networks Learn. Syst., April, 2026

Edit Where You Mean: Region-Aware Adapter Injection for Mask-Free Local Image Editing.
CoRR, April, 2026

EditCaption: Human-Aligned Instruction Synthesis for Image Editing via Supervised Fine-Tuning and Direct Preference Optimization.
CoRR, April, 2026

SPARD: Self-Paced Curriculum for RL Alignment via Integrating Reward Dynamics and Data Utility.
CoRR, April, 2026

PRAISE: Prefix-Based Rollout Reuse in Agentic Search Training.
CoRR, April, 2026

PROMO: Promptable Outfitting for Efficient High-Fidelity Virtual Try-On.
CoRR, March, 2026

Aligning Large Language Models with Searcher Preferences.
CoRR, March, 2026

FireRedASR2S: A State-of-the-Art Industrial-Grade All-in-One Automatic Speech Recognition System.
CoRR, March, 2026

GCAgent: Enhancing Group Chat Communication through Dialogue Agents System.
CoRR, March, 2026

FireRed-OCR Technical Report.
CoRR, March, 2026

IdGlow: Dynamic Identity Modulation for Multi-Subject Generation.
CoRR, March, 2026

FireRed-Image-Edit-1.0 Technical Report.
CoRR, February, 2026

LASER: An Efficient Target-Aware Segmented Attention Framework for End-to-End Long Sequence Modeling.
CoRR, February, 2026

QP-OneModel: A Unified Generative LLM for Multi-Task Query Understanding in Xiaohongshu Search.
CoRR, February, 2026

CLIP-Map: Structured Matrix Mapping for Parameter-Efficient CLIP Compression.
CoRR, February, 2026

Weaver: End-to-End Agentic System Training for Video Interleaved Reasoning.
CoRR, February, 2026

IVC-Prune: Revealing the Implicit Visual Coordinates in LVLMs for Vision Token Pruning.
CoRR, February, 2026

Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models.
CoRR, February, 2026

Decouple Searching from Training: Scaling Data Mixing via Model Merging for Large Language Model Pre-training.
CoRR, February, 2026

Learning More from Less: Unlocking Internal Representations for Benchmark Compression.
CoRR, February, 2026

Benchmarking Machine Translation on Chinese Social Media Texts.
CoRR, January, 2026

Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models.
CoRR, January, 2026

Self-Compression of Chain-of-Thought via Multi-Agent Reinforcement Learning.
CoRR, January, 2026

JADE: Bridging the Strategic-Operational Gap in Dynamic Agentic RAG.
CoRR, January, 2026

Do Not Waste Your Rollouts: Recycling Search Experience for Efficient Test-Time Scaling.
CoRR, January, 2026

Robust Tool Use via Fission-GRPO: Learning to Recover from Execution Errors.
CoRR, January, 2026

Benchmark^2: Systematic Evaluation of LLM Benchmarks.
CoRR, January, 2026

EComStage: Stage-wise and Orientation-specific Benchmarking for Large Language Models in E-commerce.
CoRR, January, 2026

HyMiRec: A Hybrid Multi-interest Learning Framework for LLM-based Sequential Recommendation.
Proceedings of the ACM Web Conference 2026, 2026

Causality Enhancement for Cross-Domain Recommendation.
Proceedings of the ACM Web Conference 2026, 2026

A Creator-Aware Recommendation System for Content Platforms.
Proceedings of the ACM Web Conference 2026, 2026

Guiding the Recommender: Information-Aware Auto-Bidding for Content Promotion.
Proceedings of the Abstracts of the 2026 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, 2026

Optimizing Generative Ranking Relevance via Reinforcement Learning in Xiaohongshu Search.
Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.1, 2026

SPARD: Self-Paced Curriculum for RL Alignment via Integrating Reward Dynamics and Data Utility.
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026

Robust Tool Use via Fission-GRPO: Learning to Recover from Execution Errors.
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026

LLM-Powered Benchmark Factory: Reliable, Generic, and Efficient.
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026

Anti-Length Shift: Dynamic Outlier Truncation for Training Efficient Reasoning Models.
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026

CrossVid: A Comprehensive Benchmark for Evaluating Cross-Video Reasoning in Multimodal Large Language Models.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
RedOne 2.0: Rethinking Domain-specific LLM Post-Training in Social Networking Services.
CoRR, November, 2025

TeaRAG: A Token-Efficient Agentic Retrieval-Augmented Generation Framework.
CoRR, November, 2025

Cross-Scenario Unified Modeling of User Interests at Billion Scale.
CoRR, October, 2025

GIR-Bench: Versatile Benchmark for Generating Images with Reasoning.
CoRR, October, 2025

Diagnosing and Mitigating System Bias in Self-Rewarding RL.
CoRR, October, 2025

PatternKV: Flattening KV Representation Expands Quantization Headroom.
CoRR, October, 2025

RealBench: A Chinese Multi-image Understanding Benchmark Close to Real-world Scenarios.
CoRR, September, 2025

InstanceAssemble: Layout-Aware Image Generation via Instance Assembling Attention.
CoRR, September, 2025

Interleaving Reasoning for Better Text-to-Image Generation.
CoRR, September, 2025

FireRedChat: A Pluggable, Full-Duplex Voice Interaction System with Cascaded and Semi-Cascaded Implementations.
CoRR, September, 2025

FireRedTTS-2: Towards Long Conversational Speech Generation for Podcast and Chatbot.
CoRR, September, 2025

Decomposed Reasoning with Reinforcement Learning for Relevance Assessment in UGC Platforms.
CoRR, August, 2025

RedOne: Revealing Domain-specific LLM Post-Training in Social Networking Services.
CoRR, July, 2025

Flux-Sculptor: Text-Driven Rich-Attribute Portrait Editing through Decomposed Spatial Flow Control.
CoRR, July, 2025

AgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System Need.
CoRR, June, 2025

Plan Your Travel and Travel with Your Plan: Wide-Horizon Planning and Evaluation via LLM.
CoRR, June, 2025

Progressive Scaling Visual Object Tracking.
CoRR, May, 2025

MT<sup>3</sup>: Scaling MLLM-based Text Image Machine Translation via Multi-Task Reinforcement Learning.
CoRR, May, 2025

MT-R1-Zero: Advancing LLM-based Machine Translation via R1-Zero-like Reinforcement Learning.
CoRR, April, 2025

SocioVerse: A World Model for Social Simulation Powered by LLM Agents and A Pool of 10 Million Real-World Users.
CoRR, April, 2025

Redefining Machine Translation on Social Network Services with Large Language Models.
CoRR, April, 2025

Hierarchical Self-Distilled Feature Learning for Fine-Grained Visual Categorization.
IEEE Trans. Neural Networks Learn. Syst., March, 2025

FireRedTTS-1S: An Upgraded Streamable Foundation Text-to-Speech System.
CoRR, March, 2025

CQ-DINO: Mitigating Gradient Dilution via Category Queries for Vast Vocabulary Object Detection.
CoRR, March, 2025

VLRMBench: A Comprehensive and Challenging Benchmark for Vision-Language Reward Models.
CoRR, March, 2025

Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models.
CoRR, March, 2025

WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs.
CoRR, February, 2025

FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration.
CoRR, January, 2025

DynamicFace: High-Quality and Consistent Video Face Swapping using Composable 3D Facial Priors.
CoRR, January, 2025

Scenario-Aware Multimodal Chain-of-Thought Prompting for Rationales of VideoSocial Relations.
IEEE Trans. Circuits Syst. Video Technol., 2025

Scalable Overload-Aware Graph-Based Index Construction for 10-Billion-Scale Vector Similarity Search.
Proceedings of the Companion Proceedings of the ACM on Web Conference 2025, 2025

PaRT: Enhancing Proactive Social Chatbots with Personalized Real-Time Retrieval.
Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2025

Qilin: A Multimodal Information Retrieval Dataset with APP-level User Sessions.
Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2025

Multi-Granularity Distribution Modeling for Video Watch Time Prediction via Exponential-Gaussian Mixture Network.
Proceedings of the Nineteenth ACM Conference on Recommender Systems, 2025

Mind the Quote: Enabling Quotation-Aware Dialogue in LLMs via Plug-and-Play Modules.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

RAG-IGBench: Innovative Evaluation for RAG-based Interleaved Generation in Open-domain Question Answering.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Silencer: From Discovery to Mitigation of Self-Bias in LLM-as-Benchmark-Generator.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Wide-Horizon Thinking and Simulation-Based Evaluation for Real-World LLM Planning with Multifaceted Constraints.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

InstanceAssemble: Layout-Aware Image Generation via Instance Assembling Attention.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Every Rollout Counts: Optimal Resource Allocation for Efficient Test-Time Scaling.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

MoDification: Mixture of Depths Made Easy.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

CogLM: Tracking Cognitive Development of Large Language Models.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

Make Every Penny Count: Difficulty-Adaptive Self-Consistency for Cost-Efficient Reasoning.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, USA, April 29, 2025

Single Trajectory Distillation for Accelerating Image and Video Style Transfer.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025

NoteLLM-2: Multimodal Large Representation Models for Recommendation.
Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, V.1, 2025

Improving Synthetic Image Detection Towards Generalization: An Image Transformation Perspective.
Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, V.1, 2025

SNS-Bench: Defining, Building, and Assessing Capabilities of Large Language Models in Social Networking Services.
Proceedings of the Forty-second International Conference on Machine Learning, 2025

UniCBE: An Uniformity-driven Comparing Based Evaluation Framework with Unified Multi-Objective Optimization.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

A Sanity Check for AI-generated Image Detection.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

DynaPrompt: Dynamic Test-Time Prompt Tuning.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Object-Centric Video Question Answering with Visual Grounding and Referring.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

DynamicFace: High-Quality and Consistent Face Swapping for Image and Video Using Composable 3D Facial Priors.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

VLRMBench: A Comprehensive and Challenging Benchmark for Vision-Language Reward Models.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

RealBench: A Chinese Multi-image Understanding Benchmark Close to Real-world Scenarios.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

Think-Search-Patch: A Retrieval-Augmented Reasoning Framework for Repository-Level Code Repair.
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

InsBank: Evolving Instruction Subset for Ongoing Alignment.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

EcoLANG: Efficient and Effective Agent Communication Language Induction for Social Simulation.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

Speculative Decoding for Multi-Sample Inference.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

DecEx-RAG: Boosting Agentic Retrieval-Augmented Generation with Decision and Execution Optimization via Process Supervision.
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

SelfAug: Mitigating Catastrophic Forgetting in Retrieval-Augmented Generation via Distribution Self-Alignment.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

SelfRACG: Enabling LLMs to Self-Express and Retrieve for Code Generation.
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Understanding the RoPE Extensions of Long-Context LLMs: An Attention Perspective.
Proceedings of the 31st International Conference on Computational Linguistics, 2025

ZigZagKV: Dynamic KV Cache Compression for Long-context Modeling based on Layer Uncertainty.
Proceedings of the 31st International Conference on Computational Linguistics, 2025

Towards the Law of Capacity Gap in Distilling Language Models.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Beyond One-Size-Fits-All: Tailored Benchmarks for Efficient Evaluation.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

From Sub-Ability Diagnosis to Human-Aligned Generation: Bridging the Gap for Text Length Control via MarkerGen.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

iPET: An Interactive Emotional Companion Dialogue System with LLM-Powered Virtual Pet World Simulation.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), 2025

Revisiting Self-Consistency from Dynamic Distributional Alignment Perspective on Answer Aggregation.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024
OV-VIS: Open-Vocabulary Video Instance Segmentation.
Int. J. Comput. Vis., November, 2024

OV-DAR: Open-Vocabulary Object Detection and Attributes Recognition.
Int. J. Comput. Vis., November, 2024

TOMGPT: Reliable Text-Only Training Approach for Cost-Effective Multi-modal Large Language Model.
ACM Trans. Knowl. Discov. Data, August, 2024

PiClick: Picking the desired mask from multiple candidates in click-based interactive segmentation.
Neurocomputing, 2024

LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant.
CoRR, 2024

ScalingNote: Scaling up Retrievers with Large Language Models for Real-World Dense Retrieval.
CoRR, 2024

GPRec: Bi-level User Modeling for Deep Recommenders.
CoRR, 2024

Benchmarking Large Language Models for Conversational Question Answering in Multi-instructional Documents.
CoRR, 2024

P4Q: Learning to Prompt for Quantization in Visual-language Models.
CoRR, 2024

Target-Driven Distillation: Consistency Distillation with Target Timestep Selection and Decoupled Guidance.
CoRR, 2024

Make Every Penny Count: Difficulty-Adaptive Self-Consistency for Cost-Efficient Reasoning.
CoRR, 2024

Mining Open Semantics from CLIP: A Relation Transition Perspective for Few-Shot Learning.
CoRR, 2024

From a Social Cognitive Perspective: Context-aware Visual Social Relationship Recognition.
CoRR, 2024

NoteLLM-2: Multimodal Large Representation Models for Recommendation.
CoRR, 2024

From Image to Video, what do we need in multimodal LLMs?
CoRR, 2024

Agent Group Chat: An Interactive Group Chat Simulacra For Better Eliciting Collective Emergent Behavior.
CoRR, 2024

StableGarment: Garment-Centric Generation via Stable Diffusion.
CoRR, 2024

Stable-Makeup: When Real-World Makeup Transfer Meets Diffusion Model.
CoRR, 2024

NoteLLM: A Retrievable Large Language Model for Note Recommendation.
CoRR, 2024

NoteLLM: A Retrievable Large Language Model for Note Recommendation.
Proceedings of the Companion Proceedings of the ACM on Web Conference 2024, 2024

Vript: A Video Is Worth Thousands of Words.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Instruction Embedding: Latent Representations of Instructions Towards Task Identification.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Small-loss Adaptive Regret for Online Convex Optimization.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Efficient Stochastic Approximation of Minimax Excess Risk Optimization.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Knowledge-Enhanced Multi-perspective Incongruity Perception Network for Multimodal Sarcasm Detection.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

Caseg: Clip-Based Action Segmentation With Learnable Text Prompt.
Proceedings of the IEEE International Conference on Image Processing, 2024

Bi-Level User Modeling for Deep Recommenders.
Proceedings of the IEEE International Conference on Data Mining, 2024

Focused Large Language Models are Stable Many-Shot Learners.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

VISA: Reasoning Video Object Segmentation via Large Language Models.
Proceedings of the Computer Vision - ECCV 2024, 2024

SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

ZONE: Zero-Shot Instruction-Guided Local Editing.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

BatchEval: Towards Human-like Text Evaluation.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Poor-Supervised Evaluation for SuperLLM via Mutual Consistency.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

Integrate the Essence and Eliminate the Dross: Fine-Grained Self-Consistency for Free-Form Language Generation.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Controllable Mind Visual Diffusion Model.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

AQ-DETR: Low-Bit Quantized Detection Transformer with Auxiliary Queries.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Optimizing traffic efficiency via a reinforcement learning approach based on time allocation.
Int. J. Mach. Learn. Cybern., October, 2023

ZONE: Zero-Shot Instruction-Guided Local Editing.
CoRR, 2023

PiClick: Picking the desired mask in click-based interactive segmentation.
CoRR, 2023

MVP-SEG: Multi-View Prompt Learning for Open-Vocabulary Semantic Segmentation.
CoRR, 2023

Towards Open-Vocabulary Video Instance Segmentation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

2INER: Instructive and In-Context Learning on Few-Shot Named Entity Recognition.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

OvarNet: Towards Open-Vocabulary Object Attribute Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Horizontal-to-Vertical Video Conversion.
IEEE Trans. Multim., 2022

End-to-End Temporal Action Detection With Transformer.
IEEE Trans. Image Process., 2022

Occluded Video Instance Segmentation: A Benchmark.
Int. J. Comput. Vis., 2022

Non-stationary Dueling Bandits for Online Learning to Rank.
Proceedings of the Web and Big Data - 6th International Joint Conference, 2022

2021
Socializing the Videos: A Multimodal Approach for Social Relation Recognition.
ACM Trans. Multim. Comput. Commun. Appl., 2021

Boosting Entity-aware Image Captioning with Multi-modal Knowledge Graph.
CoRR, 2021

End-to-end Temporal Action Detection with Transformer.
CoRR, 2021

Occluded Video Instance Segmentation.
CoRR, 2021

Pyramid Self-attention for Semantic Segmentation.
Proceedings of the Pattern Recognition and Computer Vision - 4th Chinese Conference, 2021

Occluded Video Instance Segmentation: Dataset and ICCV 2021 Challenge.
Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

Linking the Characters: Video-oriented Social Graph Generation via Hierarchical-cumulative GCN.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Decoupled IoU Regression for Object Detection.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Deep Interactive Video Inpainting: An Invisibility Cloak for Harry Potter.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Salient Object Ranking with Position-Preserved Attention.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

SwiftNet: Real-Time Video Object Segmentation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Multi-Shot Temporal Event Localization: A Benchmark.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Stochastic Bandits with Graph Feedback in Non-Stationary Environments.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

Spatial-temporal Causal Inference for Partial Image-to-video Adaptation.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
LAMP: Label Augmented Multimodal Pretraining.
CoRR, 2020

Spherical Knowledge Distillation.
CoRR, 2020

Multi-label Zero-shot Classification by Learning to Transfer from External Knowledge.
CoRR, 2020

Modeling Heterogeneous Statistical Patterns in High-dimensional Data by Adversarial Distributions: An Unsupervised Generative Framework.
Proceedings of the WWW '20: The Web Conference 2020, Taipei, Taiwan, April 20-24, 2020, 2020

Feature-Induced Manifold Disambiguation for Multi-View Partial Multi-label Learning.
Proceedings of the KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2020

Adapting to Smoothness: A More Universal Algorithm for Online Convex Optimization.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Deep Time-Stream Framework for Click-through Rate Prediction by Tracking Interest Evolution.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Uncertainty Aware Graph Gaussian Process for Semi-Supervised Learning.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Multi-View Partial Multi-Label Learning with Graph-Based Disambiguation.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Correlation Maximized Structural Similarity Loss for Semantic Segmentation.
CoRR, 2019

Multi-View Multi-Label Learning with View-Specific Information Extraction.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Multi-Objective Generalized Linear Bandits.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Multi-View Active Learning for Video Recommendation.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Optimal Algorithms for Lipschitz Bandits with Heavy-tailed Rewards.
Proceedings of the 36th International Conference on Machine Learning, 2019

2016
Atom Decomposition with Adaptive Basis Selection Strategy for Matrix Completion.
ACM Trans. Multim. Comput. Commun. Appl., 2016

Online robust principal component analysis via truncated nuclear norm regularization.
Neurocomputing, 2016

Atom Decomposition Based Subgradient Descent for matrix classification.
Neurocomputing, 2016

2015
Large scale multi-class classification with truncated nuclear norm regularization.
Neurocomputing, 2015

Event Recovery by Faster Truncated Nuclear Norm Minimization.
Proceedings of the Intelligence Science and Big Data Engineering. Big Data and Machine Learning Techniques, 2015

2014
Fast and Accurate Hashing Via Iterative Nearest Neighbors Expansion.
IEEE Trans. Cybern., 2014

Matrix Completion for Cross-view Pairwise Constraint Propagation.
Proceedings of the ACM International Conference on Multimedia, MM '14, Orlando, FL, USA, November 03, 2014

Iterative Multi-View Hashing for Cross Media Indexing.
Proceedings of the ACM International Conference on Multimedia, MM '14, Orlando, FL, USA, November 03, 2014

Sparse Learning for Stochastic Composite Optimization.
Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, 2014

2013
Fast and Accurate Matrix Completion via Truncated Nuclear Norm Regularization.
IEEE Trans. Pattern Anal. Mach. Intell., 2013

Salient Object Detection via Fast Iterative Truncated Nuclear Norm Recovery.
Proceedings of the Intelligence Science and Big Data Engineering, 2013

A Unified Approximate Nearest Neighbor Search Scheme by Combining Data Structure and Hashing.
Proceedings of the IJCAI 2013, 2013

Active Learning Based on Local Representation.
Proceedings of the IJCAI 2013, 2013

Complementary Projection Hashing.
Proceedings of the IEEE International Conference on Computer Vision, 2013

2012
Accelerated singular value thresholding for matrix completion.
Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2012

Matrix completion by Truncated Nuclear Norm Regularization.
Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012


  Loading...