We stand with Ukraine

We stand with Ukraine

Qingpeng Cai

Orcid: 0000-0001-6451-9299

Affiliations:

Kuaishou Technology, Beijing, China
Alibaba Group (former)
Tsinghua University, China (former)

According to our database¹, Qingpeng Cai authored at least 69 papers between 2016 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

Online presence:

On csauthors.net:

Bibliography

2026

Reinforced Preference Optimization for Reasoning-Augmented Recommendations.

[DOI]

,

,

,

,

,

,

,

,

,

CoRR, May, 2026

Position-Aware Drafting for Inference Acceleration in LLM-Based Generative List-Wise Recommendation.

[DOI]

,

,

,

,

,

,

CoRR, April, 2026

LBM: Hierarchical Large Auto-Bidding Model via Reasoning and Acting.

[DOI]

,

,

,

,

,

CoRR, March, 2026

Phase-Aware Mixture of Experts for Agentic Reinforcement Learning.

[DOI]

,

,

,

,

,

,

CoRR, February, 2026

Hierarchical Semantic RL: Tackling the Problem of Dynamic Action Space for RL-based Recommendations.

[DOI]

,

,

,

,

,

,

,

Proceedings of the ACM Web Conference 2026, 2026

LBM: Hierarchical Large Auto-Bidding Model via Reasoning and Acting.

[DOI]

,

,

,

,

,

,

Proceedings of the ACM Web Conference 2026, 2026

TemporalExpertNet: Cross-Temporal Knowledge Reuse for Promotion-Aware CVR Prediction.

[DOI]

,

,

,

,

,

,

,

Proceedings of the Nineteenth ACM International Conference on Web Search and Data Mining, 2026

TrackRec: Iterative Alternating Feedback with Chain-of-Thought via Preference Alignment for Recommendation.

[DOI]

,

,

,

,

,

,

,

Proceedings of the Database Systems for Advanced Applications, 2026

2025

MindRec: A Diffusion-driven Coarse-to-Fine Paradigm for Generative Recommendation.

[DOI]

,

,

,

,

,

,

,

CoRR, November, 2025

Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards.

[DOI]

,

,

,

,

,

,

CoRR, September, 2025

Generative Auto-Bidding in Large-Scale Competitive Auctions via Diffusion Completer-Aligner.

[DOI]

,

,

,

,

,

,

,

,

,

CoRR, September, 2025

Navigate the Unknown: Enhancing LLM Reasoning with Intrinsic Motivation Guided Exploration.

[DOI]

,

,

,

,

,

,

,

CoRR, May, 2025

Generative Auto-Bidding with Value-Guided Explorations.

[DOI]

,

,

,

,

,

,

,

,

,

,

CoRR, April, 2025

From Principles to Applications: A Comprehensive Survey of Discrete Tokenizers in Generation, Comprehension, Recommendation, and Information Retrieval.

[DOI]

,

,

,

,

,

,

,

,

CoRR, February, 2025

Future-Conditioned Recommendations with Multi-Objective Controllable Decision Transformer.

[DOI]

,

,

,

,

,

,

,

,

CoRR, January, 2025

AURO: Reinforcement Learning for Adaptive User Retention Optimization in Recommender Systems.

[DOI]

,

,

,

,

,

,

Proceedings of the ACM on Web Conference 2025, 2025

Value Function Decomposition in Markov Recommendation Process.

[DOI]

,

,

,

,

,

,

Proceedings of the ACM on Web Conference 2025, 2025

GAS: Generative Auto-bidding with Post-training Search.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the Companion Proceedings of the ACM on Web Conference 2025, 2025

DLCRec: A Novel Approach for Managing Diversity in LLM-Based Recommender Systems.

[DOI]

,

,

,

,

,

Proceedings of the Eighteenth ACM International Conference on Web Search and Data Mining, 2025

AgentIR: 2nd Workshop on Agent-based Information Retrieval.

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2025

Generative Auto-Bidding with Value-Guided Explorations.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2025

Random Policy Evaluation Uncovers Policies of Generative Flow Networks.

[DOI]

,

Emmanuel Bengio

,

,

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Flow Factorization for Efficient Generative Flow Networks.

[DOI]

,

,

,

,

,

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

LLM-Powered User Simulator for Recommender System.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

ACQ: A Unified Framework for Automated Programmatic Creativity in Online Advertising.

[DOI]

,

,

,

,

,

,

CoRR, 2024

LDACP: Long-Delayed Ad Conversions Prediction Model for Bidding Strategy.

[DOI]

,

,

,

,

,

,

,

,

,

,

CoRR, 2024

Rectifying Reinforcement Learning for Reward Matching.

[DOI]

,

Emmanuel Bengio

,

,

CoRR, 2024

Bifurcated Generative Flow Networks.

[DOI]

,

,

,

,

CoRR, 2024

M3oE: Multi-Domain Multi-Task Mixture-of Experts Recommendation Framework.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

CoRR, 2024

Future Impact Decomposition in Request-level Recommendations.

[DOI]

,

,

,

,

,

,

,

CoRR, 2024

M<sup>3</sup>oE: Multi-Domain Multi-Task Mixture-of Experts Recommendation Framework.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024

Sequential Recommendation for Optimizing Both Immediate Feedback and Long-term Retention.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024

AgentIR: 1st Workshop on Agent-based Information Retrieval.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024

Future Impact Decomposition in Request-level Recommendations.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024

Modeling User Retention through Generative Flow Networks.

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024

2023

AdaRec: Adaptive Sequential Recommendation for Reinforcing Long-term User Engagement.

[DOI]

,

,

,

,

,

,

,

CoRR, 2023

A Large Language Model Enhanced Conversational Recommender System.

[DOI]

,

,

,

,

,

,

,

CoRR, 2023

Multi-Task Recommendations with Reinforcement Learning.

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the ACM Web Conference 2023, 2023

Exploration and Regularization of the Latent Action Space in Recommendation.

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the ACM Web Conference 2023, 2023

Two-Stage Constrained Actor-Critic for Short Video Recommendation.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the ACM Web Conference 2023, 2023

Reinforcing User Retention in a Billion Scale Short Video Recommender System.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the Companion Proceedings of the ACM Web Conference 2023, 2023

KuaiSim: A Comprehensive Simulator for Recommender Systems.

[DOI]

,

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

State Regularized Policy Optimization on Data with Dynamics Shift.

[DOI]

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

PrefRec: Recommender Systems with Human Preferences for Reinforcing Long-term User Engagement.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023

Generative Flow Network for Listwise Recommendation.

[DOI]

,

,

,

,

Julian J. McAuley

,

,

,

Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023

ResAct: Reinforcing Long-term Engagement in Sequential Recommendation with Residual Actor.

[DOI]

,

,

,

,

,

,

Proceedings of the Eleventh International Conference on Learning Representations, 2023

2022

PrefRec: Preference-based Recommender Systems for Reinforcing Long-term User Engagement.

[DOI]

,

,

,

,

,

,

,

CoRR, 2022

ResAct: Reinforcing Long-term Engagement in Sequential Recommendation with Residual Actor.

[DOI]

,

,

,

,

,

CoRR, 2022

Constrained Reinforcement Learning for Short Video Recommendation.

[DOI]

,

,

,

,

,

,

,

CoRR, 2022

2021

Exploration in policy optimization through multiple paths.

[DOI]

,

,

Auton. Agents Multi Agent Syst., 2021

2020

Generator and Critic: A Deep Reinforcement Learning Approach for Slate Re-ranking in E-commerce.

[DOI]

,

,

,

,

,

CoRR, 2020

Softmax Deep Double Deterministic Policy Gradients.

[DOI]

,

,

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Reinforcement Learning with Dynamic Boltzmann Softmax Updates.

[DOI]

,

,

,

,

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

Multi-Path Policy Optimization.

[DOI]

,

,

Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems, 2020

Deterministic Value-Policy Gradients.

[DOI]

,

,

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019

Reinforcement Learning Driven Heuristic Optimization.

[DOI]

,

,

Azalia Mirhoseini

,

,

,

CoRR, 2019

Reinforcement Learning with Dynamic Boltzmann Softmax Updates.

[DOI]

,

,

,

,

,

CoRR, 2019

Policy Gradients for Contextual Recommendations.

[DOI]

,

,

,

,

Proceedings of the World Wide Web Conference, 2019

Policy Optimization with Model-Based Explorations.

[DOI]

,

,

,

,

,

,

,

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

A Deep Reinforcement Learning Framework for Rebalancing Dockless Bike Sharing Systems.

[DOI]

,

,

,

,

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018

Generalized deterministic policy gradient algorithms.

[DOI]

,

,

CoRR, 2018

Rebalancing Dockless Bike Sharing Systems.

[DOI]

,

,

,

,

CoRR, 2018

Policy Gradients for Contextual Bandits.

[DOI]

,

,

,

,

CoRR, 2018

Reinforcement Mechanism Design for e-commerce.

[DOI]

,

Aris Filos-Ratsikas

,

,

Proceedings of the 2018 World Wide Web Conference on World Wide Web, 2018

Ranking Mechanism Design for Price-setting Agents in E-commerce.

[DOI]

,

,

Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, 2018

Reinforcement Mechanism Design for Fraudulent Behaviour in e-Commerce.

[DOI]

,

Aris Filos-Ratsikas

,

,

Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017

Multi-armed Bandit Mechanism with Private Histories.

[DOI]

,

,

Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, 2017

2016

Mechanism Design for Personalized Recommender Systems.

[DOI]

,

Aris Filos-Ratsikas

,

,

Proceedings of the 10th ACM Conference on Recommender Systems, 2016

Facility Location with Minimax Envy.

[DOI]

,

Aris Filos-Ratsikas

,

Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016

Loading...