Beidi Chen

According to our database¹, Beidi Chen authored at least 90 papers between 2016 and 2025.

Collaborative distances:

Dijkstra number² of three.
Erdős number³ of three.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

RLBoost: Harvesting Preemptible Resources for Cost-Efficient Reinforcement Learning on LLMs.

[BibT_eX]

[DOI]

CoRR, October, 2025

When "Correct" Is Not Safe: Can We Trust Functionally Correct Patches Generated by Code Agents?

[BibT_eX]

[DOI]

CoRR, October, 2025

Prosperity before Collapse: How Far Can Off-Policy RL Reach with Stale Data on LLMs?

[BibT_eX]

[DOI]

Haizhong Zheng

Jiawei Zhao

Beidi Chen

CoRR, October, 2025

Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation.

[BibT_eX]

[DOI]

CoRR, June, 2025

Kinetics: Rethinking Test-Time Scaling Laws.

[BibT_eX]

[DOI]

CoRR, June, 2025

Act Only When It Pays: Efficient Reinforcement Learning for LLM Reasoning via Selective Rollouts.

[BibT_eX]

[DOI]

CoRR, June, 2025

Scalable LLM Math Reasoning Acceleration with Low-rank Distillation.

[BibT_eX]

[DOI]

CoRR, May, 2025

HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading.

[BibT_eX]

[DOI]

CoRR, February, 2025

GSM-Infinite: How Do Your LLMs Behave over Infinitely Increasing Context Length and Reasoning Complexity?

[BibT_eX]

[DOI]

CoRR, February, 2025

Reliable and Responsible Foundation Models.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2025

GSM-∞: How Do your LLMs Behave over Infinitely Increasing Reasoning Complexity and Context Length?

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Speculative Prefill: Turbocharging TTFT with Lightweight and Training-Free Token Importance Estimation.

[BibT_eX]

[DOI]

Jingyu Liu

Beidi Chen

Ce Zhang

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Memory Mosaics.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Zeroth-Order Fine-Tuning of LLMs with Transferable Static Sparsity.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

MagicPIG: LSH Sampling for Efficient LLM Generation.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

APE: Faster and Longer Context-Augmented Generation via Adaptive Parallel Encoding.

[BibT_eX]

[DOI]

Xinyu Yang

Tianqi Chen

Beidi Chen

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024

Sirius: Contextual Sparsity with Correction for Efficient LLMs.

[BibT_eX]

[DOI]

CoRR, 2024

MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding.

[BibT_eX]

[DOI]

CoRR, 2024

MINI-SEQUENCE TRANSFORMER: Optimizing Intermediate Memory for Long Sequences Training.

[BibT_eX]

[DOI]

CoRR, 2024

VcLLM: Video Codecs are Secretly Tensor Codecs.

[BibT_eX]

[DOI]

CoRR, 2024

It Takes Two: On the Seamlessness between Reward and Policy Model in RLHF.

[BibT_eX]

[DOI]

CoRR, 2024

Zeroth-Order Fine-Tuning of LLMs with Extreme Sparsity.

[BibT_eX]

[DOI]

CoRR, 2024

TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding.

[BibT_eX]

[DOI]

CoRR, 2024

Prompt-prompted Mixture of Experts for Efficient LLM Generation.

[BibT_eX]

[DOI]

Harry Dong

Beidi Chen

Yuejie Chi

CoRR, 2024

LLM Inference Unveiled: Survey and Roofline Model Insights.

[BibT_eX]

[DOI]

CoRR, 2024

Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding.

[BibT_eX]

[DOI]

CoRR, 2024

Learn To be Efficient: Build Structured Sparsity in Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

SIRIUS : Contexual Sparisty with Correction for Efficient LLMs.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Learn To be Efficient: Build Structured Sparsity in Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

S<sup>2</sup>FT: Efficient, Scalable and Generalizable LLM Fine-tuning by Structured Sparsity.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

SpecExec: Massively Parallel Speculative Decoding For Interactive LLM Inference on Consumer Devices.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Mini-Sequence Transformers: Optimizing Intermediate Memory for Long Sequences Training.

[BibT_eX]

[DOI]

Animashree Anandkumar

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

On the Surprising Effectiveness of Attention Transfer for Vision Transformers.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Sequoia: Scalable and Robust Speculative Decoding.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Nearest Neighbor Speculative Decoding for LLM Generation and Attribution.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Federated Black-box Prompt Tuning System for Large Language Models on the Edge.

[BibT_eX]

[DOI]

Proceedings of the 30th Annual International Conference on Mobile Computing and Networking, 2024

Q-Hitter: A Better Token Oracle for Efficient LLM Inference via Sparse-Quantized KV Cache.

[BibT_eX]

[DOI]

Proceedings of the Seventh Annual Conference on Machine Learning and Systems, 2024

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Soft Prompt Recovers Compressed LLMs, Transferably.

[BibT_eX]

[DOI]

Zhaozhuo Xu

Zirui Liu

Beidi Chen

Shaochen (Henry) Zhong

Anshumali Shrivastava

Proceedings of the Forty-first International Conference on Machine Learning, 2024

KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache.

[BibT_eX]

[DOI]

Zirui Liu

Jiayi Yuan

Hongye Jin

Shaochen (Henry) Zhong

Proceedings of the Forty-first International Conference on Machine Learning, 2024

HexGen: Generative Inference of Large Language Model over Heterogeneous Environment.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

LoCoCo: Dropping In Convolutions for Long Context Compression.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Efficient Streaming Language Models with Attention Sinks.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

JoMA: Demystifying Multilayer Transformers via Joint Dynamics of MLP and Attention.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

HexGen: Generative Inference of Foundation Model over Heterogeneous Decentralized Environment.

[BibT_eX]

[DOI]

CoRR, 2023

H<sub>2</sub>O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

InRank: Incremental Low-Rank Learning.

[BibT_eX]

[DOI]

CoRR, 2023

Compress, Then Prompt: Improving Accuracy-Efficiency Trade-off of LLM Inference with Transferable Prompt.

[BibT_eX]

[DOI]

Anshumali Shrivastava

CoRR, 2023

High-throughput Generative Inference of Large Language Models with a Single GPU.

[BibT_eX]

[DOI]

CoRR, 2023

Modeling Scattering Coefficients using Self-Attentive Complex Polynomials with Image-based Representation.

[BibT_eX]

[DOI]

CoRR, 2023

H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Laughing Hyena Distillery: Extracting Compact Recurrences From Convolutions.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

CocktailSGD: Fine-tuning Foundation Models over 500Mbps Networks.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time.

[BibT_eX]

[DOI]

Anshumali Shrivastava

Proceedings of the International Conference on Machine Learning, 2023

FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Fast Algorithms for a New Relaxation of Optimal Transport.

[BibT_eX]

[DOI]

Proceedings of the Thirty Sixth Annual Conference on Learning Theory, 2023

2022

Fine-tuning Language Models over Slow Networks using Activation Compression with Guarantees.

[BibT_eX]

[DOI]

CoRR, 2022

Decentralized Training of Foundation Models in Heterogeneous Environments.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Fine-tuning Language Models over Slow Networks using Activation Quantization with Guarantees.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

HALOS: Hashing Large Output Space for Cheap Inference.

[BibT_eX]

[DOI]

Anshumali Shrivastava

Proceedings of the Fifth Conference on Machine Learning and Systems, 2022

Monarch: Expressive Structured Matrices for Efficient and Accurate Training.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

Pixelated Butterfly: Simple and Efficient Sparse training for Neural Network Models.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

2021

Satellite Images and Deep Learning to Identify Discrepancy in Mailing Addresses with Applications to Census 2020 in Houston.

[BibT_eX]

[DOI]

Anshumali Shrivastava

CoRR, 2021

Scatterbrain: Unifying Sparse and Low-rank Attention Approximation.

[BibT_eX]

[DOI]

CoRR, 2021

Locality Sensitive Teaching.

[BibT_eX]

[DOI]

Anshumali Shrivastava

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Scatterbrain: Unifying Sparse and Low-rank Attention.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

A Tale of Two Efficient and Informative Negative Sampling Distributions.

[BibT_eX]

[DOI]

Anshumali Shrivastava

Proceedings of the 38th International Conference on Machine Learning, 2021

SOLAR: Sparse Orthogonal Learned and Random Embeddings.

[BibT_eX]

[DOI]

Tharun Medini

Beidi Chen

Anshumali Shrivastava

Proceedings of the 9th International Conference on Learning Representations, 2021

MONGOOSE: A Learnable LSH Framework for Efficient Neural Network Training.

[BibT_eX]

[DOI]

Anshumali Shrivastava

Christopher Ré

Proceedings of the 9th International Conference on Learning Representations, 2021

2020

A Constant-time Adaptive Negative Sampling.

[BibT_eX]

[DOI]

Anshumali Shrivastava

CoRR, 2020

Climbing the WOL: Training for Cheaper Inference.

[BibT_eX]

[DOI]

Anshumali Shrivastava

CoRR, 2020

SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems.

[BibT_eX]

[DOI]

Tsung-Yuan Charlie Tai

Anshumali Shrivastava

Proceedings of the Third Conference on Machine Learning and Systems, 2020

Angular Visual Hardness.

[BibT_eX]

[DOI]

Anshumali Shrivastava

Animesh Garg

Animashree Anandkumar

Proceedings of the 37th International Conference on Machine Learning, 2020

2019

Sub-Linear Privacy-Preserving Near-Neighbor Search.

[BibT_eX]

[DOI]

M. Sadegh Riazi

Beidi Chen

Anshumali Shrivastava

Dan S. Wallach

Farinaz Koushanfar

IACR Cryptol. ePrint Arch., 2019

Lsh-sampling Breaks the Computation Chicken-and-egg Loop in Adaptive Stochastic Gradient Estimation.

[BibT_eX]

[DOI]

Beidi Chen

Yingchen Xu

Anshumali Shrivastava

CoRR, 2019

SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems.

[BibT_eX]

[DOI]

Beidi Chen

Tharun Medini

Anshumali Shrivastava

CoRR, 2019

Fast and Accurate Stochastic Gradient Estimation.

[BibT_eX]

[DOI]

Beidi Chen

Yingchen Xu

Anshumali Shrivastava

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

2018

Densified Winner Take All (WTA) Hashing for Sparse Datasets.

[BibT_eX]

[DOI]

Beidi Chen

Anshumali Shrivastava

Proceedings of the Thirty-Fourth Conference on Uncertainty in Artificial Intelligence, 2018

Lsh-Sampling breaks the Computational chicken-and-egg Loop in adaptive stochastic Gradient estimation.

[BibT_eX]

[DOI]

Beidi Chen

Yingchen Xu

Anshumali Shrivastava

Proceedings of the 6th International Conference on Learning Representations, 2018

2017

Unique Entity Estimation with Application to the Syrian Conflict.

[BibT_eX]

[DOI]

Beidi Chen

Anshumali Shrivastava

Rebecca C. Steorts

CoRR, 2017

2016

Sub-linear Privacy-preserving Search with Untrusted Server and Semi-honest Parties.

[BibT_eX]

[DOI]

M. Sadegh Riazi

Beidi Chen

Anshumali Shrivastava

Dan S. Wallach

Farinaz Koushanfar

CoRR, 2016

Revisiting Winner Take All (WTA) Hashing for Sparse Datasets.

[BibT_eX]

[DOI]

Beidi Chen

Anshumali Shrivastava

CoRR, 2016

Beidi Chen

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...