Beidi Chen

According to our database1, Beidi Chen authored at least 85 papers between 2016 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation.
CoRR, June, 2025

Kinetics: Rethinking Test-Time Scaling Laws.
CoRR, June, 2025

Act Only When It Pays: Efficient Reinforcement Learning for LLM Reasoning via Selective Rollouts.
CoRR, June, 2025

Scalable LLM Math Reasoning Acceleration with Low-rank Distillation.
CoRR, May, 2025

HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading.
CoRR, February, 2025

GSM-Infinite: How Do Your LLMs Behave over Infinitely Increasing Context Length and Reasoning Complexity?
CoRR, February, 2025

Speculative Prefill: Turbocharging TTFT with Lightweight and Training-Free Token Importance Estimation.
CoRR, February, 2025

Memory Mosaics.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Zeroth-Order Fine-Tuning of LLMs with Transferable Static Sparsity.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

MagicPIG: LSH Sampling for Efficient LLM Generation.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

APE: Faster and Longer Context-Augmented Generation via Adaptive Parallel Encoding.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference.
CoRR, 2024

Sirius: Contextual Sparsity with Correction for Efficient LLMs.
CoRR, 2024

MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding.
CoRR, 2024

MINI-SEQUENCE TRANSFORMER: Optimizing Intermediate Memory for Long Sequences Training.
CoRR, 2024

VcLLM: Video Codecs are Secretly Tensor Codecs.
CoRR, 2024

It Takes Two: On the Seamlessness between Reward and Policy Model in RLHF.
CoRR, 2024

Zeroth-Order Fine-Tuning of LLMs with Extreme Sparsity.
CoRR, 2024

TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding.
CoRR, 2024

Prompt-prompted Mixture of Experts for Efficient LLM Generation.
CoRR, 2024

LLM Inference Unveiled: Survey and Roofline Model Insights.
CoRR, 2024

Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding.
CoRR, 2024

Learn To be Efficient: Build Structured Sparsity in Large Language Models.
CoRR, 2024

SIRIUS : Contexual Sparisty with Correction for Efficient LLMs.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Learn To be Efficient: Build Structured Sparsity in Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

S<sup>2</sup>FT: Efficient, Scalable and Generalizable LLM Fine-tuning by Structured Sparsity.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

SpecExec: Massively Parallel Speculative Decoding For Interactive LLM Inference on Consumer Devices.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Mini-Sequence Transformers: Optimizing Intermediate Memory for Long Sequences Training.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

On the Surprising Effectiveness of Attention Transfer for Vision Transformers.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Sequoia: Scalable and Robust Speculative Decoding.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Nearest Neighbor Speculative Decoding for LLM Generation and Attribution.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Federated Black-box Prompt Tuning System for Large Language Models on the Edge.
Proceedings of the 30th Annual International Conference on Mobile Computing and Networking, 2024

Q-Hitter: A Better Token Oracle for Efficient LLM Inference via Sparse-Quantized KV Cache.
Proceedings of the Seventh Annual Conference on Machine Learning and Systems, 2024

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Soft Prompt Recovers Compressed LLMs, Transferably.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

HexGen: Generative Inference of Large Language Model over Heterogeneous Environment.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

LoCoCo: Dropping In Convolutions for Long Context Compression.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Efficient Streaming Language Models with Attention Sinks.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

JoMA: Demystifying Multilayer Transformers via Joint Dynamics of MLP and Attention.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
HexGen: Generative Inference of Foundation Model over Heterogeneous Decentralized Environment.
CoRR, 2023

H<sub>2</sub>O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.
CoRR, 2023

InRank: Incremental Low-Rank Learning.
CoRR, 2023

Compress, Then Prompt: Improving Accuracy-Efficiency Trade-off of LLM Inference with Transferable Prompt.
CoRR, 2023

High-throughput Generative Inference of Large Language Models with a Single GPU.
CoRR, 2023

Modeling Scattering Coefficients using Self-Attentive Complex Polynomials with Image-based Representation.
CoRR, 2023

H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Laughing Hyena Distillery: Extracting Compact Recurrences From Convolutions.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

CocktailSGD: Fine-tuning Foundation Models over 500Mbps Networks.
Proceedings of the International Conference on Machine Learning, 2023

Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time.
Proceedings of the International Conference on Machine Learning, 2023

FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU.
Proceedings of the International Conference on Machine Learning, 2023

Fast Algorithms for a New Relaxation of Optimal Transport.
Proceedings of the Thirty Sixth Annual Conference on Learning Theory, 2023

2022
Fine-tuning Language Models over Slow Networks using Activation Compression with Guarantees.
CoRR, 2022

Decentralized Training of Foundation Models in Heterogeneous Environments.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Fine-tuning Language Models over Slow Networks using Activation Quantization with Guarantees.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

HALOS: Hashing Large Output Space for Cheap Inference.
Proceedings of the Fifth Conference on Machine Learning and Systems, 2022

Monarch: Expressive Structured Matrices for Efficient and Accurate Training.
Proceedings of the International Conference on Machine Learning, 2022

Pixelated Butterfly: Simple and Efficient Sparse training for Neural Network Models.
Proceedings of the Tenth International Conference on Learning Representations, 2022

2021
Satellite Images and Deep Learning to Identify Discrepancy in Mailing Addresses with Applications to Census 2020 in Houston.
CoRR, 2021

Scatterbrain: Unifying Sparse and Low-rank Attention Approximation.
CoRR, 2021

Locality Sensitive Teaching.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Scatterbrain: Unifying Sparse and Low-rank Attention.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

A Tale of Two Efficient and Informative Negative Sampling Distributions.
Proceedings of the 38th International Conference on Machine Learning, 2021

SOLAR: Sparse Orthogonal Learned and Random Embeddings.
Proceedings of the 9th International Conference on Learning Representations, 2021

MONGOOSE: A Learnable LSH Framework for Efficient Neural Network Training.
Proceedings of the 9th International Conference on Learning Representations, 2021

2020
A Constant-time Adaptive Negative Sampling.
CoRR, 2020

Climbing the WOL: Training for Cheaper Inference.
CoRR, 2020

SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems.
Proceedings of the Third Conference on Machine Learning and Systems, 2020

Angular Visual Hardness.
Proceedings of the 37th International Conference on Machine Learning, 2020

2019
Sub-Linear Privacy-Preserving Near-Neighbor Search.
IACR Cryptol. ePrint Arch., 2019

Lsh-sampling Breaks the Computation Chicken-and-egg Loop in Adaptive Stochastic Gradient Estimation.
CoRR, 2019

SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems.
CoRR, 2019

Fast and Accurate Stochastic Gradient Estimation.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

2018
Densified Winner Take All (WTA) Hashing for Sparse Datasets.
Proceedings of the Thirty-Fourth Conference on Uncertainty in Artificial Intelligence, 2018

Lsh-Sampling breaks the Computational chicken-and-egg Loop in adaptive stochastic Gradient estimation.
Proceedings of the 6th International Conference on Learning Representations, 2018

2017
Unique Entity Estimation with Application to the Syrian Conflict.
CoRR, 2017

2016
Sub-linear Privacy-preserving Search with Untrusted Server and Semi-honest Parties.
CoRR, 2016

Revisiting Winner Take All (WTA) Hashing for Sparse Datasets.
CoRR, 2016


  Loading...