Tri Dao

According to our database¹, Tri Dao authored at least 69 papers between 2017 and 2026.

Collaborative distances:

Dijkstra number² of three.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs.

[BibT_eX]

[DOI]

CoRR, May, 2026

Search Your Block Floating Point Scales!

[BibT_eX]

[DOI]

CoRR, May, 2026

SAW-INT4: System-Aware 4-Bit KV-Cache Quantization for Real-World LLM Serving.

[BibT_eX]

[DOI]

CoRR, April, 2026

Introspective Diffusion Language Models.

[BibT_eX]

[DOI]

CoRR, April, 2026

Squeeze Evolve: Unified Multi-Model Orchestration for Verifier-Free Evolution.

[BibT_eX]

[DOI]

Monishwaran Maheswaran

CoRR, April, 2026

Mamba-3: Improved Sequence Modeling using State Space Principles.

[BibT_eX]

[DOI]

CoRR, March, 2026

M<sup>2</sup>RNN: Non-Linear RNNs with Matrix-Valued States for Scalable Language Modeling.

[BibT_eX]

[DOI]

CoRR, March, 2026

FlashAttention-4: Algorithm and Kernel Pipelining Co-Design for Asymmetric Hardware Scaling.

[BibT_eX]

[DOI]

CoRR, March, 2026

AI+HW 2035: Shaping the Next Decade.

[BibT_eX]

[DOI]

CoRR, March, 2026

Speculative Speculative Decoding.

[BibT_eX]

[DOI]

Tanishq Kumar

Tri Dao

Avner May

CoRR, March, 2026

When RL Meets Adaptive Speculative Training: A Unified Training-Serving System.

[BibT_eX]

[DOI]

CoRR, February, 2026

2025

SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations.

[BibT_eX]

[DOI]

CoRR, December, 2025

Beat the long tail: Distribution-Aware Speculative Decoding for RL Training.

[BibT_eX]

[DOI]

CoRR, November, 2025

Opportunistic Expert Activation: Batch-Aware Expert Routing for Faster Decode Without Retraining.

[BibT_eX]

[DOI]

Costin-Andrei Oncescu

CoRR, November, 2025

Log-Linear Attention.

[BibT_eX]

[DOI]

CoRR, June, 2025

Hardware-Efficient Attention for Fast Decoding.

[BibT_eX]

[DOI]

Ted Zadouri

Hubert Strauss

Tri Dao

CoRR, May, 2025

M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models.

[BibT_eX]

[DOI]

CoRR, April, 2025

Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners.

[BibT_eX]

[DOI]

CoRR, February, 2025

HybriDNA: A Hybrid Transformer-Mamba2 Long-Range DNA Language Model.

[BibT_eX]

[DOI]

CoRR, February, 2025

Marconi: Prefix Caching for the Era of Hybrid LLMs.

[BibT_eX]

[DOI]

Proceedings of the Eighth Conference on Machine Learning and Systems, 2025

Ladder-Residual: Parallelism-Aware Architecture for Accelerating Large Model Inference with Communication Overlapping.

[BibT_eX]

[DOI]

Jonathan Ragan-Kelley

Shuaiwen Leon Song

Ben Athiwaratkun

Tri Dao

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Long-Context State-Space Video World Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

2024

Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers.

[BibT_eX]

[DOI]

CoRR, 2024

An Empirical Study of Mamba-based Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

StarCoder 2 and The Stack v2: The Next Generation.

[BibT_eX]

[DOI]

Evgenii Zheltonozhskii

Carolyn Jane Anderson

Nicolas Chapados

et al.

CoRR, 2024

RedPajama: an Open Dataset for Training Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

The Mamba in the Llama: Distilling and Accelerating Hybrid Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

BitDelta: Your Fine-Tune May Only Be Worth One Bit.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality.

[BibT_eX]

[DOI]

Tri Dao

Albert Gu

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning.

[BibT_eX]

[DOI]

Tri Dao

Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023

StarCoder: may the source be with you!

[BibT_eX]

[DOI]

Evgenii Zheltonozhskii

Logesh Kumar Umapathi

Urvashi Bhattacharyya

Carolyn Jane Anderson

Carlos Muñoz Ferrandis

Trans. Mach. Learn. Res., 2023

Mamba: Linear-Time Sequence Modeling with Selective State Spaces.

[BibT_eX]

[DOI]

Albert Gu

Tri Dao

CoRR, 2023

Hyena Hierarchy: Towards Larger Convolutional Language Models.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time.

[BibT_eX]

[DOI]

Anshumali Shrivastava

Proceedings of the International Conference on Machine Learning, 2023

Simple Hardware-Efficient Long Convolutions for Sequence Modeling.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Effectively Modeling Time Series with Simple Discrete State Spaces.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Hungry Hungry Hippos: Towards Language Modeling with State Space Models.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

2022

S4ND: Modeling Images and Videos as Multidimensional Signals Using State Spaces.

[BibT_eX]

[DOI]

CoRR, 2022

Fine-tuning Language Models over Slow Networks using Activation Compression with Guarantees.

[BibT_eX]

[DOI]

CoRR, 2022

Decentralized Training of Foundation Models in Heterogeneous Environments.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Fine-tuning Language Models over Slow Networks using Activation Quantization with Guarantees.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Transform Once: Efficient Operator Learning in Frequency Domain.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

S4ND: Modeling Images and Videos as Multidimensional Signals with State Spaces.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

ButterflyFlow: Building Invertible Layers with Butterfly Matrices.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

Monarch: Expressive Structured Matrices for Efficient and Accurate Training.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

Pixelated Butterfly: Simple and Efficient Sparse training for Neural Network Models.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

2021

Scatterbrain: Unifying Sparse and Low-rank Attention Approximation.

[BibT_eX]

[DOI]

CoRR, 2021

Rethinking Neural Operations for Diverse Tasks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Combining Recurrent, Convolutional, and Continuous-time Models with Linear State Space Layers.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Scatterbrain: Unifying Sparse and Low-rank Attention.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Catformer: Designing Stable Transformers via Sensitivity Analysis.

[BibT_eX]

[DOI]

Jared Quincy Davis

Albert Gu

Krzysztof Choromanski

Proceedings of the 38th International Conference on Machine Learning, 2021

Knowledge Distillation as Semiparametric Inference.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

MONGOOSE: A Learnable LSH Framework for Efficient Neural Network Training.

[BibT_eX]

[DOI]

Anshumali Shrivastava

Christopher Ré

Proceedings of the 9th International Conference on Learning Representations, 2021

2020

HiPPO: Recurrent Memory with Optimal Polynomial Projections.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Kaleidoscope: An Efficient, Learnable Representation For All Structured Linear Maps.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Learning Representations, 2020

2019

Adaptive Hashing for Model Counting.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth Conference on Uncertainty in Artificial Intelligence, 2019

On the Downstream Performance of Compressed Word Embeddings.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Approximating the Permanent by Sampling from Adaptive Partitions.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

A Kernel Theory of Modern Data Augmentation.

[BibT_eX]

[DOI]

Proceedings of the 36th International Conference on Machine Learning, 2019

Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations.

[BibT_eX]

[DOI]

Proceedings of the 36th International Conference on Machine Learning, 2019

Low-Precision Random Fourier Features for Memory-constrained Kernel Approximation.

[BibT_eX]

[DOI]

Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, 2019

2018

Learning Compressed Transforms with Low Displacement Rank.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Learning Invariance with Compact Transforms.

[BibT_eX]

[DOI]

Proceedings of the 6th International Conference on Learning Representations, 2018

2017

Gaussian Quadrature for Kernel Features.

[BibT_eX]

[DOI]

Tri Dao

Christopher De Sa

Christopher Ré

Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Tri Dao

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...