Suvinay Subramanian

Midhilesh Elavazhagan

William Won

Amir Yazdanbakhsh

Tushar Krishna

CoRR, May, 2026

Characterizing VLA Models: Identifying the Action Generation Bottleneck for Edge AI Architectures.

[BibT_eX]

[DOI]

Manoj Vishwanathan

Suvinay Subramanian

Anand Raghunathan

CoRR, March, 2026

Demystifying the Cost Versus Benefits of Sparse Large Language Model Acceleration.

[BibT_eX]

[DOI]

IEEE Micro, 2026

2025

Planned Diffusion.

[BibT_eX]

[DOI]

CoRR, October, 2025

FG-Attn: Leveraging Fine-Grained Sparsity In Diffusion Transformers.

[BibT_eX]

[DOI]

CoRR, September, 2025

Spark Transformer: Reactivating Sparsity in FFN and Attention.

[BibT_eX]

[DOI]

CoRR, June, 2025

Understanding and Optimizing Multi-Stage AI Inference Pipelines.

[BibT_eX]

[DOI]

Gintare Karolina Dziugaite

Midhilesh Elavazhagan

Madhu Kumar

Tushar Krishna

CoRR, April, 2025

RAGO: Systematic Performance Optimization for Retrieval-Augmented Generation Serving.

[BibT_eX]

[DOI]

Proceedings of the 52nd Annual International Symposium on Computer Architecture, 2025

Learning to Keep a Promise: Scaling Language Model Decoding Parallelism with Learned Asynchronous Decoding.

[BibT_eX]

[DOI]

Jonathan Ragan-Kelley

Suvinay Subramanian

Michael Carbin

Proceedings of the Forty-second International Conference on Machine Learning, 2025

The Journey Matters: Average Parameter Count over Pre-training Unifies Sparse and Dense Scaling Laws.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Effective Interplay between Sparsity and Quantization: From Theory to Practice.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Progressive Gradient Flow for Robust N:M Sparsity Training in Transformers.

[BibT_eX]

[DOI]

Proceedings of the Conference on Parsimony and Learning, 2025

2024

Progressive Gradient Flow for Robust N: M Sparsity Training in Transformers.

[BibT_eX]

[DOI]

CoRR, 2024

Jaxpruner: A Concise Library for Sparsity Research.

[BibT_eX]

[DOI]

Proceedings of the Conference on Parsimony and Learning, 2024

2023

TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings.

[BibT_eX]

[DOI]

Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

STEP: Learning N: M Structured Sparsity Masks from Scratch with Precondition.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

FLAT: An Optimized Dataflow for Mitigating Attention Bottlenecks.

[BibT_eX]

[DOI]

Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2022

Training Recipe for N: M Structured Sparsity with Decaying Pruning Mask.

[BibT_eX]

[DOI]

CoRR, 2022

2021

ATTACC the Quadratic Bottleneck of Attention Layers.

[BibT_eX]

[DOI]

CoRR, 2021

2018

Architectural techniques to unlock ordered and nested speculative parallelism.

[BibT_eX]

Suvinay Subramanian

PhD thesis, 2018

Harmonizing Speculative and Non-Speculative Execution in Architectures for Ordered Parallelism.

[BibT_eX]

[DOI]

Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

2017

Fractal: An Execution Model for Fine-Grain Nested Speculative Parallelism.

[BibT_eX]

[DOI]

Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

SAM: Optimizing Multithreaded Cores for Speculative Parallelism.

[BibT_eX]

[DOI]

Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

2016

Unlocking Ordered Parallelism with the Swarm Architecture.

[BibT_eX]

[DOI]

IEEE Micro, 2016

Programmable Packet Scheduling.

[BibT_eX]

[DOI]

CoRR, 2016

Programmable Packet Scheduling at Line Rate.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGCOMM 2016 Conference, Florianopolis, Brazil, August 22-26, 2016, 2016

Data-centric execution of speculative parallel programs.

[BibT_eX]

[DOI]

Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

2015

A scalable architecture for ordered parallelism.

[BibT_eX]

[DOI]

Proceedings of the 48th International Symposium on Microarchitecture, 2015

Towards Programmable Packet Scheduling.

[BibT_eX]

[DOI]

Proceedings of the 14th ACM Workshop on Hot Topics in Networks, Philadelphia, PA, USA, November 16, 2015

2014

SCORPIO: A 36-core research chip demonstrating snoopy coherence on a scalable mesh NoC with in-network ordering.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

SCORPIO: 36-core shared memory processor demonstrating snoopy coherence on a mesh interconnect.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE Hot Chips 26 Symposium (HCS), 2014

2013

Single-Cycle Multihop Asynchronous Repeated Traversal: A SMART Future for Reconfigurable On-Chip Networks.

[BibT_eX]

[DOI]

Computer, 2013

No silver bullet: extending SDN to the data plane.

[BibT_eX]

[DOI]

Proceedings of the Twelfth ACM Workshop on Hot Topics in Networks, 2013

SMART: a single-cycle reconfigurable NoC for SoC applications.

[BibT_eX]

[DOI]