Esha Choukse

Orcid: 0000-0003-0371-5522

Affiliations:

Microsoft, Redmond, WA, USA

According to our database¹, Esha Choukse authored at least 44 papers between 2016 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

Natural Language Query to Configuration for Retrieval Agents.

[BibT_eX]

[DOI]

CoRR, May, 2026

StreamWise: Serving Multi-Modal Generation in Real-Time at Scale.

[BibT_eX]

[DOI]

CoRR, March, 2026

DroidSpeak: KV Cache Sharing Across Fine-tuned Model Variants.

[BibT_eX]

[DOI]

Proceedings of the 23rd USENIX Symposium on Networked Systems Design and Implementation, 2026

Harvesting Spare CPU Resources in Container Systems.

[BibT_eX]

[DOI]

Adam Hall

Anirudh Sarma

Esha Choukse

Umakishore Ramachandran

Sameh Elnikety

Proceedings of the 23rd USENIX Symposium on Networked Systems Design and Implementation, 2026

2025

Sherlock: Reliable and Efficient Agentic Workflow Execution.

[BibT_eX]

[DOI]

CoRR, November, 2025

Energy Use of AI Inference: Efficiency Pathways and Test-Time Compute.

[BibT_eX]

[DOI]

Juan M. Lavista Ferres

CoRR, September, 2025

Murakkab: Resource-Efficient Agentic Workflow Orchestration in Cloud Platforms.

[BibT_eX]

[DOI]

CoRR, August, 2025

Power Stabilization for AI Training Datacenters.

[BibT_eX]

[DOI]

Caroline Lichtenberger

Praneeth Gottumukkala

CoRR, August, 2025

EcoServe: Designing Carbon-Aware AI Inference Systems.

[BibT_eX]

[DOI]

CoRR, February, 2025

Towards Efficient Large Multimodal Model Serving.

[BibT_eX]

[DOI]

CoRR, February, 2025

Enabling Sustainable Cloud Computing With Low-Carbon Server Design.

[BibT_eX]

[DOI]

IEEE Micro, 2025

Splitwise: Efficient Generative LLM Inference Using Phase Splitting.

[BibT_eX]

[DOI]

IEEE Micro, 2025

DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2025

Towards Resource-Efficient Compound AI Systems.

[BibT_eX]

[DOI]

Proceedings of the 2025 Workshop on Hot Topics in Operating Systems, 2025

Performance Aware LLM Load Balancer for Mixed Workloads.

[BibT_eX]

[DOI]

Proceedings of the 5th Workshop on Machine Learning and Systems, 2025

Slim-SC: Thought Pruning for Efficient Scaling with Self-Consistency.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

ModServe: Modality- and Stage-Aware Resource Disaggregation for Scalable Multimodal Model Serving.

[BibT_eX]

[DOI]

Proceedings of the 2025 ACM Symposium on Cloud Computing, 2025

TAPAS: Thermal- and Power-Aware Scheduling for LLM Inference in Cloud Platforms.

[BibT_eX]

[DOI]

Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2025

2024

DroidSpeak: Enhancing Cross-LLM Communication.

[BibT_eX]

[DOI]

CoRR, 2024

Mnemosyne: Parallelization Strategies for Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations.

[BibT_eX]

[DOI]

CoRR, 2024

Intelligent Router for LLM Workloads: Improving Performance Through Workload-Aware Scheduling.

[BibT_eX]

[DOI]

CoRR, 2024

Towards Greener LLMs: Bringing Energy-Efficiency to the Forefront of LLM Inference.

[BibT_eX]

[DOI]

CoRR, 2024

Junctiond: Extending FaaS Runtimes with Kernel-Bypass.

[BibT_eX]

[DOI]

CoRR, 2024

Input-Dependent Power Usage in GPUs.

[BibT_eX]

[DOI]

Theo Gregersen

Pratyush Patel

Esha Choukse

Proceedings of the SC24-W: Workshops of the International Conference for High Performance Computing, 2024

Making Kernel Bypass Practical for the Cloud with Junction.

[BibT_eX]

[DOI]

Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation, 2024

Mosaic: Harnessing the Micro-Architectural Resources of Servers in Serverless Environments.

[BibT_eX]

[DOI]

Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024

Memory Allocation Under Hardware Compression.

[BibT_eX]

[DOI]

Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024

Designing Cloud Servers for Lower Carbon.

[BibT_eX]

[DOI]

Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

SmartOClock: Workload- and Risk-Aware Overclocking in the Cloud.

[BibT_eX]

[DOI]

Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

Splitwise: Efficient Generative LLM Inference Using Phase Splitting.

[BibT_eX]

[DOI]

Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

DyLeCT: Achieving Huge-page-like Translation Performance for Hardware-compressed Memory.

[BibT_eX]

[DOI]

Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

Characterizing Power Management Opportunities for LLMs in the Cloud.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

Optimizing GPU Data Center Power.

[BibT_eX]

[DOI]

Proceedings of the IEEE Asia Pacific Conference on Circuits and Systems, 2024

2023

POLCA: Power Oversubscription in LLM Cloud Providers.

[BibT_eX]

[DOI]

CoRR, 2023

Towards Improved Power Management in Cloud GPUs.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2023

Myths and Misconceptions Around Reducing Carbon Embedded in Cloud Platforms.

[BibT_eX]

[DOI]

Proceedings of the 2nd Workshop on Sustainable Computer Systems, 2023

2022

Overclocking in Immersion-Cooled Datacenters.

[BibT_eX]

[DOI]

IEEE Micro, 2022

Translation-optimized Memory Compression for Capacity.

[BibT_eX]

[DOI]

Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

2020

Buddy Compression: Enabling Larger Memory for Deep Learning and HPC Workloads on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

2019

PruneTrain: Gradual Structured Pruning from Scratch for Faster Neural Network Training.

[BibT_eX]

[DOI]

CoRR, 2019

PruneTrain: fast neural network training by dynamic sparse model reconfiguration.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2019

2018

CompressPoints: An Evaluation Methodology for Compressed Memory Systems.

[BibT_eX]

[DOI]

Esha Choukse

Mattan Erez

Alaa R. Alameldeen

IEEE Comput. Archit. Lett., 2018

Compresso: Pragmatic Main Memory Compression.

[BibT_eX]

[DOI]

Esha Choukse

Mattan Erez

Alaa R. Alameldeen

Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

2016

Bit-Plane Compression: Transforming Data for Better Compression in Many-Core Architectures.

[BibT_eX]

[DOI]

Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

Esha Choukse

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...