Sehoon Kim

Orcid: 0000-0002-9339-5480

Affiliations:

University of California, Berkeley, CA, USA (PhD 2024)

According to our database¹, Sehoon Kim authored at least 35 papers between 2021 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2025

Multipole Attention for Efficient Long Context Reasoning.

[BibT_eX]

[DOI]

CoRR, June, 2025

Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks.

[BibT_eX]

[DOI]

Gopala Anumanchipalli

Kurt Keutzer

Amir Gholami

CoRR, March, 2025

ETS: Efficient Tree Search for Inference-Time Scaling.

[BibT_eX]

[DOI]

Monishwaran Maheswaran

CoRR, February, 2025

QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache.

[BibT_eX]

[DOI]

CoRR, February, 2025

Squeezed Attention: Accelerating Long Context Length LLM Inference.

[BibT_eX]

[DOI]

Coleman Richard Charles Hooper

Sehoon Kim

Hiva Mohammadzadeh

Monishwaran Maheswaran

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

Full Stack Approach for Efficient Deep Learning Inference

[BibT_eX]

[DOI]

Sehoon Kim

PhD thesis, 2024

AI and Memory Wall.

[BibT_eX]

[DOI]

IEEE Micro, 2024

Corrigendum: Applications and techniques for fast machine learning in science.

[BibT_eX]

[DOI]

Frontiers Big Data, 2024

Squeezed Attention: Accelerating Long Context Length LLM Inference.

[BibT_eX]

[DOI]

Coleman Hooper

Sehoon Kim

Hiva Mohammadzadeh

Monishwaran Maheswaran

CoRR, 2024

Efficient and Scalable Estimation of Tool Representations in Vector Space.

[BibT_eX]

[DOI]

CoRR, 2024

TinyAgent: Function Calling at the Edge.

[BibT_eX]

[DOI]

Gopala Anumanchipalli

Kurt Keutzer

Amir Gholami

CoRR, 2024

Characterizing Prompt Compression Methods for Long Context Inference.

[BibT_eX]

[DOI]

CoRR, 2024

Learned Best-Effort LLM Serving.

[BibT_eX]

[DOI]

CoRR, 2024

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

An LLM Compiler for Parallel Function Calling.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

SqueezeLLM: Dense-and-Sparse Quantization.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement.

[BibT_eX]

[DOI]

Gopala Anumanchipalli

Michael W. Mahoney

Kurt Keutzer

Amir Gholami

Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023

SPEED: Speculative Pipelined Execution for Efficient Decoding.

[BibT_eX]

[DOI]

CoRR, 2023

Full Stack Optimization of Transformer Inference: a Survey.

[BibT_eX]

[DOI]

CoRR, 2023

Big Little Transformer Decoder.

[BibT_eX]

[DOI]

CoRR, 2023

Speculative Decoding with Big Little Decoder.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

2022

Applications and Techniques for Fast Machine Learning in Science.

[BibT_eX]

[DOI]

Frontiers Big Data, 2022

Hessian-Aware Pruning and Optimal Neural Implant.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022

A Fast Post-Training Pruning Framework for Transformers.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Squeezeformer: An Efficient Transformer for Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Learned Token Pruning for Transformers.

[BibT_eX]

[DOI]

Proceedings of the KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14, 2022

Integer-Only Zero-Shot Quantization for Efficient Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

WindTunnel: Towards Differentiable ML Pipelines Beyond a Single Modele.

[BibT_eX]

[DOI]

Proc. VLDB Endow., 2021

Applications and Techniques for Fast Machine Learning in Science.

[BibT_eX]

[DOI]

Allison McCarn Deiana

Nhan Tran

Joshua Agar

Michaela Blott

Giuseppe Di Guglielmo

Tomás E. Müller-Bravo

Seyedramin Rasoulinezhad

Maria Domenica Galati

Mohammed Attia Mohammed

Subramanian Ramamoorthy

CoRR, 2021

Learned Token Pruning for Transformers.

[BibT_eX]

[DOI]

CoRR, 2021

Q-ASR: Integer-only Zero-shot Quantization for Efficient Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2021

A Survey of Quantization Methods for Efficient Neural Network Inference.

[BibT_eX]

[DOI]

CoRR, 2021

Terra: Imperative-Symbolic Co-Execution of Imperative Deep Learning Programs.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Memory-Efficient Hardware Performance Counters with Approximate-Counting Algorithms.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2021

I-BERT: Integer-only BERT Quantization.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

Sehoon Kim

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...