Souvik Kundu

Orcid: 0000-0002-3533-9405

Affiliations:
  • Intel Labs, San Diego, CA, USA


According to our database1, Souvik Kundu authored at least 35 papers between 2023 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
On Evaluating Performance of LLM Inference Serving Systems.
CoRR, July, 2025

On-the-Fly Adaptive Distillation of Transformer to Dual-State Linear Attention.
CoRR, June, 2025

Assortment of Attention Heads: Accelerating Federated PEFT with Head Pruning and Strategic Client Selection.
CoRR, June, 2025

Accelerating LLM Inference with Flexible N:M Sparsity via A Fully Digital Compute-in-Memory Accelerator.
CoRR, April, 2025

Understanding and Optimizing Multi-Stage AI Inference Pipelines.
CoRR, April, 2025

SEAL: Steerable Reasoning Calibration of Large Language Models for Free.
CoRR, April, 2025

OuroMamba: A Data-Free Quantization Framework for Vision Mamba Models.
CoRR, March, 2025

Enhancing Large Language Models for Hardware Verification: A Novel SystemVerilog Assertion Dataset.
CoRR, March, 2025

LANTERN++: Enhanced Relaxed Speculative Decoding with Static Tree Drafting for Visual Auto-regressive Models.
CoRR, February, 2025

CITER: Collaborative Inference for Efficient Large Language Model Decoding with Token-Level Routing.
CoRR, February, 2025

Unraveling Zeroth-Order Optimization through the Lens of Low-Dimensional Structured Perturbations.
CoRR, January, 2025

Fast and Cost-effective Speculative Edge-Cloud Decoding with Early Exits.
Trans. Mach. Learn. Res., 2025

MicroScopiQ: Accelerating Foundational Models through Outlier-Aware Microscaling Quantization.
Proceedings of the 52nd Annual International Symposium on Computer Architecture, 2025

LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Scaling Long Context Training Data by Long-Distance Referrals.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

LLM-NPU: Towards Efficient Foundation Model Inference on Low-Power Neural Processing Units.
Proceedings of the IEEE International Conference on Omni-layer Intelligent Systems, 2025

2024
Bit-by-Bit: Investigating the Vulnerabilities of Binary Neural Networks to Adversarial Bit Flipping.
Trans. Mach. Learn. Res., 2024

Unveiling Adversarially Robust Graph Lottery Tickets.
Trans. Mach. Learn. Res., 2024

AttentionBreaker: Adaptive Evolutionary Optimization for Unmasking Vulnerabilities in LLMs through Bit-Flip Attacks.
CoRR, 2024

Metron: Holistic Performance Evaluation Framework for LLM Inference Systems.
CoRR, 2024

CLAMP-ViT: Contrastive Data-Free Learning for Adaptive Post-Training Quantization of ViTs.
CoRR, 2024

Demystifying Platform Requirements for Diverse LLM Inference Use Cases.
CoRR, 2024

GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM.
CoRR, 2024

ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Junk DNA Hypothesis: Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs "Difficult" Downstream Tasks in LLMs.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Fusing Models with Complementary Expertise.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Analyzing Adversarial Vulnerabilities of Graph Lottery Tickets.
Proceedings of the IEEE International Conference on Acoustics, 2024

Towards Real-Time LLM Inference on Heterogeneous Edge Platforms.
Proceedings of the 31st IEEE International Conference on High Performance Computing, Data and Analytics, HiPC 2024, 2024

GEAR: An Efficient Error Reduction Framework for KV Cache Compression in LLM Inference.
Proceedings of the NeurIPS Efficient Natural Language and Speech Processing Workshop, 2024

GenQ: Quantization in Low Data Regimes with Generative Synthetic Data.
Proceedings of the Computer Vision - ECCV 2024, 2024

2023
Sparse but Strong: Crafting Adversarially Robust Graph Lottery Tickets.
CoRR, 2023

Junk DNA Hypothesis: A Task-Centric Angle of LLM Pre-trained Weights through Sparsity.
CoRR, 2023

Don't just prune by magnitude! Your mask topology is a secret weapon.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

NeRFool: Uncovering the Vulnerability of Generalizable Neural Radiance Fields against Adversarial Perturbations.
Proceedings of the International Conference on Machine Learning, 2023

Vision HGNN: An Image is More than a Graph of Nodes.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023


  Loading...