We stand with Ukraine

We stand with Ukraine

Aravind Sukumaran-Rajam

Orcid: 0000-0002-4062-0293

According to our database¹, Aravind Sukumaran-Rajam authored at least 51 papers between 2013 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

Online presence:

on orcid.org

On csauthors.net:

Bibliography

2025

cuTeSpMM: Accelerating Sparse-Dense Matrix Multiplication using GPU Tensor Cores.

[DOI]

,

,

,

Aravind Sukumaran-Rajam

,

CoRR, April, 2025

ScaWL: Scaling k-WL (Weisfeiler-Lehman) Algorithms in Memory and Performance on Shared and Distributed-Memory Systems.

[DOI]

,

Aravind Sukumaran-Rajam

,

,

,

Mahantesh Halappanavar

,

Assefaw H. Gebremedhin

ACM Trans. Archit. Code Optim., March, 2025

A Comparative Analysis of Loosely and Tightly Coupled Accelerator Architectures for Machine Learning.

[DOI]

Amin Firoozshahian

,

,

,

Aravind Sukumaran-Rajam

,

,

,

,

,

Sujith Srinivasan

,

Harshitha Pilla

,

,

Surendra Rajupalem

,

K. Rajesh Jagannath

,

,

Harikrishna Reddy

,

,

Charlie Hong-Men Su

,

IEEE Micro, 2025

Meta's Second Generation AI Chip: Model-Chip Co-Design and Productionization Experiences.

[DOI]

,

,

Sameer Abu Asal

,

,

Raviteja Chinta

,

Harish Dattatraya Dixit

,

,

Saritha Dwarakapuram

,

Amin Firoozshahian

,

,

Kaustubh Gondkar

,

,

,

,

Sterling Hughes

,

,

,

Guoqiang Jerry Chen

,

Indu Kalyanaraman

,

,

,

,

Roman Levenstein

,

,

,

,

,

Jack Montgomery

,

Nadathur Satish

,

,

Ashwin Narasimha

,

,

,

,

Poorvaja Ramani

,

Harikrishna Reddy

,

,

,

,

,

,

Aravind Sukumaran-Rajam

,

,

,

Shreya Varshini

,

Richard Wareing

,

,

,

,

,

,

,

,

,

,

,

Emmanuel Menage

,

Truls Edvard Stokke

,

Mohammed Sourouri

Proceedings of the 52nd Annual International Symposium on Computer Architecture, 2025

2023

Accelerating Graph Computations on 3D NoC-Enabled PIM Architectures.

[DOI]

Dwaipayan Choudhury

,

,

Aravind Sukumaran-Rajam

,

Anantharaman Kalyanaraman

,

Partha Pratim Pande

ACM Trans. Design Autom. Electr. Syst., 2023

cuAlign: Scalable Network Alignment on GPU Accelerators.

[DOI]

,

,

,

Aravind Sukumaran-Rajam

,

Mahantesh Halappanavar

Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

TDC: Towards Extremely Efficient CNNs on GPUs via Hardware-Aware Tucker Decomposition.

[DOI]

,

,

Chengming Zhang

,

Aravind Sukumaran-Rajam

,

,

,

Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2023

A Performance Portability Study Using Tensor Contraction Benchmarks.

[DOI]

,

,

,

,

Aravind Sukumaran-Rajam

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

Communication Optimization for Distributed Execution of Graph Neural Networks.

[DOI]

Süreyya Emre Kurt

,

,

Aravind Sukumaran-Rajam

,

Prashant Pandey

,

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

2022

Software/Hardware Co-design of 3D NoC-based GPU Architectures for Accelerated Graph Computations.

[DOI]

Dwaipayan Choudhury

,

,

Aravind Sukumaran-Rajam

,

Ananth Kalyanaraman

,

Partha Pratim Pande

ACM Trans. Design Autom. Electr. Syst., 2022

High-Performance and Energy-Efficient 3D Manycore GPU Architecture for Accelerating Graph Analytics.

[DOI]

Dwaipayan Choudhury

,

Aravind Sukumaran-Rajam

,

Ananth Kalyanaraman

,

Partha Pratim Pande

ACM J. Emerg. Technol. Comput. Syst., 2022

Sparsity-Aware Tensor Decomposition.

[DOI]

Süreyya Emre Kurt

,

,

Aravind Sukumaran-Rajam

,

Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

Comprehensive Accelerator-Dataflow Co-design Optimization for Convolutional Neural Networks.

[DOI]

,

Aravind Sukumaran-Rajam

,

,

Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2022

Training of deep learning pipelines on memory-constrained GPUs via segmented fused-tiled execution.

[DOI]

,

,

,

,

Aravind Sukumaran-Rajam

,

Proceedings of the CC '22: 31st ACM SIGPLAN International Conference on Compiler Construction, Seoul, South Korea, April 2, 2022

Effective Performance Modeling and Domain-Specific Compiler Optimization of CNNs for GPUs.

[DOI]

,

,

Erik Curtis Barton

,

,

,

Aravind Sukumaran-Rajam

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2022

High-Performance Architecture Aware Sparse Convolutional Neural Networks for GPUs.

[DOI]

,

,

Aravind Sukumaran-Rajam

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2022

2021

Efficient Distributed Algorithms for Convolutional Neural Networks.

[DOI]

,

,

Aravind Sukumaran-Rajam

,

,

Proceedings of the SPAA '21: 33rd ACM Symposium on Parallelism in Algorithms and Architectures, 2021

cuTS: scaling subgraph isomorphism on distributed multi-GPU systems using trie based data structure.

[DOI]

,

,

,

Mahantesh Halappanavar

,

Aravind Sukumaran-Rajam

Proceedings of the International Conference for High Performance Computing, 2021

Analytical characterization and design space exploration for optimization of CNNs.

[DOI]

,

,

Aravind Sukumaran-Rajam

,

,

Proceedings of the ASPLOS '21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2021

2020

Efficient tiled sparse matrix multiplication through matrix signatures.

[DOI]

Süreyya Emre Kurt

,

Aravind Sukumaran-Rajam

,

Fabrice Rastello

,

Proceedings of the International Conference for High Performance Computing, 2020

ALO-NMF: Accelerated Locality-Optimized Non-negative Matrix Factorization.

[DOI]

Gordon Euhyun Moon

,

J. Austin Ellis

,

Aravind Sukumaran-Rajam

,

Srinivasan Parthasarathy

,

Proceedings of the KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2020

2019

PL-NMF: Parallel Locality-Optimized Non-negative Matrix Factorization.

[DOI]

Gordon Euhyun Moon

,

Aravind Sukumaran-Rajam

,

Srinivasan Parthasarathy

,

CoRR, 2019

An efficient mixed-mode representation of sparse tensors.

[DOI]

,

,

Aravind Sukumaran-Rajam

,

Prashant Singh Rawat

,

Sriram Krishnamoorthy

,

Proceedings of the International Conference for High Performance Computing, 2019

Parallel Data-Local Training for Optimizing Word2Vec Embeddings for Word and Graph Embeddings.

[DOI]

Gordon Euhyun Moon

,

Denis Newman-Griffis

,

,

Aravind Sukumaran-Rajam

,

Eric Fosler-Lussier

,

Proceedings of the 2019 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments, 2019

Analytical cache modeling and tilesize optimization for tensor contractions.

[DOI]

,

Aravind Sukumaran-Rajam

,

,

,

Fabrice Rastello

,

,

Proceedings of the International Conference for High Performance Computing, 2019

Adaptive sparse tiling for sparse matrix multiplication.

[DOI]

,

Aravind Sukumaran-Rajam

,

,

,

Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2019

On Optimizing Complex Stencils on GPUs.

[DOI]

Prashant Singh Rawat

,

,

Aravind Sukumaran-Rajam

,

,

Louis-Noël Pouchet

,

Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

Load-Balanced Sparse MTTKRP on GPUs.

[DOI]

,

,

Aravind Sukumaran-Rajam

,

Richard W. Vuduc

,

Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

A Code Generator for High-Performance Tensor Contractions on GPUs.

[DOI]

,

Aravind Sukumaran-Rajam

,

,

Sriram Krishnamoorthy

,

,

Louis-Noël Pouchet

,

,

Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2019

2018

Domain-Specific Optimization and Generation of High-Performance GPU Code for Stencil Computations.

[DOI]

Prashant Singh Rawat

,

,

Aravind Sukumaran-Rajam

,

Mahesh Ravishankar

,

,

,

Louis-Noël Pouchet

,

Proc. IEEE, 2018

Associative instruction reordering to alleviate register pressure.

[DOI]

Prashant Singh Rawat

,

Aravind Sukumaran-Rajam

,

,

Fabrice Rastello

,

Louis-Noël Pouchet

,

Proceedings of the International Conference for High Performance Computing, 2018

Register optimizations for stencils on GPUs.

[DOI]

Prashant Singh Rawat

,

Fabrice Rastello

,

Aravind Sukumaran-Rajam

,

Louis-Noël Pouchet

,

,

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

Performance modeling for GPUs using abstract kernel emulation.

[DOI]

,

Aravind Sukumaran-Rajam

,

,

Prashant Singh Rawat

,

Sriram Krishnamoorthy

,

Louis-Noël Pouchet

,

Fabrice Rastello

,

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

GPU code optimization using abstract kernel emulation and sensitivity analysis.

[DOI]

,

Aravind Sukumaran-Rajam

,

,

Prashant Singh Rawat

,

Sriram Krishnamoorthy

,

Louis-Noël Pouchet

,

Fabrice Rastello

,

Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2018

TTLG - An Efficient Tensor Transposition Library for GPUs.

[DOI]

Jyothi Vedurada

,

,

Aravind Sukumaran-Rajam

,

,

,

,

Sriram Krishnamoorthy

,

V. Krishna Nandivada

,

Rohit Kumar Srivastava

,

Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

Effective Machine Learning Based Format Selection and Performance Modeling for SpMV on GPUs.

[DOI]

,

,

Aravind Sukumaran-Rajam

,

,

Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018

Optimizing Tensor Contractions in CCSD(T) for Efficient Execution on GPUs.

[DOI]

,

Aravind Sukumaran-Rajam

,

,

,

Rohit Kumar Srivastava

,

Sriram Krishnamoorthy

,

Proceedings of the 32nd International Conference on Supercomputing, 2018

Parallel Latent Dirichlet Allocation on GPUs.

[DOI]

Gordon Euhyun Moon

,

,

Aravind Sukumaran-Rajam

,

Bortik Bandyopadhyay

,

Srinivasan Parthasarathy

,

Proceedings of the Computational Science - ICCS 2018, 2018

Efficient sparse-matrix multi-vector product on GPUs.

[DOI]

,

Aravind Sukumaran-Rajam

,

Bortik Bandyopadhyay

,

,

Süreyya Emre Kurt

,

,

Shivani Sabhlok

,

Ümit V. Çatalyürek

,

Srinivasan Parthasarathy

,

Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing, 2018

Sampled Dense Matrix Multiplication for High-Performance Machine Learning.

[DOI]

,

Aravind Sukumaran-Rajam

,

Süreyya Emre Kurt

,

,

Proceedings of the 25th IEEE International Conference on High Performance Computing, 2018

2017

Parallel CCD++ on GPU for Matrix Factorization.

[DOI]

,

Aravind Sukumaran-Rajam

,

Rakshith Kunchum

,

Proceedings of the General Purpose GPUs, 2017

On improving performance of sparse matrix-matrix multiplication on GPUs.

[DOI]

Rakshith Kunchum

,

,

Aravind Sukumaran-Rajam

,

,

,

Proceedings of the International Conference on Supercomputing, 2017

Parallel LDA with Over-Decomposition.

[DOI]

Gordon Euhyun Moon

,

Aravind Sukumaran-Rajam

,

Proceedings of the 24th IEEE International Conference on High Performance Computing Workshops, 2017

Characterization of Data Movement Requirements for Sparse Matrix Computations on GPUs.

[DOI]

Süreyya Emre Kurt

,

,

,

Aravind Sukumaran-Rajam

,

Proceedings of the 24th IEEE International Conference on High Performance Computing, 2017

POSTER: Statement Reordering to Alleviate Register Pressure for Stencils on GPUs.

[DOI]

Prashant Singh Rawat

,

Aravind Sukumaran-Rajam

,

,

Fabrice Rastello

,

Louis-Noël Pouchet

,

Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

MultiGraph: Efficient Graph Processing on GPUs.

[DOI]

,

Aravind Sukumaran-Rajam

,

,

Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

2016

The Polyhedral Model of Nonlinear Loops.

[DOI]

Aravind Sukumaran-Rajam

,

Philippe Clauss

ACM Trans. Archit. Code Optim., 2016

2015

Beyond the Realm of the Polyhedral Model: Combining Speculative Program Parallelization with Polyhedral Compilation. (Au delà des limites du modèle polyédrique: en combinant la parallélisation spéculative de programmes et la compilation polyédrique).

[DOI]

Aravind Sukumaran-Rajam

PhD thesis, 2015

Speculative Runtime Parallelization of Loop Nests: Towards Greater Scope and Efficiency.

[DOI]

Aravind Sukumaran-Rajam

,

Luis Esteban Campostrini

,

Juan Manuel Martinez Caamaño

,

Philippe Clauss

Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015

2014

Speculative Program Parallelization with Scalable and Decentralized Runtime Verification.

[DOI]

Aravind Sukumaran-Rajam

,

Juan Manuel Martinez Caamaño

,

,

Alexandra Jimborean

,

Philippe Clauss

Proceedings of the Runtime Verification - 5th International Conference, 2014

2013

Online Dynamic Dependence Analysis for Speculative Polyhedral Parallelization.

[DOI]

Alexandra Jimborean

,

Philippe Clauss

,

Juan Manuel Martinez Caamaño

,

Aravind Sukumaran-Rajam

Proceedings of the Euro-Par 2013 Parallel Processing, 2013

Loading...