Aravind Sukumaran-Rajam

Orcid: 0000-0002-4062-0293

According to our database1, Aravind Sukumaran-Rajam authored at least 47 papers between 2013 and 2023.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2023
Accelerating Graph Computations on 3D NoC-Enabled PIM Architectures.
ACM Trans. Design Autom. Electr. Syst., 2023

cuAlign: Scalable Network Alignment on GPU Accelerators.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

TDC: Towards Extremely Efficient CNNs on GPUs via Hardware-Aware Tucker Decomposition.
Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2023

A Performance Portability Study Using Tensor Contraction Benchmarks.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

Communication Optimization for Distributed Execution of Graph Neural Networks.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

2022
Software/Hardware Co-design of 3D NoC-based GPU Architectures for Accelerated Graph Computations.
ACM Trans. Design Autom. Electr. Syst., 2022

High-Performance and Energy-Efficient 3D Manycore GPU Architecture for Accelerating Graph Analytics.
ACM J. Emerg. Technol. Comput. Syst., 2022

Sparsity-Aware Tensor Decomposition.
Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

Comprehensive Accelerator-Dataflow Co-design Optimization for Convolutional Neural Networks.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2022

Training of deep learning pipelines on memory-constrained GPUs via segmented fused-tiled execution.
Proceedings of the CC '22: 31st ACM SIGPLAN International Conference on Compiler Construction, Seoul, South Korea, April 2, 2022

Effective Performance Modeling and Domain-Specific Compiler Optimization of CNNs for GPUs.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2022

High-Performance Architecture Aware Sparse Convolutional Neural Networks for GPUs.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2022

2021
Efficient Distributed Algorithms for Convolutional Neural Networks.
Proceedings of the SPAA '21: 33rd ACM Symposium on Parallelism in Algorithms and Architectures, 2021

cuTS: scaling subgraph isomorphism on distributed multi-GPU systems using trie based data structure.
Proceedings of the International Conference for High Performance Computing, 2021

Analytical characterization and design space exploration for optimization of CNNs.
Proceedings of the ASPLOS '21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2021

2020
Efficient tiled sparse matrix multiplication through matrix signatures.
Proceedings of the International Conference for High Performance Computing, 2020

ALO-NMF: Accelerated Locality-Optimized Non-negative Matrix Factorization.
Proceedings of the KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2020

2019
PL-NMF: Parallel Locality-Optimized Non-negative Matrix Factorization.
CoRR, 2019

An efficient mixed-mode representation of sparse tensors.
Proceedings of the International Conference for High Performance Computing, 2019

Parallel Data-Local Training for Optimizing Word2Vec Embeddings for Word and Graph Embeddings.
Proceedings of the 2019 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments, 2019

Analytical cache modeling and tilesize optimization for tensor contractions.
Proceedings of the International Conference for High Performance Computing, 2019

Adaptive sparse tiling for sparse matrix multiplication.
Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2019

On Optimizing Complex Stencils on GPUs.
Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

Load-Balanced Sparse MTTKRP on GPUs.
Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

A Code Generator for High-Performance Tensor Contractions on GPUs.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2019

2018
Domain-Specific Optimization and Generation of High-Performance GPU Code for Stencil Computations.
Proc. IEEE, 2018

Associative instruction reordering to alleviate register pressure.
Proceedings of the International Conference for High Performance Computing, 2018

Register optimizations for stencils on GPUs.
Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

Performance modeling for GPUs using abstract kernel emulation.
Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

GPU code optimization using abstract kernel emulation and sensitivity analysis.
Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2018

TTLG - An Efficient Tensor Transposition Library for GPUs.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

Effective Machine Learning Based Format Selection and Performance Modeling for SpMV on GPUs.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018

Optimizing Tensor Contractions in CCSD(T) for Efficient Execution on GPUs.
Proceedings of the 32nd International Conference on Supercomputing, 2018

Parallel Latent Dirichlet Allocation on GPUs.
Proceedings of the Computational Science - ICCS 2018, 2018

Efficient sparse-matrix multi-vector product on GPUs.
Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing, 2018

Sampled Dense Matrix Multiplication for High-Performance Machine Learning.
Proceedings of the 25th IEEE International Conference on High Performance Computing, 2018

2017
Parallel CCD++ on GPU for Matrix Factorization.
Proceedings of the General Purpose GPUs, 2017

On improving performance of sparse matrix-matrix multiplication on GPUs.
Proceedings of the International Conference on Supercomputing, 2017

Parallel LDA with Over-Decomposition.
Proceedings of the 24th IEEE International Conference on High Performance Computing Workshops, 2017

Characterization of Data Movement Requirements for Sparse Matrix Computations on GPUs.
Proceedings of the 24th IEEE International Conference on High Performance Computing, 2017

POSTER: Statement Reordering to Alleviate Register Pressure for Stencils on GPUs.
Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

MultiGraph: Efficient Graph Processing on GPUs.
Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

2016
The Polyhedral Model of Nonlinear Loops.
ACM Trans. Archit. Code Optim., 2016

2015
Beyond the Realm of the Polyhedral Model: Combining Speculative Program Parallelization with Polyhedral Compilation. (Au delà des limites du modèle polyédrique: en combinant la parallélisation spéculative de programmes et la compilation polyédrique).
PhD thesis, 2015

Speculative Runtime Parallelization of Loop Nests: Towards Greater Scope and Efficiency.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015

2014
Speculative Program Parallelization with Scalable and Decentralized Runtime Verification.
Proceedings of the Runtime Verification - 5th International Conference, 2014

2013
Online Dynamic Dependence Analysis for Speculative Polyhedral Parallelization.
Proceedings of the Euro-Par 2013 Parallel Processing, 2013


  Loading...