Kiran Kumar Matam

According to our database1, Kiran Kumar Matam authored at least 20 papers between 2010 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
QuickUpdate: a Real-Time Personalization System for Large-Scale Recommendation Models.
Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation, 2024

2022
Check-N-Run: a Checkpointing System for Training Deep Learning Recommendation Models.
Proceedings of the 19th USENIX Symposium on Networked Systems Design and Implementation, 2022


2021
High-performance, Distributed Training of Large-scale Deep Learning Recommendation Models.
CoRR, 2021

MultiLogVC: Efficient Out-of-Core Graph Processing Framework for Flash Storage.
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

2020
Check-N-Run: A Checkpointing System for Training Recommendation Models.
CoRR, 2020

2019
Efficient automatic parallelization of a single GPU program for a multiple GPU system.
Integr., 2019

PartitionedVC: Partitioned External Memory Graph Analytics Framework for SSDs.
CoRR, 2019

GraphSSD: graph semantics aware SSD.
Proceedings of the 46th International Symposium on Computer Architecture, 2019

2017
Summarizer: trading communication with computing near storage.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

2013
CPU and/or GPU: Revisiting the GPU Vs. CPU Myth
CoRR, 2013

Energy-efficient large-scale matrix multiplication on FPGAs.
Proceedings of the 2012 International Conference on Reconfigurable Computing and FPGAs, 2013

Evaluating energy efficiency of floating point matrix multiplication on FPGAs.
Proceedings of the IEEE High Performance Extreme Computing Conference, 2013

Energy efficient architecture for matrix multiplication on FPGAs.
Proceedings of the 23rd International Conference on Field programmable Logic and Applications, 2013

High throughput and programmable online trafficclassifier on FPGA.
Proceedings of the 2013 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2013

Algorithm Design Methodology for Embedded Architectures.
Proceedings of the Reconfigurable Computing: Architectures, Tools and Applications, 2013

2012
Sparse matrix-matrix multiplication on modern architectures.
Proceedings of the 19th International Conference on High Performance Computing, 2012

2011
Accelerating Sparse Matrix Vector Multiplication in Iterative Methods Using GPU.
Proceedings of the International Conference on Parallel Processing, 2011

GPU Accelerated Lanczos Algorithm with Applications.
Proceedings of the 25th IEEE International Conference on Advanced Information Networking and Applications Workshops, 2011

2010
Efficient Discrete Range Searching primitives on the GPU with applications.
Proceedings of the 2010 International Conference on High Performance Computing, 2010


  Loading...