Sreenivas Subramoney

Orcid: 0000-0001-5372-0173

According to our database1, Sreenivas Subramoney authored at least 60 papers between 2000 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
ApHMM: Accelerating Profile Hidden Markov Models for Fast and Energy-efficient Genome Analysis.
ACM Trans. Archit. Code Optim., March, 2024

CiMNet: Towards Joint Optimization for DNN Architecture and Configuration for Compute-In-Memory Hardware.
CoRR, 2024

2023
Enhanced regularization for on-chip training using analog and temporary memory weights.
Neural Networks, August, 2023

Telescope: Telemetry at Terabyte Scale.
CoRR, 2023

Motivating Next-Generation OS Physical Memory Management for Terabyte-Scale NVMMs.
CoRR, 2023

Reclaimer: A Reinforcement Learning Approach to Dynamic Resource Allocation for Cloud Microservices.
CoRR, 2023

VEGETA: Vertically-Integrated Extensions for Sparse/Dense GEMM Tile Acceleration on CPUs.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

2022
A Unified Programmable Edge Matrix Processor for Deep Neural Networks and Matrix Algebra.
ACM Trans. Embed. Comput. Syst., September, 2022

A Survey of Deep Learning on CPUs: Opportunities and Co-Optimizations.
IEEE Trans. Neural Networks Learn. Syst., 2022

Unsupervised Learning of Depth, Camera Pose and Optical Flow from Monocular Video.
CoRR, 2022

Disrupting Low-Write-Energy vs. Fast-Read Dilemma in RRAM to Enable L1 Instruction Cache.
Proceedings of the VLSI Design and Test - 26th International Symposium, 2022

Speculative Code Compaction: Eliminating Dead Code via Speculative Microcode Transformations.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

Compute-In-Memory Using 6T SRAM for a Wide Variety of Workloads.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2022

Thermometer: profile-guided btb replacement for data center applications.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

Register file prefetching.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

SeGraM: a universal hardware accelerator for genomic sequence-to-graph and sequence-to-sequence mapping.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

Segment-Fusion: Hierarchical Context Fusion for Robust 3D Semantic Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
Robust 3D Scene Segmentation through Hierarchical and Learnable Part-Fusion.
CoRR, 2021

Page Table Management for Heterogeneous Memory Systems.
CoRR, 2021

PDede: Partitioned, Deduplicated, Delta Branch Target Buffer.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

Cryptographic Capability Computing.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

Twig: Profile-Guided BTB Prefetching for Data Center Applications.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

Pythia: A Customizable Hardware Prefetching Framework Using Online Reinforcement Learning.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

Radiant: efficient page table management for tiered memory systems.
Proceedings of the ISMM '21: 2021 ACM SIGPLAN International Symposium on Memory Management, 2021

REDUCT: Keep it Close, Keep it Cool! : Efficient Scaling of DNN Inference on Multi-core CPUs with Near-Cache Compute.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

ONT-X: An FPGA Approach to Real-time Portable Genomic Analysis.
Proceedings of the 29th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2021

RASA: Efficient Register-Aware Systolic Array Matrix Engine for CPU.
Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021

2020
REAL: REquest Arbitration in Last Level Caches.
ACM Trans. Embed. Comput. Syst., 2020

AccSS3D: Accelerator for Spatially Sparse 3D DNNs.
CoRR, 2020

Proximu: Efficiently Scaling DNN Inference in Multi-core CPUs through Near-Cache Compute.
CoRR, 2020

Look-Up Table based Energy Efficient Processing in Cache Support for Neural Network Acceleration.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

GenASM: A High-Performance, Low-Power Approximate String Matching Acceleration Framework for Genome Sequence Analysis.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

Characterization of Data Generating Neural Network Applications on x86 CPU Architecture.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2020

Auto-Predication of Critical Branches.
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

Focused Value Prediction.
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

Towards Noise Resilient SLAM.
Proceedings of the 2020 IEEE International Conference on Robotics and Automation, 2020

Descriptor Scoring for Feature Selection in Real-Time Visual Slam.
Proceedings of the IEEE International Conference on Image Processing, 2020

PSB-RNN: A Processing-in-Memory Systolic Array Architecture using Block Circulant Matrices for Recurrent Neural Networks.
Proceedings of the 2020 Design, Automation & Test in Europe Conference & Exhibition, 2020

Opportunistic Early Pipeline Re-steering for Data-dependent Branches.
Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

2019
Towards the adoption of Local Branch Predictors in Modern Out-of-Order Superscalar Processors.
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

DSPatch: Dual Spatial Pattern Prefetcher.
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

Bandwidth-Aware Last-Level Caching: Efficiently Coordinating Off-Chip Read and Write Bandwidth.
Proceedings of the 37th IEEE International Conference on Computer Design, 2019

Visual Inertial Odometry At the Edge: A Hardware-Software Co-design Approach for Ultra-low Latency and Power.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2019

2018
MARS: Memory Aware Reordered Source.
CoRR, 2018

Tackling memory access latency through DRAM row management.
Proceedings of the International Symposium on Memory Systems, 2018

Criticality Aware Tiered Cache Hierarchy: A Fundamental Relook at Multi-Level Cache Hierarchies.
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

Density Tradeoffs of Non-Volatile Memory as a Replacement for SRAM Based Last Level Cache.
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

Closed yet open DRAM: achieving low latency and high performance in DRAM memory systems.
Proceedings of the 55th Annual Design Automation Conference, 2018

2017
Cooperative Multi-Agent Reinforcement Learning-Based Co-optimization of Cores, Caches, and On-chip Network.
ACM Trans. Archit. Code Optim., 2017

Micro-Sector Cache: Improving Space Utilization in Sectored DRAM Caches.
ACM Trans. Archit. Code Optim., 2017

Near-Optimal Access Partitioning for Memory Hierarchies with Multiple Heterogeneous Bandwidth Sources.
Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

A coordinated multi-agent reinforcement learning approach to multi-level cache co-partitioning.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2017

2016
Base-Victim Compression: An Opportunistic Cache Compression Architecture.
Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

Machine Learned Machines: Adaptive co-optimization of caches, cores, and On-chip Network.
Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition, 2016

2014
Array scalarization in high level synthesis.
Proceedings of the 19th Asia and South Pacific Design Automation Conference, 2014

2013
Efficient management of last-level caches in graphics processors for 3D scene rendering workloads.
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

2012
Introducing hierarchy-awareness in replacement and bypass algorithms for last-level caches.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011
Bypass and insertion algorithms for exclusive last-level caches.
Proceedings of the 38th International Symposium on Computer Architecture (ISCA 2011), 2011

2004
Prefetch inection based on hardware monitoring and object metadata.
Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation 2004, 2004

2000
Cycles to Recycle: Garbage Collection on the IA-64.
Proceedings of the ISMM 2000, 2000


  Loading...