Murali Annavaram

Orcid: 0000-0002-4633-6867

Affiliations:
  • University of Southern California, Los Angeles, USA


According to our database1, Murali Annavaram authored at least 132 papers between 1996 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
An efficient sequential consistency implementation with dynamic race detection for GPUs.
J. Parallel Distributed Comput., May, 2024

Differentially Private Next-Token Prediction of Large Language Models.
CoRR, 2024

Edge Private Graph Neural Networks with Singular Value Perturbation.
CoRR, 2024

Ethos: Rectifying Language Models in Orthogonal Parameter Space.
CoRR, 2024

Differentially Private Knowledge Distillation via Synthetic Text Generation.
CoRR, 2024

2023
FLIXR: Embedding Index Into Flash Translation Layer in SSDs.
IEEE Trans. Computers, 2023

CompactTag: Minimizing Computation Overheads in Actively-Secure MPC for Deep Neural Networks.
IACR Cryptol. ePrint Arch., 2023

SuperBP: Design Space Exploration of Perceptron-Based Branch Predictors for Superconducting CPUs.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

LAORAM: A Look Ahead ORAM Architecture for Training Large Embedding Tables.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

2022
Data Leakage via Access Patterns of Sparse Features in Deep Learning-based Recommendation Systems.
CoRR, 2022

MPC-Pipe: an Efficient Pipeline Scheme for Secure Multi-party Machine Learning Inference.
CoRR, 2022

Check-N-Run: a Checkpointing System for Training Deep Learning Recommendation Models.
Proceedings of the 19th USENIX Symposium on Networked Systems Design and Implementation, 2022

StATIK: Structure and Text for Inductive Knowledge Graph Completion.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2022, 2022

PageORAM: An Efficient DRAM Page Aware ORAM Strategy.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

Characterization of MPC-based Private Inference for Transformer-based Models.
Proceedings of the International IEEE Symposium on Performance Analysis of Systems and Software, 2022

Adaptive Verifiable Coded Computing: Towards Fast, Secure and Private Distributed Machine Learning.
Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

Enhancing Privacy Through Domain Adaptive Noise Injection For Speech Emotion Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022

HiPerRF: A Dual-Bit Dense Storage SFQ Register File.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

SpreadGNN: Decentralized Multi-Task Federated Learning for Graph Neural Networks on Molecular Data.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
Attribute Inference Attack of Speech Emotion Recognition in Federated Learning Settings.
CoRR, 2021

Verifiable Coded Computing: Towards Fast, Secure and Private Distributed Machine Learning.
CoRR, 2021

Look Ahead ORAM: Obfuscating Addresses in Recommendation Model Training.
CoRR, 2021

SpreadGNN: Serverless Multi-task Federated Learning for Graph Neural Networks.
CoRR, 2021

Byzantine-Robust and Privacy-Preserving Framework for FedML.
CoRR, 2021

Privacy and Integrity Preserving Training Using Trusted Hardware.
CoRR, 2021

FedGraphNN: A Federated Learning System and Benchmark for Graph Neural Networks.
CoRR, 2021

Jupiter: a networked computing architecture.
Proceedings of the UCC '21: 2021 IEEE/ACM 14th International Conference on Utility and Cloud Computing, Leicester, United Kingdom, December 6 - 9, 2021, 2021

cDLRM: Look Ahead Caching for Scalable Training of Recommendation Models.
Proceedings of the RecSys '21: Fifteenth ACM Conference on Recommender Systems, Amsterdam, The Netherlands, 27 September 2021, 2021

Tactical Jupiter: Dynamic Scheduling of Dispersed Computations in Tactical MANETs.
Proceedings of the 2021 IEEE Military Communications Conference, 2021

DarKnight: An Accelerated Framework for Privacy and Integrity Preserving Deep Learning Using Trusted Hardware.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

MultiLogVC: Efficient Out-of-Core Graph Processing Framework for Flash Storage.
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

Origami Inference: Private Inference Using Hardware Enclaves.
Proceedings of the 14th IEEE International Conference on Cloud Computing, 2021

2020
Distributed Training of Graph Convolutional Networks using Subgraph Approximation.
CoRR, 2020

Check-N-Run: A Checkpointing System for Training Recommendation Models.
CoRR, 2020

Group Knowledge Transfer: Collaborative Training of Large CNNs on the Edge.
CoRR, 2020

FedML: A Research Library and Benchmark for Federated Machine Learning.
CoRR, 2020

DarKnight: A Data Privacy Scheme for Training and Inference of Deep Neural Networks.
CoRR, 2020

FedNAS: Federated Deep Learning via Neural Architecture Search.
CoRR, 2020

Group Knowledge Transfer: Federated Learning of Large CNNs at the Edge.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Collage Inference: Using Coded Redundancy for Lowering Latency Variation in Distributed Image Classification Systems.
Proceedings of the 40th IEEE International Conference on Distributed Computing Systems, 2020

2019
Adaptive Cooperation of Prefetching and Warp Scheduling on GPUs.
IEEE Trans. Computers, 2019

An Efficient GPU Cache Architecture for Applications with Irregular Memory Access Patterns.
ACM Trans. Archit. Code Optim., 2019

Efficient automatic parallelization of a single GPU program for a multiple GPU system.
Integr., 2019

Privacy-Preserving Inference in Machine Learning Services Using Trusted Execution Environments.
CoRR, 2019

Train Where the Data is: A Case for Bandwidth Efficient Coded Training.
CoRR, 2019

Collage Inference: Achieving low tail latency during distributed image classification using coded redundancy models.
CoRR, 2019

PartitionedVC: Partitioned External Memory Graph Analytics Framework for SSDs.
CoRR, 2019

Collage Inference: Tolerating Stragglers in Distributed Neural Network Inference using Coding.
CoRR, 2019

Distributed Matrix Multiplication Using Speed Adaptive Coding.
CoRR, 2019

Slack squeeze coded computing for adaptive straggler mitigation.
Proceedings of the International Conference for High Performance Computing, 2019

Linebacker: preserving victim cache lines in idle register files of GPUs.
Proceedings of the 46th International Symposium on Computer Architecture, 2019

GraphSSD: graph semantics aware SSD.
Proceedings of the 46th International Symposium on Computer Architecture, 2019

GPUGuard: mitigating contention based side and covert channel attacks on GPUs.
Proceedings of the ACM International Conference on Supercomputing, 2019

2018
GradiVeQ: Vector Quantization for Bandwidth-Efficient Gradient Aggregation in Distributed CNN Training.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

CTA-Aware Prefetching and Scheduling for GPU.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

G-TSC: Timestamp Based Coherence for GPUs.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

2017
Improving Energy Efficiency of GPUs through Data Compression and Compressed Execution.
IEEE Trans. Computers, 2017

Summarizer: trading communication with computing near storage.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

Docker characterization on high performance SSDs.
Proceedings of the 2017 IEEE International Symposium on Performance Analysis of Systems and Software, 2017

Access Pattern-Aware Cache Management for Improving Data Utilization in GPU.
Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

Power Efficient Sharing-Aware GPU Data Management.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Elastic-Cache: GPU Cache Architecture for Efficient Fine- and Coarse-Grained Cache-Line Management.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

G-Scalar: Cost-Effective Generalized Scalar Execution Architecture for Power-Efficient GPUs.
Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

Pilot Register File: Energy Efficient Partitioned Register File for GPUs.
Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

CIRCE - a runtime scheduler for DAG-based dispersed computing: demo.
Proceedings of the Second ACM/IEEE Symposium on Edge Computing, San Jose / Silicon Valley, 2017

2016
Virtual Thread: Maximizing Thread-Level Parallelism beyond GPU Scheduling Limit.
Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

Warped-Slicer: Efficient Intra-SM Slicing through Dynamic Resource Partitioning for GPU Multiprogramming.
Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

APRES: Improving Cache Efficiency by Exploiting Load Characteristics on GPUs.
Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

Origami: Folding Warps for Energy Efficient GPUs.
Proceedings of the 2016 International Conference on Supercomputing, 2016

Warped-preexecution: A GPU pre-execution approach for improving latency hiding.
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

Approximating warps with intra-warp operand value similarity.
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

2015
A privacy mechanism for mobile-based urban traffic monitoring.
Pervasive Mob. Comput., 2015

GPU register file virtualization.
Proceedings of the 48th International Symposium on Microarchitecture, 2015

Warped-compression: enabling power efficient GPUs through register compression.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

Revealing Critical Loads and Hidden Data Locality in GPGPU Applications.
Proceedings of the 2015 IEEE International Symposium on Workload Characterization, 2015

A Retrospective Look Back on the Road Towards Energy Proportionality.
Proceedings of the 2015 IEEE International Symposium on Workload Characterization, 2015

Warped-RE: Low-Cost Error Detection and Correction in GPUs.
Proceedings of the 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2015

2014
Extremely Low Cost Error Protection with Correctable Parity Protected Cache.
IEEE Trans. Computers, 2014

Efficient RAS support for die-stacked DRAM.
Proceedings of the 2014 International Test Conference, 2014

Graph processing on GPUs: Where are the bottlenecks?
Proceedings of the 2014 IEEE International Symposium on Workload Characterization, 2014

Implications of high energy proportional servers on cluster-wide energy proportionality.
Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, 2014

Warped-Shield: Tolerating Hard Faults in GPGPUs.
Proceedings of the 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2014

Reliability-Aware Exceptions: Tolerating intermittent faults in microprocessor array structures.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014

PATS: pattern aware scheduling and power gating for GPGPUs.
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013
NFRA: Generalized Network Flow-Based Resource Allocation for Hosting Centers.
IEEE Trans. Computers, 2013

Scaling the Energy Proportionality Wall with KnightShift.
IEEE Micro, 2013

Warped gates: gating aware scheduling and power gating for GPGPUs.
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

Warped register file: A power efficient register file for GPGPUs.
Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013

PHYS: Profiled-HYbrid Sampling for soft error reliability benchmarking.
Proceedings of the 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2013

2012
Enhancing Privacy and Accuracy in Probe Vehicle-Based Traffic Monitoring via Virtual Trip Lines.
IEEE Trans. Mob. Comput., 2012

KNOWME: An Energy-Efficient Multimodal Body Area Network for Physical Activity Monitoring.
ACM Trans. Embed. Comput. Syst., 2012

KNOWME: a case study in wireless body area sensor network design.
IEEE Commun. Mag., 2012

Semi-Markov state estimation and policy optimization for energy efficient mobile sensing.
Proceedings of the 9th Annual IEEE Communications Society Conference on Sensor, 2012

Warped-DMR: Light-weight Error Detection for GPGPU.
Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012

KnightShift: Scaling the Energy Proportionality Wall through Server-Level Heterogeneity.
Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012

A case for 3D stacked analog circuits in high-speed sensing systems.
Proceedings of the Thirteenth International Symposium on Quality Electronic Design, 2012

Wireless Body Area Networks: Where does energy go?
Proceedings of the 2012 IEEE International Symposium on Workload Characterization, 2012

Benchmarking ISA reliability to intermittent errors.
Proceedings of the 2012 IEEE International Symposium on Workload Characterization, 2012

MACAU: A Markov model for reliability evaluations of caches under Single-bit and Multi-bit Upsets.
Proceedings of the 18th IEEE International Symposium on High Performance Computer Architecture, 2012

2011
Optimal Time-Resource Allocation for Energy-Efficient Physical Activity Detection.
IEEE Trans. Signal Process., 2011

Soft error benchmarking of L2 caches with PARMA.
Proceedings of the SIGMETRICS 2011, 2011

CPPC: correctable parity protected cache.
Proceedings of the 38th International Symposium on Computer Architecture (ISCA 2011), 2011

Analyzing the effects of compiler optimizations on application reliability.
Proceedings of the 2011 IEEE International Symposium on Workload Characterization, 2011

A case for guarded power gating for multi-core processors.
Proceedings of the 17th International Conference on High-Performance Computer Architecture (HPCA-17 2011), 2011

Modeling high-level descriptions of real-life physical activities using latent topic modeling of multimodal sensor signals.
Proceedings of the 33rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2011

Cross-layer resilience using wearout aware design flow.
Proceedings of the 2011 IEEE/IFIP International Conference on Dependable Systems and Networks, 2011

2010
Adaptive and Speculative Slack Simulations of CMPs on CMPs.
Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

Guarded Power Gating in a Multi-core Setting.
Proceedings of the Computer Architecture, 2010

KnightShift: Shifting the I/O Burden in Datacenters to Management Processor for Energy Efficiency.
Proceedings of the Computer Architecture, 2010

Markov-optimal sensing policy for user state estimation in mobile devices.
Proceedings of the 9th International Conference on Information Processing in Sensor Networks, 2010

WearMon: Reliability monitoring using adaptive critical path testing.
Proceedings of the 2010 IEEE/IFIP International Conference on Dependable Systems and Networks, 2010

2009
SlackSim: a platform for parallel simulations of CMPs on CMPs.
SIGARCH Comput. Archit. News, 2009

A framework of energy efficient mobile sensing for automatic user state recognition.
Proceedings of the 7th International Conference on Mobile Systems, 2009

The Tradeoff between Energy Efficiency and User State Estimation Accuracy in Mobile Sensing.
Proceedings of the Mobile Computing, Applications, and Services, 2009

Tolerance of performance degrading faults for effective yield improvement.
Proceedings of the 2009 IEEE International Test Conference, 2009

Exploiting Simulation Slack to Improve Parallel Simulation Speed.
Proceedings of the ICPP 2009, 2009

Optimal Allocation of Time-Resources for Multihypothesis Activity-Level Detection.
Proceedings of the Distributed Computing in Sensor Systems, 2009

Optimal time-resource allocation for activity-detection via multimodal sensing.
Proceedings of the 4th International ICST Conference on Body Area Networks, 2009

2008
Game theoretic approach to location sharing with privacy in a community-based mobile safety application.
Proceedings of the 11th International Symposium on Modeling Analysis and Simulation of Wireless and Mobile Systems, 2008

Virtual trip lines for distributed privacy-preserving traffic monitoring.
Proceedings of the 6th International Conference on Mobile Systems, 2008

2007
Implications of Device Timing Variability on Full Chip Timing.
Proceedings of the 13st International Conference on High-Performance Computer Architecture (HPCA-13 2007), 2007

2006
Die Stacking (3D) Microarchitecture.
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-39 2006), 2006

2005
Mitigating Amdahl's Law through EPI Throttling.
Proceedings of the 32st International Symposium on Computer Architecture (ISCA 2005), 2005

2004
A case for shared instruction cache on chip multiprocessors running OLTP.
SIGARCH Comput. Archit. News, 2004

The Fuzzy Correlation between Code and Performance Predictability.
Proceedings of the 37th Annual International Symposium on Microarchitecture (MICRO-37 2004), 2004

2003
Call graph prefetching for database applications.
ACM Trans. Comput. Syst., 2003

Scaling and Charact rizing Database Workloads: Bridging the Gap between Research and Practice.
Proceedings of the 36th Annual International Symposium on Microarchitecture, 2003

2002
Branch Behavior of a Commercial OLTP Workload on Intel IA32 Processors.
Proceedings of the 20th International Conference on Computer Design (ICCD 2002), 2002

2001
Prefetch mechanisms that acquire and exploit application specific knowledge.
PhD thesis, 2001

Data prefetching by dependence graph precomputation.
Proceedings of the 28th Annual International Symposium on Computer Architecture, 2001

2000
Instruction overhead and data locality effects in superscalar processors.
Proceedings of the 2000 IEEE International Symposium on Performance Analysis of Systems and Software, 2000

1996
Comparison of two storage models in data-driven multithreaded architectures.
Proceedings of the Eighth IEEE Symposium on Parallel and Distributed Processing, 1996


  Loading...