Nuwan Jayasena

According to our database1, Nuwan Jayasena authored at least 39 papers between 2000 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
T3: Transparent Tracking & Triggering for Fine-grained Overlap of Compute & Collectives.
CoRR, 2024

2023
Inclusive-PIM: Hardware-Software Co-design for Broad Acceleration on Commercial PIM Architectures.
CoRR, 2023

Computation vs. Communication Scaling for Future Transformers on Future Hardware.
CoRR, 2023


Tale of Two Cs: Computation vs. Communication Scaling for Future Transformers on Future Hardware.
Proceedings of the IEEE International Symposium on Workload Characterization, 2023

2022
Demystifying BERT: System Design Implications.
Proceedings of the IEEE International Symposium on Workload Characterization, 2022

2021
Demystifying BERT: Implications for Accelerator Design.
CoRR, 2021

2020
Morton filters: fast, compressed sparse cuckoo filters.
VLDB J., 2020

Memory Performance Optimization.
Proceedings of the 10th IEEE/ACM Workshop on Irregular Applications: Architectures and Algorithms, 2020

SeqPoint: Identifying Representative Iterations of Sequence-Based Neural Networks.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2020

2019
Co-ML: a case for <u>co</u>llaborative <u>ML</u> acceleration using near-data processing.
Proceedings of the International Symposium on Memory Systems, 2019

2018
CODA: Enabling Co-location of Computation and Data for Multiple GPU Systems.
ACM Trans. Archit. Code Optim., 2018

Morton Filters: Faster, Space-Efficient Cuckoo Filters via Biasing, Compression, and Decoupled Logical Sparsity.
Proc. VLDB Endow., 2018

RegMutex: Inter-Warp GPU Register Time-Sharing.
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

2017
Exploring the Processing-in-Memory design space.
J. Syst. Archit., 2017

CODA: Enabling Co-location of Computation and Data for Near-Data Processing.
CoRR, 2017

MemPod: A Clustered Architecture for Efficient and Scalable Migration in Flat Address Space Multi-level Memories.
Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

DVFS Space Exploration in Power Constrained Processing-in-Memory Systems.
Proceedings of the Architecture of Computing Systems - ARCS 2017, 2017

HBM-Resident Prefetching for Heterogeneous Memory System.
Proceedings of the Architecture of Computing Systems - ARCS 2017, 2017

2016
Near-Memory Data Services.
IEEE Micro, 2016

Horton Tables: Fast Hash Tables for In-Memory Data-Intensive Computing.
Proceedings of the 2016 USENIX Annual Technical Conference, 2016

Analytical Study on Bandwidth Efficiency of Heterogeneous Memory Systems.
Proceedings of the Second International Symposium on Memory Systems, 2016

Fine-Grained Task Migration for Graph Algorithms Using Processing in Memory.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

HADM: Hybrid Analysis for Detection of Malware.
Proceedings of SAI Intelligent Systems Conference (IntelliSys) 2016, 2016

Prefetching Techniques for Near-memory Throughput Processors.
Proceedings of the 2016 International Conference on Supercomputing, 2016

2015
Achieving Exascale Capabilities through Heterogeneous Computing.
IEEE Micro, 2015

GPGPU performance and power estimation using machine learning.
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

Understanding idle behavior and power gating mechanisms in the context of modern benchmarks on CPU-GPU Integrated systems.
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

Processing-in-Memory: Exploring the Design Space.
Proceedings of the Architecture of Computing Systems - ARCS 2015, 2015

2014
A comparison of core power gating strategies implemented in modern hardware.
Proceedings of the ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems, 2014

Managing DRAM Latency Divergence in Irregular GPGPU Applications.
Proceedings of the International Conference for High Performance Computing, 2014

TOP-PIM: throughput-oriented programmable processing in memory.
Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing, 2014

Improving Node-Level MapReduce Performance Using Processing-in-Memory Technologies.
Proceedings of the Euro-Par 2014: Parallel Processing Workshops, 2014

2013
A new perspective on processing-in-memory architecture design.
Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness, 2013

Load balancing in a changing world: dealing with heterogeneity and performance variability.
Proceedings of the Computing Frontiers Conference, 2013

2005
Fault Tolerance Techniques for the Merrimac Streaming Supercomputer.
Proceedings of the ACM/IEEE SC2005 Conference on High Performance Networking and Computing, 2005

2004
Stream Register Files with Indexed Access.
Proceedings of the 10th International Conference on High-Performance Computer Architecture (HPCA-10 2004), 2004

2003
Merrimac: Supercomputing with Streams.
Proceedings of the ACM/IEEE SC2003 Conference on High Performance Networking and Computing, 2003

2000
Smart Memories: a modular reconfigurable architecture.
Proceedings of the 27th International Symposium on Computer Architecture (ISCA 2000), 2000


  Loading...