Tor M. Aamodt

Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

Energy-Efficient Realtime Motion Planning.

[BibT_eX]

[DOI]

Ningfeng Yang

Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

LumiBench: A Benchmark Suite for Hardware Ray Tracing.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Workload Characterization, 2023

Learning Label Encodings for Deep Regression.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

2022

Vulkan-Sim: A GPU Architecture Simulator for Ray Tracing.

[BibT_eX]

[DOI]

Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

Anticipating and eliminating redundant computations in accelerated sparse training.

[BibT_eX]

[DOI]

Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

Label Encoding for Regression Networks.

[BibT_eX]

[DOI]

Zi Yu Xue

Proceedings of the Tenth International Conference on Learning Representations, 2022

2021

Characterizing and Improving the Resilience of Accelerators in Autonomous Robots.

[BibT_eX]

[DOI]

CoRR, 2021

AC-GC: Lossy Activation Compression with Guaranteed Convergence.

[BibT_eX]

[DOI]

R. David Evans

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Intersection Prediction for Accelerated GPU Ray Tracing.

[BibT_eX]

[DOI]

Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

AccelWattch: A Power Modeling Framework for Modern GPUs.

[BibT_eX]

[DOI]

Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

2020

Energy Efficient On-Demand Dynamic Branch Prediction Models.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2020

Sparse Weight Activation Training.

[BibT_eX]

[DOI]

Md Aamir Raihan

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Deterministic Atomic Buffering.

[BibT_eX]

[DOI]

Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

Accel-Sim: An Extensible Simulation Framework for Validated GPU Modeling.

[BibT_eX]

[DOI]

Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

JPEG-ACT: Accelerating Deep Learning via Transform-based Lossy Compression.

[BibT_eX]

[DOI]

R. David Evans

Lufei Liu

Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

ReSprop: Reuse Sparsified Backpropagation.

[BibT_eX]

[DOI]

Negar Goli

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Bandwidth Bottleneck in Network-on-Chip for High-Throughput Processors.

[BibT_eX]

[DOI]

Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

2019

Hash-Based Ray Path Prediction: Skipping BVH Traversal Computation by Exploiting Ray Locality.

[BibT_eX]

[DOI]

Francois Demoullin

Ayub A. Gubran

CoRR, 2019

Surface Compression Using Dynamic Color Palettes.

[BibT_eX]

[DOI]

Ayub A. Gubran

Felix Huang

CoRR, 2019

Modeling Deep Learning Accelerator Enabled GPUs.

[BibT_eX]

[DOI]

Md Aamir Raihan

Negar Goli

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2019

Analyzing Machine Learning Workloads Using a Detailed GPU Simulator.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2019

A Detailed Model for Contemporary GPU Memory Systems.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2019

Emerald: graphics modeling for SoC systems.

[BibT_eX]

[DOI]

Ayub A. Gubran

Tayler Hicklin Hetherington

Proceedings of the 46th International Symposium on Computer Architecture, 2019

EDGE: Event-Driven GPU Execution.

[BibT_eX]

[DOI]

Maria Lubeznov

Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques, 2019

2018

General-Purpose Graphics Processor Architectures

[BibT_eX]

[DOI]

Wilson Wai Lun Fung

Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, ISBN: 978-3-031-01759-9, 2018

Proteus: Exploiting precision variability in deep neural networks.

[BibT_eX]

[DOI]

Raquel Urtasun

Parallel Comput., 2018

Value-Based Deep-Learning Acceleration.

[BibT_eX]

[DOI]

Alberto Delmas Lascorz

Sayeh Sharify

IEEE Micro, 2018

Exploring Modern GPU Memory System Design Challenges through Accurate Modeling.

[BibT_eX]

[DOI]

CoRR, 2018

Exploiting Typical Values to Accelerate Deep Learning.

[BibT_eX]

[DOI]

Alberto Delmas Lascorz

Sayeh Sharify

Zissis Poulos

Computer, 2018

Identifying and Exploiting Ineffectual Computations to Enable Hardware Acceleration of Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the 16th IEEE International New Circuits and Systems Conference, 2018

Warp Scheduling for Fine-Grained Synchronization.

[BibT_eX]

[DOI]

Ahmed ElTantawy

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

2017

CG-OoO: Energy-Efficient Coarse-Grain Out-of-Order Execution Near In-Order Energy with Near Out-of-Order Performance.

[BibT_eX]

[DOI]

Milad Mohammadi

ACM Trans. Archit. Code Optim., 2017

HoLiSwap: Reducing Wire Energy in L1 Caches.

[BibT_eX]

[DOI]

CoRR, 2017

A state machine block for high-level synthesis.

[BibT_eX]

[DOI]

Shadi Assadikhomami

Jennifer Ongko

Proceedings of the International Conference on Field Programmable Technology, 2017

2016

Reuse Distance-Based Probabilistic Cache Replacement.

[BibT_eX]

[DOI]

Subhasis Das

ACM Trans. Archit. Code Optim., 2016

CG-OoO: Energy-Efficient Coarse-Grain Out-of-Order Execution.

[BibT_eX]

[DOI]

Milad Mohammadi

CoRR, 2016

Inter-Core Locality Aware Memory Scheduling.

[BibT_eX]

[DOI]

Dongdong Li

IEEE Comput. Archit. Lett., 2016

Stripes: Bit-serial deep neural network computing.

[BibT_eX]

[DOI]

Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

MIMD synchronization on SIMT architectures.

[BibT_eX]

[DOI]

Ahmed ElTantawy

Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing.

[BibT_eX]

[DOI]

Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

Proteus: Exploiting Numerical Precision Variability in Deep Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 2016 International Conference on Supercomputing, 2016

2015

Reduced-Precision Strategies for Bounded Memory in Deep Neural Nets.

[BibT_eX]

[DOI]

Raquel Urtasun

CoRR, 2015

On-Demand Dynamic Branch Prediction.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2015

SLIP: reducing wire energy in the memory hierarchy.

[BibT_eX]

[DOI]

Subhasis Das

Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

MemcachedGPU: scaling-up scale-out key-value stores.

[BibT_eX]

[DOI]

Proceedings of the Sixth ACM Symposium on Cloud Computing, 2015

2014

Cache Coherence for GPU Architectures.

[BibT_eX]

[DOI]

IEEE Micro, 2014

Learning your limit: managing massively multithreaded caches through scheduling.

[BibT_eX]

[DOI]

Commun. ACM, 2014

Scaling usable computing capability.

[BibT_eX]

[DOI]

Proceedings of the XIVth International Conference on Embedded Computer Systems: Architectures, 2014

A scalable multi-path microarchitecture for efficient GPU control flow.

[BibT_eX]

[DOI]

Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, 2014

2013

Designing on-chip networks for throughput accelerators.

[BibT_eX]

[DOI]

John Kim

ACM Trans. Archit. Code Optim., 2013

Cache-Conscious Thread Scheduling for Massively Multithreaded Processors.

[BibT_eX]

[DOI]

IEEE Micro, 2013

Divergence-aware warp scheduling.

[BibT_eX]

[DOI]

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

Energy efficient GPU transactional memory via space-time optimizations.

[BibT_eX]

[DOI]

Wilson W. L. Fung

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

GPUWattch: enabling energy optimizations in GPGPUs.

[BibT_eX]

[DOI]

Jingwen Leng

Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013

Characterizing the performance benefits of fused CPU/GPU systems using FusionSim.

[BibT_eX]

[DOI]

Vitaly Zakharenko

Proceedings of the Design, Automation and Test in Europe, 2013

GPUDet: a deterministic GPU architecture.

[BibT_eX]

[DOI]

Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2013

2012

Formal-Analysis-Based Trace Computation for Post-Silicon Debug.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2012

Modeling Cache Contention and Throughput of Multiprogrammed Manycore Processors.

[BibT_eX]

[DOI]

Xi E. Chen

IEEE Trans. Computers, 2012

Kilo TM: Hardware Transactional Memory for GPU Architectures.

[BibT_eX]

[DOI]

IEEE Micro, 2012

Progressive-BackSpace: Efficient Predecessor Computation for Post-Silicon Debug.

[BibT_eX]

[DOI]

Johnny J. W. Kuan

Proceedings of the 13th International Workshop on Microprocessor Test and Verification, 2012

Cache-Conscious Wavefront Scheduling.

[BibT_eX]

[DOI]

Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012

Characterizing and evaluating a key-value store application on heterogeneous CPU-GPU systems.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2012

Small virtual channel routers on FPGAs through block RAM sharing.

[BibT_eX]

[DOI]

Jimmy Kwa

Proceedings of the 2012 International Conference on Field-Programmable Technology, 2012

2011

Hybrid analytical modeling of pending cache hits, data prefetching, and MSHRs.

[BibT_eX]

[DOI]

Xi E. Chen

ACM Trans. Archit. Code Optim., 2011

Hardware transactional memory for GPU architectures.

[BibT_eX]

[DOI]

Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

Thread block compaction for efficient SIMT control flow.

[BibT_eX]

[DOI]

Wilson W. L. Fung

Proceedings of the 17th International Conference on High-Performance Computer Architecture (HPCA-17 2011), 2011

2010

Throughput-Effective On-Chip Networks for Manycore Accelerators.

[BibT_eX]

[DOI]

John Kim

Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

Accelerating trace computation in post-silicon debug.

[BibT_eX]

[DOI]

Johnny J. W. Kuan

Steven J. E. Wilton

Proceedings of the 11th International Symposium on Quality of Electronic Design (ISQED 2010), 2010

Visualizing complex dynamics in many-core accelerator architectures.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2010

On-chip network design considerations for compute accelerators.

[BibT_eX]

[DOI]

John Kim

Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

2009

Dynamic warp formation: Efficient MIMD control flow on SIMD graphics hardware.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2009

Complexity effective memory access scheduling for many-core accelerator architectures.

[BibT_eX]

[DOI]

George L. Yuan

Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

Analyzing CUDA workloads using a detailed GPU simulator.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2009

A first-order fine-grained multithreaded throughput model.

[BibT_eX]

[DOI]

Xi E. Chen

Proceedings of the 15th International Conference on High-Performance Computer Architecture (HPCA-15 2009), 2009

2008

Compile-time and instruction-set methods for improving floating- to fixed-point conversion accuracy.

[BibT_eX]

[DOI]

Paul Chow

ACM Trans. Embed. Comput. Syst., 2008

Pangaea: a tightly-coupled IA32 heterogeneous chip multiprocessor.

[BibT_eX]

[DOI]

Ankur Khandelwal Groen

Hong Jiang

Hong Wang

Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, 2008

2007

Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow.

[BibT_eX]

[DOI]

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-40 2007), 2007

Optimization of data prefetch helper threads with path-expression based statistical modeling.

[BibT_eX]

[DOI]

Paul Chow

Proceedings of the 21th Annual International Conference on Supercomputing, 2007

2004

Hardware Support for Prescient Instruction Prefetch.

[BibT_eX]

[DOI]

Proceedings of the 10th International Conference on High-Performance Computer Architecture (HPCA-10 2004), 2004

2003

A framework for modeling and optimization of prescient instruction prefetch.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Measurements and Modeling of Computer Systems, 2003

2000

Embedded ISA support for enhanced floating-point to fixed-point ANSI-C compilation.

[BibT_eX]

[DOI]