Neal Clayton Crago

Orcid: 0000-0001-7774-0531

According to our database1, Neal Clayton Crago authored at least 20 papers between 2008 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
WASP: Exploiting GPU Pipeline Parallelism with Hardware-Accelerated Automatic Warp Specialization.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024

2023
Symphony: Orchestrating Sparse and Dense Tensors with Hierarchical Heterogeneous Processing.
ACM Trans. Comput. Syst., 2023

Community-based Matrix Reordering for Sparse Linear Algebra Optimization.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2023

Accelerating Sparse Data Orchestration via Dynamic Reflexive Tiling (Extended Abstract).
Proceedings of the 2023 ACM Workshop on Highlights of Parallel Computing, 2023

Accelerating Sparse Data Orchestration via Dynamic Reflexive Tiling.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2021
P-OPT: Practical Optimal Cache Replacement for Graph Analytics.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

2019
Exposing Memory Access Patterns to Improve Instruction and Memory Efficiency in GPUs.
ACM Trans. Archit. Code Optim., 2019

ExTensor: An Accelerator for Sparse Tensor Algebra.
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

Buffets: An Efficient and Composable Storage Idiom for Explicit Decoupled Data Orchestration.
Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019

2015
Efficient Control and Communication Paradigms for Coarse-Grained Spatial Architectures.
ACM Trans. Comput. Syst., 2015

2014
Efficient Spatial Processing Element Control via Triggered Instructions.
IEEE Micro, 2014

Exploiting spatial architectures for edit distance algorithms.
Proceedings of the 2014 IEEE International Symposium on Performance Analysis of Systems and Software, 2014

2013
Triggered instructions: a control paradigm for spatially-programmed architectures.
Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013

Hybrid latency tolerance for robust energy-efficiency on 1000-core data parallel processors.
Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013

2012
Energy-efficient latency tolerance for 1000-core data parallel processors with decoupled strands
PhD thesis, 2012

Developing a parallel computational implementation of AMOEBA.
Int. J. Geogr. Inf. Sci., 2012

2011
OUTRIDER: efficient memory latency tolerance with decoupled strands.
Proceedings of the 38th International Symposium on Computer Architecture (ISCA 2011), 2011

Decoupled Architectures as a Low-Complexity Alternative to Out-of-order Execution.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

2009
Rigel: an architecture and scalable programming interface for a 1000-core accelerator.
Proceedings of the 36th International Symposium on Computer Architecture (ISCA 2009), 2009

2008
Tradeoffs in designing accelerator architectures for visual computing.
Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-41 2008), 2008


  Loading...