Matthew D. Sinclair

Orcid: 0000-0003-0189-7895

Affiliations:
  • University of Wisconsin-Madison, WI, USA


According to our database1, Matthew D. Sinclair authored at least 35 papers between 2011 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
T3: Transparent Tracking & Triggering for Fine-grained Overlap of Compute & Collectives.
CoRR, 2024

2023
Improving the Scalability of GPU Synchronization Primitives.
IEEE Trans. Parallel Distributed Syst., 2023

Fifty Years of the International Symposium on Computer Architecture: A Data-Driven Retrospective.
IEEE Micro, 2023

Fifty Years of ISCA: A data-driven retrospective on key trends.
CoRR, 2023

Integrating Per-Stream Stat Tracking into Accel-Sim.
CoRR, 2023

Computation vs. Communication Scaling for Future Transformers on Future Hardware.
CoRR, 2023

Tale of Two Cs: Computation vs. Communication Scaling for Future Transformers on Future Hardware.
Proceedings of the IEEE International Symposium on Workload Characterization, 2023

2022
A Case for Fine-grain Coherence Specialization in Heterogeneous Systems.
ACM Trans. Archit. Code Optim., 2022

Not All GPUs Are Created Equal: Characterizing Variability in Large-Scale, Accelerator-Rich Systems.
Proceedings of the SC22: International Conference for High Performance Computing, 2022

Demystifying BERT: System Design Implications.
Proceedings of the IEEE International Symposium on Workload Characterization, 2022

Only Buffer When You Need To: Reducing On-chip GPU Traffic with Reconfigurable Local Atomic Buffers.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

2021
Demystifying BERT: Implications for Accelerator Design.
CoRR, 2021

Enabling Reproducible and Agile Full-System Simulation.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2021

DENNI: Distributed Neural Network Inference on Severely Resource Constrained Edge Devices.
Proceedings of the IEEE International Performance, 2021

Deadline-Aware Offloading for High-Throughput Accelerators.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

2020
Inter-kernel Reuse-aware Thread Block Scheduling.
ACM Trans. Archit. Code Optim., 2020

The gem5 Simulator: Version 20.0+.
CoRR, 2020

Deterministic Atomic Buffering.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

Specializing Coherence, Consistency, and Push/Pull for GPU Graph Analytics.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2020

SeqPoint: Identifying Representative Iterations of Sequence-Based Neural Networks.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2020

Independent Forward Progress of Work-groups.
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

2019
Optimizing GPU Cache Policies for MI Workloads.
CoRR, 2019

Analyzing Machine Learning Workloads Using a Detailed GPU Simulator.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2019

Optimizing GPU Cache Policies for MI Workloads.
Proceedings of the IEEE International Symposium on Workload Characterization, 2019

2018
HPVM: heterogeneous parallel virtual machine.
Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

Spandex: A Flexible Interface for Efficient Heterogeneous Coherence.
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

Lost in Abstraction: Pitfalls of Analyzing GPUs at the Intermediate Language Level.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

2017
Efficient coherence and consistency for specialized memory hierarchies
PhD thesis, 2017

Chasing Away RAts: Semantics and Evaluation for Relaxed Atomics on Heterogeneous Systems.
Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

HeteroSync: A benchmark suite for fine-grained synchronization on tightly coupled GPUs.
Proceedings of the 2017 IEEE International Symposium on Workload Characterization, 2017

2016
GSI: A GPU Stall Inspector to characterize the sources of memory stalls for tightly coupled GPUs.
Proceedings of the 2016 IEEE International Symposium on Performance Analysis of Systems and Software, 2016

POSTER: hVISC: A Portable Abstraction for Heterogeneous Parallel Systems.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015
Efficient GPU synchronization without scopes: saying no to complex consistency models.
Proceedings of the 48th International Symposium on Microarchitecture, 2015

Stash: have your scratchpad and cache it too.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

2011
Sampling + DMR: practical and low-overhead permanent fault detection.
Proceedings of the 38th International Symposium on Computer Architecture (ISCA 2011), 2011


  Loading...