Daniel Wong

Orcid: 0000-0002-5376-7868

Affiliations:
  • University of California, Riverside, CA, USA


According to our database1, Daniel Wong authored at least 45 papers between 2010 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Unleashing the Power of Preemptive Priority-based Scheduling for Real-Time GPU Tasks.
CoRR, 2024

Characterizing In-Kernel Observability of Latency-Sensitive Request-Level Metrics with eBPF.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2024

GCAPS: GPU Context-Aware Preemptive Priority-Based Scheduling for Real-Time Tasks.
Proceedings of the 36th Euromicro Conference on Real-Time Systems, 2024

2023
VSCuda: LLM based CUDA extension for Visual Studio Code.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

WattWiser: Power & Resource-Efficient Scheduling for Multi-Model Multi-GPU Inference Servers.
Proceedings of the 14th International Green and Sustainable Computing Conference, 2023

CoFRIS: Coordinated Frequency and Resource Scaling for GPU Inference Servers.
Proceedings of the 14th International Green and Sustainable Computing Conference, 2023

KRISP: Enabling Kernel-wise RIght-sizing for Spatial Partitioned GPU Inference Servers.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

2022
PowerMorph: QoS-Aware Server Power Reshaping for Data Center Regulation Service.
ACM Trans. Archit. Code Optim., 2022

ScaleServe: a scalable multi-GPU machine learning inference system and benchmarking suite.
Proceedings of the GPGPU@PPoPP 2022: Proceedings of the 14th Workshop on General Purpose Processing Using GPU, 2022

GPUCalorie: Floorplan Estimation for GPU Thermal Evaluation.
Proceedings of the International IEEE Symposium on Performance Analysis of Systems and Software, 2022

2021
PAVER: Locality Graph-Based Thread Block Scheduling for GPUs.
ACM Trans. Archit. Code Optim., 2021

MAPA: multi-accelerator pattern allocation policy for multi-tenant GPU servers.
Proceedings of the International Conference for High Performance Computing, 2021

Energy Efficient Task Graph Execution Using Compute Unit Masking in GPUs.
Proceedings of the IEEE/ACM Redefining Scalability for Diversely Heterogeneous Architectures Workshop, 2021

LocalityGuru: A PTX Analyzer for Extracting Thread Block-level Locality in GPGPUs.
Proceedings of the IEEE International Conference on Networking, Architecture and Storage, 2021

ICAP: Designing Inrush Current Aware Power Gating Switch for GPGPU.
Proceedings of the IEEE International Conference on Networking, Architecture and Storage, 2021

LC-MEMENTO: A Memory Model for Accelerated Architectures.
Proceedings of the Languages and Compilers for Parallel Computing, 2021

BlockMaestro: Enabling Programmer-Transparent Task-based Execution in GPU Systems.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

2020
GPU-NEST: Characterizing Energy Efficiency of Multi-GPU Inference Servers.
IEEE Comput. Archit. Lett., 2020

Transferable Graph Optimizers for ML Compilers.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

BOW: Breathing Operand Windows to Exploit Bypassing in GPUs.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

High-Performance Parallel Radix Sort on FPGA.
Proceedings of the 28th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2020

2019
Speeding up Collective Communications Through Inter-GPU Re-Routing.
IEEE Comput. Archit. Lett., 2019

Locality-Aware GPU Register File.
IEEE Comput. Archit. Lett., 2019

Long-Term Reliability Management For Multitasking GPGPUs.
Proceedings of the 16th International Conference on Synthesis, 2019

μDPM: Dynamic Power Management for the Microsecond Era.
Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019

CORF: Coalescing Operand Register File for GPUs.
Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019

2018
Load-Triggered Warp Approximation on GPU.
Proceedings of the International Symposium on Low Power Electronics and Design, 2018

Joint Server and Network Energy Saving in Data Centers for Latency-Sensitive Applications.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

2017
Wireframe: supporting data-dependent parallelism through dependency graph execution in GPUs.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

2016
Squeezing Energy Savings Out Of Similar Data and Computation in GPGPUs.
Tiny Trans. Comput. Sci., 2016

STOMP: Statistical Techniques for Optimizing and Modeling Performance of Blocked Sparse Matrix Vector Multiplication.
Proceedings of the 28th International Symposium on Computer Architecture and High Performance Computing, 2016

DynSleep: Fine-grained Power Management for a Latency-Critical Data Center Application.
Proceedings of the 2016 International Symposium on Low Power Electronics and Design, 2016

Peak Efficiency Aware Scheduling for Highly Energy Proportional Servers.
Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

Origami: Folding Warps for Energy Efficient GPUs.
Proceedings of the 2016 International Conference on Supercomputing, 2016

Approximating warps with intra-warp operand value similarity.
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

Invited - Cross-layer modeling and optimization for electromigration induced reliability.
Proceedings of the 53rd Annual Design Automation Conference, 2016

2015
A Retrospective Look Back on the Road Towards Energy Proportionality.
Proceedings of the 2015 IEEE International Symposium on Workload Characterization, 2015

2014
Implications of high energy proportional servers on cluster-wide energy proportionality.
Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, 2014

2013
Scaling the Energy Proportionality Wall with KnightShift.
IEEE Micro, 2013

Warped gates: gating aware scheduling and power gating for GPGPUs.
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

2012
KnightShift: Scaling the Energy Proportionality Wall through Server-Level Heterogeneity.
Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012

2010
Adaptive and Speculative Slack Simulations of CMPs on CMPs.
Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

Implementing games on pinball machines.
Proceedings of the International Conference on the Foundations of Digital Games, 2010

Teaching Artificial Intelligence and Robotics Via Games.
Proceedings of the First Symposium on Education Advances in Artificial Intelligence, 2010

Teaching Robotics and Computer Science with Pinball Machines.
Proceedings of the Educational Robotics and Beyond, 2010


  Loading...