Cody Hao Yu

Orcid: 0000-0002-9298-6254

Affiliations:
  • University of California, Los Angeles, USA (PhD 2019)


According to our database1, Cody Hao Yu authored at least 34 papers between 2014 and 2023.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2023
Efficiently Programming Large Language Models using SGLang.
CoRR, 2023

RAF: Holistic Compilation for Deep Learning Model Training.
CoRR, 2023

Decoupled Model Schedule for Deep Learning Training.
CoRR, 2023

Efficient Memory Management for Large Language Model Serving with PagedAttention.
Proceedings of the 29th Symposium on Operating Systems Principles, 2023

Grape: Practical and Efficient Graphed Execution for Dynamic Deep Neural Networks on GPUs.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

TensorIR: An Abstraction for Automatic Tensorized Program Optimization.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

Hidet: Task-Mapping Programming Paradigm for Deep Learning Tensor Programs.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2022
AutoDSE: Enabling Software Programmers to Design Efficient FPGA Accelerators.
ACM Trans. Design Autom. Electr. Syst., 2022

Tensor Program Optimization with Probabilistic Programs.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

DietCode: Automatic Optimization for Dynamic Tensor Programs.
Proceedings of Machine Learning and Systems 2022, 2022

2021
Bring Your Own Codegen to Deep Learning Compiler.
CoRR, 2021

AutoDSE: Enabling Software Programmers Design Efficient FPGA Accelerators.
Proceedings of the FPGA '21: The 2021 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Virtual Event, USA, February 28, 2021

MOCHA: Multinode Cost Optimization in Heterogeneous Clouds with Accelerators.
Proceedings of the FPGA '21: The 2021 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Virtual Event, USA, February 28, 2021

Lorien: Efficient Deep Learning Workloads Delivery.
Proceedings of the SoCC '21: ACM Symposium on Cloud Computing, 2021

2020
Ansor: Generating High-Performance Tensor Programs for Deep Learning.
Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation, 2020

Analysis and Optimization of the Implicit Broadcasts in FPGA HLS to Improve Maximum Frequency.
Proceedings of the FPGA '20: The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2020

2019
Raising an Abstraction Level of Compilation and Optimization for Customized Computing.
PhD thesis, 2019

Customizable Computing - From Single Chip to Datacenters.
Proc. IEEE, 2019

Overcoming Data Transfer Bottlenecks in DNN Accelerators via Layer-Conscious Memory Managment.
Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Reconfigurable Computing.
Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

2018
AutoAccel: Automated Accelerator Generation and Optimization with Composable, Parallel and Pipeline Architecture.
CoRR, 2018

Best-Effort FPGA Programming: A Few Steps Can Go a Long Way.
CoRR, 2018

TGPA: tile-grained pipeline architecture for low latency CNN inference.
Proceedings of the International Conference on Computer-Aided Design, 2018

From JVM to FPGA: Bridging Abstraction Hierarchy via Optimized Deep Pipelining.
Proceedings of the 10th USENIX Workshop on Hot Topics in Cloud Computing, 2018

Latte: Locality Aware Transformation for High-Level Synthesis.
Proceedings of the 26th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2018

S2FA: an accelerator automation framework for heterogeneous computing in datacenters.
Proceedings of the 55th Annual Design Automation Conference, 2018

Automated accelerator generation and optimization with composable, parallel and pipeline architecture.
Proceedings of the 55th Annual Design Automation Conference, 2018

2017
Automated Systolic Array Architecture Synthesis for High Throughput CNN Inference on FPGAs.
Proceedings of the 54th Annual Design Automation Conference, 2017

Bandwidth Optimization Through On-Chip Memory Restructuring for HLS.
Proceedings of the 54th Annual Design Automation Conference, 2017

2016
The SMEM Seeding Acceleration for DNA Sequence Alignment.
Proceedings of the 24th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2016

Invited - Heterogeneous datacenters: options and opportunities.
Proceedings of the 53rd Annual Design Automation Conference, 2016

Programming and Runtime Support to Blaze FPGA Accelerator Deployment at Datacenter Scale.
Proceedings of the Seventh ACM Symposium on Cloud Computing, 2016

2015
Impact of Loop Transformations on Software Reliability.
Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 2015

2014
Thermal-Aware On-Line Scheduler for 3-D Many-Core Processor Throughput Optimization.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2014


  Loading...