Jaewoong Sim

Orcid: 0000-0002-0403-9928

According to our database1, Jaewoong Sim authored at least 32 papers between 2012 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
VR-Pipe: Streamlining Hardware Graphics Pipeline for Volume Rendering.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2025

2024
CuPBoP: Making CUDA a Portable Language.
ACM Trans. Design Autom. Electr. Syst., 2024

InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management.
Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation, 2024

Tender: Accelerating Large Language Models via Tensor Decomposition and Runtime Requantization.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

MoNDE: Mixture of Near-Data Experts for Large-Scale Sparse Models.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

GSCore: Efficient Radiance Field Rendering via Architectural Support for 3D Gaussian Splatting.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023
CuPBoP: A Framework to Make CUDA Portable.
Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2023

NeuRex: A Case for Neural Rendering Acceleration.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

SDM: Sharing-Enabled Disaggregated Memory System with Cache Coherent Compute Express Link.
Proceedings of the 32nd International Conference on Parallel Architectures and Compilation Techniques, 2023

2022
COX : Exposing CUDA Warp-level Functions to CPUs.
ACM Trans. Archit. Code Optim., 2022

CuPBoP: CUDA for Parallelized and Broad-range Processors.
CoRR, 2022

2021
COX: CUDA on X86 by Exposing Warp-Level Functions to CPUs.
CoRR, 2021

Supporting CUDA for an extended RISC-V GPU architecture.
CoRR, 2021

2020
Batch-Aware Unified Memory Management in GPUs for Irregular Workloads.
Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020

2019
Thermal-aware processing-in-memory instruction offloading.
J. Parallel Distributed Comput., 2019

Specializing FGPU for Persistent Deep Learning.
Proceedings of the 29th International Conference on Field Programmable Logic and Applications, 2019

Evaluating and Enhancing Intel® Stratix® 10 FPGAs for Persistent Real-Time AI.
Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

Why Compete When You Can Work Together: FPGA-ASIC Integration for Persistent RNNs.
Proceedings of the 27th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2019

2018
CoolPIM: Thermal-Aware Source Throttling for Efficient PIM Instruction Offloading.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

A Customizable Matrix Multiplication Framework for the Intel HARPv2 Xeon+FPGA Platform: A Deep Learning Case Study.
Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2018

2017
GraphPIM: Enabling Instruction-Level PIM Offloading in Graph Computing Frameworks.
Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

High performance binary neural networks on the Xeon+FPGA™ platform.
Proceedings of the 27th International Conference on Field Programmable Logic and Applications, 2017

Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks?
Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017

2016
Accelerating Binarized Neural Networks: Comparison of FPGA, CPU, GPU, and ASIC.
Proceedings of the 2016 International Conference on Field-Programmable Technology, 2016

Accelerating recurrent neural networks in analytics servers: Comparison of FPGA, CPU, GPU, and ASIC.
Proceedings of the 26th International Conference on Field Programmable Logic and Applications, 2016

2015
BSSync: Processing Near Memory for Machine Learning Workloads with Bounded Staleness Consistency Models.
Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

2014
A Configurable and Strong RAS Solution for Die-Stacked DRAM Caches.
IEEE Micro, 2014

Transparent Hardware Management of Stacked DRAM as Part of Memory.
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

2013
Resilient die-stacked DRAM caches.
Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013

2012
A performance analysis framework for identifying potential benefits in GPGPU applications.
Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2012

A Mostly-Clean DRAM Cache for Effective Hit Speculation and Self-Balancing Dispatch.
Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012

FLEXclusion: Balancing cache capacity and on-chip bandwidth via Flexible Exclusion.
Proceedings of the 39th International Symposium on Computer Architecture (ISCA 2012), 2012


  Loading...