Youngsok Kim

Orcid: 0000-0002-1015-9969

According to our database1, Youngsok Kim authored at least 37 papers between 2014 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Gem5-AVX: Extension of the Gem5 Simulator to Support AVX Instruction Sets.
IEEE Access, 2024

AGAThA: Fast and Efficient GPU Acceleration of Guided Sequence Alignment for Long Read Mapping.
Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2024

Smart-Infinity: Fast Large Language Model Training using Near-Storage Processing on a Real System.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024

2023
Enabling Fine-Grained Spatial Multitasking on Systolic-Array NPUs Using Dataflow Mirroring.
IEEE Trans. Computers, December, 2023

Design and Analysis of a Processing-in-DIMM Join Algorithm: A Case Study with UPMEM DIMMs.
Proc. ACM Manag. Data, 2023

GraNNDis: Efficient Unified Distributed Training Framework for Deep GNNs on Large Clusters.
CoRR, 2023

McCore: A Holistic Management of High-Performance Heterogeneous Multicores.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

SGCN: Exploiting Compressed-Sparse Features in Deep Graph Convolutional Network Accelerators.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

Pipe-BD: Pipelined Parallel Blockwise Distillation.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2023

Occamy: Memory-efficient GPU Compiler for DNN Inference.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware Communication Compression.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

Virtual PIM: Resource-Aware Dynamic DPU Allocation and Workload Scheduling Framework for Multi-DPU PIM Architecture.
Proceedings of the 32nd International Conference on Parallel Architectures and Compilation Techniques, 2023

2022
GuardiaNN: Fast and Secure On-Device Inference in TrustZone Using Embedded SRAM and Cryptographic Hardware.
Proceedings of the Middleware '22: 23rd International Middleware Conference, Quebec, QC, Canada, November 7, 2022

GCoM: a detailed GPU core model for accurate analytical modeling of modern GPUs.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

SALoBa: Maximizing Data Locality and Workload Balance for Fast Sequence Alignment on GPUs.
Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

Enabling hard constraints in differentiable neural network and accelerator co-exploration.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

It's All In the Teacher: Zero-Shot Quantization Brought Closer to the Teacher.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Slice-and-Forge: Making Better Use of Caches for Graph Convolutional Network Accelerators.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2022

Decoupling Schedule, Topology Layout, and Algorithm to Easily Enlarge the Tuning Space of GPU Graph Processing.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2022

2021
Making a Better Use of Caches for GCN Accelerators with Feature Slicing and Automatic Tile Morphing.
IEEE Comput. Archit. Lett., 2021

Qimera: Data-free Quantization with Synthetic Boundary Supporting Samples.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Dataflow Mirroring: Architectural Support for Highly Efficient Fine-Grained Spatial Multitasking on Systolic-Array NPUs.
Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021

DANCE: Differentiable Accelerator/Network Co-Exploration.
Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021

Thread-Aware Area-Efficient High-Level Synthesis Compiler for Embedded Devices.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2021

2020
Real-Time Object Detection System with Multi-Path Neural Networks.
Proceedings of the IEEE Real-Time and Embedded Technology and Applications Symposium, 2020

2019
FlexLearn: Fast and Highly Efficient Brain Simulations Using Flexible On-Chip Learning.
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

μLayer: Low Latency On-Device Inference Using Cooperative Single-Layer Acceleration and Processor-Friendly Quantization.
Proceedings of the Fourteenth EuroSys Conference 2019, Dresden, Germany, March 25-28, 2019, 2019

2018
Flexon: A Flexible Digital Neuron for Efficient Spiking Neural Network Simulations.
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

DCS-ctrl: A Fast and Flexible Device-Control Mechanism for Device-Centric Server Architecture.
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks.
Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018

2017
GPUpd: a fast and scalable multi-GPU architecture using cooperative projection and distribution.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

2016
Efficient footprint caching for Tagless DRAM Caches.
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

CloudSwap: A Cloud-Assisted Swap Mechanism for Mobile Devices.
Proceedings of the IEEE/ACM 16th International Symposium on Cluster, 2016

2015
DCS: a fast and scalable device-centric server architecture.
Proceedings of the 48th International Symposium on Microarchitecture, 2015

2014
ScaleGPU: GPU Architecture for Memory-Unaware GPU Programming.
IEEE Comput. Archit. Lett., 2014

Stealing Webpages Rendered on Your Browser by Exploiting GPU Vulnerabilities.
Proceedings of the 2014 IEEE Symposium on Security and Privacy, 2014

GPUdmm: A high-performance and memory-oblivious GPU architecture using dynamic memory management.
Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, 2014


  Loading...