Yongjun Park

Youjip Won

Proceedings of the 9th Non-Volatile Memory Systems and Applications Symposium, 2020

Optimization of GPU-based Sparse Matrix Multiplication for Large Sparse Networks.

[BibT_eX]

[DOI]

Proceedings of the 36th IEEE International Conference on Data Engineering, 2020

Convergence-Aware Neural Network Training.

[BibT_eX]

[DOI]

Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

Navigator: Dynamic Multi-kernel Scheduling to Improve GPU Performance.

[BibT_eX]

[DOI]

Jiho Kim

John Kim

Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

PreScaler: an efficient system-aware precision scaling framework on heterogeneous systems.

[BibT_eX]

[DOI]

Seokwon Kang

Kyunghwan Choi

Proceedings of the CGO '20: 18th ACM/IEEE International Symposium on Code Generation and Optimization, 2020

2019

Adaptive Cooperation of Prefetching and Warp Scheduling on GPUs.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2019

Improving GPU Multitasking Efficiency Using Dynamic Resource Sharing.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2019

Microarchitecture-Aware Code Generation for Deep Learning on Single-ISA Heterogeneous Multi-Core Mobile Processors.

[BibT_eX]

[DOI]

IEEE Access, 2019

A compiler-based approach for GPGPU performance calibration using TLP modulation (WIP paper).

[BibT_eX]

[DOI]

Yongseung Yu

Seokwon Kang

Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, 2019

GATE: A Generalized Dataflow-level Approximation Tuning Engine For Data Parallel Architectures.

[BibT_eX]

[DOI]

Proceedings of the 56th Annual Design Automation Conference 2019, 2019

2018

WASP: Selective Data Prefetching with Monitoring Runtime Warp Progress on GPUs.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2018

Runtime Profiling of OpenCL Workloads Using LLVM-based Code Instrumentation.

[BibT_eX]

[DOI]

Yongseung Yu

Seokwon Kang

Proceedings of the TENCON 2018, 2018

Automated Neural Network Accelerator Generation Framework for Multiple Neural Network Applications.

[BibT_eX]

[DOI]

Proceedings of the TENCON 2018, 2018

Core-level DVFS for Spatial Multitasking GPUs.

[BibT_eX]

[DOI]

Jehee Cha

Jiho Kim

Proceedings of the TENCON 2018, 2018

Automatic code conversion for non-volatile memory.

[BibT_eX]

[DOI]

Proceedings of the 33rd Annual ACM Symposium on Applied Computing, 2018

NN compactor: Minimizing memory and logic resources for small neural networks.

[BibT_eX]

[DOI]

Seongmin Hong

Inho Lee

Proceedings of the 2018 Design, Automation & Test in Europe Conference & Exhibition, 2018

2017

Selective DRAM cache bypassing for improving bandwidth on DRAM/NVM hybrid main memory systems.

[BibT_eX]

[DOI]

IEICE Electron. Express, 2017

Efficient GPU multitasking with latency minimization and cache boosting.

[BibT_eX]

[DOI]

Jiho Kim

Minsung Chu

IEICE Electron. Express, 2017

A Comparative Study of Programming Environments Exploiting Heterogeneous Systems.

[BibT_eX]

[DOI]

IEEE Access, 2017

A FPGA-based neural accelerator for small IoT devices.

[BibT_eX]

[DOI]

Seongmin Hong

Proceedings of the International SoC Design Conference, 2017

Dynamic Resource Management for Efficient Utilization of Multitasking GPUs.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017

2016

An eDRAM-Based Approximate Register File for GPUs.

[BibT_eX]

[DOI]

IEEE Des. Test, 2016

A bypass first policy for energy-efficient last level caches.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Embedded Computer Systems: Architectures, 2016

APRES: Improving Cache Efficiency by Exploiting Load Characteristics on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

2015

SKMD: Single Kernel on Multiple Devices for Transparent CPU-GPU Collaboration.

[BibT_eX]

[DOI]

ACM Trans. Comput. Syst., 2015

ELF: maximizing memory-level parallelism for GPUs with coordinated warp and fetch scheduling.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2015

Enabling Efficient Alias Speculation.

[BibT_eX]

[DOI]

Soumyadeep Ghosh

Arun Raman

Proceedings of the 16th ACM SIGPLAN/SIGBED Conference on Languages, 2015

Chimera: Collaborative Preemption for Multitasking on a Shared GPU.

[BibT_eX]

[DOI]

Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, 2015

Fine Grain Cache Partitioning Using Per-Instruction Working Blocks.

[BibT_eX]

[DOI]

Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

2013

Efficient execution of augmented reality applications on mobile programmable accelerators.

[BibT_eX]

[DOI]

Proceedings of the 2013 International Conference on Field-Programmable Technology, 2013

Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems.

[BibT_eX]

[DOI]

Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013

2012

Libra: Tailoring SIMD Execution Using Heterogeneous Hardware and Dynamic Configurability.

[BibT_eX]

[DOI]

Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012

Efficient performance scaling of future CGRAs for mobile applications.

[BibT_eX]

[DOI]

Proceedings of the 2012 International Conference on Field-Programmable Technology, 2012

Process variation in near-threshold wide SIMD architectures.

[BibT_eX]

[DOI]

Proceedings of the 49th Annual Design Automation Conference 2012, 2012

SIMD defragmenter: efficient ILP realization on data-parallel architectures.

[BibT_eX]

[DOI]

Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems, 2012

2010

Resource recycling: putting idle resources to work on a composable accelerator.

[BibT_eX]

[DOI]

Proceedings of the 2010 International Conference on Compilers, 2010

2009

A dataflow-centric approach to design low power control paths in CGRAs.

[BibT_eX]

[DOI]

Hyunchul Park

Proceedings of the IEEE 7th Symposium on Application Specific Processors, 2009

Polymorphic pipeline array: a flexible multicore accelerator with virtualized execution for mobile multimedia applications.

[BibT_eX]

[DOI]

Hyunchul Park

Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

CGRA express: accelerating execution using dynamic operation fusion.

[BibT_eX]

[DOI]

Hyunchul Park