Youngmin Yi

Orcid: 0000-0001-9802-2109

According to our database1, Youngmin Yi authored at least 44 papers between 2002 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
WiseGraph: Optimizing GNN with Joint Workload Partition of Graph and Operations.
Proceedings of the Nineteenth European Conference on Computer Systems, 2024

2021
Exploiting Activation Sparsity for Fast CNN Inference on Mobile GPUs.
ACM Trans. Embed. Comput. Syst., 2021

Performance Evaluation of INT8 Quantized Inference on Mobile GPUs.
IEEE Access, 2021

Minimizing GPU Kernel Launch Overhead in Deep Learning Inference on Mobile GPUs.
Proceedings of the HotMobile '21: The 22nd International Workshop on Mobile Computing Systems and Applications, 2021

Understanding and bridging the gaps in current GNN performance optimizations.
Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

HybridHadoop: CPU-GPU Hybrid Scheduling in Hadoop.
Proceedings of the HPC Asia 2021: The International Conference on High Performance Computing in Asia-Pacific Region, 2021

2020
Scheduling of Deep Learning Applications Onto Heterogeneous Processors in an Embedded Device.
IEEE Access, 2020

Towards Real-time CNN Inference from a Video Stream on a Mobile GPU (WiP Paper).
Proceedings of the 21st ACM SIGPLAN/SIGBED International Conference on Languages, 2020

BPNet: Branch-pruned Conditional Neural Network for Systematic Time-accuracy Tradeoff.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

GOPipe: A Granularity-Oblivious Programming Framework for Pipelined Stencil Executions on GPU.
Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

2019
BPNet: Branch-pruned conditional neural network for systematic time-accuracy tradeoff in DNN inference: work-in-progress.
Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis Companion, 2019

HiWayLib: A Software Framework for Enabling High Performance Communications for Heterogeneous Pipeline Computations.
Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019

2018
Distributed Video Decoding on Hadoop.
IEICE Trans. Inf. Syst., 2018

Real-Time and Energy-Efficient Face Detection on CPU-GPU Heterogeneous Embedded Platforms.
IEICE Trans. Inf. Syst., 2018

NNsim: fast performance estimation based on sampled simulation of GPGPU kernels for neural networks.
Proceedings of the 55th Annual Design Automation Conference, 2018

2017
Versapipe: a versatile programming framework for pipelined computing on GPU.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

2016
Acceleration of Word2vec Using GPUs.
Proceedings of the Neural Information Processing - 23rd International Conference, 2016

2015
Fast GPU-in-the-loop simulation technique at OpenGL ES API level for Android Graphics Applications.
Proceedings of the 2015 International Symposium on Rapid System Prototyping, 2015

Real-time face detection in Full HD images exploiting both embedded CPU and GPU.
Proceedings of the 2015 IEEE International Conference on Multimedia and Expo, 2015

2014
Efficient parallel CKY parsing using GPUs.
J. Log. Comput., 2014

An efficient parallelization technique for x264 encoder on heterogeneous platforms consisting of CPUs and GPUs.
J. Real Time Image Process., 2014

Real-time integrated face detection and recognition on embedded GPGPUs.
Proceedings of the 12th IEEE Symposium on Embedded Systems for Real-time Multimedia, 2014

Hardware-in-the-loop simulation of Android GPGPU applications.
Proceedings of the 12th IEEE Symposium on Embedded Systems for Real-time Multimedia, 2014

Hardware-in-the-loop Simulation for CPU/GPU Heterogeneous Platforms.
Proceedings of the 51st Annual Design Automation Conference 2014, 2014

2013
Active disk meets flash: a case for intelligent SSDs.
Proceedings of the International Conference on Supercomputing, 2013

Fast PCA-based face recognition on GPUs.
Proceedings of the IEEE International Conference on Acoustics, 2013

2012
A cycle-level parallel simulation technique exploiting both space and time parallelism.
Proceedings of the 23rd IEEE International Symposium on Rapid System Prototyping, 2012

2011
Automatic CUDA Code Synthesis Framework for Multicore CPU and GPU Architectures.
Proceedings of the Parallel Processing and Applied Mathematics, 2011

Efficient Parallel CKY Parsing on GPUs.
Proceedings of the 12th International Conference on Parsing Technologies, 2011

An efficient parallel motion estimation algorithm and X264 parallelization in CUDA.
Proceedings of the 2011 Conference on Design and Architectures for Signal and Image Processing, 2011

2009
Parallel scalability in speech recognition.
IEEE Signal Process. Mag., 2009

A timed HW/SW coemulation technique for fast yet accurate system verification.
Proceedings of the 2009 International Conference on Embedded Computer Systems: Architectures, 2009

A fully data parallel WFST-based large vocabulary continuous speech recognition on a graphics processing unit.
Proceedings of the INTERSPEECH 2009, 2009

Scalable HMM based inference engine in large vocabulary continuous speech recognition.
Proceedings of the 2009 IEEE International Conference on Multimedia and Expo, 2009

2007
PeaCE: A hardware-software codesign environment for multimedia embedded systems.
ACM Trans. Design Autom. Electr. Syst., 2007

Fast and Accurate Cosimulation of MPSoC Using Trace-Driven Virtual Synchronization.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2007

Communication Architecture Simulation on the Virtual Synchronization Framework.
Proceedings of the Embedded Computer Systems: Architectures, 2007

2006
Hardware-Software Codesign of Multimedia Embedded Systems: the PeaCE.
Proceedings of the 12th IEEE Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA 2006), 2006

2005
Trace-driven HW/SW cosimulation using virtual synchronization technique.
Proceedings of the 42nd Design Automation Conference, 2005

Embedded software generation from system level specification for multi-tasking embedded systems.
Proceedings of the 2005 Conference on Asia South Pacific Design Automation, 2005

2004
Fast design space exploration framework with an efficient performance estimation technique.
Proceedings of the 2nd Workshop on Embedded Systems for Real-Time Multimedia, 2004

2003
Fast and Time-Accurate Cosimulation with OS Scheduler Modeling.
Des. Autom. Embed. Syst., 2003

Virtual synchronization technique with OS modeling for fast and time-accurate cosimulation.
Proceedings of the 1st IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2003

2002
Virtual Synchronization for Fast Distributed Cosimulation of Dataflow Task Graphs.
Proceedings of the 15th International Symposium on System Synthesis (ISSS 2002), 2002


  Loading...