# Wonyong Sung

According to our database

Collaborative distances:

^{1}, Wonyong Sung authored at least 150 papers between 1980 and 2019.Collaborative distances:

## Timeline

#### Legend:

Book In proceedings Article PhD thesis Other## Links

#### On csauthors.net:

## Bibliography

2019

Compression of Deep Neural Networks with Structured Sparse Ternary Coding.

Signal Processing Systems, 2019

Empirical Analysis of Knowledge Distillation Technique for Optimization of Quantized Deep Neural Networks.

CoRR, 2019

Workload-aware Automatic Parallelization for Multi-GPU DNN Training.

Proceedings of the IEEE International Conference on Acoustics, 2019

Memorization Capacity of Deep Neural Networks under Parameter Quantization.

Proceedings of the IEEE International Conference on Acoustics, 2019

2018

Workload-aware Automatic Parallelization for Multi-GPU DNN Training.

CoRR, 2018

Single Stream Parallelization of Recurrent Neural Networks for Low Power and Fast Inference.

CoRR, 2018

On-Device End-to-end Speech Recognition with Multi-Step Parallel Rnns.

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Massively parallel computation of linear recurrence equations with graphics processing units.

Proceedings of the 18th International Conference on Embedded Computer Systems: Architectures, 2018

Fully Neural Network Based Speech Recognition on Mobile and Embedded Devices.

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Hierarchical Recurrent Neural Networks for Acoustic Modeling.

Proceedings of the Interspeech 2018, 2018

Character-level Language Modeling with Gated Hierarchical Recurrent Neural Networks.

Proceedings of the Interspeech 2018, 2018

2017

Structured Pruning of Deep Convolutional Neural Networks.

JETC, 2017

Fixed-point optimization of deep neural networks with adaptive step size retraining.

CoRR, 2017

Structured Sparse Ternary Weight Coding of Deep Neural Networks for Efficient Hardware Implementations.

CoRR, 2017

High-throughput decoding of block turbo codes on graphics processing units.

Proceedings of the 2017 IEEE International Workshop on Signal Processing Systems, 2017

Structured sparse ternary weight coding of deep neural networks for efficient hardware implementations.

Proceedings of the 2017 IEEE International Workshop on Signal Processing Systems, 2017

SVD-Softmax: Fast Softmax Approximation on Large Vocabulary Neural Networks.

Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Fixed-point optimization of deep neural networks with adaptive step size retraining.

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Character-level language modeling with hierarchical recurrent neural networks.

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

2016

Dynamic Hand Gesture Recognition for Wearable Devices with Low Complexity Recurrent Neural Networks.

CoRR, 2016

Quantized neural network design under weight capacity constraint.

CoRR, 2016

Generative Transfer Learning between Recurrent Neural Networks.

CoRR, 2016

Fpga Based Implementation of Deep Neural Networks Using On-chip Memory Only.

CoRR, 2016

FPGA-Based Low-Power Speech Recognition with Recurrent Neural Networks.

CoRR, 2016

Character-Level Language Modeling with Hierarchical Recurrent Neural Networks.

CoRR, 2016

Character-Level Incremental Speech Recognition with Recurrent Neural Networks.

CoRR, 2016

Compact Deep Convolutional Neural Networks With Coarse Pruning.

CoRR, 2016

FPGA-Based Low-Power Speech Recognition with Recurrent Neural Networks.

Proceedings of the 2016 IEEE International Workshop on Signal Processing Systems, 2016

Architecture exploration of a programmable neural network processor for embedded systems.

Proceedings of the International Conference on Embedded Computer Systems: Architectures, 2016

Dynamic hand gesture recognition for wearable devices with low complexity recurrent neural networks.

Proceedings of the IEEE International Symposium on Circuits and Systems, 2016

Sequence to Sequence Training of CTC-RNNs with Partial Windowing.

Proceedings of the 33nd International Conference on Machine Learning, 2016

Fixed-point performance analysis of recurrent neural networks.

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

FPGA based implementation of deep neural networks using on-chip memory only.

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Character-level incremental speech recognition with recurrent neural networks.

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Learning separable fixed-point kernels for deep convolutional neural networks.

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Evaluation of block turbo codes for long-haul optical networks.

Proceedings of the 2016 22nd Asia-Pacific Conference on Communications (APCC), 2016

2015

Low Energy Signal Processing Techniques for Reliability Improvement of High-Density NAND Flash Memory.

Signal Processing Systems, 2015

Resiliency of Deep Neural Networks under Quantization.

CoRR, 2015

Fixed Point Performance Analysis of Recurrent Neural Networks.

CoRR, 2015

Online Sequence Training of Recurrent Neural Networks with Connectionist Temporal Classification.

CoRR, 2015

Single stream parallelization of generalized LSTM-like RNNs on a GPU.

CoRR, 2015

Online Keyword Spotting with a Character-Level Recurrent Neural Network.

CoRR, 2015

Structured Pruning of Deep Convolutional Neural Networks.

CoRR, 2015

Single stream parallelization of generalized LSTM-like RNNs on a GPU.

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Fixed point optimization of deep convolutional neural networks for object recognition.

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

2014

Area-Efficient Parallel Syndrome Generators for Linear Block Codes.

Signal Processing Systems, 2014

Rate-0.96 LDPC Decoding VLSI for Soft-Decision Error Correction of NAND Flash Memory.

IEEE Trans. VLSI Syst., 2014

Decision Directed Estimation of Threshold Voltage Distribution in NAND Flash Memory.

IEEE Trans. Signal Processing, 2014

Power Modeling for GPU Architectures Using McPAT.

ACM Trans. Design Autom. Electr. Syst., 2014

Direct and indirect measurement of inter-cell capacitance in NAND flash memory.

Proceedings of the 2014 IEEE Workshop on Signal Processing Systems, 2014

Fixed-point feedforward deep neural network design using weights +1, 0, and -1.

Proceedings of the 2014 IEEE Workshop on Signal Processing Systems, 2014

Fault tolerance analysis of digital feed-forward deep neural networks.

Proceedings of the IEEE International Conference on Acoustics, 2014

X1000 real-time phoneme recognition VLSI using feed-forward deep neural networks.

Proceedings of the IEEE International Conference on Acoustics, 2014

2013

Least Squares Based Coupling Cancelation for MLC NAND Flash Memory with a Small Number of Voltage Sensing Operations.

Signal Processing Systems, 2013

Soft-Decision Error Correction of NAND Flash Memory with a Turbo Product Code.

Signal Processing Systems, 2013

Estimation of NAND Flash Memory Threshold Voltage Distribution for Optimum Soft-Decision Error Correction.

IEEE Trans. Signal Processing, 2013

Load Balanced Resampling for Real-Time Particle Filtering on Graphics Processing Units.

IEEE Trans. Signal Processing, 2013

Signal processing techniques for reliability improvement of sub-20NM NAND flash memory.

Proceedings of the IEEE Workshop on Signal Processing Systems, 2013

DRAM access reduction in GPUs by thread-block scheduling for overlapped data reuse.

Proceedings of the 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013), 2013

GPU based implementation of recursive digital filtering algorithms.

Proceedings of the IEEE International Conference on Acoustics, 2013

Soft-decision decoding with cell to cell interference removed signal in nand flash memory.

Proceedings of the IEEE International Conference on Acoustics, 2013

2012

Flexible and Expandable Speech Recognition Hardware with Weighted Finite State Transducers.

Signal Processing Systems, 2012

Parallel Computation of Adaptive Filtering Algorithms on Multi-Core Systems.

Signal Processing Systems, 2012

Guest Editors' Introduction.

Signal Processing Systems, 2012

Low energy error correction of NAND Flash memory through soft-decision decoding.

EURASIP J. Adv. Sig. Proc., 2012

Optimal Output Quantization of Binary Input AWGN Channel for Belief-Propagation Decoding of LDPC Codes.

Proceedings of the 2012 IEEE Workshop on Signal Processing Systems, 2012

A simulation-based study for DRAM power reduction strategies in GPGPUs.

Proceedings of the 2012 IEEE International Symposium on Circuits and Systems, 2012

Performance analysis of multi-bank DRAM with increased clock frequency.

Proceedings of the 2012 IEEE International Symposium on Circuits and Systems, 2012

Performance of rate 0.96 (68254, 65536) EG-LDPC code for NAND Flash memory error correction.

Proceedings of IEEE International Conference on Communications, 2012

Least squares based cell-to-cell interference cancelation technique for multi-level cell nand flash memory.

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Multi-user real-time speech recognition with a GPU.

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Reducing off-chip memory traffic by selective cache management scheme in GPGPUs.

Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units, 2012

2011

Memory Access Optimized VLSI for 5000-Word Continuous Speech Recognition.

Signal Processing Systems, 2011

Memory Access Optimized Implementation of Cyclic and Quasi-Cyclic LDPC Codes on a GPGPU.

Signal Processing Systems, 2011

Trends in Design and Implementation of Signal Processing Systems [In the Spotlight].

IEEE Signal Process. Mag., 2011

Reduced complexity Chase-Pyndiah decoding algorithm for turbo product codes.

Proceedings of the IEEE Workshop on Signal Processing Systems, 2011

Memory access pattern-aware DRAM performance model for multi-core systems.

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2011

Parallel computation of adaptive lattice filters.

Proceedings of the IEEE International Conference on Acoustics, 2011

H- and C-level WFST-based large vocabulary continuous speech recognition on Graphics Processing Units.

Proceedings of the IEEE International Conference on Acoustics, 2011

2010

VLSI Implementation of BCH Error Correction for Multilevel Cell NAND Flash Memory.

IEEE Trans. VLSI Syst., 2010

COPR: a cost-oriented recycling policy for flash translation layer.

IEEE Trans. Consumer Electronics, 2010

A Real-Time FPGA-Based 20 000-Word Speech Recognizer With Optimized DRAM Access.

IEEE Trans. on Circuits and Systems, 2010

VLSI Implementation of a High-Throughput Soft-Bit-Flipping Decoder for Geometric LDPC Codes.

IEEE Trans. on Circuits and Systems, 2010

Adaptive Threshold Technique for Bit-Flipping Decoding of Low-Density Parity-Check Codes.

IEEE Communications Letters, 2010

Parallel implementation of an error diffusion halftoning algorithm with a general purpose graphics processing unit.

Proceedings of the International Conference on Image Processing, 2010

Multi-core and SIMD architecture based implementation of recursive digital filtering algorithms.

Proceedings of the IEEE International Conference on Acoustics, 2010

An FPGA implementation of speech recognition with weighted finite state transducers.

Proceedings of the IEEE International Conference on Acoustics, 2010

Optimization of Number Representations.

Proceedings of the Handbook of Signal Processing Systems, 2010

2009

Compiler-Based Performance Evaluation of an SIMD Processor with a Multi-Bank Memory Unit.

Signal Processing Systems, 2009

Access-Pattern-Aware On-Chip Memory Allocation for SIMD Processors.

IEEE Trans. on CAD of Integrated Circuits and Systems, 2009

Efficient Software-Based Encoding and Decoding of BCH Codes.

IEEE Trans. Computers, 2009

Massively parallel implementation of cyclic LDPC codes on a general purpose graphics processing unit.

Proceedings of the IEEE Workshop on Signal Processing Systems, 2009

Low-power implementation of a high-throughput LDPC decoder for IEEE 802.11N standard.

Proceedings of the IEEE Workshop on Signal Processing Systems, 2009

SIMD processor based implementation of recursive filtering equations.

Proceedings of the IEEE Workshop on Signal Processing Systems, 2009

VLSI Implementation of a Soft Bit-flipping Decoder for PG-LDPC Codes.

Proceedings of the International Symposium on Circuits and Systems (ISCAS 2009), 2009

Scalable HMM based inference engine in large vocabulary continuous speech recognition.

Proceedings of the 2009 IEEE International Conference on Multimedia and Expo, 2009

OpenMP-based parallel implementation of a continuous speech recognizer on a multi-core system.

Proceedings of the IEEE International Conference on Acoustics, 2009

VLSI for 5000-word continuous speech recognition.

Proceedings of the IEEE International Conference on Acoustics, 2009

2008

Algorithm and Software Optimization of Variable Block Size Motion Estimation for H.264/AVC on a VLIW-SIMD DSP.

Signal Processing Systems, 2008

Strength-Reduced Parallel Chien Search Architecture for Strong BCH Codes.

IEEE Trans. on Circuits and Systems, 2008

Software implementation of Chien search process for strong BCH codes.

Proceedings of the International Symposium on Circuits and Systems (ISCAS 2008), 2008

FPGA-based implementation of a real-time 5000-word continuous speech recognizer.

Proceedings of the 2008 16th European Signal Processing Conference, 2008

Efficient vectorization of SIMD programs with non-aligned and irregular data access hardware.

Proceedings of the 2008 International Conference on Compilers, 2008

2007

Fast Block Mode Decision for H.264/AVC on a Programmable Digital Signal Processor.

Proceedings of the IEEE Workshop on Signal Processing Systems, 2007

Performance Optimization of a Multimedia Player on a Mobile CPU Platform.

Proceedings of the IEEE Workshop on Signal Processing Systems, 2007

Memory Access Reduced Software Implementation of H.264/AVC Sub-pixel Motion Estimation Using Differential Data Encoding.

Proceedings of the International Symposium on Circuits and Systems (ISCAS 2007), 2007

Mobile CPU Based Optimization of Fast Likelihood Computation for Continuous Speech Recognition.

Proceedings of the IEEE International Conference on Acoustics, 2007

2006

Efficient Media Synchronization Method for Video Telephony System.

IEICE Transactions, 2006

A Robust Formant Extraction Algorithm Combining Spectral Peak Picking and Root Polishing.

EURASIP J. Adv. Sig. Proc., 2006

Low-Power High-Throughput BCH Error Correction VLSI Design for Multi-Level Cell NAND Flash Memories.

Proceedings of the IEEE Workshop on Signal Processing Systems, 2006

Performance Evaluation of an SIMD Architecture with a Multi-bank Vector Memory Unit.

Proceedings of the IEEE Workshop on Signal Processing Systems, 2006

An FPGA based SIMD processor with a vector memory unit.

Proceedings of the International Symposium on Circuits and Systems (ISCAS 2006), 2006

Design and Implementation of Speech Recognition on a Softcore Based Fpga.

Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

2005

Guest Editorial.

VLSI Signal Processing, 2005

VLSI Implementation of An Adaptive Equalizer for ATSC Digital TV Receivers.

VLSI Signal Processing, 2005

Compressed Swapping for NAND Flash Memory Based Embedded Systems.

Proceedings of the Embedded Computer Systems: Architectures, 2005

Memory access overhead reduction for a digital color copier implementation using a VLIW digital signal processor.

Proceedings of the International Symposium on Circuits and Systems (ISCAS 2005), 2005

2004

Implementation of an intonational quality assessment system for a handheld device.

Proceedings of the INTERSPEECH 2004, 2004

Implementation of a digital color copier using a VLIW SIMD architecture.

Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

2003

Optimization of power consumption for an ARM7-based multimedia handheld device.

Proceedings of the 2003 International Symposium on Circuits and Systems, 2003

Implementation of a digital copier using TMS320C6414 VLIW DSP processor.

Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

2002

Implementation of an intonational quality assessment system.

Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Software optimization of MPEG audio layer-III for a 32 bit RISC processor.

Proceedings of the IEEE Asia Pacific Conference on Circuits and Systems 2002, 2002

2001

A Compiler-Friendly RISC-Based Digital Signal Processor Synthesis and Performance Evaluation.

VLSI Signal Processing, 2001

Multimedia processor-based implementation of an error-diffusion halftoning algorithm exploiting subword parallelism.

IEEE Trans. Circuits Syst. Video Techn., 2001

Combined word-length optimization and high-level synthesis ofdigital signal processing systems.

IEEE Trans. on CAD of Integrated Circuits and Systems, 2001

A codebook shaping method for perceptual quality improvement of CELP coders.

Proceedings of the 2001 International Symposium on Circuits and Systems, 2001

Feedback-directed memory disambiguation for embedded multimedia VLIW computing.

Proceedings of the 2001 International Symposium on Circuits and Systems, 2001

A block priority based instruction caching scheme for multimedia processors.

Proceedings of the IEEE International Conference on Acoustics, 2001

2000

Memory efficient software synthesis with mixed coding style from dataflow graphs.

IEEE Trans. VLSI Syst., 2000

Variable dimensional algebraic CELP coding of prototype waveforms.

Proceedings of the IEEE International Conference on Acoustics, 2000

1999

A statistical model-based voice activity detection.

IEEE Signal Process. Lett., 1999

An enhanced two-level adaptive multiple branch prediction for superscalar processors.

Journal of Systems Architecture, 1999

A low resolution pulse position coding method for improved excitation modeling of speech transition.

Proceedings of the 1999 IEEE International Conference on Acoustics, 1999

A floating-point to integer C converter with shift reduction for fixed-point digital signal processors.

Proceedings of the 1999 IEEE International Conference on Acoustics, 1999

1998

Fixed-point error analysis and word length optimization of 8×8 IDCT architectures.

IEEE Trans. Circuits Syst. Video Techn., 1998

Memory Efficient Software Synthesis from Dataflow Graph.

Proceedings of the 11th International Symposium on System Synthesis, 1998

A voice activity detector employing soft decision based noise spectrum adaptation.

Proceedings of the 1998 IEEE International Conference on Acoustics, 1998

Optimized Timed Hardware Software Cosimulation without Roll-back.

Proceedings of the 1998 Design, 1998

A Hardware Software Cosimulation Backplane with Automatic Interface Generation.

Proceedings of the ASP-DAC '98, 1998

An Efficient Compiled Simulation System for VLIW Code Verification.

Proceedings of the Proceedings 31st Annual Simulation Symposium (SS '98), 1998

1997

Adaptive Threshold Error Diffusion Technique for Color Inkjet Printing .

Proceedings of the Proceedings 1997 International Conference on Image Processing, 1997

A fast direction sequence generation method for CORDIC processors.

Proceedings of the 1997 IEEE International Conference on Acoustics, 1997

Fixed-point C compiler for TMS320C50 digital signal processor.

Proceedings of the 1997 IEEE International Conference on Acoustics, 1997

An Enhanced Two-Level Adaptive Multiple Branch Prediction for Superscalar Processors.

Proceedings of the Euro-Par '97 Parallel Processing, 1997

1995

Simulation-based word-length optimization method for fixed-point digital signal processing systems.

IEEE Trans. Signal Processing, 1995

An integrated hardware-software cosimulation environment for heterogeneous systems prototyping.

Proceedings of the 1995 Conference on Asia Pacific Design Automation, Makuhari, Massa, Chiba, Japan, August 29, 1995

1994

Word-length determination and scaling software for a signal flow block diagram.

Proceedings of ICASSP '94: IEEE International Conference on Acoustics, 1994

1992

Multiprocessor Implementation of Digital Filtering Algorithms Using a Parallel Block Processing Method.

IEEE Trans. Parallel Distrib. Syst., 1992

1980

A 4800 bps LPC vocoder with improved excitation.

Proceedings of the IEEE International Conference on Acoustics, 1980