Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

Debunking the CUDA Myth Towards GPU-based AI Systems: Evaluation of the Performance and Programmability of Intel's Gaudi NPU for AI Model Serving.

[BibT_eX]

[DOI]

Yunjae Lee

Proceedings of the 52nd Annual International Symposium on Computer Architecture, 2025

Unifying Uniform and Binary-coding Quantization for Accurate Compression of Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

To FP8 and Back Again: Quantifying the Effects of Reducing Precision on LLM Training Stability.

[BibT_eX]

[DOI]

CoRR, 2024

No Token Left Behind: Reliable KV Cache Compression via Importance-Aware Mixed Precision Quantization.

[BibT_eX]

[DOI]

CoRR, 2024

DropBP: Accelerating Fine-Tuning of Large Language Models by Dropping Backward Propagation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

LUT-GEMM: Quantized Matrix Multiplication based on LUTs for Efficient Inference in Large-Scale Generative Language Models.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Label-Noise Robust Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023

Memory-Efficient Fine-Tuning of Compressed Large Language Models via sub-4-bit Integer Quantization.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Sparsity-Aware Memory Interface Architecture using Stacked XORNet Compression for Accelerating Pruned-DNN Models.

[BibT_eX]

[DOI]

Proceedings of the Sixth Conference on Machine Learning and Systems, 2023

FlexRound: Learnable Rounding based on Element-wise Division for Post-Training Quantization.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Refining Generative Process with Discriminator Guidance in Score-based Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Winning Both the Accuracy of Floating Point Activation and the Simplicity of Integer Arithmetic.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

TF-MVP: Novel Sparsity-Aware Transformer Accelerator with Mixed-Length Vector Pruning.

[BibT_eX]

[DOI]

Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

2022

nuQmm: Quantized MatMul for Efficient Inference of Large-Scale Generative Language Models.

[BibT_eX]

[DOI]

CoRR, 2022

Maximum Likelihood Training of Implicit Nonlinear Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2022

Maximum Likelihood Training of Implicit Nonlinear Diffusion Model.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Encoding Weights of Irregular Sparsity for Fixed-to-Fixed Model Compression.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

AlphaTuning: Quantization-Aware Parameter-Efficient Adaptation of Large-Scale Pre-Trained Language Models.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

2021

Modulating Regularization Frequency for Efficient Compression-Aware Model Training.

[BibT_eX]

[DOI]

CoRR, 2021

Sequential Encryption of Sparse Neural Networks Toward Optimum Representation of Irregular Sparsity.

[BibT_eX]

[DOI]

CoRR, 2021

Q-Rater: Non-Convex Optimization for Post-Training Uniform Quantization.

[BibT_eX]

[DOI]

CoRR, 2021

2020

Adaptive Discrete Event Simulation Systems to Embrace Changes of Requirements Using Event Control Models.

[BibT_eX]

[DOI]

IEEE Trans. Syst. Man Cybern. Syst., 2020

BiQGEMM: matrix multiplication with lookup table for binary-coding-based quantized DNNs.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2020

FleXOR: Trainable Fractional Quantization.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Extremely Low Bit Transformer Quantization for On-Device Neural Machine Translation.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

Structured Compression by Weight Encryption for Unstructured Pruning and Quantization.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019

Learning Low-Rank Approximation for CNNs.

[BibT_eX]

[DOI]

CoRR, 2019

Structured Compression by Unstructured Pruning for Sparse Quantized Neural Networks.

[BibT_eX]

[DOI]

CoRR, 2019

Network Pruning for Low-Rank Binary Indexing.

[BibT_eX]

[DOI]

CoRR, 2019

2018

Simulation-Based Optimization on the System-of-Systems Model via Model Transformation and Genetic Algorithm: A Case Study of Network-Centric Warfare.

[BibT_eX]

[DOI]

Complex., 2018

2013

Integrated hybrid systems modeling and simulation methodology based on HDEVS formalism.

[BibT_eX]

[DOI]

Proceedings of the 2013 Summer Simulation Multiconference, 2013

2012

Design and implementation of event-based DEVS execution environment for faster execution of iterative simulation.

[BibT_eX]

[DOI]

Se Jung Kwon

Tag Gon Kim

Proceedings of the 2012 Spring Simulation Multiconference, 2012

Se Jung Kwon

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...