Jongsoo Park

Orcid: 0000-0002-4750-9440

According to our database1, Jongsoo Park authored at least 60 papers between 2007 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Wukong: Towards a Scaling Law for Large-Scale Recommendation.
CoRR, 2024

Disaggregated Multi-Tower: Topology-aware Modeling Technique for Efficient Large-Scale Recommendation.
CoRR, 2024

2023
75% radiation dose reduction using deep learning reconstruction on low-dose chest CT.
BMC Medical Imaging, December, 2023

MTrainS: Improving DLRM training efficiency using heterogeneous memories.
CoRR, 2023

Shared Microexponents: A Little Shifting Goes a Long Way.
CoRR, 2023

AdaEmbed: Adaptive Embedding for Large-Scale Recommendation Models.
Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, 2023


2022
RecD: Deduplication for End-to-End Deep Learning Recommendation Model Training Infrastructure.
CoRR, 2022

DHEN: A Deep and Hierarchical Ensemble Network for Large-Scale Click-Through Rate Prediction.
CoRR, 2022

Unity: Accelerating DNN Training Through Joint Optimization of Algebraic Transformations and Parallelization.
Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, 2022


Efficient Soft-Error Detection for Low-precision Deep Learning Recommendation Models.
Proceedings of the IEEE International Conference on Big Data, 2022

2021
Low-Precision Hardware Architectures Meet Recommendation Model Inference at Scale.
IEEE Micro, 2021

First-Generation Inference Accelerator Deployment at Facebook.
CoRR, 2021

High-performance, Distributed Training of Large-scale Deep Learning Recommendation Models.
CoRR, 2021

FBGEMM: Enabling High-Performance Low-Precision Deep Learning Inference.
CoRR, 2021

Alternate Model Growth and Pruning for Efficient Training of Recommendation Systems.
Proceedings of the 20th IEEE International Conference on Machine Learning and Applications, 2021

2020
Mixed-Precision Embedding Using a Cache.
CoRR, 2020

Adaptive Dense-to-Sparse Paradigm for Pruning Online Recommendation System with Non-Stationary Data.
CoRR, 2020

2019
Post-Training 4-bit Quantization on Embedding Tables.
CoRR, 2019

Deep Learning Recommendation Model for Personalization and Recommendation Systems.
CoRR, 2019

A Study of BFLOAT16 for Deep Learning Training.
CoRR, 2019

Spatial-Winograd Pruning Enabling Sparse Winograd Convolution.
CoRR, 2019

2018
HPC formulations of optimization algorithms for tensor completion.
Parallel Comput., 2018

Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications.
CoRR, 2018

On Periodic Functions as Regularizers for Quantization of Neural Networks.
CoRR, 2018

Glow: Graph Lowering Compiler Techniques for Neural Networks.
CoRR, 2018

Dynamic fine-grained sparse memory accesses.
Proceedings of the International Symposium on Memory Systems, 2018

2017
Gate scheduling for quantum algorithms.
CoRR, 2017

Enabling Sparse Winograd Convolution by Native Pruning.
CoRR, 2017

Sparse Tensor Factorization on Many-Core Processors with High-Bandwidth Memory.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Faster CNNs with Direct Sparse Convolutions and Guided Pruning.
Proceedings of the 5th International Conference on Learning Representations, 2017

2016
Optimizations in a high-performance conjugate gradient benchmark for IA-based multi- and many-core processors.
Int. J. High Perform. Comput. Appl., 2016

Holistic SparseCNN: Forging the Trident of Accuracy, Speed, and Size.
CoRR, 2016

Automating wavefront parallelization for sparse matrix computations.
Proceedings of the International Conference for High Performance Computing, 2016

An exploration of optimization algorithms for high performance tensor completion.
Proceedings of the International Conference for High Performance Computing, 2016

Sparso: Context-driven Optimizations of Sparse Linear Algebra.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015
Parallel Efficient Sparse Matrix-Matrix Multiplication on Multicore Platforms.
Proceedings of the High Performance Computing - 30th International Conference, 2015

Improving concurrency and asynchrony in multithreaded MPI applications using software offloading.
Proceedings of the International Conference for High Performance Computing, 2015

High-performance algebraic multigrid solver optimized for multi-core based distributed parallel systems.
Proceedings of the International Conference for High Performance Computing, 2015

Exploring Shared-Memory Optimizations for an Unstructured Mesh CFD Application on Modern Parallel Systems.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

2014
Sparsifying Synchronization for High-Performance Shared-Memory Sparse Triangular Solver.
Proceedings of the Supercomputing - 29th International Conference, 2014

Navigating the maze of graph analytics frameworks using massive graph datasets.
Proceedings of the International Conference on Management of Data, 2014

Efficient Shared-Memory Implementation of High-Performance Conjugate Gradient Benchmark and its Application to Unstructured Matrices.
Proceedings of the International Conference for High Performance Computing, 2014

Improving Communication Performance and Scalability of Native Applications on Intel Xeon Phi Coprocessor Clusters.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Versatile and scalable parallel histogram construction.
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013
A framework for low-communication 1-D FFT.
Sci. Program., 2013

Efficient backprojection-based synthetic aperture radar computation with many-core processors.
Sci. Program., 2013

Distributed SociaLite: A Datalog-Based Language for Large-Scale Graph Analysis.
Proc. VLDB Endow., 2013

Location-aware cache management for many-core processors with deep cache hierarchy.
Proceedings of the International Conference for High Performance Computing, 2013

Tera-scale 1D FFT with low-communication algorithm and Intel® Xeon Phi™ coprocessors.
Proceedings of the International Conference for High Performance Computing, 2013

2012
CloudRAMSort: fast and efficient large-scale distributed RAM sort on shared-nothing cluster.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2012

Billion-particle SIMD-friendly two-point correlation on large-scale HPC cluster systems.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

2010
Buffer-space efficient and deadlock-free scheduling of stream applications on multi-core architectures.
Proceedings of the SPAA 2010: Proceedings of the 22nd Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2010

Fine-grain dynamic instruction placement for L0 scratch-pad memory.
Proceedings of the 2010 International Conference on Compilers, 2010

2008
A Practical Improvement to the Partial Redundancy Elimination in SSA Form.
J. Comput. Sci. Eng., 2008

Efficient Embedded Computing.
Computer, 2008

Hierarchical Instruction Register Organization.
IEEE Comput. Archit. Lett., 2008

An Energy-Efficient Processor Architecture for Embedded Systems.
IEEE Comput. Archit. Lett., 2008

2007
Register pointer architecture for efficient embedded processors.
Proceedings of the 2007 Design, Automation and Test in Europe Conference and Exposition, 2007


  Loading...