Jongsoo Park

Maxim Naumov

CoRR, 2019

2018

HPC formulations of optimization algorithms for tensor completion.

[BibT_eX]

[DOI]

Shaden Smith

George Karypis

Parallel Comput., 2018

Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications.

[BibT_eX]

[DOI]

CoRR, 2018

On Periodic Functions as Regularizers for Quantization of Neural Networks.

[BibT_eX]

[DOI]

CoRR, 2018

Glow: Graph Lowering Compiler Techniques for Neural Networks.

[BibT_eX]

[DOI]

CoRR, 2018

Dynamic fine-grained sparse memory accesses.

[BibT_eX]

[DOI]

Berkin Akin

Chiachen Chou

Christopher J. Hughes

Rajat Agarwal

Proceedings of the International Symposium on Memory Systems, 2018

2017

Gate scheduling for quantum algorithms.

[BibT_eX]

[DOI]

Gian Giacomo Guerreschi

CoRR, 2017

Enabling Sparse Winograd Convolution by Native Pruning.

[BibT_eX]

[DOI]

Sheng R. Li

Ping Tak Peter Tang

CoRR, 2017

Sparse Tensor Factorization on Many-Core Processors with High-Bandwidth Memory.

[BibT_eX]

[DOI]

Shaden Smith

George Karypis

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Faster CNNs with Direct Sparse Convolutions and Guided Pruning.

[BibT_eX]

[DOI]

Proceedings of the 5th International Conference on Learning Representations, 2017

2016

Optimizations in a high-performance conjugate gradient benchmark for IA-based multi- and many-core processors.

[BibT_eX]

[DOI]

Mikhail Smelyanskiy

Alexander Heinecke

Dhiraj D. Kalamkar

Int. J. High Perform. Comput. Appl., 2016

Holistic SparseCNN: Forging the Trident of Accuracy, Speed, and Size.

[BibT_eX]

[DOI]

CoRR, 2016

Automating wavefront parallelization for sparse matrix computations.

[BibT_eX]

[DOI]

Anand Venkat

Mahdi Soltan Mohammadi

Hongbo Rong

Rajkishore Barik

Michelle Mills Strout

Mary W. Hall

Proceedings of the International Conference for High Performance Computing, 2016

An exploration of optimization algorithms for high performance tensor completion.

[BibT_eX]

[DOI]

Shaden Smith

George Karypis

Proceedings of the International Conference for High Performance Computing, 2016

Sparso: Context-driven Optimizations of Sparse Linear Algebra.

[BibT_eX]

[DOI]

Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015

Parallel Efficient Sparse Matrix-Matrix Multiplication on Multicore Platforms.

[BibT_eX]

[DOI]

Nadathur Rajagopalan Satish

Narayanan Sundaram

Michael J. Anderson

Satya Gautam Vadlamudi

Proceedings of the High Performance Computing - 30th International Conference, 2015

Improving concurrency and asynchrony in multithreaded MPI applications using software offloading.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2015

High-performance algebraic multigrid solver optimized for multi-core based distributed parallel systems.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2015

Exploring Shared-Memory Optimizations for an Unstructured Mesh CFD Application on Modern Parallel Systems.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

2014

Sparsifying Synchronization for High-Performance Shared-Memory Sparse Triangular Solver.

[BibT_eX]

[DOI]

Proceedings of the Supercomputing - 29th International Conference, 2014

Navigating the maze of graph analytics frameworks using massive graph datasets.

[BibT_eX]

[DOI]

Nadathur Satish

Narayanan Sundaram

Jiwon Seo

Muhammad Amber Hassaan

Shubho Sengupta

Zhaoming Yin

Pradeep Dubey

Proceedings of the International Conference on Management of Data, 2014

Efficient Shared-Memory Implementation of High-Performance Conjugate Gradient Benchmark and its Application to Unstructured Matrices.

[BibT_eX]

[DOI]

Mikhail Smelyanskiy

Alexander Heinecke

Dhiraj D. Kalamkar

Xing Liu

Yutong Lu

Pradeep Dubey

Proceedings of the International Conference for High Performance Computing, 2014

Improving Communication Performance and Scalability of Native Applications on Intel Xeon Phi Coprocessor Clusters.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Versatile and scalable parallel histogram construction.

[BibT_eX]

[DOI]

Wookeun Jung

Jaejin Lee

Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013

Distributed SociaLite: A Datalog-Based Language for Large-Scale Graph Analysis.

[BibT_eX]

[DOI]

Proc. VLDB Endow., 2013

Location-aware cache management for many-core processors with deep cache hierarchy.

[BibT_eX]

[DOI]

Richard M. Yoo

Daya Shanker Khudia

Christopher J. Hughes

Daehyun Kim

Proceedings of the International Conference for High Performance Computing, 2013

Tera-scale 1D FFT with low-communication algorithm and Intel® Xeon Phi™ coprocessors.

[BibT_eX]

[DOI]

Ganesh Bikshandi

Ping Tak Peter Tang

Pradeep Dubey

Daehyun Kim

Proceedings of the International Conference for High Performance Computing, 2013

2012

CloudRAMSort: fast and efficient large-scale distributed RAM sort on shared-nothing cluster.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGMOD International Conference on Management of Data, 2012

A framework for low-communication 1-D FFT.

[BibT_eX]

[DOI]

Proceedings of the SC Conference on High Performance Computing Networking, 2012

Efficient backprojection-based synthetic aperture radar computation with many-core processors.

[BibT_eX]

[DOI]

Proceedings of the SC Conference on High Performance Computing Networking, 2012

Billion-particle SIMD-friendly two-point correlation on large-scale HPC cluster systems.

[BibT_eX]

[DOI]

Proceedings of the SC Conference on High Performance Computing Networking, 2012

2010

Buffer-space efficient and deadlock-free scheduling of stream applications on multi-core architectures.

[BibT_eX]

[DOI]

JongSoo Park

William J. Dally

Proceedings of the SPAA 2010: Proceedings of the 22nd Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2010

Fine-grain dynamic instruction placement for L0 scratch-pad memory.

[BibT_eX]

[DOI]

JongSoo Park

James D. Balfour

William J. Dally

Proceedings of the 2010 International Conference on Compilers, 2010

2008

A Practical Improvement to the Partial Redundancy Elimination in SSA Form.

[BibT_eX]

[DOI]