Maryam Mehri Dehnavi

Environ. Model. Softw., 2024

SLiM: One-shot Quantized Sparse Plus Low-rank Approximation of LLMs.

[BibT_eX]

[DOI]

Mohammad Mozaffari

CoRR, 2024

A Framework for Fine-Grained Synchronization of Dependent GPU Kernels.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2024

2023

[BibT_eX]

[DOI]

Lucas Wilkinson

Rodrigo de Queiroga Miranda

Proc. ACM Program. Lang., 2023

Development of a knowledge-sharing parallel computing approach for calibrating distributed watershed hydrologic models.

[BibT_eX]

[DOI]

Environ. Model. Softw., 2023

Runtime Composition of Iterations for Fusing Loop-carried Sparse Dependence.

[BibT_eX]

[DOI]

Michelle Strout

Proceedings of the International Conference for High Performance Computing, 2023

MKOR: Momentum-Enabled Kronecker-Factor-Based Optimizer Using Rank-1 Updates.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

2022

Randomized Gossiping With Effective Resistance Weights: Performance Guarantees and Applications.

[BibT_eX]

[DOI]

IEEE Trans. Control. Netw. Syst., 2022

A review of parallel computing applications in calibrating watershed hydrologic models.

[BibT_eX]

[DOI]

Environ. Model. Softw., 2022

HyLo: A Hybrid Low-Rank Natural Gradient Descent Method.

[BibT_eX]

[DOI]

Proceedings of the SC22: International Conference for High Performance Computing, 2022

Vectorizing Sparse Matrix Computations with Partially-Strided Codelets.

[BibT_eX]

[DOI]

Zachary Cetinic

Proceedings of the SC22: International Conference for High Performance Computing, 2022

Optimizing sparse computations jointly.

[BibT_eX]

[DOI]

Proceedings of the PPoPP '22: 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Seoul, Republic of Korea, April 2, 2022

HDagg: Hybrid Aggregation of Loop-carried Dependence Iterations in Sparse Matrix Computations.

[BibT_eX]

[DOI]

Behrooz Zarebavani

Bangtian Liu

Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

Combining Run-Time Checks and Compile-Time Analysis to Improve Control Flow Auto-Vectorization.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2022

2021

Differentiating-based Vectorization for Sparse Kernels.

[BibT_eX]

[DOI]

Zachary Cetinic

CoRR, 2021

Composing Loop-carried Dependence with Other Loops.

[BibT_eX]

[DOI]

CoRR, 2021

TENGraD: Time-Efficient Natural Gradient Descent with Exact Fisher-Block Inversion.

[BibT_eX]

[DOI]

CoRR, 2021

L-DQN: An Asynchronous Limited-Memory Distributed Quasi-Newton Method.

[BibT_eX]

[DOI]

Proceedings of the 2021 60th IEEE Conference on Decision and Control (CDC), 2021

2020

NASOQ: numerically accurate sparsity-oriented QP solver.

[BibT_eX]

[DOI]

ACM Trans. Graph., 2020

MatRox: modular approach for improving data locality in hierarchical (Mat)rix App(Rox)imation.

[BibT_eX]

[DOI]

Bangtian Liu

Saeed Soori

Proceedings of the PPoPP '20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2020

ASYNC: A Cloud Engine with Asynchrony and History for Distributed Machine Learning.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

DAve-QN: A Distributed Averaged Quasi-Newton Method with Local Superlinear Convergence Rate.

[BibT_eX]

[DOI]

Saeed Soori

Konstantin Mishchenko

Aryan Mokhtari

Mert Gürbüzbalaban

Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

2019

ASYNC: Asynchronous Machine Learning on Distributed Systems.

[BibT_eX]

[DOI]

CoRR, 2019

Sparse computation data dependence simplification for efficient compiler-generated inspectors.

[BibT_eX]

[DOI]

Mahdi Soltan Mohammadi

Catherine Olschanowsky

Anand Venkat

Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2019

2018

MatRox: A Model-Based Algorithm with an Efficient Storage Format for Parallel HSS-Structured Matrix Approximations.

[BibT_eX]

[DOI]

CoRR, 2018

Sparse Matrix Code Dependence Analysis Simplification at Compile Time.

[BibT_eX]

[DOI]

Mahdi Soltan Mohammadi

Ganesh Gopalakrishnan

CoRR, 2018

ParSy: inspection and transformation of sparse matrix computations for parallelism.

[BibT_eX]

[DOI]

Shoaib Kamil

Proceedings of the International Conference for High Performance Computing, 2018

Extending Index-Array Properties for Data Dependence Analysis.

[BibT_eX]

[DOI]

Mahdi Soltan Mohammadi

Proceedings of the Languages and Compilers for Parallel Computing, 2018

Sparsity-Aware Storage Format Selection.

[BibT_eX]

[DOI]

Leila Cheshmi

Proceedings of the 2018 International Conference on High Performance Computing & Simulation, 2018

Reducing Communication in Proximal Newton Methods for Sparse Least Squares Problems.

[BibT_eX]

[DOI]

Proceedings of the 47th International Conference on Parallel Processing, 2018

CSTF: Large-Scale Sparse Tensor Factorizations on Distributed Platforms.

[BibT_eX]

[DOI]

Zachary Blanco

Bangtian Liu

Ekanathan Palamadai Natarajan

Proceedings of the 47th International Conference on Parallel Processing, 2018

2017

Avoiding Communication in Proximal Methods for Convex Optimization Problems.

[BibT_eX]

[DOI]

CoRR, 2017

Autotuning divide-and-conquer stencil computations.

[BibT_eX]

[DOI]

Charles E. Leiserson

Concurr. Comput. Pract. Exp., 2017

Power grid safety control via fine-grained multi-persona programmable logic controllers.

[BibT_eX]

[DOI]

Gabriel Salles-Loustau

Proceedings of the 2017 IEEE International Conference on Smart Grid Communications, 2017

Sympiler: transforming sparse matrix codes by decoupling symbolic analysis.

[BibT_eX]

[DOI]

Shoaib Kamil

Proceedings of the International Conference for High Performance Computing, 2017

A Unified Optimization Approach for Sparse Tensor Operations on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

2015

Parallel finite element technique using Gaussian belief propagation.

[BibT_eX]

[DOI]

Yousef El-Kurdi

Warren J. Gross

Dennis Giannacopoulos

Comput. Phys. Commun., 2015

2014

Evaluating multi-core and many-core architectures through accelerating the three-dimensional Lax-Wendroff correction stencil.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2014

Survey on Grid Resource Allocation Mechanisms.

[BibT_eX]

[DOI]

Muhammad Bilal Qureshi

Nasro Min-Allah

Muhammad Shuaib Qureshi

J. Grid Comput., 2014

MIC-SVM: Designing a Highly Efficient Support Vector Machine for Advanced Modern Multi-core and Many-Core Architectures.

[BibT_eX]

[DOI]

Amanda Peters Randles

Guangwen Yang

Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Designing a Heuristic Cross-Architecture Combination for Breadth-First Search.

[BibT_eX]

[DOI]

Yang You

David A. Bader

Proceedings of the 43rd International Conference on Parallel Processing, 2014

2013

Parallel Sparse Approximate Inverse Preconditioning on Graphic Processing Units.

[BibT_eX]

[DOI]