Maryam Mehri Dehnavi

Orcid: 0000-0002-2719-8788

According to our database1, Maryam Mehri Dehnavi authored at least 40 papers between 2013 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
A Framework for Fine-Grained Synchronization of Dependent GPU Kernels.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2024

2023
Register Tiling for Unstructured Sparsity in Neural Network Inference.
Proc. ACM Program. Lang., 2023

Development of a knowledge-sharing parallel computing approach for calibrating distributed watershed hydrologic models.
Environ. Model. Softw., 2023

Runtime Composition of Iterations for Fusing Loop-carried Sparse Dependence.
Proceedings of the International Conference for High Performance Computing, 2023

MKOR: Momentum-Enabled Kronecker-Factor-Based Optimizer Using Rank-1 Updates.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

2022
Randomized Gossiping With Effective Resistance Weights: Performance Guarantees and Applications.
IEEE Trans. Control. Netw. Syst., 2022

A review of parallel computing applications in calibrating watershed hydrologic models.
Environ. Model. Softw., 2022

HyLo: A Hybrid Low-Rank Natural Gradient Descent Method.
Proceedings of the SC22: International Conference for High Performance Computing, 2022

Vectorizing Sparse Matrix Computations with Partially-Strided Codelets.
Proceedings of the SC22: International Conference for High Performance Computing, 2022

Optimizing sparse computations jointly.
Proceedings of the PPoPP '22: 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Seoul, Republic of Korea, April 2, 2022

HDagg: Hybrid Aggregation of Loop-carried Dependence Iterations in Sparse Matrix Computations.
Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

Combining Run-Time Checks and Compile-Time Analysis to Improve Control Flow Auto-Vectorization.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2022

2021
Differentiating-based Vectorization for Sparse Kernels.
CoRR, 2021

Composing Loop-carried Dependence with Other Loops.
CoRR, 2021

TENGraD: Time-Efficient Natural Gradient Descent with Exact Fisher-Block Inversion.
CoRR, 2021

L-DQN: An Asynchronous Limited-Memory Distributed Quasi-Newton Method.
Proceedings of the 2021 60th IEEE Conference on Decision and Control (CDC), 2021

2020
NASOQ: numerically accurate sparsity-oriented QP solver.
ACM Trans. Graph., 2020

MatRox: modular approach for improving data locality in hierarchical (Mat)rix App(Rox)imation.
Proceedings of the PPoPP '20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2020

ASYNC: A Cloud Engine with Asynchrony and History for Distributed Machine Learning.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

DAve-QN: A Distributed Averaged Quasi-Newton Method with Local Superlinear Convergence Rate.
Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

2019
ASYNC: Asynchronous Machine Learning on Distributed Systems.
CoRR, 2019

Sparse computation data dependence simplification for efficient compiler-generated inspectors.
Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2019

2018
MatRox: A Model-Based Algorithm with an Efficient Storage Format for Parallel HSS-Structured Matrix Approximations.
CoRR, 2018

Sparse Matrix Code Dependence Analysis Simplification at Compile Time.
CoRR, 2018

ParSy: inspection and transformation of sparse matrix computations for parallelism.
Proceedings of the International Conference for High Performance Computing, 2018

Extending Index-Array Properties for Data Dependence Analysis.
Proceedings of the Languages and Compilers for Parallel Computing, 2018

Sparsity-Aware Storage Format Selection.
Proceedings of the 2018 International Conference on High Performance Computing & Simulation, 2018

Reducing Communication in Proximal Newton Methods for Sparse Least Squares Problems.
Proceedings of the 47th International Conference on Parallel Processing, 2018

CSTF: Large-Scale Sparse Tensor Factorizations on Distributed Platforms.
Proceedings of the 47th International Conference on Parallel Processing, 2018

2017
Avoiding Communication in Proximal Methods for Convex Optimization Problems.
CoRR, 2017

Autotuning divide-and-conquer stencil computations.
Concurr. Comput. Pract. Exp., 2017

Power grid safety control via fine-grained multi-persona programmable logic controllers.
Proceedings of the 2017 IEEE International Conference on Smart Grid Communications, 2017

Sympiler: transforming sparse matrix codes by decoupling symbolic analysis.
Proceedings of the International Conference for High Performance Computing, 2017

A Unified Optimization Approach for Sparse Tensor Operations on GPUs.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

2015
Parallel finite element technique using Gaussian belief propagation.
Comput. Phys. Commun., 2015

2014
Evaluating multi-core and many-core architectures through accelerating the three-dimensional Lax-Wendroff correction stencil.
Int. J. High Perform. Comput. Appl., 2014

Survey on Grid Resource Allocation Mechanisms.
J. Grid Comput., 2014

MIC-SVM: Designing a Highly Efficient Support Vector Machine for Advanced Modern Multi-core and Many-Core Architectures.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Designing a Heuristic Cross-Architecture Combination for Breadth-First Search.
Proceedings of the 43rd International Conference on Parallel Processing, 2014

2013
Parallel Sparse Approximate Inverse Preconditioning on Graphic Processing Units.
IEEE Trans. Parallel Distributed Syst., 2013


  Loading...