Hongzhang Shan

According to our database1, Hongzhang Shan authored at least 40 papers between 1997 and 2019.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.



In proceedings 
PhD thesis 


On csauthors.net:


Accelerating the Performance of Modal Aerosol Module of E3SM Using OpenACC.
Proceedings of the Accelerator Programming Using Directives - 6th International Workshop, 2019

A Novel Multi-level Integrated Roofline Model Approach for Performance Characterization.
Proceedings of the High Performance Computing - 33rd International Conference, 2018

Improving MPI Reduction Performance for Manycore Architectures with OpenMP and Data Compression.
Proceedings of the 2018 IEEE/ACM Performance Modeling, 2018

Performance analysis and optimization of the RAMPAGE metal alloy potential generation software.
Proceedings of the 4th ACM SIGPLAN International Workshop on Software Engineering for Parallel Systems, 2017

A Locality-Based Threading Algorithm for the Configuration-Interaction Method.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

Experiences of Applying One-Sided Communication to Nearest-Neighbor Communication.
Proceedings of the 2016 PGAS Applications Workshop, 2016

MPI usage at NERSC: Present and Future.
Proceedings of the 23rd European MPI Users' Group Meeting, EuroMPI 2016, 2016

Parallel implementation and performance optimization of the configuration-interaction method.
Proceedings of the International Conference for High Performance Computing, 2015

Thread-level parallelization and optimization of NWChem for the Intel MIC architecture.
Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores, 2015

Evaluation of PGAS Communication Paradigms with Geometric Multigrid.
Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, 2014

UPC++: A PGAS Extension for C++.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Performance Tuning of Fock Matrix and Two-Electron Integral Calculations for NWChem on Leading HPC Platforms.
Proceedings of the High Performance Computing Systems. Performance Modeling, Benchmarking and Simulation, 2013

A preliminary evaluation of the hardware acceleration of the Cray Gemini interconnect for PGAS languages and comparison with MPI.
SIGMETRICS Perform. Evaluation Rev., 2012

Optimizing the Advanced Accelerator Simulation Framework Synergia Using OpenMP.
Proceedings of the OpenMP in a Heterogeneous World - 8th International Workshop on OpenMP, 2012

A programming model performance study using the NAS parallel benchmarks.
Sci. Program., 2010

Developing a Parameterized Performance Proxy for Sequential Scientific Kernels.
Proceedings of the 12th IEEE International Conference on High Performance Computing and Communications, 2010

HPC global file system performance analysis using a scientific-application derived benchmark.
Parallel Comput., 2009

Performance Analysis of Leading HPC Architectures With Beambeam3D.
Int. J. High Perform. Comput. Appl., 2008

Linearly scaling 3D fragment method for large-scale electronic structure calculations.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008

Characterizing and predicting the I/O performance of HPC applications using a parameterized synthetic benchmark.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008

APEX-Map: a parameterized scalable memory access probe for high-performance computing systems.
Concurr. Comput. Pract. Exp., 2007

Investigation of leading HPC I/O performance using a scientific-application derived benchmark.
Proceedings of the ACM/IEEE Conference on High Performance Networking and Computing, 2007

Scientific Application Performance on Candidate PetaScale Platforms.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Particles and contiuum - Performance modeling and optimization of a high energy colliding beam simulation code.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Performance Analysis of a High Energy Colliding Beam Simulation Code on Four HPC Architectures.
Proceedings of the 2006 International Conference on Parallel Processing (ICPP 2006), 2006

Apex-Map: A Global Data Access Benchmark to Analyze HPC Systems and Parallel Programming Paradigms.
Proceedings of the ACM/IEEE SC2005 Conference on High Performance Networking and Computing, 2005

Apex-Map: A Synthetic Scalable Benchmark Probe to Explore Data Access Performance on Highly Parallel Systems.
Proceedings of the Euro-Par 2005, Parallel Processing, 11th International Euro-Par Conference, Lisbon, Portugal, August 30, 2005

A Performance Evaluation of the Cray X1 for Scientific Applications.
Proceedings of the High Performance Computing for Computational Science, 2004

Architecture Independent Performance Characterization and Benchmarking for Scientific Applications.
Proceedings of the 12th International Workshop on Modeling, 2004

Performance characteristics of the Cray X1 and their implications for application performance tuning.
Proceedings of the 18th Annual International Conference on Supercomputing, 2004

Message passing and shared address space parallelism on an SMP cluster.
Parallel Comput., 2003

Job Superscheduler Architecture and Performance in Computational Grid Environments.
Proceedings of the ACM/IEEE SC2003 Conference on High Performance Networking and Computing, 2003

A Comparison of Three Programming Models for Adaptive Applications on the Origin2000.
J. Parallel Distributed Comput., 2002

A Comparison of MPI, SHMEM and Cache-Coherent Shared Address Space Programming Models on a Tightly-Coupled Multiprocessors.
Int. J. Parallel Program., 2001

Design Strategies for Irregularly Adapting Parallel Applications.
Proceedings of the Tenth SIAM Conference on Parallel Processing for Scientific Computing, 2001

Message Passing Vs. Shared Address Space on a Clusters of SMPs.
Proceedings of the 15th International Parallel & Distributed Processing Symposium (IPDPS-01), 2001

Parallel Sorting on Cache-coherent DSM Multiprocessors.
Proceedings of the ACM/IEEE Conference on Supercomputing, 1999

A comparison of MPI, SHMEM and cache-coherent shared address space programming models on the SGI Origin2000.
Proceedings of the 13th international conference on Supercomputing, 1999

Parallel Tree Building on a Range of Shared Address Space Multiprocessors: Algorithms and Application Performance.
Proceedings of the 12th International Parallel Processing Symposium / 9th Symposium on Parallel and Distributed Processing (IPPS/SPDP '98), March 30, 1998

Application Restructuring and Performance Portability on Shared Virtual Memory and Hardware-Coherent Multiprocessors.
Proceedings of the Sixth ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPOPP), 1997