# Guangming Tan

Guangming Tan authored at least 71 papers between 2005 and 2018.

## Timeline

## Bibliography

2018

Quadboost: A Scalable Concurrent Quadtree.

IEEE Trans. Parallel Distrib. Syst., 2018

Design and Implementation of Adaptive SpMV Library for Multicore and Many-Core Architecture.

ACM Trans. Math. Softw., 2018

Automated and precise event detection method for big data in biomedical imaging with support vector machine.

Comput. Syst. Sci. Eng., 2018

Register-based implementation of the sparse general matrix-matrix multiplication on GPUs.

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

High-performance genomic analysis framework with in-memory computing.

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

Communication-Avoiding for Dynamical Core of Atmospheric General Circulation Model.

Proceedings of the 47th International Conference on Parallel Processing, 2018

Accelerating FM-index Search for Genomic Data Processing.

Proceedings of the 47th International Conference on Parallel Processing, 2018

2017

Understanding the GPU Microarchitecture to Achieve Bare-Metal Performance Tuning.

Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017

A performance analysis framework for exploiting GPU microarchitectural capability.

Proceedings of the International Conference on Supercomputing, 2017

RING: NUMA-Aware Message-Batching Runtime for Data-Intensive Applications.

Proceedings of the 23rd IEEE International Conference on Parallel and Distributed Systems, 2017

Quantifying and Mitigating Computational Inefficiency of Genomics Data Analysis.

Proceedings of the 19th IEEE International Conference on High Performance Computing and Communications; 15th IEEE International Conference on Smart City; 3rd IEEE International Conference on Data Science and Systems, 2017

2016

Graphine: Programming Graph-Parallel Computation of Large Natural Graphs for Multicore Clusters.

IEEE Trans. Parallel Distrib. Syst., 2016

Accelerating Irregular Computation in Massive Short Reads Mapping on FPGA Co-Processor.

IEEE Trans. Parallel Distrib. Syst., 2016

Locality of Computation for Stencil Optimization.

Proceedings of the Algorithms and Architectures for Parallel Processing, 2016

Accelerating large-scale genomic analysis with Spark.

Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine, 2016

2015

SuperDragon: A Heterogeneous Parallel System for Accelerating 3D Reconstruction of Cryo-Electron Microscopy Images.

TRETS, 2015

Detection of soft errors in LU decomposition with partial pivoting using algorithm-based fault tolerance.

IJHPCA, 2015

FAST: A Fast Stencil Autotuning Framework Based On An Optimal-solution Space Model.

Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

Bit Flipping Errors in High Performance Linpack at Exascale and Beyond.

Proceedings of the 44th International Conference on Parallel Processing, 2015

Study on Partitioning Real-World Directed Graphs of Skewed Degree Distribution.

Proceedings of the 44th International Conference on Parallel Processing, 2015

Implementation of Short Read Alignment Algorithm in OpenCL on Xeon Phi Coprocessor.

Proceedings of the 17th IEEE International Conference on High Performance Computing and Communications, 2015

Application Taxonomy via Algorithmic Commonality for Domain-Specific Architecture Desgin.

Proceedings of the 22nd IEEE International Conference on High Performance Computing, 2015

A Reliable Distributed Convolutional Neural Network for Biology Image Segmentation.

Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015

2014

Exploiting fine-grained parallelism in graph traversal algorithms via lock virtualization on multi-core architecture.

The Journal of Supercomputing, 2014

Accelerating massive short reads mapping for next generation sequencing (abstract only).

Proceedings of the 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2014

Reducing Communication in Parallel Breadth-First Search on Distributed Memory Systems.

Proceedings of the 17th IEEE International Conference on Computational Science and Engineering, 2014

Optimizing stencil code via locality of computation.

Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013

Scalability study of molecular dynamics simulation on Godson-T many-core architecture.

J. Parallel Distrib. Comput., 2013

Optimizing Parallel S n Sweeps on Unstructured Grids for Multi-Core Clusters.

J. Comput. Sci. Technol., 2013

Understanding parallelism in graph traversal on multi-core clusters.

Computer Science - R&D, 2013

A Study of Leveraging Memory Level Parallelism for DRAM System on Multi-core/Many-Core Architecture.

Proceedings of the 12th IEEE International Conference on Trust, 2013

SMAT: an input adaptive auto-tuner for sparse matrix-vector multiplication.

Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2013

ParaInsight: An Assistant for Quantitatively Analyzing Multi-granularity Parallel Region.

Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing, 2013

Vlock: Lock virtualization mechanism for exploiting fine-grained parallelism in graph traversal algorithms.

Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, 2013

2012

A lightweight hybrid hardware/software approach for object-relative memory profiling.

Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2012

A Case Study of Designing Efficient Algorithm-based Fault Tolerant Application for Exascale Parallelism.

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Investigating Memory Optimization of Hash-index for Next Generation Sequencing on Multi-core Architecture.

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

PDSEC Introduction.

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

An optimized large-scale hybrid DGEMM design for CPUs and ATI GPUs.

Proceedings of the International Conference on Supercomputing, 2012

A coarse-grained stream architecture for cryo-electron microscopy images 3D reconstruction.

Proceedings of the ACM/SIGDA 20th International Symposium on Field Programmable Gate Arrays, 2012

Accelerating Millions of Short Reads Mapping on a Heterogeneous Architecture with FPGA Accelerator.

Proceedings of the 2012 IEEE 20th Annual International Symposium on Field-Programmable Custom Computing Machines, 2012

2011

Analysis and performance results of computing betweenness centrality on IBM Cyclops64.

The Journal of Supercomputing, 2011

Dawning Nebulae: A PetaFLOPS Supercomputer with a Heterogeneous Structure.

J. Comput. Sci. Technol., 2011

Numerical assessment of flood hazard risk to people and vehicles in flash floods.

Environmental Modelling and Software, 2011

Fast implementation of DGEMM on Fermi GPU.

Proceedings of the Conference on High Performance Computing Networking, 2011

Poster: revisiting virtual channel memory for performance and fairness on multi-core architecture.

Proceedings of the 25th International Conference on Supercomputing, 2011, Tucson, AZ, USA, May 31, 2011

Experience of parallelizing cryo-EM 3D reconstruction on a CPU-GPU heterogeneous system.

Proceedings of the 20th ACM International Symposium on High Performance Distributed Computing, 2011

Building algorithmically nonstop fault tolerant MPI programs.

Proceedings of the 18th International Conference on High Performance Computing, 2011

Performance analysis and optimization of molecular dynamics simulation on Godson-T many-core processor.

*Godson-T*many-core processor.
Proceedings of the 8th Conference on Computing Frontiers, 2011

2010

Automatically Tuned Dynamic Programming with an Algorithm-by-Blocks.

Proceedings of the 16th IEEE International Conference on Parallel and Distributed Systems, 2010

Preliminary Investigation of Accelerating Molecular Dynamics Simulation on Godson-T Many-Core Processor.

Proceedings of the Euro-Par 2010 Parallel Processing Workshops, 2010

2009

Improving Performance of Dynamic Programming via Parallelism and Locality on Multicore Architectures.

IEEE Trans. Parallel Distrib. Syst., 2009

Extending Amdahl's law in the multicore era.

SIGMETRICS Performance Evaluation Review, 2009

Characterizing Betweenness Centrality Algorithm on Multi-core Architectures.

Proceedings of the IEEE International Symposium on Parallel and Distributed Processing with Applications, 2009

Single-particle 3d reconstruction from cryo-electron microscopy images on GPU.

Proceedings of the 23rd international conference on Supercomputing, 2009

A Parallel Algorithm for Computing Betweenness Centrality.

Proceedings of the ICPP 2009, 2009

High Performance Matrix Multiplication on Many Cores.

Proceedings of the Euro-Par 2009 Parallel Processing, 2009

2008

Experience on optimizing irregular computation for memory hierarchy in manycore architecture.

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008

Just-In-Time Locality and Percolation for Optimizing Irregular Applications on a Manycore Architecture.

Proceedings of the Languages and Compilers for Parallel Computing, 2008

2007

Cache oblivious algorithms for nonserial polyadic programming.

The Journal of Supercomputing, 2007

A parallel dynamic programming algorithm on a multi-core architecture.

Proceedings of the SPAA 2007: Proceedings of the 19th Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2007

Implementation of the Smith-Waterman algorithm on a reconfigurable supercomputing platform.

Proceedings of the 1st international workshop on High-performance reconfigurable computing technology and applications, 2007

2006

Improvement of Performance of MegaBlast Algorithm for DNA Sequence Alignment.

J. Comput. Sci. Technol., 2006

Biology - Locality and parallelism optimization for dynamic programming algorithm in bioinformatics.

Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

An experimental study of optimizing bioinformatics applications.

Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Improving locality of nonserial polyadic dynamic programming.

Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Load Balancing and Parallel Multiple Sequence Alignment with Tree Accumulation.

Proceedings of the Euro-Par 2006, Parallel Processing, 12th International Euro-Par Conference, Dresden, Germany, August 28, 2006

2005

Load Balancing Algorithm in Cluster-based RNA secondary structure Prediction.

Proceedings of the 4th International Symposium on Parallel and Distributed Computing (ISPDC 2005), 2005

An Optimized Algorithm of High Spatial-temporal Efficiency for MegaBlast.

Proceedings of the 11th International Conference on Parallel and Distributed Systems, 2005

An Efficient Dynamic Programming Algorithm and Implementation for RNA Secondary Structure Prediction.

Proceedings of the Computational Science, 2005

Exploiting Parallelization for RNA Secondary Structure Prediction in Cluster.

Proceedings of the Computational Science, 2005