Hao Wang

Orcid: 0000-0003-3557-6301

Affiliations:
  • Ohio State University, Columbus, OH, USA
  • Virginia Tech, Blacksburg, VA, USA


According to our database1, Hao Wang authored at least 58 papers between 2003 and 2022.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2022
An RDMA-enabled In-memory Computing Platform for R-tree on Clusters.
ACM Trans. Spatial Algorithms Syst., June, 2022

Meta-Regulation: Adaptive Adjustment to Block Size and Creation Interval for Blockchain Systems.
IEEE J. Sel. Areas Commun., 2022

2021
Segmented Merge: A New Primitive for Parallel Sparse Matrix Computations.
Int. J. Parallel Program., 2021

NestGPU: Nested Query Processing on GPU.
Proceedings of the 37th IEEE International Conference on Data Engineering, 2021

2020
Automating Incremental and Asynchronous Evaluation for Recursive Aggregate Data Processing.
Proceedings of the 2020 International Conference on Management of Data, 2020

Segmented Merge: A New Primitive for Parallel Sparse Matrix Computations.
Proceedings of the Network and Parallel Computing, 2020

A Feasibility Study for MPI over HDFS.
Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020

GPU-Accelerated Computation of Vietoris-Rips Persistence Barcodes.
Proceedings of the 36th International Symposium on Computational Geometry, 2020

2019
GPU-Based Iterative Medical CT Image Reconstructions.
J. Signal Process. Syst., 2019

SEP-graph: finding shortest execution paths for graph processing under a hybrid framework on GPU.
Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2019

HYPHA: a framework based on separation of parallelisms to accelerate persistent homology matrix reduction.
Proceedings of the ACM International Conference on Supercomputing, 2019

Catfish: Adaptive RDMA-enabled R-Tree for Low Latency and High Throughput.
Proceedings of the 39th IEEE International Conference on Distributed Computing Systems, 2019

2018
A Framework for the Automatic Vectorization of Parallel Sort on x86-Based Processors.
IEEE Trans. Parallel Distributed Syst., 2018

Highly Efficient Compensation-Based Parallelism for Wavefront Loops on GPUs.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

Software-Defined Software: A Perspective of Machine Learning-Based Software Production.
Proceedings of the 38th IEEE International Conference on Distributed Computing Systems, 2018

Taming irregular applications via advanced dynamic parallelism on GPUs.
Proceedings of the 15th ACM International Conference on Computing Frontiers, 2018

2017
cuBLASTP: Fine-Grained Parallelization of Protein Sequence Search on CPU+GPU.
IEEE ACM Trans. Comput. Biol. Bioinform., 2017

Exploring and analyzing the real impact of modern on-package memory on HPC scientific kernels.
Proceedings of the International Conference for High Performance Computing, 2017

Eliminating Irregularities of Protein Sequence Search on Multicore Architectures.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

PaPar: A Parallel Data Partitioning Framework for Big Data Applications.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

A framework for fast and fair evaluation of automata processing hardware.
Proceedings of the 2017 IEEE International Symposium on Workload Characterization, 2017

Fast segmented sort on GPUs.
Proceedings of the International Conference on Supercomputing, 2017

An Enhanced Image Reconstruction Tool for Computed Tomography on CPUs.
Proceedings of the Computing Frontiers Conference, 2017

GPU-UniCache: Automatic Code Generation of Spatial Blocking for Stencils on CPUs.
Proceedings of the Computing Frontiers Conference, 2017

Robotomata: A framework for approximate pattern matching of big data on an automata processor.
Proceedings of the 2017 IEEE International Conference on Big Data (IEEE BigData 2017), 2017

2016
muBLASTP: database-indexed protein sequence search on multicore CPUs.
BMC Bioinform., 2016

AAlign: A SIMD Framework for Pairwise Sequence Alignment on x86-Based Multi-and Many-Core Processors.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Parallel Transposition of Sparse Data Structures.
Proceedings of the 2016 International Conference on Supercomputing, 2016

cuART: Fine-Grained Algebraic Reconstruction Technique for Computed Tomography Images on GPUs.
Proceedings of the IEEE/ACM 16th International Symposium on Cluster, 2016

2015
A Parallel Algorithm for Game Tree Search Using GPGPU.
IEEE Trans. Parallel Distributed Syst., 2015

ASPaS: A Framework for Automatic SIMDization of Parallel Sorting on x86-based Many-core Processors.
Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

pDindel: Accelerating indel detection on a multicore CPU architecture with SIMD.
Proceedings of the 5th IEEE International Conference on Computational Advances in Bio and Medical Sciences, 2015

2014
GPU-Aware MPI on RDMA-Enabled Clusters: Design, Implementation and Evaluation.
IEEE Trans. Parallel Distributed Syst., 2014

cuBLASTP: Fine-Grained Parallelization of Protein Sequence Search on a GPU.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Petascale Application of a Coupled CPU-GPU Algorithm for Simulation and Analysis of Multiphase Flow Solutions in Porous Medium Systems.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Delivering Parallel Programmability to the Masses via the Intel MIC Ecosystem: A Case Study.
Proceedings of the 43rd International Conference on Parallel Processing Workshops, 2014

2013
Redesigning MPI shared memory communication for large multi-core architecture.
Comput. Sci. Res. Dev., 2013

High-Performance RDMA-based Design of Hadoop MapReduce over InfiniBand.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

Extending OpenSHMEM for GPU Computing.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

High-Performance Design of Hadoop RPC with RDMA over InfiniBand.
Proceedings of the 42nd International Conference on Parallel Processing, 2013

Consolidating Applications for Energy Efficiency in Heterogeneous Computing Systems.
Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing, 2013

2012
High performance RDMA-based design of HDFS over InfiniBand.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

OMB-GPU: A Micro-Benchmark Suite for Evaluating MPI Libraries on GPU Clusters.
Proceedings of the Recent Advances in the Message Passing Interface, 2012

Understanding the communication characteristics in HBase: What are the fundamental bottlenecks?
Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2012

Optimizing MPI Communication on Multi-GPU Systems Using CUDA Inter-Process Communication.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

High-Performance Design of HBase with RDMA over InfiniBand.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

SSD-Assisted Hybrid Memory to Accelerate Memcached over High Performance Networks.
Proceedings of the 41st International Conference on Parallel Processing, 2012

A Node-based Parallel Game Tree Algorithm Using GPUs.
Proceedings of the 2012 IEEE International Conference on Cluster Computing, 2012

Scalable Memcached Design for InfiniBand Clusters Using Hybrid Transports.
Proceedings of the 12th IEEE/ACM International Symposium on Cluster, 2012

2011
MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters.
Comput. Sci. Res. Dev., 2011

Optimizing MPI One Sided Communication on Multi-core InfiniBand Clusters Using Shared Memory Backed Windows.
Proceedings of the Recent Advances in the Message Passing Interface, 2011

CRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart.
Proceedings of the International Conference on Parallel Processing, 2011

Memcached Design on High Performance RDMA Capable Interconnects.
Proceedings of the International Conference on Parallel Processing, 2011

Optimized Non-contiguous MPI Datatype Communication for GPU Clusters: Design, Implementation and Evaluation with MVAPICH2.
Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER), 2011

MPI Alltoall Personalized Exchange on GPGPU Clusters: Design Alternatives and Benefit.
Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER), 2011

2006
An Approach to Exception Handling for Service-Oriented Systems.
Proceedings of the 2006 IEEE International Conference on Web Services (ICWS 2006), 2006

Application-aware Interface for SOAP Communication in Web Services.
Proceedings of the 2006 IEEE International Conference on Cluster Computing, 2006

2003
Agora: Grid Community in Vega Grid.
Proceedings of the Grid and Cooperative Computing, Second International Workshop, 2003


  Loading...