Hong An

According to our database1, Hong An authored at least 63 papers between 1999 and 2019.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Other 

Links

On csauthors.net:

Bibliography

2019
CARS: A contention-aware scheduler for efficient resource management of HPC storage systems.
Parallel Computing, 2019

Improving the Performance of Distributed MXNet with RDMA.
International Journal of Parallel Programming, 2019

DDP-B: A Distributed Dynamic Parallel Framework for Meta-genomics Binary Similarity.
Proceedings of the Network and Parallel Computing, 2019

Improving the Performance of MongoDB with RDMA.
Proceedings of the 21st IEEE International Conference on High Performance Computing and Communications; 17th IEEE International Conference on Smart City; 5th IEEE International Conference on Data Science and Systems, 2019

swFLOW: A Dataflow Deep Learning Framework on Sunway TaihuLight Supercomputer.
Proceedings of the 21st IEEE International Conference on High Performance Computing and Communications; 17th IEEE International Conference on Smart City; 5th IEEE International Conference on Data Science and Systems, 2019

TripletRun: A Dataflow Runtime Simulator and Its Performance Model.
Proceedings of the 21st IEEE International Conference on High Performance Computing and Communications; 17th IEEE International Conference on Smart City; 5th IEEE International Conference on Data Science and Systems, 2019

Interference-Aware I/O Scheduling for Data-Intensive Applications on Hierarchical HPC Storage Systems.
Proceedings of the 21st IEEE International Conference on High Performance Computing and Communications; 17th IEEE International Conference on Smart City; 5th IEEE International Conference on Data Science and Systems, 2019

Redesign NAMD Molecular Dynamics Non-Bonded Force-Field on Sunway Manycore Processor.
Proceedings of the 21st IEEE International Conference on High Performance Computing and Communications; 17th IEEE International Conference on Smart City; 5th IEEE International Conference on Data Science and Systems, 2019

2018
PEPS++: Towards Extreme-Scale Simulations of Strongly Correlated Quantum Many-Particle Models on Sunway TaihuLight.
IEEE Trans. Parallel Distrib. Syst., 2018

Combining Hadoop with MPI to Solve Metagenomics Problems that are both Data- and Compute-intensive.
International Journal of Parallel Programming, 2018

Improving the Performance of Distributed TensorFlow with RDMA.
International Journal of Parallel Programming, 2018

Contention-Aware Resource Scheduling for Burst Buffer Systems.
Proceedings of the 47th International Conference on Parallel Processing, 2018

2017
A Dataflow-Based Runtime Support on a 100P Actual System.
Proceedings of the 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), 2017

Refactoring the Molecular Docking Simulation for Heterogeneous, Manycore Processors Systems.
Proceedings of the 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), 2017

A hierarchical grid algorithm for accelerating high-performance conjugate gradient benchmark on sunway many-core processor.
Proceedings of the 3rd International Conference on Communication and Information Processing, 2017

Pipelining Computation and Optimization Strategies for Scaling GROMACS on the Sunway Many-Core Processor.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2017

2016
A Flexible Chip Multiprocessor Simulator Dedicated for Thread Level Speculation.
Proceedings of the 2016 IEEE Trustcom/BigDataSE/ISPA, 2016

Parallelizing Back Propagation Neural Network on Speculative Multicores.
Proceedings of the 22nd IEEE International Conference on Parallel and Distributed Systems, 2016

2015
Speculative Parallelism Characterization Profiling in General Purpose Computing Applications.
JCSE, 2015

Optimization and Analysis of Parallel Back Propagation Neural Network on GPU Using CUDA.
Proceedings of the Neural Information Processing - 22nd International Conference, 2015

Local State Reusing for Efficient Model Checking of Multithreaded Programs.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2015

Parallelizing Block Cryptography Algorithms on Speculative Multicores.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2015

Optimization of Binomial Option Pricing on Intel MIC Heterogeneous System.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2015

2014
Exploring speculative procedure and loop level parallelism in SPLASH2.
IJHPSA, 2014

Efficient execution of speculative threads and transactions with hardware transactional memory.
Future Generation Comp. Syst., 2014

A Criticality-Aware DVFS Runtime Utility for Optimizing Power Efficiency of Multithreaded Applications.
Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014

Understanding the SIMD Efficiency of Graph Traversal on GPU.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2014

A Compiler Translate Directive-Based Language to Optimized CUDA.
Proceedings of the 2014 IEEE International Conference on High Performance Computing and Communications, 2014

2013
Phase-Priority based Directory Coherence for Multicore Processor
CoRR, 2013

Quantitative Analysis of Inter-block Dependence in Speculative Execution.
Proceedings of the 12th IEEE International Conference on Trust, 2013

2012
Priority-based squash reducing methods in thread level speculation.
IJITCC, 2012

FlexBFS: a parallelism-aware implementation of breadth-first search on GPU.
Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2012

A Speculative HMMER Search Implementation on GPU.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

CRQ-based fair scheduling on composable multicore architectures.
Proceedings of the International Conference on Supercomputing, 2012

Distributed replay protocol for distributed uniprocessors.
Proceedings of the International Conference on Supercomputing, 2012

SeTM: Efficient Execution of Speculative Threads with Hardware Transactional Memory.
Proceedings of the 18th IEEE International Conference on Parallel and Distributed Systems, 2012

VSCP: A Cache Controlling Method for Improving Single Thread Performance in Multicore System.
Proceedings of the 14th IEEE International Conference on High Performance Computing and Communication & 9th IEEE International Conference on Embedded Software and Systems, 2012

Distributed Control Independence for Composable Multi-processors.
Proceedings of the 2012 IEEE/ACIS 11th International Conference on Computer and Information Science, Shanghai, China, May 30, 2012

Value Predicted LogSPoTM: Improve the Parallesim of Thread Level System by Using a Value Predictor.
Proceedings of the 2012 IEEE/ACIS 11th International Conference on Computer and Information Science, Shanghai, China, May 30, 2012

2011
CHMasters: A Scalable and Speed-Efficient Metadata Service in Distributed File System.
Proceedings of the 12th International Conference on Parallel and Distributed Computing, 2011

A Non-blocking Programming Framework for Pipeline Application on Multi-core Platform.
Proceedings of the IEEE International Symposium on Parallel and Distributed Processing with Applications, 2011

A Priority-Aware NoC to Reduce Squashes in Thread Level Speculation for Chip Multiprocessors.
Proceedings of the IEEE International Symposium on Parallel and Distributed Processing with Applications, 2011

Exploiting Speculative Thread-Level Parallelism Based on Transactional Memory.
Proceedings of the Third International Conference on Communications and Mobile Computing, 2011

Accelerating Block Cryptography Algorithms in Procedure Level Speculation.
Proceedings of the Seventh International Conference on Computational Intelligence and Security, 2011

2010
FACRA: Flexible-Core Architecture Chip Resource Abstractor.
Proceedings of the 2010 International Conference on Parallel and Distributed Computing, 2010

CuHMMer: A load-balanced CPU-GPU cooperative bioinformatics application.
Proceedings of the 2010 International Conference on High Performance Computing & Simulation, 2010

Pattern-Unit Based Regular Expression Matching with Reconfigurable Function Unit.
Proceedings of the Computational Science and Its Applications, 2010

Dynamic Resource Tuning for Flexible Core Chip Multiprocessors.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2010

The optimization of parallel Smith-Waterman sequence alignment using on-chip memory of GPGPU.
Proceedings of the Fifth International Conference on Bio-Inspired Computing: Theories and Applications, 2010

2009
The Mapping Framework and Optimizing Strategy for Block Cryptography Algorithms on Cell Broadband Engine.
Proceedings of the 2009 International Conference on Parallel and Distributed Computing, 2009

Performance and Power Efficiency Analysis of the Symmetric Cryptograph on Two Stream Processor Architectures.
Proceedings of the Fifth International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP 2009), 2009

Investigation of Factors Impacting Thread-Level Parallelism from Desktop, Multimedia and HPC Applications.
Proceedings of the Fourth International Conference on Frontier of Computer Science and Technology, 2009

A Program Behavior Study of Block Cryptography Algorithms on GPGPU.
Proceedings of the Fourth International Conference on Frontier of Computer Science and Technology, 2009

Scaling the Performance of Tiled Processor Architectures with On-Chip-Network Topology.
Proceedings of the Second International Joint Conference on Computational Sciences and Optimization, 2009

2008
A wire delay scalable stream processor architecture.
Proceedings of the 13th Asia-Pacific Computer Systems Architecture Conference, 2008

Profile guided optimization for dataflow predication.
Proceedings of the 13th Asia-Pacific Computer Systems Architecture Conference, 2008

LogSPoTM: a scalable thread level speculation model based on transactional memory.
Proceedings of the 13th Asia-Pacific Computer Systems Architecture Conference, 2008

2007
Balancing Thread Partition for Efficiently Exploiting Speculative Thread-Level Parallelism.
Proceedings of the Advanced Parallel Processing Technologies, 7th International Symposium, 2007

An Online Profile Guided Optimization Approach for Speculative Parallel Threading.
Proceedings of the Advances in Computer Systems Architecture, 2007

2005
Improving Latency Tolerance of Network Processors Through Simultaneous Multithreading.
Proceedings of the Advanced Parallel Processing Technologies, 6th International Workshop, 2005

2000
Broadcasting Under Network Ignorance Scenario.
Proceedings of the Applied Computing 2000, 2000

1999
A Parallel and Distributed Debugger Implemented with Java.
Proceedings of the TOOLS 1999: 31st International Conference on Technology of Object-Oriented Languages and Systems, 1999

A Java/CORBA Based Universal Framework for Super Server User-End Integrated Environments.
Proceedings of the TOOLS 1999: 31st International Conference on Technology of Object-Oriented Languages and Systems, 1999


  Loading...