Haohuan Fu

According to our database1, Haohuan Fu authored at least 119 papers between 2004 and 2020.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Other 

Links

On csauthors.net:

Bibliography

2020
Large-Scale Automatic K-Means Clustering for Heterogeneous Many-Core Supercomputer.
IEEE Trans. Parallel Distributed Syst., 2020

Millimeter-Scale and Billion-Atom Reactive Force Field Simulation on Sunway Taihulight.
IEEE Trans. Parallel Distributed Syst., 2020

Improving 3-m Resolution Land Cover Mapping through Efficient Learning from an Imperfect 10-m Resolution Map.
Remote. Sens., 2020

Efficient AES implementation on Sunway TaihuLight supercomputer: A systematic approach.
J. Parallel Distributed Comput., 2020

Benchmarking 50-Photon Gaussian Boson Sampling on the Sunway TaihuLight.
CoRR, 2020

Cross-regional oil palm tree counting and detection via multi-level attention domain adaptation network.
CoRR, 2020

Neighbor-list-free molecular dynamics on sunway TaihuLight supercomputer.
Proceedings of the PPoPP '20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2020

Cross-Regional Oil Palm Tree Detection.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019
Optimizing Finite Volume Method Solvers on Nvidia GPUs.
IEEE Trans. Parallel Distributed Syst., 2019

Performance Tuning and Analysis for Stencil-Based Applications on POWER8 Processor.
ACM Trans. Archit. Code Optim., 2019

Semantic Segmentation-Based Building Footprint Extraction Using Very High-Resolution Satellite Images and Multi-Source GIS Data.
Remote. Sens., 2019

A Real-Time Tree Crown Detection Approach for Large-Scale Remote Sensing Images on FPGAs.
Remote. Sens., 2019

Large-Scale Oil Palm Tree Detection from High-Resolution Satellite Images Using Two-Stage Convolutional Neural Networks.
Remote. Sens., 2019

RedSync: Reducing synchronization bandwidth for distributed deep learning training system.
J. Parallel Distributed Comput., 2019

An automatic performance model-based scheduling tool for coupled climate system models.
J. Parallel Distributed Comput., 2019

NAMSG: An Efficient Method For Training Neural Networks.
CoRR, 2019

SW_GROMACS: accelerate GROMACS on Sunway TaihuLight.
Proceedings of the International Conference for High Performance Computing, 2019

GPU-based 3D cryo-EM reconstruction with key-value streams: poster.
Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2019

SunwayLB: Enabling Extreme-Scale Lattice Boltzmann Method Based Computing Fluid Dynamics Simulations on Sunway TaihuLight.
Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

Large-Scale Oil Palm Tree Detection from High-Resolution Remote Sensing Images Using Faster-RCNN.
Proceedings of the 2019 IEEE International Geoscience and Remote Sensing Symposium, 2019

Parallelizing cryo-EM 3D reconstruction on GPU cluster with a partitioned and streamed model.
Proceedings of the ACM International Conference on Supercomputing, 2019

swATOP: Automatically Optimizing Deep Learning Operators on SW26010 Many-Core Processor.
Proceedings of the 48th International Conference on Parallel Processing, 2019

Scaling the Training of Recurrent Neural Networks on Sunway TaihuLight Supercomputer.
Proceedings of the Computational Science - ICCS 2019, 2019

Million-Core-Scalable Simulation of the Elastic Migration Algorithm on Sunway TaihuLight Supercomputer.
Proceedings of the 19th IEEE/ACM International Symposium on Cluster, 2019

Large-scale Parallel Design for Cryo-EM Structure Determination on Heterogeneous Many-core Architectures.
Proceedings of the 2019 IEEE International Conference on Bioinformatics and Biomedicine, 2019

2018
Optimizing Convolutional Neural Networks on the Sunway TaihuLight Supercomputer.
ACM Trans. Archit. Code Optim., 2018

Application software beyond exascale: challenges and possible trends.
Frontiers Inf. Technol. Electron. Eng., 2018

Will supercomputers be super-data and super-AI machines?
Commun. ACM, 2018

Large-scale hierarchical <i>k-means</i> for heterogeneous many-core supercomputers.
Proceedings of the International Conference for High Performance Computing, 2018

Redesigning LAMMPS for peta-scale and hundred-billion-atom simulation on Sunway TaihuLight.
Proceedings of the International Conference for High Performance Computing, 2018

Simulating the Wenchuan earthquake with accurate surface topography on Sunway TaihuLight.
Proceedings of the International Conference for High Performance Computing, 2018

A Fast Sparse Triangular Solver for Structured-grid Problems on Sunway Many-core Processor SW26010.
Proceedings of the 47th International Conference on Parallel Processing, 2018

Global Simulation of Planetary Rings on Sunway TaihuLight.
Proceedings of the Computational Science - ICCS 2018, 2018

PLZMA: A Parallel Data Compression Method for Cloud Computing.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2018

Semantic Segmentation Based Building Extraction Method Using Multi-Source GIS Map Datasets and Satellite Imagery.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2018

swCaffe: A Parallel Framework for Accelerating Deep Learning Applications on Sunway TaihuLight.
Proceedings of the IEEE International Conference on Cluster Computing, 2018

2017
A Fully-Pipelined Hardware Design for Gaussian Mixture Models.
IEEE Trans. Computers, 2017

Parallel Multiclass Support Vector Machine for Remote Sensing Data Classification on Multicore and Many-Core Architectures.
IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens., 2017

Deep Learning Based Oil Palm Tree Detection and Counting for High-Resolution Remote Sensing Images.
Remote. Sens., 2017

An EnKF-based scheme to optimize hyper-parameters and features for SVM classifier.
Pattern Recognit., 2017

Solving Mesoscale Atmospheric Dynamics Using a Reconfigurable Dataflow Architecture.
IEEE Micro, 2017

Designing and implementing a heuristic cross-architecture combination for graph traversal.
J. Parallel Distributed Comput., 2017

A tetrahedral mesh generation approach for 3D marine controlled-source electromagnetic modeling.
Comput. Geosci., 2017

Chapter Four - Data Flow Computing in Geoscience Applications.
Adv. Comput., 2017

Redesigning CAM-SE for peta-scale climate modeling performance and ultra-high resolution on Sunway TaihuLight.
Proceedings of the International Conference for High Performance Computing, 2017

18.9-Pflops nonlinear earthquake simulation on Sunway TaihuLight: enabling depiction of 18-Hz and 8-meter scenarios.
Proceedings of the International Conference for High Performance Computing, 2017

SW-AES: Accelerating AES Algorithm on the Sunway TaihuLight.
Proceedings of the 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), 2017

swDNN: A Library for Accelerating Deep Learning Applications on Sunway TaihuLight.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

26 PFLOPS Stencil Computations for Atmospheric Modeling on Sunway TaihuLight.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Deep convolutional neural network based large-scale oil palm tree detection for high-resolution remote sensing images.
Proceedings of the 2017 IEEE International Geoscience and Remote Sensing Symposium, 2017

An FPGA-based tree crown detection approach for remote sensing images.
Proceedings of the International Conference on Field Programmable Technology, 2017

Exploring the potential of reconfigurable platforms for order book update.
Proceedings of the 27th International Conference on Field Programmable Logic and Applications, 2017

Accelerating Financial Market Server through Hybrid List Design (Abstract Only).
Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017

A Nanosecond-Level Hybrid Table Design for Financial Market Data Generators.
Proceedings of the 25th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2017

2016
Czip: A Fast Lossless Compression Algorithm for Climate Data.
Int. J. Parallel Program., 2016

A probabilistic graphical model approach in 30 m land cover mapping with multiple data sources.
CoRR, 2016

The Sunway TaihuLight supercomputer: system and applications.
Sci. China Inf. Sci., 2016

Evaluating the POWER8 Architecture through Optimizing Stencil-Based Algorithms.
Proceedings of the 2016 IEEE Trustcom/BigDataSE/ISPA, 2016

10M-core scalable fully-implicit solver for nonhydrostatic atmospheric dynamics.
Proceedings of the International Conference for High Performance Computing, 2016

Refactoring and optimizing the community atmosphere model (CAM) on the sunway taihulight supercomputer.
Proceedings of the International Conference for High Performance Computing, 2016

TADE: Tight Adaptive Differential Evolution.
Proceedings of the Parallel Problem Solving from Nature - PPSN XIV, 2016

Optimizing Residue Number System on FPGA.
Proceedings of the 2016 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, 2016

Cache-Friendly Design for Complex Spatially-Variable Coefficient Stencils on Many-Core Architectures.
Proceedings of the 23rd IEEE International Conference on High Performance Computing, 2016

Accelerating the 3D euler atmospheric solver through heterogeneous CPU-GPU platforms.
Proceedings of the ACM International Conference on Computing Frontiers, CF'16, 2016

Libra: an automated code generation and tuning framework for register-limited stencils on GPUs.
Proceedings of the ACM International Conference on Computing Frontiers, CF'16, 2016

Generalized GPU Acceleration for Applications Employing Finite-Volume Methods.
Proceedings of the IEEE/ACM 16th International Symposium on Cluster, 2016

Graph-Oriented Code Transformation Approach for Register-Limited Stencils on GPUs.
Proceedings of the IEEE/ACM 16th International Symposium on Cluster, 2016

F-CNN: An FPGA-based framework for training Convolutional Neural Networks.
Proceedings of the 27th IEEE International Conference on Application-specific Systems, 2016

Performance optimization of Jacobi stencil algorithms based on POWER8 architecture.
Proceedings of the 27th IEEE International Conference on Application-specific Systems, 2016

Unleashing the performance potential of CPU-GPU platforms for the 3D atmospheric Euler solver.
Proceedings of the 27th IEEE International Conference on Application-specific Systems, 2016

2015
Solving the Global Atmospheric Equations through Heterogeneous Reconfigurable Platforms.
ACM Trans. Reconfigurable Technol. Syst., 2015

Ultra-Scalable CPU-MIC Acceleration of Mesoscale Atmospheric Modeling on Tianhe-2.
IEEE Trans. Computers, 2015

Parallel Genetic Algorithms on Multiple FPGAs.
SIGARCH Comput. Archit. News, 2015

Scaling Support Vector Machines on modern HPC platforms.
J. Parallel Distributed Comput., 2015

Data Reduction Analysis for Climate Data Sets.
Int. J. Parallel Program., 2015

Targeted Mutation: A Novel Mutation Strategy for Differential Evolution.
Proceedings of the 27th IEEE International Conference on Tools with Artificial Intelligence, 2015

Optimizing Complex Spatially-Variant Coefficient Stencils for Seismic Modeling on GPU.
Proceedings of the 21st IEEE International Conference on Parallel and Distributed Systems, 2015

Optimizing Residue Number Reverse Converters through Bitwise Arithmetic on FPGAs.
Proceedings of the 23rd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2015

2014
Global-Scale Associations of Vegetation Phenology with Rainfall and Temperature at a High Spatio-Temporal Resolution.
Remote. Sens., 2014

Scaling Reverse Time Migration Performance through Reconfigurable Dataflow Engines.
IEEE Micro, 2014

Evaluating multi-core and many-core architectures through accelerating the three-dimensional Lax-Wendroff correction stencil.
Int. J. High Perform. Comput. Appl., 2014

CFIO2: Overlapping Communications and I/O with Computations Using RDMA Technology.
Proceedings of the Network and Parallel Computing, 2014

A High Performance Compression Method for Climate Data.
Proceedings of the IEEE International Symposium on Parallel and Distributed Processing with Applications, 2014

MIC-SVM: Designing a Highly Efficient Support Vector Machine for Advanced Modern Multi-core and Many-Core Architectures.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Enabling and Scaling a Global Shallow-Water Atmospheric Model on Tianhe-2.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Scaling and analyzing the stencil performance on multi-core and many-core architectures.
Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems, 2014

Porting the Princeton Ocean Model to GPUs.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2014

Patra: Parallel tree-reweighted message passing architecture.
Proceedings of the 24th International Conference on Field Programmable Logic and Applications, 2014

A highly-efficient and green data flow engine for solving euler atmospheric equations.
Proceedings of the 24th International Conference on Field Programmable Logic and Applications, 2014

A Fully-Pipelined FPGA Design for Tree-Reweighted Message Passing Algorithm.
Proceedings of the 22nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2014

2013
CFIO: A Fast I/O Library for Climate Models.
Proceedings of the 12th IEEE International Conference on Trust, 2013

A peta-scalable CPU-GPU algorithm for global atmospheric simulations.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2013

Optimize Multidimensional Arrays Queries with Heterogeneous Replica Method.
Proceedings of the IEEE Eighth International Conference on Networking, 2013

Accelerating the 3D Elastic Wave Forward Modeling on GPU and MIC.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

Accelerating solvers for global atmospheric equations through mixed-precision data flow engine.
Proceedings of the 23rd International Conference on Field programmable Logic and Applications, 2013

An FPGA-Based Data Flow Engine for Gaussian Copula Model.
Proceedings of the 21st IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2013

Global Atmospheric Simulation on a Reconfigurable Platform.
Proceedings of the 21st IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2013

A Scalable Barotropic Mode Solver for the Parallel Ocean Program.
Proceedings of the Euro-Par 2013 Parallel Processing, 2013

Understanding Data Characteristics and Access Patterns in a Cloud Storage System.
Proceedings of the 13th IEEE/ACM International Symposium on Cluster, 2013

2012
Revisiting finite difference and spectral migration methods on diverse parallel architectures.
Comput. Geosci., 2012

The Chunk-Locality Index: An Efficient Query Method for Climate Datasets.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

A fully-pipelined expectation-maximization engine for Gaussian Mixture Models.
Proceedings of the 2012 International Conference on Field-Programmable Technology, 2012

2011
Beyond Traditional Microprocessors for Geoscience High-Performance Computing Applications.
IEEE Micro, 2011

Eliminating the memory bottleneck: an FPGA-based solution for 3d reverse time migration.
Proceedings of the ACM/SIGDA 19th International Symposium on Field Programmable Gate Arrays, 2011

2010
FPGA Designs with Optimized Logarithmic Arithmetic.
IEEE Trans. Computers, 2010

2009
Accelerating Seismic Computations Using Customized Number Representations on FPGAs.
EURASIP J. Embed. Syst., 2009

2008
An efficient admission control for IEEE 802.11 networks based on throughput analyses of (un)saturated channel.
Int. J. Commun. Syst., 2008

Smart Enumeration: A Systematic Approach to Exhaustive Search.
Proceedings of the Integrated Circuit and System Design. Power and Timing Modeling, 2008

Optimizing residue arithmetic on FPGAs.
Proceedings of the 2008 International Conference on Field-Programmable Technology, 2008

2007
Optimizing Logarithmic Arithmetic on FPGAs.
Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines, 2007

2006
Comparing floating-point and logarithmic number representations for reconfigurable acceleration.
Proceedings of the 2006 IEEE International Conference on Field Programmable Technology, 2006

2005
Efficient Multiplexing Protocol for Low Bit Rate Multi-point Video Conferencing.
Proceedings of the Mobile Ad-hoc and Sensor Networks, First International Conference, 2005

Next Generation Networks Architecture and Layered End-to-End QoS Control.
Proceedings of the Parallel and Distributed Processing and Applications, 2005

Efficient Implementation of 3G-324M Protocol Stack for Multimedia Communication.
Proceedings of the 11th International Conference on Parallel and Distributed Systems, 2005

Efficient wireless link bandwidth detection for IEEE 802.11 networks.
Proceedings of IEEE International Conference on Communications, 2005

Object-Oriented Design and Implementations of 3G-324M Protocol Stack.
Proceedings of the Distributed and Parallel Computing, 2005

2004
Performance Evaluations of Replacement Algorithms in Hierarchical Web Caching.
Proceedings of the Advances in Web-Age Information Management: 5th International Conference, 2004

Efficient construction of connected dominating set in wireless ad hoc networks.
Proceedings of the 2004 IEEE International Conference on Mobile Ad-hoc and Sensor Systems, 2004

An Integration Approach of Data Mining with Web Cache Pre-fetching.
Proceedings of the Parallel and Distributed Processing and Applications, 2004


  Loading...