Canqun Yang

According to our database1, Canqun Yang authored at least 59 papers between 2005 and 2019.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Other 

Links

On csauthors.net:

Bibliography

2019
Application-aware NoC management in GPUs multitasking.
The Journal of Supercomputing, 2019

Toward fault-tolerant hybrid programming over large-scale heterogeneous clusters via checkpointing/restart optimization.
The Journal of Supercomputing, 2019

SCP: Shared Cache Partitioning for High-Performance GEMM.
TACO, 2019

Low-Cost Image Compressive Sensing with Multiple Measurement Rates for Object Detection.
Sensors, 2019

GARDENIA: A Graph Processing Benchmark Suite for Next-Generation Accelerators.
JETC, 2019

Reverse Offload Programming on Heterogeneous Systems.
IEEE Access, 2019

The Communication-Overlapped Hybrid Decomposition Parallel Algorithm for Multi-Scale Fluid Simulations.
Proceedings of the 48th International Conference on Parallel Processing, 2019

2018
Orchestrating parallel detection of strongly connected components on GPUs.
Parallel Computing, 2018

Moving from exascale to zettascale computing: challenges and techniques.
Frontiers of IT & EE, 2018

A hybrid deep learning CNN-ELM for age and gender classification.
Neurocomputing, 2018

Collaborative Subspace Graph Hashing for Cross-modal Retrieval.
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval, 2018

Auto-tuning Streamed Applications on Intel Xeon Phi.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

UHCL-Darknet: An OpenCL-based Deep Neural Network Framework for Heterogeneous Multi-/Many-core Clusters.
Proceedings of the 47th International Conference on Parallel Processing, 2018

MOCL: an efficient openCL implementation for the matrix-2000 architecture.
Proceedings of the 15th ACM International Conference on Computing Frontiers, 2018

2017
Efficient and high-quality sparse graph coloring on GPUs.
Concurrency and Computation: Practice and Experience, 2017

LU factorization on heterogeneous systems: an energy-efficient approach towards high performance.
Computing, 2017

Dependency-based long short term memory network for drug-drug interaction extraction.
BMC Bioinformatics, 2017

High Performance Detection of Strongly Connected Components in Sparse Graphs on GPUs.
Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores, 2017

Projective Hard Thresholding Pursuit for Nonnegative Sparse Recovery.
Proceedings of the 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), 2017

Delay Compensated Asynchronous Adam Algorithm for Deep Neural Networks.
Proceedings of the 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), 2017

Implementing and Evaluating OpenCL on an ARMv8 Multi-Core CPU.
Proceedings of the 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), 2017

Efficient and Portable ALS Matrix Factorization for Recommender Systems.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

Automatic density clustering with multiple kernels for high-dimension bioinformatics data.
Proceedings of the 2017 IEEE International Conference on Bioinformatics and Biomedicine, 2017

2016
Evaluating Multiple Streams on Heterogeneous Platforms.
Parallel Processing Letters, 2016

623 Tflop/s HPCG run on Tianhe-2: Leveraging millions of hybrid cores.
IJHPCA, 2016

Accurate, validated and fast evaluation of elementary symmetric functions and its application.
Applied Mathematics and Computation, 2016

Bilateral Sampling Randomized Singular Value Decomposition.
Proceedings of the 17th International Conference on Parallel and Distributed Computing, 2016

Accurate Evaluation of Bivariate Polynomials.
Proceedings of the 17th International Conference on Parallel and Distributed Computing, 2016

Accelerator-Centered Programming on Heterogeneous Systems.
Proceedings of the 17th International Conference on Parallel and Distributed Computing, 2016

Streaming Applications on Heterogeneous Platforms.
Proceedings of the Network and Parallel Computing, 2016

Monaural Speech Separation on Many Integrated Core Architecture.
Proceedings of the Computer Engineering and Technology - 20th CCF Conference, 2016

Accelerating Nyström Kernel Independent Component Analysis with Many Integrated Core Architecture.
Proceedings of the Computer Engineering and Technology - 20th CCF Conference, 2016

Evaluating the Performance Impact of Multiple Streams on the MIC-Based Heterogeneous Platform.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

High Performance Parallel Graph Coloring on GPGPUs.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

An Energy-Efficient Implementation of LU Factorization on Heterogeneous Systems.
Proceedings of the 22nd IEEE International Conference on Parallel and Distributed Systems, 2016

mAMBER: A CPU/MIC collaborated parallel framework for AMBER on Tianhe-2 supercomputer.
Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine, 2016

2015
An Efficient Clique-Based Algorithm of Compute Nodes Allocation for In-memory Checkpoint System.
Proceedings of the High Performance Computing - 30th International Conference, 2015

Large-Scale Neo-Heterogeneous Programming and Optimization of SNP Detection on Tianhe-2.
Proceedings of the High Performance Computing - 30th International Conference, 2015

Design and Implementation of a Highly Efficient DGEMM for 64-Bit ARMv8 Multi-core Processors.
Proceedings of the 44th International Conference on Parallel Processing, 2015

Implementation of an Accurate and Efficient Compensated DGEMM for 64-bit ARMv8 Multi-Core Processors.
Proceedings of the 21st IEEE International Conference on Parallel and Distributed Systems, 2015

FT-Offload: A Scalable Fault-Tolerance Programing Model on MIC Cluster.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2015

The Challenge of Scaling Genome Big Data Analysis Software on TH-2 Supercomputer.
Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015

mAMBER: Accelerating Explicit Solvent Molecular Dynamic with Intel Xeon Phi Many-Integrated Core Coprocessors.
Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015

2014
MilkyWay-2 supercomputer: system and application.
Frontiers Comput. Sci., 2014

HPCG: Preliminary Evaluation and Optimization on Tianhe-2 CPU-only Nodes.
Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing, 2014

2013
Exploiting hierarchy parallelism for molecular dynamics on a petascale heterogeneous system.
J. Parallel Distrib. Comput., 2013

OpenACC to Intel Offload: Automatic Translation and Optimization.
Proceedings of the Computer Engineering and Technology - 17th CCF Conference, 2013

MIC acceleration of short-range molecular dynamics simulations.
Proceedings of the First International Workshop on Code Optimisation for Multi and Many Cores, 2013

2012
Parallelizing SOR for GPGPUs using alternate loop tiling.
Parallel Computing, 2012

A Fast Parallel Implementation of Molecular Dynamics with the Morse Potential on a Heterogeneous Petascale Supercomputer.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

2011
Optimizing Linpack Benchmark on GPU-Accelerated Petascale Supercomputer.
J. Comput. Sci. Technol., 2011

2010
TH-1: China's first petaflop supercomputer.
Frontiers Comput. Sci. China, 2010

Adaptive Optimization for Petascale Heterogeneous CPU/GPU Computing.
Proceedings of the 2010 IEEE International Conference on Cluster Computing, 2010

2009
Solving 2D Nonlinear Unsteady Convection-Diffusion Equations on Heterogenous Platforms with Multiple GPUs.
Proceedings of the 15th IEEE International Conference on Parallel and Distributed Systems, 2009

GPU Acceleration of High-Speed Collision Molecular Dynamics Simulation.
Proceedings of the Ninth IEEE International Conference on Computer and Information Technology, 2009

2008
Low Power Optimization for MPI Collective Operations.
Proceedings of the 9th International Conference for Young Computer Scientists, 2008

Exploiting Energy Saving Opportunity of Barrier Operation in MPI Programs.
Proceedings of the Second Asia International Conference on Modelling and Simulation, 2008

OSS: Efficient Compiler Approach for Selecting Optimal Strip Size on the Imagine Stream Processor.
Proceedings of the 22nd International Conference on Advanced Information Networking and Applications, 2008

2005
Improving the Performance of GCC by Exploiting IA-64 Architectural Features.
Proceedings of the Advances in Computer Systems Architecture, 10th Asia-Pacific Conference, 2005


  Loading...