Proceedings of the 21st IEEE International Conference on High Performance Computing and Communications; 17th IEEE International Conference on Smart City; 5th IEEE International Conference on Data Science and Systems, 2019

2018

Orchestrating parallel detection of strongly connected components on GPUs.

[BibT_eX]

[DOI]

Parallel Comput., 2018

Tuning Streamed Applications on Intel Xeon Phi: A Machine Learning Based Approach.

[BibT_eX]

[DOI]

CoRR, 2018

Auto-tuning Streamed Applications on Intel Xeon Phi.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

MOCL: an efficient openCL implementation for the matrix-2000 architecture.

[BibT_eX]

[DOI]

Proceedings of the 15th ACM International Conference on Computing Frontiers, 2018

2017

多核/众核平台上推荐算法的实现与性能评估 (Implementation and Performance Evaluation of Recommender Algorithms Based on Multi-/Many-core Platforms).

[BibT_eX]

[DOI]

计算机科学, 2017

面向存储层次设计优化的GPU程序性能分析 (Performance Analysis of GPU Programs Towards Better Memory Hierarchy Design).

[BibT_eX]

[DOI]

计算机科学, 2017

Efficient and high-quality sparse graph coloring on GPUs.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2017

LU factorization on heterogeneous systems: an energy-efficient approach towards high performance.

[BibT_eX]

[DOI]

Computing, 2017

High Performance Detection of Strongly Connected Components in Sparse Graphs on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores, 2017

Implementing and Evaluating OpenCL on an ARMv8 Multi-Core CPU.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), 2017

Efficient and Portable ALS Matrix Factorization for Recommender Systems.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

High Performance Coordinate Descent Matrix Factorization for Recommender Systems.

[BibT_eX]

[DOI]

Proceedings of the Computing Frontiers Conference, 2017

2016

Evaluating Multiple Streams on Heterogeneous Platforms.

[BibT_eX]

[DOI]

Parallel Process. Lett., 2016

Streaming Applications on Heterogeneous Platforms.

[BibT_eX]

[DOI]

Proceedings of the Network and Parallel Computing, 2016

Evaluating the Performance Impact of Multiple Streams on the MIC-Based Heterogeneous Platform.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

High Performance Parallel Graph Coloring on GPGPUs.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

An Energy-Efficient Implementation of LU Factorization on Heterogeneous Systems.

[BibT_eX]

[DOI]

Proceedings of the 22nd IEEE International Conference on Parallel and Distributed Systems, 2016

2015

An Efficient Clique-Based Algorithm of Compute Nodes Allocation for In-memory Checkpoint System.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing - 30th International Conference, 2015

2014

OpenMC: Towards Simplifying Programming for TianHe Supercomputers.

[BibT_eX]

[DOI]

J. Comput. Sci. Technol., 2014

2013

Exploiting hierarchy parallelism for molecular dynamics on a petascale heterogeneous system.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2013

OpenACC to Intel Offload: Automatic Translation and Optimization.

[BibT_eX]

[DOI]

Proceedings of the Computer Engineering and Technology - 17th CCF Conference, 2013

MIC acceleration of short-range molecular dynamics simulations.

[BibT_eX]

[DOI]

Proceedings of the First International Workshop on Code Optimisation for Multi and Many Cores, 2013

2012

MPtostream: an OpenMP compiler for CPU-GPU heterogeneous parallel systems.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2012

2011

Power Optimization for GPU Programs Based on Software Prefetching.

[BibT_eX]

[DOI]

Yisong Lin

Tao Tang

Guibin Wang

Proceedings of the IEEE 10th International Conference on Trust, 2011

Cache Miss Analysis for GPU Programs Based on Stack Distance Profile.

[BibT_eX]

[DOI]

Tao Tang

Xuejun Yang

Yisong Lin

Proceedings of the 2011 International Conference on Distributed Computing Systems, 2011

2010

Optimization and Implementation of LBM Benchmark on Multithreaded GPU.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Data Storage and Data Engineering, 2010

Improving scratchpad allocation with demand-driven data tiling.

[BibT_eX]

[DOI]

Proceedings of the 2010 International Conference on Compilers, 2010

Optimizing Stencil Application on Multi-thread GPU Architecture Using Stream Programming Model.

[BibT_eX]

[DOI]

Proceedings of the Architecture of Computing Systems, 2010

A Data Communication Scheduler for Stream Programs on CPU-GPU Platform.

[BibT_eX]

[DOI]

Tao Tang

Xinhai Xu

Yisong Lin

Proceedings of the 10th IEEE International Conference on Computer and Information Technology, 2010

2009

SRF Coloring: Stream Register File Allocation via Graph Coloring.

[BibT_eX]

[DOI]

J. Comput. Sci. Technol., 2009

Program Optimization of Stencil Based Application on the GPU-Accelerated System.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Parallel and Distributed Processing with Applications, 2009

Program Optimization of Array-Intensive SPEC2k Benchmarks on Multithreaded GPU Using CUDA and Brook+.

[BibT_eX]

[DOI]

Proceedings of the 15th IEEE International Conference on Parallel and Distributed Systems, 2009

2008

Optimizing scientific application loops on stream processors.

[BibT_eX]

[DOI]

Proceedings of the 2008 ACM SIGPLAN/SIGBED Conference on Languages, 2008

Model-guided strip size selection for minimal execution time on imagine stream processor.

[BibT_eX]

[DOI]

Proceedings of 8th IEEE International Conference on Computer and Information Technology, 2008

2007

Implementation and Optimization of Dense LU Decomposition on the Stream Processor.

[BibT_eX]

[DOI]

Proceedings of the Parallel Processing and Applied Mathematics, 2007

Implementation and Optimization of Sparse Matrix-Vector Multiplication on Imagine Stream Processor.

[BibT_eX]

[DOI]

Proceedings of the Parallel and Distributed Processing and Applications, 2007

Architecture-Based Optimization for Mapping Scientific Applications to Imagine.

[BibT_eX]

[DOI]

Proceedings of the Parallel and Distributed Processing and Applications, 2007

Evaluation of Transcendental Functions on Imagine Architecture.

[BibT_eX]

[DOI]

Proceedings of the 2007 International Conference on Parallel Processing (ICPP 2007), 2007

Implementation and Evaluation of Jacobi Iteration on the Imagine Stream Processor.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing, 2007

Tao Tang

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...