Xiaoming Li

Proceedings of the International Symposium on Memory Systems, 2024

2023

A Distributed Pricing Strategy for Edge Computation Offloading Optimization in Autonomous Driving.

[BibT_eX]

[DOI]

IEEE Netw., September, 2023

On Memory Codelets: Prefetching, Recoding, Moving and Streaming Data.

[BibT_eX]

[DOI]

Dawson Fox

Jose Monsalve Diaz

CoRR, 2023

DEMAC: A Platform for Education in High-performance Computing, Bridging the Gap Between Users and Hardware.

[BibT_eX]

[DOI]

Proceedings of the Workshop on Computer Architecture Education, 2023

Memory Transfer Decomposition: Exploring Smart Data Movement Through Architecture-Aware Strategies.

[BibT_eX]

[DOI]

Johannes Doerfert

Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

A gem5 Implementation of the Sequential Codelet Model: Reducing Overhead and Expanding the Software Memory Interface.

[BibT_eX]

[DOI]

Dawson Fox

Rafael A. Herrera Guaitero

Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

Towards Fault Tolerance and Resilience in the Sequential Codelet Model.

[BibT_eX]

[DOI]

Diego A. Roa Perdomo

Proceedings of the High Performance Computing - 10th Latin American Conference, 2023

2022

Chiplets and the Codelet Model.

[BibT_eX]

[DOI]

Dawson Fox

Rafael A. Herrera Guaitero

CoRR, 2022

Programming Autonomous Machines.

[BibT_eX]

[DOI]

CoRR, 2022

Automatic Asynchronous Execution of Synchronously Offloaded OpenMP Target Regions.

[BibT_eX]

[DOI]

Thomas Applencourt

Johannes Doerfert

Proceedings of the Eighth IEEE/ACM Workshop on the LLVM Compiler Infrastructure in HPC, 2022

Programming Autonomous Machines : Special Session Paper.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Embedded Software, 2022

2021

Fast Monotonicity Preserving Text Sorting On GPU.

[BibT_eX]

[DOI]

Haoke Xu

Proceedings of the IEEE International Performance, 2021

An Efficient Shuffle-Light FFT Library.

[BibT_eX]

[DOI]

Salvatore Servodio

Proceedings of the IEEE International Performance, 2021

2020

G-Code Re-compilation and Optimization for Faster 3D Printing.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2020

Fast Convolutional Neural Networks with Fine-Grained FFTs.

[BibT_eX]

[DOI]

Yulin Zhang

Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

2018

CSSMT: Compiler Based Software Simultaneous Multithreading (SMT).

[BibT_eX]

[DOI]

Yuanfang Chen

Qingchuan Shi

Proceedings of the 26th Euromicro International Conference on Parallel, 2018

2017

A scalable interface-resolved simulation of particle-laden flow using the lattice Boltzmann method.

[BibT_eX]

[DOI]

Parallel Comput., 2017

Scalable Top-K Query Processing Using Graphics Processing Unit.

[BibT_eX]

[DOI]

Yulin Zhang

Hui Fang

Proceedings of the Languages and Compilers for Parallel Computing, 2017

Improving Retrieval Effectiveness for Temporal-Constrained Top-K Query Processing.

[BibT_eX]

[DOI]

Proceedings of the Information Retrieval Technology, 2017

2015

Network and Parallel Computing.

[BibT_eX]

[DOI]

Ching-Hsien Hsu

Xuanhua Shi

Int. J. Parallel Program., 2015

FreshBreeze: A Data Flow Approach for Meeting DDDAS Challenges.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computational Science, 2015

A Thread Merging Transformation to Improve Throughput of Multiple Programs.

[BibT_eX]

[DOI]

Yuanfang Chen

Proceedings of the 29th IEEE International Conference on Advanced Information Networking and Applications, 2015

2014

Page Classifier and Placer: A Scheme of Managing Hybrid Caches.

[BibT_eX]

[DOI]

Proceedings of the Network and Parallel Computing, 2014

Input-adaptive parallel sparse fast fourier transform for stream processing.

[BibT_eX]

[DOI]

Shuo Chen

Proceedings of the 2014 International Conference on Supercomputing, 2014

A Dataflow Programming Language and its Compiler for Streaming Systems.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computational Science, 2014

2013

An Input-Adaptive Algorithm for High Performance Sparse Fast Fourier Transform.

[BibT_eX]

[DOI]

Shuo Chen

Proceedings of the Languages and Compilers for Parallel Computing, 2013

A hybrid GPU/CPU FFT library for large FFT problems.

[BibT_eX]

[DOI]

Shuo Chen

Proceedings of the IEEE 32nd International Performance Computing and Communications Conference, 2013

2012

Static micro-scheduling: Resource contention relief in multithreaded programs.

[BibT_eX]

[DOI]

Yuanfang Chen

Proceedings of the 31st IEEE International Performance Computing and Communications Conference, 2012

2011

A Code Merging Optimization Technique for GPU.

[BibT_eX]

[DOI]

Ryan Taylor

Proceedings of the Languages and Compilers for Parallel Computing, 2011

Using GPUs to compute large out-of-card FFTs.

[BibT_eX]

[DOI]

Proceedings of the 25th International Conference on Supercomputing, 2011, Tucson, AZ, USA, May 31, 2011

Source Code Partitioning in Program Optimization.

[BibT_eX]

[DOI]

Proceedings of the 17th IEEE International Conference on Parallel and Distributed Systems, 2011

2010

Software-based branch predication for AMD GPUs.

[BibT_eX]

[DOI]

Ryan Taylor

SIGARCH Comput. Archit. News, 2010

An empirically tuned 2D and 3D FFT library on CUDA GPU.

[BibT_eX]

[DOI]

Proceedings of the 24th International Conference on Supercomputing, 2010

A Micro-benchmark Suite for AMD GPUs.

[BibT_eX]

[DOI]

Ryan Taylor

Proceedings of the 39th International Conference on Parallel Processing, 2010

2009

DFT Performance Prediction in FFTW.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2009

Iterative layer-based raytracing on CUDA.

[BibT_eX]

[DOI]

Alejandro Segovia

Guang R. Gao

Proceedings of the 28th International Performance Computing and Communications Conference, 2009

Performance modeling for DFT algorithms in FFTW.

[BibT_eX]

[DOI]

Proceedings of the 23rd international conference on Supercomputing, 2009

CUDA Memory Optimizations for Large Data-Structures in the Gravit Simulator.

[BibT_eX]

[DOI]

Juergen Ributzka

Proceedings of the ICPPW 2009, 2009

A control-structure splitting optimization for GPGPU.

[BibT_eX]

[DOI]

Snaider Carrillo