Yi Yang

Orcid: 0000-0003-1462-5100

Affiliations:
  • NEC Laboratories America, Department of Computing Systems Architecture, Princeton, NJ, USA
  • North Carolina State University, Department of Electrical and Computer Engineering, Raleigh, NC, USA (former)


According to our database1, Yi Yang authored at least 34 papers between 2010 and 2022.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2022
DyCo: Dynamic, Contextualized AI Models.
ACM Trans. Embed. Comput. Syst., November, 2022

2021
F3S: Free Flow Fever Screening.
CoRR, 2021

F<sup>3</sup>S: Free Flow Fever Screening.
Proceedings of the IEEE International Conference on Smart Computing, 2021

Magic-Pipe: self-optimizing video analytics pipelines.
Proceedings of the Middleware '21: 22nd International Middleware Conference, Québec City, Canada, December 6, 2021

UAC: An Uncertainty-Aware Face Clustering Algorithm.
Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2021

2017
Accelerating deep neural network training with inconsistent stochastic gradient descent.
Neural Networks, 2017

2016
Optimizing memory efficiency for deep convolutional neural networks on GPUs.
Proceedings of the International Conference for High Performance Computing, 2016

BLASX: A High Performance Level-3 BLAS Library for Heterogeneous Multi-GPU Computing.
Proceedings of the 2016 International Conference on Supercomputing, 2016

HppCnn: A High-Performance, Portable Deep-Learning Library for GPGPUs.
Proceedings of the 45th International Conference on Parallel Processing, 2016

2015
CUDA-NP: Realizing Nested Thread-Level Parallelism in GPGPU Applications.
J. Comput. Sci. Technol., 2015

BLASX: A High Performance Level-3 BLAS Library for Heterogeneous Multi-GPU Computing.
CoRR, 2015

Automatic and Efficient Data Host-Device Communication for Many-Core Coprocessors.
Proceedings of the Languages and Compilers for Parallel Computing, 2015

Automatic data placement into GPU on-chip memory resources.
Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2015

Revisiting ILP Designs for Throughput-Oriented GPGPU Architecture.
Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015

2014
CUDA-NP: realizing nested thread-level parallelism in GPGPU applications.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2014

COMP: Compiler Optimizations for Manycore Processors.
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

Understanding the tradeoffs between software-managed vs. hardware-managed caches in GPUs.
Proceedings of the 2014 IEEE International Symposium on Performance Analysis of Systems and Software, 2014

A Case for a Flexible Scalar Unit in SIMT Architecture.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Automating and optimizing data transfers for many-core coprocessors.
Proceedings of the 2014 International Conference on Supercomputing, 2014

Warp-level divergence in GPUs: Characterization, impact, and mitigation.
Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, 2014

A Highly Efficient FFT Using Shared-Memory Multiplexing.
Proceedings of the Numerical Computations with GPUs, 2014

2013
Locality principle revisited: A probability-based quantitative approach.
J. Parallel Distributed Comput., 2013

The Implementation of a High Performance GPGPU Compiler.
Int. J. Parallel Program., 2013

Semi-automatic restructuring of offloadable tasks for many-core accelerators.
Proceedings of the International Conference for High Performance Computing, 2013

Exploiting uniform vector instructions for GPGPU performance, energy efficiency, and opportunistic reliability enhancement.
Proceedings of the International Conference on Supercomputing, 2013

2012
A unified optimizing compiler framework for different GPGPU architectures.
ACM Trans. Archit. Code Optim., 2012

Apricot: an optimizing compiler and productivity tool for x86-compatible many-core coprocessors.
Proceedings of the International Conference on Supercomputing, 2012

Fixing Performance Bugs: An Empirical Study of Open-Source GPGPU Programs.
Proceedings of the 41st International Conference on Parallel Processing, 2012

CPU-assisted GPGPU on fused CPU-GPU architectures.
Proceedings of the 18th IEEE International Symposium on High Performance Computer Architecture, 2012

Shared memory multiplexing: a novel way to improve GPGPU throughput.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

Many-thread aware instruction-level parallelism: architecting shader cores for GPU computing.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2010
An optimizing compiler for GPGPU programs with input-data sharing.
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010

A GPGPU compiler for memory optimization and parallelism management.
Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation, 2010

Accelerating MATLAB Image Processing Toolbox functions on GPUs.
Proceedings of 3rd Workshop on General Purpose Processing on Graphics Processing Units, 2010


  Loading...