Cliff Young

Orcid: 0000-0003-2172-1651

According to our database1, Cliff Young authored at least 49 papers between 1994 and 2023.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2023
TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

2022
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts.
CoRR, 2022

2021
Atari's ANTIC: My Favorite Microprocessor.
IEEE Micro, 2021

Best Papers From Hot Chips 32.
IEEE Micro, 2021

The Design Process for Google's Training Chips: TPUv2 and TPUv3.
IEEE Micro, 2021

Exploring the Limits of Concurrency in ML Training on Google TPUS.
Proceedings of Machine Learning and Systems 2021, 2021

Ten Lessons From Three Generations Shaped Google's TPUv4i : Industrial Product.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

2020
Exploring the limits of Concurrency in ML Training on Google TPUs.
CoRR, 2020

A domain-specific supercomputer for training deep neural networks.
Commun. ACM, 2020

Sparse GPU kernels for deep learning.
Proceedings of the International Conference for High Performance Computing, 2020


Planaria: Dynamic Architecture Fission for Spatial Multi-Tenant Acceleration of Deep Neural Networks.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

Google's Training Chips Revealed: TPUv2 and TPUv3.
Proceedings of the IEEE Hot Chips 32 Symposium, 2020

Bit-Parallel Vector Composability for Neural Acceleration.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

2019
MLPerf Training Benchmark.
CoRR, 2019

2018
Motivation for and Evaluation of the First Tensor Processing Unit.
IEEE Micro, 2018

A New Golden Age in Computer Architecture: Empowering the Machine-Learning Revolution.
IEEE Micro, 2018

A domain-specific architecture for deep neural networks.
Commun. ACM, 2018

Mesh-TensorFlow: Deep Learning for Supercomputers.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

2017
In-Datacenter Performance Analysis of a Tensor Processing Unit.
CoRR, 2017


2016
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation.
CoRR, 2016

2014

2013
Hardware support for fine-grained event-driven computation in Anton 2.
Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2013

2011
VLIW Processors.
Proceedings of the Encyclopedia of Parallel Computing, 2011

Anton, A Special-Purpose Molecular Simulation Machine.
Proceedings of the Encyclopedia of Parallel Computing, 2011

Overcoming Communication Latency Barriers in Massively Parallel Scientific Computation.
IEEE Micro, 2011

2010
Exploiting 162-Nanosecond End-to-End Communication Latency on Anton.
Proceedings of the Conference on High Performance Computing Networking, 2010

2009
A 32x32x32, spatially distributed 3D FFT in four microseconds on Anton.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009


2008
Anton, a special-purpose machine for molecular dynamics simulation.
Commun. ACM, 2008

Hierarchical simulation-based verification of Anton, a special-purpose parallel machine.
Proceedings of the 26th International Conference on Computer Design, 2008

High-throughput pairwise point interactions in Anton, a specialized machine for molecular dynamics simulation.
Proceedings of the 14th International Conference on High-Performance Computer Architecture (HPCA-14 2008), 2008

Incorporating flexibility in Anton, a specialized machine for molecular dynamics simulation.
Proceedings of the 14th International Conference on High-Performance Computer Architecture (HPCA-14 2008), 2008

Simulation and embedded software development for Anton, a parallel machine with heterogeneous multicore ASICs.
Proceedings of the 6th International Conference on Hardware/Software Codesign and System Synthesis, 2008

2006
Architectures and Algorithms for Biomolecular Simulation.
Proceedings of the 2006 USENIX Annual Technical Conference, Boston, MA, USA, May 30, 2006

2005
Embedded computing - a VLIW approach to architecture, compilers, and tools.
Morgan Kaufmann, ISBN: 978-1-55860-766-8, 2005

2001
Instruction scheduling for instruction level parallel processors.
Proc. IEEE, 2001

Protium, an Infrastructure for Partitioned Applications.
Proceedings of HotOS-VIII: 8th Workshop on Hot Topics in Operating Systems, 2001

2000
Comparing and Combining Profiles.
J. Instr. Level Parallelism, 2000

Coherence Communication Prediction in Shared-Memory Multiprocessors.
Proceedings of the Sixth International Symposium on High-Performance Computer Architecture, 2000

1999
Static correlated branch prediction.
ACM Trans. Program. Lang. Syst., 1999

1998
Better Global Scheduling Using Path Profiles.
Proceedings of the 31st Annual IEEE/ACM International Symposium on Microarchitecture, 1998

1997
Near-optimal Intraprocedural Branch Alignment.
Proceedings of the ACM SIGPLAN '97 Conference on Programming Language Design and Implementation (PLDI), 1997

1996
An Analysis of Dynamic Branch Prediction Schemes on System Workloads.
Proceedings of the 23rd Annual International Symposium on Computer Architecture, 1996

1995
Performance issues in correlated branch prediction schemes.
Proceedings of the 28th Annual International Symposium on Microarchitecture, Ann Arbor, Michigan, USA, November 29, 1995

A Comparative Analysis of Schemes for Correlated Branch Prediction.
Proceedings of the 22nd Annual International Symposium on Computer Architecture, 1995

1994
Secure Short-Cut Routing for Mobile IP.
Proceedings of the USENIX Summer 1994 Technical Conference, 1994

Improving the Accuracy of Static Branch Prediction Using Branch Correlation.
Proceedings of the ASPLOS-VI Proceedings, 1994


  Loading...