We stand with Ukraine

We stand with Ukraine

Noam Shazeer

Affiliations:

Google

According to our database¹, Noam Shazeer authored at least 51 papers between 1999 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

Online presence:

On csauthors.net:

Bibliography

2025

Hot ChipsKeynote.

[DOI]

Proceedings of the IEEE Hot Chips 37 Symposium, 2025

2023

Scaling Up Models and Data with t5x and seqio.

[DOI]

J. Mach. Learn. Res., 2023

PaLM: Scaling Language Modeling with Pathways.

[DOI]

Aakanksha Chowdhery

,

,

,

,

,

,

,

Hyung Won Chung

,

,

Sebastian Gehrmann

,

,

,

Sasha Tsvyashchenko

,

,

,

,

,

,

Vinodkumar Prabhakaran

,

,

,

,

,

,

,

,

,

,

,

Anselm Levskaya

,

Sanjay Ghemawat

,

,

Henryk Michalewski

,

,

,

,

,

,

Daphne Ippolito

,

,

,

,

Alexander Spiridonov

,

,

,

Shivani Agrawal

,

,

,

Thanumalayan Sankaranarayana Pillai

,

,

Aitor Lewkowycz

,

,

,

Oleksandr Polozov

,

,

,

,

,

,

,

Michele Catasta

,

,

Kathy Meier-Hellstern

,

,

,

,

J. Mach. Learn. Res., 2023

2022

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity.

[DOI]

,

,

J. Mach. Learn. Res., 2022

Scaling Up Models and Data with t5x and seqio.

[DOI]

CoRR, 2022

Designing Effective Sparse Expert Models.

[DOI]

,

,

,

,

,

,

,

CoRR, 2022

LaMDA: Language Models for Dialog Applications.

[DOI]

Romal Thoppilan

,

Daniel De Freitas

,

,

,

Apoorv Kulshreshtha

,

,

,

,

,

,

,

,

Huaixiu Steven Zheng

,

,

Marcelo Menegali

,

,

,

Dmitry Lepikhin

,

,

,

,

,

,

,

,

Chung-Ching Chang

,

,

,

,

Kathleen S. Meier-Hellstern

,

Meredith Ringel Morris

,

,

Renelito Delos Santos

,

,

,

Ben Zevenbergen

,

Vinodkumar Prabhakaran

,

,

,

,

Alejandra Molina

,

Erin Hoffman-John

,

,

,

,

,

,

Viktoriya Kuzmina

,

,

,

Rachel Bernstein

,

,

Blaise Agüera y Arcas

,

,

,

,

CoRR, 2022

2021

Primer: Searching for Efficient Transformers for Language Modeling.

[DOI]

,

,

,

,

,

CoRR, 2021

GSPMD: General and Scalable Parallelization for ML Computation Graphs.

[DOI]

,

,

,

Blake A. Hechtman

,

,

,

,

Dmitry Lepikhin

,

,

Marcello Maggioni

,

,

,

,

,

,

CoRR, 2021

Searching for Efficient Transformers for Language Modeling.

[DOI]

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding.

[DOI]

Dmitry Lepikhin

,

,

,

,

,

,

,

,

Proceedings of the 9th International Conference on Learning Representations, 2021

Do Transformer Modifications Transfer Across Implementations and Applications?

[DOI]

,

Hyung Won Chung

,

,

,

Thibault Févry

,

,

Karishma Malkan

,

,

,

,

,

,

,

,

,

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

2020

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer.

[DOI]

,

,

,

,

,

,

,

,

J. Mach. Learn. Res., 2020

Talking-Heads Attention.

[DOI]

,

,

,

,

CoRR, 2020

GLU Variants Improve Transformer.

[DOI]

CoRR, 2020

Faster Transformer Decoding: N-gram Masked Self-Attention.

[DOI]

,

,

,

CoRR, 2020

How Much Knowledge Can You Pack Into the Parameters of a Language Model?

[DOI]

,

,

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

2019

Fast Transformer Decoding: One Write-Head is All You Need.

[DOI]

CoRR, 2019

High Resolution Medical Image Analysis with Spatial Partitioning.

[DOI]

,

,

,

,

,

Panagiotis Korfiatis

,

Travis M. Drucker

,

Daniel J. Blezek

,

CoRR, 2019

Corpora Generation for Grammatical Error Correction.

[DOI]

Jared Lichtarge

,

,

,

,

,

Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

Music Transformer: Generating Music with Long-Term Structure.

[DOI]

Cheng-Zhi Anna Huang

,

,

Jakob Uszkoreit

,

,

Curtis Hawthorne

,

,

,

Matthew D. Hoffman

,

Monica Dinculescu

,

Proceedings of the 7th International Conference on Learning Representations, 2019

2018

Weakly Supervised Grammatical Error Correction using Iterative Decoding.

[DOI]

Jared Lichtarge

,

Christopher Alberti

,

,

,

CoRR, 2018

An Improved Relative Self-Attention Mechanism for Transformer with Application to Music Generation.

[DOI]

Cheng-Zhi Anna Huang

,

,

Jakob Uszkoreit

,

,

Curtis Hawthorne

,

,

Matthew D. Hoffman

,

CoRR, 2018

Image Transformer.

[DOI]

,

,

Jakob Uszkoreit

,

,

,

CoRR, 2018

Blockwise Parallel Decoding for Deep Autoregressive Models.

[DOI]

,

,

Jakob Uszkoreit

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Mesh-TensorFlow: Deep Learning for Supercomputers.

[DOI]

,

,

,

,

,

Penporn Koanantakool

,

,

,

,

,

,

Blake A. Hechtman

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Adafactor: Adaptive Learning Rates with Sublinear Memory Cost.

[DOI]

,

Proceedings of the 35th International Conference on Machine Learning, 2018

Image Transformer.

[DOI]

,

,

Jakob Uszkoreit

,

,

,

,

Proceedings of the 35th International Conference on Machine Learning, 2018

Fast Decoding in Sequence Models Using Discrete Latent Variables.

[DOI]

,

,

,

,

,

Jakob Uszkoreit

,

Proceedings of the 35th International Conference on Machine Learning, 2018

Generating Wikipedia by Summarizing Long Sequences.

[DOI]

,

,

,

,

,

,

Proceedings of the 6th International Conference on Learning Representations, 2018

HydraNets: Specialized Dynamic Architectures for Efficient Inference.

[DOI]

Ravi Teja Mullapudi

,

William R. Mark

,

,

Kayvon Fatahalian

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Tensor2Tensor for Neural Machine Translation.

[DOI]

,

,

,

François Chollet

,

,

,

,

,

Nal Kalchbrenner

,

,

,

,

Jakob Uszkoreit

Proceedings of the 13th Conference of the Association for Machine Translation in the Americas, 2018

The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation.

[DOI]

,

,

,

,

Wolfgang Macherey

,

George F. Foster

,

,

,

,

,

,

Jakob Uszkoreit

,

,

,

,

Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018

2017

One Model To Learn Them All.

[DOI]

,

,

,

,

,

,

Jakob Uszkoreit

CoRR, 2017

Attention is All you Need.

[DOI]

,

,

,

Jakob Uszkoreit

,

,

,

,

Illia Polosukhin

Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer.

[DOI]

,

Azalia Mirhoseini

,

Krzysztof Maziarz

,

,

,

Geoffrey E. Hinton

,

Proceedings of the 5th International Conference on Learning Representations, 2017

2016

Sparse Non-negative Matrix Language Modeling.

[DOI]

,

,

Trans. Assoc. Comput. Linguistics, 2016

Swivel: Improving Embeddings by Noticing What's Missing.

[DOI]

,

,

,

CoRR, 2016

Exploring the Limits of Language Modeling.

[DOI]

Rafal Józefowicz

,

,

,

,

CoRR, 2016

NN-Grams: Unifying Neural Network and n-Gram Language Models for Speech Recognition.

[DOI]

Babak Damavandi

,

,

,

Antoine Bruguier

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

End-to-end text-dependent speaker verification.

[DOI]

,

,

,

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015

Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks.

[DOI]

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Sparse non-negative matrix language modeling for skip-grams.

[DOI]

,

,

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Pruning sparse non-negative matrix n-gram language models.

[DOI]

,

,

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Sparse non-negative matrix language modeling for geo-annotated query session data.

[DOI]

,

Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014

Skip-gram Language Modeling Using Sparse Non-negative Matrix Probability Estimation.

[DOI]

,

,

CoRR, 2014

2010

Variational Program Inference

[DOI]

,

CoRR, 2010

2002

A probabilistic approach to solving crossword puzzles.

[DOI]

Michael L. Littman

,

,

Noam M. Shazeer

Artif. Intell., 2002

1999

Solving Crossword Puzzles as Probabilistic Constraint Satisfaction.

[DOI]

Noam M. Shazeer

,

Michael L. Littman

,

Proceedings of the Sixteenth National Conference on Artificial Intelligence and Eleventh Conference on Innovative Applications of Artificial Intelligence, 1999

Solving Crosswords with PROVERB.

[DOI]

Michael L. Littman

,

,

Noam M. Shazeer

Proceedings of the Sixteenth National Conference on Artificial Intelligence and Eleventh Conference on Innovative Applications of Artificial Intelligence, 1999

PROVERB: The Probabilistic Cruciverbalist.

[DOI]

,

Noam M. Shazeer

,

Michael L. Littman

,

Sushant Agarwal

,

Catherine M. Cheves

,

Joseph Fitzgerald

,

,

,

Shannon Pollard

,

Karl Weinmeister

Proceedings of the Sixteenth National Conference on Artificial Intelligence and Eleventh Conference on Innovative Applications of Artificial Intelligence, 1999

Loading...