We stand with Ukraine

We stand with Ukraine

Mostofa Patwary

According to our database¹, Mostofa Patwary authored at least 38 papers between 2019 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

Pretraining Large Language Models with NVFP4.

[BibT_eX]

[DOI]

Felix Abecassis

,

,

,

,

Stefania Alborghetti

,

Michael Andersch

,

Sivakumar Arayandi

,

,

,

,

,

Bryan Catanzaro

,

,

Mike Chrzanowski

,

,

,

,

Bita Darvish Rouhani

,

Carlo del Mundo

,

,

Sukru Burc Eryilmaz

,

,

,

,

,

,

,

,

Ujval J. Kapasi

,

Brucek Khailany

,

,

,

Alex Kondratenko

,

Ronny Krashinsky

,

,

,

Michael Lightstone

,

,

Paulius Micikevicius

,

,

,

Deepak Narayanan

,

,

Abhijit Paithankar

,

Satish Pasumarthi

,

,

Mostofa Patwary

,

,

,

Sweta Priyadarshi

,

,

,

,

,

Sanjeev Satheesh

,

,

,

,

,

Mohammad Shoeybi

,

,

Misha Smelyanskiy

,

,

,

,

,

,

,

,

Evgeny Tsykunov

,

Gandhi Vaithilingam

,

,

Rangharajan Venkatesan

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, September, 2025

Nemotron-CC-Math: A 133 Billion-Token-Scale High Quality Math Pretraining Dataset.

[BibT_eX]

[DOI]

Rabeeh Karimi Mahabadi

,

Sanjeev Satheesh

,

Shrimai Prabhumoye

,

Mostofa Patwary

,

Mohammad Shoeybi

,

Bryan Catanzaro

CoRR, August, 2025

Fusing LLM Capabilities with Routing Data.

[BibT_eX]

[DOI]

,

,

,

,

Mostofa Patwary

,

Mohammad Shoeybi

,

Bryan Catanzaro

,

CoRR, July, 2025

Llama-Nemotron: Efficient Reasoning Models.

[BibT_eX]

[DOI]

Akhiad Bercovich

,

,

,

Mohammad Dabbah

,

,

,

,

,

,

,

,

,

,

Ran Zilberstein

,

,

,

Alexander Bukharin

,

,

,

,

Ameya Sunil Mahabaleshwarkar

,

,

,

Olivier Delalleau

,

,

,

David Mosallanezhad

,

Adi Renduchintala

,

,

,

,

Somshubra Majumdar

,

,

Wasi Uddin Ahmad

,

Sean Narenthiran

,

Aleksander Ficek

,

,

,

Siddhartha Jain

,

,

,

,

Shubham Toshniwal

,

George Armstrong

,

Branislav Kisacanin

,

,

,

Evelina Bakhturina

,

Jane Polak Scowcroft

,

,

,

,

,

,

,

Sanjeev Satheesh

,

Jupinder Parmar

,

Pritam Gundecha

,

,

Joseph Jennings

,

Shrimai Prabhumoye

,

Syeda Nahida Akter

,

Mostofa Patwary

,

Abhinav Khattar

,

Deepak Narayanan

,

,

,

,

,

,

,

,

Christine Harvey

,

,

,

Sergey Kashirsky

,

,

,

,

Arun Venkatesan

,

,

,

,

,

,

Abhilash Somasamudramath

,

,

,

,

,

Omer Ullman Argov

,

,

Oleksandr Romanenko

,

,

Monika Katariya

,

Marco Rovinelli

,

,

Nicholas Edelman

,

Anahita Bhiwandiwalla

,

Muthu Subramaniam

CoRR, May, 2025

Nemotron-CrossThink: Scaling Self-Learning beyond Math Reasoning.

[BibT_eX]

[DOI]

Syeda Nahida Akter

,

Shrimai Prabhumoye

,

,

,

,

Evelina Bakhturina

,

,

,

Mostofa Patwary

,

Mohammad Shoeybi

,

Bryan Catanzaro

CoRR, April, 2025

CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

Mostofa Patwary

,

,

,

Pavlo Molchanov

CoRR, April, 2025

Efficient Hybrid Language Model Compression through Group-Aware SSM Pruning.

[BibT_eX]

[DOI]

Ali Taghibakhshi

,

Sharath Turuvekere Sreenivas

,

Saurav Muralidharan

,

Marcin Chochowski

,

Yashaswi Karnati

,

,

Ameya Sunil Mahabaleshwarkar

,

,

,

Oluwatobi Olabiyi

,

Daniel Korzekwa

,

Mostofa Patwary

,

Mohammad Shoeybi

,

,

Bryan Catanzaro

,

,

,

Pavlo Molchanov

CoRR, April, 2025

Retro-Search: Exploring Untaken Paths for Deeper and Efficient Reasoning.

[BibT_eX]

[DOI]

,

,

,

,

,

Shrimai Prabhumoye

,

Niklas Muennighoff

,

Mostofa Patwary

,

Mohammad Shoeybi

,

Bryan Catanzaro

,

CoRR, April, 2025

MIND: Math Informed syNthetic Dialogues for Pretraining LLMs.

[BibT_eX]

[DOI]

Syeda Nahida Akter

,

Shrimai Prabhumoye

,

,

Sanjeev Satheesh

,

,

Mostofa Patwary

,

Mohammad Shoeybi

,

Bryan Catanzaro

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Nemotron-CC: Transforming Common Crawl into a Refined Long-Horizon Pretraining Dataset.

[BibT_eX]

[DOI]

,

,

,

Joseph Jennings

,

,

,

Mostofa Patwary

,

Mohammad Shoeybi

,

Bryan Catanzaro

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

Maximize Your Data's Potential: Enhancing LLM Accuracy with Two-Phase Pretraining.

[BibT_eX]

[DOI]

,

Shrimai Prabhumoye

,

,

,

Mostofa Patwary

,

Mohammad Shoeybi

,

Bryan Catanzaro

CoRR, 2024

LLM Pruning and Distillation in Practice: The Minitron Approach.

[BibT_eX]

[DOI]

Sharath Turuvekere Sreenivas

,

Saurav Muralidharan

,

,

Marcin Chochowski

,

Mostofa Patwary

,

Mohammad Shoeybi

,

Bryan Catanzaro

,

,

Pavlo Molchanov

CoRR, 2024

Reuse, Don't Retrain: A Recipe for Continued Pretraining of Language Models.

[BibT_eX]

[DOI]

Jupinder Parmar

,

Sanjeev Satheesh

,

Mostofa Patwary

,

Mohammad Shoeybi

,

Bryan Catanzaro

CoRR, 2024

Data, Data Everywhere: A Guide for Pretraining Dataset Construction.

[BibT_eX]

[DOI]

Jupinder Parmar

,

Shrimai Prabhumoye

,

Joseph Jennings

,

,

Aastha Jhunjhunwala

,

,

Mostofa Patwary

,

Mohammad Shoeybi

,

Bryan Catanzaro

CoRR, 2024

Nemotron-4 340B Technical Report.

[BibT_eX]

[DOI]

,

,

,

,

Pallab Bhattacharya

,

,

,

Bryan Catanzaro

,

,

Jonathan M. Cohen

,

,

Ayush Dattagupta

,

Olivier Delalleau

,

Leon Derczynski

,

,

,

,

Aleksander Ficek

,

,

,

,

,

Tomasz Grzegorzek

,

,

,

,

Joseph Jennings

,

Aastha Jhunjhunwala

,

,

,

Oleksii Kuchaiev

,

Patrick LeGresley

,

,

,

,

,

Ameya Sunil Mahabaleshwarkar

,

Somshubra Majumdar

,

,

Miguel Martinez

,

Maer Rodrigues de Melo

,

,

Deepak Narayanan

,

Sean Narenthiran

,

,

,

,

,

Guruprasad Nutheti

,

Christopher Parisien

,

Jupinder Parmar

,

Mostofa Patwary

,

Krzysztof Pawelec

,

,

Shrimai Prabhumoye

,

,

,

Vasanth Rao Naik Sabavat

,

Sanjeev Satheesh

,

Jane Polak Scowcroft

,

,

,

,

Mohammad Shoeybi

,

,

Misha Smelyanskiy

,

,

Makesh Narsimhan Sreedhar

,

,

Sandeep Subramanian

,

,

Shubham Toshniwal

,

,

,

,

,

,

,

,

,

CoRR, 2024

Nemotron-4 15B Technical Report.

[BibT_eX]

[DOI]

CoRR, 2024

Compact Language Models via Pruning and Knowledge Distillation.

[BibT_eX]

[DOI]

Saurav Muralidharan

,

Sharath Turuvekere Sreenivas

,

,

Marcin Chochowski

,

Mostofa Patwary

,

Mohammad Shoeybi

,

Bryan Catanzaro

,

,

Pavlo Molchanov

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

LLM-Evolve: Evaluation for LLM's Evolving Capability on Benchmarks.

[BibT_eX]

[DOI]

,

,

Shrimai Prabhumoye

,

Mostofa Patwary

,

Mohammad Shoeybi

,

Bryan Catanzaro

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Data, Data Everywhere: A Guide for Pretraining Dataset Construction.

[BibT_eX]

[DOI]

Jupinder Parmar

,

Shrimai Prabhumoye

,

Joseph Jennings

,

,

Aastha Jhunjhunwala

,

,

Mostofa Patwary

,

Mohammad Shoeybi

,

Bryan Catanzaro

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

2023

Context Generation Improves Open Domain Question Answering.

[BibT_eX]

[DOI]

,

Mostofa Patwary

,

Shrimai Prabhumoye

,

,

,

Mohammad Shoeybi

,

,

Anima Anandkumar

,

Bryan Catanzaro

Proceedings of the Findings of the Association for Computational Linguistics: EACL 2023, 2023

Adding Instructions during Pretraining: Effective way of Controlling Toxicity in Language Models.

[BibT_eX]

[DOI]

Shrimai Prabhumoye

,

Mostofa Patwary

,

Mohammad Shoeybi

,

Bryan Catanzaro

Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023

2022

Factuality Enhanced Language Models for Open-Ended Text Generation.

[BibT_eX]

[DOI]

,

,

,

Mostofa Patwary

,

Mohammad Shoeybi

,

Bryan Catanzaro

CoRR, 2022

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model.

[BibT_eX]

[DOI]

CoRR, 2022

Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models.

[BibT_eX]

[DOI]

,

,

,

,

Mostofa Patwary

,

Mohammad Shoeybi

,

,

Anima Anandkumar

,

Bryan Catanzaro

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Factuality Enhanced Language Models for Open-Ended Text Generation.

[BibT_eX]

[DOI]

,

,

,

Mostofa Patwary

,

,

Mohammad Shoeybi

,

Bryan Catanzaro

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Keynote Talk 2 Training Large Language Models: Challenges and Opportunities.

[BibT_eX]

[DOI]

Mostofa Patwary

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022

Evaluating Parameter Efficient Learning for Generation.

[BibT_eX]

[DOI]

,

Mostofa Patwary

,

Shrimai Prabhumoye

,

,

,

,

,

Mohammad Shoeybi

,

Bryan Catanzaro

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Multi-Stage Prompting for Knowledgeable Dialogue Generation.

[BibT_eX]

[DOI]

,

Mostofa Patwary

,

,

Shrimai Prabhumoye

,

,

Mohammad Shoeybi

,

Bryan Catanzaro

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

2021

Efficient Large-Scale Language Model Training on GPU Clusters.

[BibT_eX]

[DOI]

Deepak Narayanan

,

Mohammad Shoeybi

,

,

Patrick LeGresley

,

Mostofa Patwary

,

Vijay Korthikanti

,

Dmitri Vainbrand

,

Prethvi Kashinkunti

,

,

Bryan Catanzaro

,

Amar Phanishayee

,

CoRR, 2021

Efficient large-scale language model training on GPU clusters using megatron-LM.

[BibT_eX]

[DOI]

Deepak Narayanan

,

Mohammad Shoeybi

,

,

Patrick LeGresley

,

Mostofa Patwary

,

Vijay Korthikanti

,

Dmitri Vainbrand

,

Prethvi Kashinkunti

,

,

Bryan Catanzaro

,

Amar Phanishayee

,

Proceedings of the International Conference for High Performance Computing, 2021

End-to-End Training of Neural Retrievers for Open-Domain Question Answering.

[BibT_eX]

[DOI]

Devendra Singh Sachan

,

Mostofa Patwary

,

Mohammad Shoeybi

,

,

,

William L. Hamilton

,

Bryan Catanzaro

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

2020

Local Knowledge Powered Conversational Agents.

[BibT_eX]

[DOI]

Sashank Santhanam

,

,

,

Mohammad Shoeybi

,

Mostofa Patwary

,

Bryan Catanzaro

CoRR, 2020

MEGATRON-CNTRL: Controllable Story Generation with External Knowledge Using Large-Scale Language Models.

[BibT_eX]

[DOI]

,

Mostofa Patwary

,

Mohammad Shoeybi

,

,

,

Anima Anandkumar

,

Bryan Catanzaro

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

BioMegatron: Larger Biomedical Domain Language Model.

[BibT_eX]

[DOI]

,

,

Evelina Bakhturina

,

,

Mostofa Patwary

,

Mohammad Shoeybi

,

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Training Question Answering Models From Synthetic Data.

[BibT_eX]

[DOI]

,

,

Mohammad Shoeybi

,

Mostofa Patwary

,

Bryan Catanzaro

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Large Scale Multi-Actor Generative Dialog Modeling.

[BibT_eX]

[DOI]

,

,

Mohammad Shoeybi

,

Mostofa Patwary

,

Bryan Catanzaro

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

2019

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism.

[BibT_eX]

[DOI]

Mohammad Shoeybi

,

Mostofa Patwary

,

,

Patrick LeGresley

,

,

Bryan Catanzaro

CoRR, 2019

DisCo: Physics-Based Unsupervised Discovery of Coherent Structures in Spatiotemporal Systems.

[BibT_eX]

[DOI]

,

,

James P. Crutchfield

,

,

Vladislav Epifanov

,

Karthik Kashinath

,

Oleksandr Pavlyk

,

Frank Schlimbach

,

Mostofa Patwary

,

Sergey Maidanov

,

Proceedings of the 2019 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments, 2019

Loading...