Mostofa Patwary

According to our database1, Mostofa Patwary authored at least 37 papers between 2019 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Nemotron-CC-Math: A 133 Billion-Token-Scale High Quality Math Pretraining Dataset.
CoRR, August, 2025

Fusing LLM Capabilities with Routing Data.
CoRR, July, 2025

Llama-Nemotron: Efficient Reasoning Models.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
CoRR, May, 2025

Nemotron-CrossThink: Scaling Self-Learning beyond Math Reasoning.
CoRR, April, 2025

CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training.
CoRR, April, 2025

Efficient Hybrid Language Model Compression through Group-Aware SSM Pruning.
CoRR, April, 2025

Retro-Search: Exploring Untaken Paths for Deeper and Efficient Reasoning.
CoRR, April, 2025

MIND: Math Informed syNthetic Dialogues for Pretraining LLMs.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Nemotron-CC: Transforming Common Crawl into a Refined Long-Horizon Pretraining Dataset.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024
Maximize Your Data's Potential: Enhancing LLM Accuracy with Two-Phase Pretraining.
CoRR, 2024

LLM Pruning and Distillation in Practice: The Minitron Approach.
CoRR, 2024

Reuse, Don't Retrain: A Recipe for Continued Pretraining of Language Models.
CoRR, 2024

Data, Data Everywhere: A Guide for Pretraining Dataset Construction.
CoRR, 2024

Nemotron-4 340B Technical Report.
CoRR, 2024

Nemotron-4 15B Technical Report.
CoRR, 2024

Compact Language Models via Pruning and Knowledge Distillation.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

LLM-Evolve: Evaluation for LLM's Evolving Capability on Benchmarks.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Data, Data Everywhere: A Guide for Pretraining Dataset Construction.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

2023
Context Generation Improves Open Domain Question Answering.
Proceedings of the Findings of the Association for Computational Linguistics: EACL 2023, 2023

Adding Instructions during Pretraining: Effective way of Controlling Toxicity in Language Models.
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023

2022
Factuality Enhanced Language Models for Open-Ended Text Generation.
CoRR, 2022

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model.
CoRR, 2022

Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Factuality Enhanced Language Models for Open-Ended Text Generation.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Keynote Talk 2 Training Large Language Models: Challenges and Opportunities.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022

Evaluating Parameter Efficient Learning for Generation.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Multi-Stage Prompting for Knowledgeable Dialogue Generation.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

2021
Efficient Large-Scale Language Model Training on GPU Clusters.
CoRR, 2021

Efficient large-scale language model training on GPU clusters using megatron-LM.
Proceedings of the International Conference for High Performance Computing, 2021

End-to-End Training of Neural Retrievers for Open-Domain Question Answering.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

2020
Local Knowledge Powered Conversational Agents.
CoRR, 2020

MEGATRON-CNTRL: Controllable Story Generation with External Knowledge Using Large-Scale Language Models.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

BioMegatron: Larger Biomedical Domain Language Model.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Training Question Answering Models From Synthetic Data.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Large Scale Multi-Actor Generative Dialog Modeling.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

2019
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism.
CoRR, 2019

DisCo: Physics-Based Unsupervised Discovery of Coherent Structures in Spatiotemporal Systems.
Proceedings of the 2019 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments, 2019


  Loading...