Jared Casper

According to our database1, Jared Casper authored at least 22 papers between 2004 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Nemotron-4 15B Technical Report.
CoRR, 2024

2022
Reducing Activation Recomputation in Large Transformer Models.
CoRR, 2022

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model.
CoRR, 2022

2021
Efficient Large-Scale Language Model Training on GPU Clusters.
CoRR, 2021

Efficient large-scale language model training on GPU clusters using megatron-LM.
Proceedings of the International Conference for High Performance Computing, 2021

2019
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism.
CoRR, 2019

2016

2015
Domain specific hardware acceleration.
PhD thesis, 2015

Deep Speech 2: End-to-End Speech Recognition in English and Mandarin.
CoRR, 2015

2014
Deep Speech: Scaling up end-to-end speech recognition.
CoRR, 2014

Hardware acceleration of database operations.
Proceedings of the 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2014

2012
A case of system-level hardware/software co-design and co-verification of a commodity multi-processor system with custom hardware.
Proceedings of the 10th International Conference on Hardware/Software Codesign and System Synthesis, 2012

2011
Hardware acceleration of transactional memory on commodity systems.
Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems, 2011

2010
A practical concurrent binary search tree.
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010

Transactional predication: high-performance concurrent sets and maps for STM.
Proceedings of the 29th Annual ACM Symposium on Principles of Distributed Computing, 2010

Eigenbench: A simple exploration tool for orthogonal TM characteristics.
Proceedings of the 2010 IEEE International Symposium on Workload Characterization, 2010

FARM: A Prototyping Environment for Tightly-Coupled, Heterogeneous Architectures.
Proceedings of the 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2010

2007
An effective hybrid transactional memory system with strong isolation guarantees.
Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), 2007

A Scalable, Non-blocking Approach to Transactional Memory.
Proceedings of the 13st International Conference on High-Performance Computer Architecture (HPCA-13 2007), 2007

A practical FPGA-based framework for novel CMP research.
Proceedings of the ACM/SIGDA 15th International Symposium on Field Programmable Gate Arrays, 2007

ATLAS: a chip-multiprocessor with transactional memory support.
Proceedings of the 2007 Design, Automation and Test in Europe Conference and Exposition, 2007

2004
The Vector-Thread Architecture.
IEEE Micro, 2004


  Loading...