Nolan Dey

According to our database1, Nolan Dey authored at least 10 papers between 2020 and 2025.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Power Lines: Scaling Laws for Weight Decay and Batch Size in LLM Pre-training.
CoRR, May, 2025

Don't be lazy: CompleteP enables compute-efficient deep transformers.
CoRR, May, 2025

Neuron-based explanations of neural networks sacrifice completeness and interpretability.
Trans. Mach. Learn. Res., 2025

Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024
Sparse maximal update parameterization: A holistic approach to sparse training dynamics.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

2023
Position Interpolation Improves ALiBi Extrapolation.
CoRR, 2023

BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model.
CoRR, 2023

Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster.
CoRR, 2023

2020
37, 000 Human-Planned Robotic Grasps With Six Degrees of Freedom.
IEEE Robotics Autom. Lett., 2020

Identifying and interpreting tuning dimensions in deep networks.
CoRR, 2020


  Loading...