Gavia Gray

According to our database1, Gavia Gray authored at least 3 papers between 2024 and 2025.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of five.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Power Lines: Scaling Laws for Weight Decay and Batch Size in LLM Pre-training.
CoRR, May, 2025

Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024
Normalization Layer Per-Example Gradients are Sufficient to Predict Gradient Noise Scale in Transformers.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024


  Loading...