Gavia Gray

According to our database1, Gavia Gray authored at least 3 papers between 2024 and 2025.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of five.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2025
Power Lines: Scaling laws for weight decay and batch size in LLM pre-training.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024
Normalization Layer Per-Example Gradients are Sufficient to Predict Gradient Noise Scale in Transformers.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024


  Loading...