Daria Soboleva

Orcid: 0009-0003-2654-3767

According to our database1, Daria Soboleva authored at least 10 papers between 2021 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
GQA-μP: The maximal parameterization update for grouped query attention.
CoRR, May, 2026

2025
Batch Tiling on Attention: Efficient Mixture of Experts Training on Wafer-Scale Processors.
Proceedings of the SC '25 Workshops of the International Conference for High Performance Computing, 2025

MoE-Inference-Bench: Performance Evaluation of Mixture of Expert Large Language and Vision Models.
Proceedings of the SC '25 Workshops of the International Conference for High Performance Computing, 2025

Power Lines: Scaling laws for weight decay and batch size in LLM pre-training.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024
Agent-Based Insight into Eco-Choices: Simulating the Fast Fashion Shift.
CoRR, 2024

2023
Position Interpolation Improves ALiBi Extrapolation.
CoRR, 2023

BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model.
CoRR, 2023

SlimPajama-DC: Understanding Data Combinations for LLM Training.
CoRR, 2023

2021
Replacing Human Audio with Synthetic Audio for on-Device Unspoken Punctuation Prediction.
Proceedings of the IEEE International Conference on Acoustics, 2021


  Loading...