Ziwei Ji

This page is a disambiguation page, it actually contains mutiple papers from persons of the same or a similar name.

Known people with the same name:

Bibliography

2025
Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation.
CoRR, July, 2025

Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024
Efficient Document Ranking with Learnable Late Interactions.
CoRR, 2024

Think before you speak: Training Language Models With Pause Tokens.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023
Depth Dependence of μP Learning Rates in ReLU MLPs.
CoRR, 2023

2022
Convex Analysis at Infinity: An Introduction to Astral Space.
CoRR, 2022

Reproducibility in Optimization: Theoretical Framework and Limits.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Agnostic Learnability of Halfspaces via Logistic Loss.
Proceedings of the International Conference on Machine Learning, 2022

Actor-critic is implicitly biased towards high entropy optimal policies.
Proceedings of the Tenth International Conference on Learning Representations, 2022

2021
Early-stopped neural networks are consistent.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Fast margin maximization via dual acceleration.
Proceedings of the 38th International Conference on Machine Learning, 2021

Generalization bounds via distillation.
Proceedings of the 9th International Conference on Learning Representations, 2021

Characterizing the implicit bias via a primal-dual analysis.
Proceedings of the Algorithmic Learning Theory, 2021

2020
Directional convergence and alignment in deep learning.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Neural tangent kernels, transportation mappings, and universal approximation.
Proceedings of the 8th International Conference on Learning Representations, 2020

Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow ReLU networks.
Proceedings of the 8th International Conference on Learning Representations, 2020

Gradient descent follows the regularization path for general losses.
Proceedings of the Conference on Learning Theory, 2020

2019
A refined primal-dual analysis of the implicit bias.
CoRR, 2019

Gradient descent aligns the layers of deep linear networks.
Proceedings of the 7th International Conference on Learning Representations, 2019

The implicit bias of gradient descent on nonseparable data.
Proceedings of the Conference on Learning Theory, 2019

2018
Risk and parameter convergence of logistic regression.
CoRR, 2018

Social Welfare and Profit Maximization from Revealed Preferences.
Proceedings of the Web and Internet Economics - 14th International Conference, 2018

2017
Wikidata Vandalism Detection - The Loganberry Vandalism Detector at WSDM Cup 2017.
CoRR, 2017

2016
The Beachcombers' Problem: Walking and Searching from an Inner Point of a Line.
Proceedings of the Language and Automata Theory and Applications, 2016


  Loading...