We stand with Ukraine

We stand with Ukraine

Ruoyu Sun

Orcid: 0000-0003-2487-5322

Affiliations:

Chinese University of Hong Kong, Shenzhen, China
University of Illinois Urbana-Champaign, IL, USA (former)
University of Minnesota, Department of ECE, MN, USA (PhD)

According to our database¹, Ruoyu Sun authored at least 79 papers between 2012 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

Online presence:

on ruoyus.github.io
on orcid.org

On csauthors.net:

Bibliography

2024

MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning.

[BibT_eX]

[DOI]

,

,

,

,

,

,

CoRR, 2024

Adam-mini: Use Fewer Learning Rates To Gain More.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

CoRR, 2024

On the Convergence of Adam under Non-uniform Smoothness: Separability from SGDM and Beyond.

[BibT_eX]

[DOI]

,

,

,

,

,

CoRR, 2024

Why Transformers Need Adam: A Hessian Perspective.

[BibT_eX]

[DOI]

,

,

,

,

,

CoRR, 2024

AceGPT, Localizing Large Language Models in Arabic.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

Provable Adaptivity of Adam under Non-uniform Smoothness.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024

How Graph Neural Networks Learn: Lessons from Training Dynamics.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the Forty-first International Conference on Machine Learning, 2024

PDHG-Unrolled Learning-to-Optimize Method for Large-Scale Linear Programming.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the Forty-first International Conference on Machine Learning, 2024

ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the Forty-first International Conference on Machine Learning, 2024

LEMON: Lossless model expansion.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Bridging the Gap: Rademacher Complexity in Robust and Standard Generalization.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the Thirty Seventh Annual Conference on Learning Theory, June 30, 2024

2023

ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

CoRR, 2023

How Graph Neural Networks Learn: Lessons from Training Dynamics in Function Space.

[BibT_eX]

[DOI]

,

,

,

,

CoRR, 2023

AceGPT, Localizing Large Language Models in Arabic.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Abdulmohsen Alharthi

,

,

,

,

,

,

,

,

,

,

,

CoRR, 2023

Restricted Generative Projection for One-Class Classification and Anomaly Detection.

[BibT_eX]

[DOI]

,

,

CoRR, 2023

Double Dynamic Sparse Training for GANs.

[BibT_eX]

[DOI]

,

,

Naira Hovakimyan

,

CoRR, 2023

Invariant Layers for Graphs with Nodes of Different Types.

[BibT_eX]

[DOI]

,

,

CoRR, 2023

PAC-Bayesian Spectrally-Normalized Bounds for Adversarially Robust Generalization.

[BibT_eX]

[DOI]

,

,

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Balanced Training for Sparse GANs.

[BibT_eX]

[DOI]

,

,

Naira Hovakimyan

,

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

NTK-SAP: Improving neural network pruning by aligning training dynamics.

[BibT_eX]

[DOI]

,

,

Proceedings of the Eleventh International Conference on Learning Representations, 2023

A GNN-Guided Predict-and-Search Framework for Mixed-Integer Linear Programming.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Proceedings of the Eleventh International Conference on Learning Representations, 2023

2022

Revisiting Landscape Analysis in Deep Neural Networks: Eliminating Decreasing Paths to Infinity.

[BibT_eX]

[DOI]

,

,

SIAM J. Optim., December, 2022

Suboptimal Local Minima Exist for Wide Neural Networks with Smooth Activations.

[BibT_eX]

[DOI]

,

,

Math. Oper. Res., November, 2022

On the Benefit of Width for Neural Networks: Disappearance of Basins.

[BibT_eX]

[DOI]

,

,

SIAM J. Optim., September, 2022

Adversarial Rademacher Complexity of Deep Neural Networks.

[BibT_eX]

[DOI]

,

,

,

CoRR, 2022

On the landscape of one-hidden-layer sparse networks and beyond.

[BibT_eX]

[DOI]

,

,

Artif. Intell., 2022

Adam Can Converge Without Any Modification On Update Rules.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Stability Analysis and Generalization Bounds of Adversarial Training.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Does Momentum Change the Implicit Regularization on Separable Data?

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

DigGAN: Discriminator gradIent Gap Regularization for GAN Training with Limited Data.

[BibT_eX]

[DOI]

,

,

Alexander G. Schwing

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Global Convergence of MAML and Theory-Inspired Neural Architecture Search for Few-Shot Learning.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Separation of Metabolites and Macromolecules for Short-TE <sup>1</sup>H-MRSI Using Learned Component-Specific Representations.

[BibT_eX]

[DOI]

,

,

,

IEEE Trans. Medical Imaging, 2021

Two Symmetrized Coordinate Descent Methods Can Be O(n<sup>2)</sup> Times Slower Than the Randomized Version.

[BibT_eX]

[DOI]

,

,

SIAM J. Optim., 2021

Worst-case complexity of cyclic coordinate descent: O(n<sup>2)</sup> gap with randomized version.

[BibT_eX]

[DOI]

,

Math. Program., 2021

Towards Understanding the Impact of Model Size on Differential Private Classification.

[BibT_eX]

[DOI]

,

,

,

CoRR, 2021

Federated Semi-Supervised Learning with Class Distribution Mismatch.

[BibT_eX]

[DOI]

,

,

,

Tsung-Hui Chang

CoRR, 2021

Momentum Doesn't Change the Implicit Bias.

[BibT_eX]

[DOI]

,

,

,

,

,

CoRR, 2021

Achieving Small Test Error in Mildly Overparameterized Neural Networks.

[BibT_eX]

[DOI]

,

,

CoRR, 2021

On a Faster R-Linear Convergence Rate of the Barzilai-Borwein Method.

[BibT_eX]

[DOI]

,

CoRR, 2021

When Expressivity Meets Trainability: Fewer than $n$ Neurons Can Work.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Faster Directional Convergence of Linear Neural Networks under Spherically Symmetric Data.

[BibT_eX]

[DOI]

,

,

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

RMSprop converges with proper hyper-parameter.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the 9th International Conference on Learning Representations, 2021

PenDer: Incorporating Shape Constraints via Penalized Derivatives.

[BibT_eX]

[DOI]

,

,

,

,

Arinbjörn Kolbeinsson

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

The Global Landscape of Neural Networks: An Overview.

[BibT_eX]

[DOI]

,

,

,

,

Rayadurgam Srikant

IEEE Signal Process. Mag., 2020

On the Efficiency of Random Permutation for ADMM and Coordinate Descent.

[BibT_eX]

[DOI]

,

,

Math. Oper. Res., 2020

Landscape of Sparse Linear Network: A Brief Investigation.

[BibT_eX]

[DOI]

,

,

CoRR, 2020

Global Convergence and Induced Kernels of Gradient-Based Meta-Learning with Neural Nets.

[BibT_eX]

[DOI]

,

,

CoRR, 2020

DEED: A General Quantization Scheme for Communication Efficiency in Bits.

[BibT_eX]

[DOI]

,

,

CoRR, 2020

A Single-Loop Smoothed Gradient Descent-Ascent Algorithm for Nonconvex-Concave Min-Max Problems.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Towards a Better Global Loss Landscape of GANs.

[BibT_eX]

[DOI]

,

,

Alexander G. Schwing

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

2019

Globally Optimal Joint Uplink Base Station Association and Beamforming.

[BibT_eX]

[DOI]

,

,

IEEE Trans. Commun., 2019

Optimization for deep learning: theory and algorithms.

[BibT_eX]

[DOI]

CoRR, 2019

Sub-Optimal Local Minima Exist for Almost All Over-parameterized Neural Networks.

[BibT_eX]

[DOI]

,

,

CoRR, 2019

Understanding Limitation of Two Symmetrized Orders by Worst-case Complexity.

[BibT_eX]

[DOI]

,

,

CoRR, 2019

On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the 7th International Conference on Learning Representations, 2019

Max-Sliced Wasserstein Distance and Its Use for GANs.

[BibT_eX]

[DOI]

Ishan Deshpande

,

,

,

,

,

,

,

David A. Forsyth

,

Alexander G. Schwing

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018

Over-Parameterized Deep Neural Networks Have No Strict Local Minima For Any Continuous Activations.

[BibT_eX]

[DOI]

,

,

CoRR, 2018

Adding One Neuron Can Eliminate All Bad Local Minima.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Understanding the Loss Surface of Neural Networks for Binary Classification.

[BibT_eX]

[DOI]

,

,

,

Rayadurgam Srikant

Proceedings of the 35th International Conference on Machine Learning, 2018

Understanding the Loss Surface of Single-Layered Neural Networks for Binary Classification.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the 6th International Conference on Learning Representations, 2018

2016

Guaranteed Matrix Completion via Non-Convex Factorization.

[BibT_eX]

[DOI]

,

IEEE Trans. Inf. Theory, 2016

Worst-case Complexity of Cyclic Coordinate Descent: $O(n^2)$ Gap with Randomized Version.

[BibT_eX]

[DOI]

,

CoRR, 2016

Optimization algorithms for big data with application in wireless networks.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the Big Data over Networks, 2016

2015

Interference Alignment Using Finite and Dependent Channel Extensions: The Single Beam Case.

[BibT_eX]

[DOI]

,

IEEE Trans. Inf. Theory, 2015

Joint Downlink Base Station Association and Power Control for Max-Min Fairness: Computation and Complexity.

[BibT_eX]

[DOI]

,

,

IEEE J. Sel. Areas Commun., 2015

Globally Optimal Joint Uplink Base Station Association and Beamforming.

[BibT_eX]

[DOI]

,

,

,

CoRR, 2015

Interference alignment via Feasible Point Pursuit.

[BibT_eX]

[DOI]

,

,

Nicholas D. Sidiropoulos

,

Proceedings of the 16th IEEE International Workshop on Signal Processing Advances in Wireless Communications, 2015

Improved Iteration Complexity Bounds of Cyclic Block Coordinate Descent for Convex Problems.

[BibT_eX]

[DOI]

,

Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Guaranteed Matrix Completion via Nonconvex Factorization.

[BibT_eX]

[DOI]

,

Proceedings of the IEEE 56th Annual Symposium on Foundations of Computer Science, 2015

2014

Cross-Layer Provision of Future Cellular Networks: A WMMSE-based approach.

[BibT_eX]

[DOI]

,

,

,

,

Meisam Razaviyayn

,

,

IEEE Signal Process. Mag., 2014

Cross Layer Provision of Future Cellular Networks.

[BibT_eX]

[DOI]

,

,

,

,

Meisam Razaviyayn

,

,

CoRR, 2014

Globally optimal joint uplink base station association and power control for max-min fairness.

[BibT_eX]

[DOI]

,

Proceedings of the IEEE International Conference on Acoustics, 2014

2013

Joint Base Station Clustering and Beamformer Design for Partial Coordinated Transmission in Heterogeneous Networks.

[BibT_eX]

[DOI]

,

,

,

IEEE J. Sel. Areas Commun., 2013

Two Performance-limiting Factors for Interference Alignment: Channel Diversity Order and the Number of Data Streams Per User.

[BibT_eX]

[DOI]

,

CoRR, 2013

Long-term transmit point association for coordinated multipoint transmission by stochastic optimization.

[BibT_eX]

[DOI]

,

,

Proceedings of the 14th IEEE Workshop on Signal Processing Advances in Wireless Communications, 2013

2012

Robust SINR-Constrained MISO Downlink Beamforming: When is Semidefinite Programming Relaxation Tight?

[BibT_eX]

[DOI]

,

,

,

,

EURASIP J. Wirel. Commun. Netw., 2012

Joint Base Station Clustering and Beamformer Design for Partial Coordinated Transmission in Heterogenous Networks

[BibT_eX]

[DOI]

,

,

,

CoRR, 2012

Optimal joint base station assignment and power allocation in a cellular network.

[BibT_eX]

[DOI]

,

,

Proceedings of the 13th IEEE International Workshop on Signal Processing Advances in Wireless Communications, 2012

Joint transceiver design and base station clustering for heterogeneous networks.

[BibT_eX]

[DOI]

,

Meisam Razaviyayn

,

,

Proceedings of the Conference Record of the Forty Sixth Asilomar Conference on Signals, 2012

Loading...