Tan M. Nguyen

CoRR, October, 2025

Modeling Expert Interactions in Sparse Mixture of Experts via Graph Structures.

[BibT_eX]

[DOI]

CoRR, October, 2025

Expert Merging in Sparse Mixture of Experts with Nash Bargaining.

[BibT_eX]

[DOI]

CoRR, October, 2025

Activation Steering with a Feedback Controller.

[BibT_eX]

[DOI]

CoRR, October, 2025

On Linear Mode Connectivity of Mixture-of-Experts Architectures.

[BibT_eX]

[DOI]

CoRR, September, 2025

The Blessing and Curse of Dimensionality in Safety Alignment.

[BibT_eX]

[DOI]

Laziz U. Abdullaev

CoRR, July, 2025

Revisiting Transformers with Insights from Image Filtering.

[BibT_eX]

[DOI]

Laziz U. Abdullaev

Maksim Tkachenko

CoRR, June, 2025

Resolving Memorization in Empirical Diffusion Model for Manifold Data in High-Dimensional Spaces.

[BibT_eX]

[DOI]

CoRR, May, 2025

Tree-Sliced Wasserstein Distance with Nonlinear Projection.

[BibT_eX]

[DOI]

CoRR, May, 2025

Spherical Tree-Sliced Wasserstein Distance.

[BibT_eX]

[DOI]

CoRR, March, 2025

MoLEx: Mixture of Layer Experts for Finetuning with Sparse Upcycling.

[BibT_eX]

[DOI]

CoRR, March, 2025

Distance-Based Tree-Sliced Wasserstein Distance.

[BibT_eX]

[DOI]

CoRR, March, 2025

CAMEx: Curvature-aware Merging of Experts.

[BibT_eX]

[DOI]

CoRR, February, 2025

Learning and predicting dynamics of compositional multiphase mixtures using Graph Neural Networks.

[BibT_eX]

[DOI]

Duc Thach Son Vu

Weiqing Ren

J. Comput. Phys., 2025

Equivariant Polynomial Functional Networks.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Tree-Sliced Wasserstein Distance with Nonlinear Projection.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Tree-Sliced Wasserstein Distance: A Geometric Perspective.

[BibT_eX]

[DOI]

Hoang V. Tran

Huyen Trang Pham

Tho Tran Huu

Thanh T. Chu

Tam Le

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Promoting Ensemble Diversity with Interactive Bayesian Distributional Robustness for Fine-tuning Foundation Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Demystifying the Token Dynamics of Deep Selective State Space Models.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Equivariant Neural Functional Networks for Transformers.

[BibT_eX]

[DOI]

Thanh Tran

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Distance-Based Tree-Sliced Wasserstein Distance.

[BibT_eX]

[DOI]

Hoang V. Tran

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Spherical Tree-Sliced Wasserstein Distance.

[BibT_eX]

[DOI]

Hoang V. Tran

Thanh T. Chu

Huyen Trang Pham

Tam Le

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

MoLEx: Mixture of Layer Experts for Fine-tuning with Sparse Upcycling.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Tight Clusters Make Specialized Experts.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

CAMEx: Curvature-aware Merging of Experts.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Transformer Meets Twicing: Harnessing Unattended Residual Information.

[BibT_eX]

[DOI]

Laziz U. Abdullaev

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024

An Attention-based Framework for Fair Contrastive Learning.

[BibT_eX]

[DOI]

Stefan K. Nielsen

CoRR, 2024

MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts.

[BibT_eX]

[DOI]

CoRR, 2024

A Clifford Algebraic Approach to E(n)-Equivariant High-order Graph Neural Networks.

[BibT_eX]

[DOI]

CoRR, 2024

Equivariant Polynomial Functional Networks.

[BibT_eX]

[DOI]

CoRR, 2024

Equivariant Neural Functional Networks for Transformers.

[BibT_eX]

[DOI]

Thanh Tran

CoRR, 2024

Demystifying the Token Dynamics of Deep Selective State Space Models.

[BibT_eX]

[DOI]

CoRR, 2024

Monomial Matrix Group Equivariant Neural Functional Networks.

[BibT_eX]

[DOI]

CoRR, 2024

A Primal-Dual Framework for Transformers and Neural Networks.

[BibT_eX]

[DOI]

CoRR, 2024

Elliptical Attention.

[BibT_eX]

[DOI]

CoRR, 2024

Unveiling the Hidden Structure of Self-Attention via Kernel Principal Component Analysis.

[BibT_eX]

[DOI]

CoRR, 2024

Tree-Sliced Wasserstein Distance on a System of Lines.

[BibT_eX]

[DOI]

CoRR, 2024

PIDformer: Transformer Meets Control Theory.

[BibT_eX]

[DOI]

CoRR, 2024

Revisiting Kernel Attention with Correlated Gaussian Process Representation.

[BibT_eX]

[DOI]

Proceedings of the Uncertainty in Artificial Intelligence, 2024

Monomial Matrix Group Equivariant Neural Functional Networks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

PIDformer: Transformer Meets Control Theory.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Neural Collapse for Cross-entropy Class-Imbalanced Learning with Unconstrained ReLU Features Model.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Beyond Vanilla Variational Autoencoders: Detecting Posterior Collapse in Conditional and Hierarchical Variational Autoencoders.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

From Coupled Oscillators to Graph Neural Networks: Reducing Over-smoothing via a Kuramoto Model-based Approach.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2024

2023

ARist: An effective API argument recommendation approach.

[BibT_eX]

[DOI]

J. Syst. Softw., October, 2023

Mitigating Over-smoothing in Transformers via Regularized Nonlocal Functionals.

[BibT_eX]

[DOI]

Tam Nguyen

Richard G. Baraniuk

CoRR, 2023

p-Laplacian Transformer.

[BibT_eX]

[DOI]

CoRR, 2023

Revisiting Over-smoothing and Over-squashing Using Ollivier-Ricci Curvature.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Neural Collapse in Deep Linear Networks: From Balanced to Imbalanced Data.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

A Primal-Dual Framework for Transformers and Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

A Probabilistic Framework for Pruning Transformers Via a Finite Admixture of Keys.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

DeepGRAND: Deep Graph Neural Diffusion.

[BibT_eX]

[DOI]

Proceedings of the 57th Asilomar Conference on Signals, Systems, and Computers, ACSSC 2023, Pacific Grove, CA, USA, October 29, 2023

2022

Scheduled Restart Momentum for Accelerated Stochastic Gradient Descent.

[BibT_eX]

[DOI]

SIAM J. Imaging Sci., 2022

Robustify Transformers with Robust Kernel Density Estimation.

[BibT_eX]

[DOI]

CoRR, 2022

Improving Generative Flow Networks with Path Regularization.

[BibT_eX]

[DOI]

CoRR, 2022

Transformer with Fourier Integral Attentions.

[BibT_eX]

[DOI]

CoRR, 2022

Momentum Transformer: Closing the Performance Gap Between Self-attention and Its Linearization.

[BibT_eX]

[DOI]

Proceedings of the Mathematical and Scientific Machine Learning, 2022

Improving Transformers with Probabilistic Attention Keys.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

GRAND++: Graph Neural Diffusion with A Source Term.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

2021

Transformer with a Mixture of Gaussian Keys.

[BibT_eX]

[DOI]

CoRR, 2021

How Does Momentum Benefit Deep Neural Networks Architecture Design? A Few Case Studies.

[BibT_eX]

[DOI]

CoRR, 2021

Heavy Ball Neural Ordinary Differential Equations.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

FMMformer: Efficient and Flexible Transformer via Decomposed Near-field and Far-field Attention.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

API parameter recommendation based on language model and program analysis.

[BibT_eX]

[DOI]

Proceedings of the 28th Asia-Pacific Software Engineering Conference, 2021

2020

Dual Dynamic Inference: Enabling More Efficient, Adaptive, and Controllable Deep Inference.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Signal Process., 2020

Scheduled Restart Momentum for Accelerated Stochastic Gradient Descent.

[BibT_eX]

[DOI]

CoRR, 2020

MomentumRNN: Integrating Momentum into Recurrent Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Neural Networks with Recurrent Generative Feedback.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

2019

InfoCNF: An Efficient Conditional Continuous Normalizing Flow with Adaptive Solvers.

[BibT_eX]

[DOI]

CoRR, 2019

Learning Near-optimal Convex Combinations of Basis Models with Generalization Guarantees.

[BibT_eX]

[DOI]