Shiwei Liu

CoRR, May, 2025

NeuroTrails: Training with Dynamic Sparse Heads as the Key to Effective Ensembling.

[BibT_eX]

[DOI]

CoRR, May, 2025

SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers.

[BibT_eX]

[DOI]

CoRR, February, 2025

Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam.

[BibT_eX]

[DOI]

CoRR, February, 2025

Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More.

[BibT_eX]

[DOI]

CoRR, February, 2025

The Curse of Depth in Large Language Models.

[BibT_eX]

[DOI]

CoRR, February, 2025

O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning.

[BibT_eX]

[DOI]

CoRR, January, 2025

FS-GNN: Improving Fairness in Graph Neural Networks via Joint Sparsification.

[BibT_eX]

[DOI]

Neurocomputing, 2025

Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN.

[BibT_eX]

[DOI]

Pengxiang Li

Lu Yin

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Composable Interventions for Language Models.

[BibT_eX]

[DOI]

Arinbjörn Kolbeinsson

Jonathan Richard Schwarz

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Full-Rank No More: Low-Rank Weight Training for Modern Speech Recognition Models.

[BibT_eX]

[DOI]

Adriana Fernandez-Lopez

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.

[BibT_eX]

[DOI]

Proceedings of the Conference on Parsimony and Learning, 2025

Outlier-weighed Layerwise Sampling for LLM Fine-tuning.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024

Condense, Don't Just Prune: Enhancing Efficiency and Performance in MoE Layer Pruning.

[BibT_eX]

[DOI]

CoRR, 2024

(PASS) Visual Prompt Locates Good Structure Sparsity through a Recurrent HyperNetwork.

[BibT_eX]

[DOI]

CoRR, 2024

From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients.

[BibT_eX]

[DOI]

CoRR, 2024

Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion.

[BibT_eX]

[DOI]

CoRR, 2024

OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning.

[BibT_eX]

[DOI]

CoRR, 2024

E2ENet: Dynamic Sparse Feature Fusion for Accurate and Efficient 3D Medical Image Segmentation.

[BibT_eX]

[DOI]

Maurice van Keulen

Elena Mocanu

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved Layer-wise Pruning of Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Q-Hitter: A Better Token Oracle for Efficient LLM Inference via Sparse-Quantized KV Cache.

[BibT_eX]

[DOI]

Proceedings of the Seventh Annual Conference on Machine Learning and Systems, 2024

Dynamic Data Pruning for Automatic Speech Recognition.

[BibT_eX]

[DOI]

Qiao Xiao

Pingchuan Ma

Adriana Fernandez-Lopez

Raghuraman Krishnamoorthi

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

MSRS: Training Multimodal Speech Recognition Models from Scratch with Sparse Mask Optimization.

[BibT_eX]

[DOI]

Adriana Fernandez-Lopez

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Sparse Cocktail: Every Sparse Pattern Every Sparse Ratio All At Once.

[BibT_eX]

[DOI]

Shiyu Chang

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Advancing Dynamic Sparse Training by Exploring Optimization Opportunities.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Junk DNA Hypothesis: Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs "Difficult" Downstream Tasks in LLMs.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

CaM: Cache Merging for Memory-efficient LLMs Inference.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

AdaMerging: Adaptive Model Merging for Multi-Task Learning.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

NeurRev: Train Better Sparse Neural Network Practically via Neuron Revitalization.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMs.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

FFN-SkipLLM: A Hidden Gem for Autoregressive Decoding with Adaptive Feed Forward Skipping.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Is C4 Dataset Optimal for Pruning? An Investigation of Calibration Data for LLM Pruning.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

HRBP: Hardware-friendly Regrouping towards Block-based Pruning for Sparse CNN Training.

[BibT_eX]

[DOI]

Proceedings of the Conference on Parsimony and Learning, 2024

Sparse Sounds: Exploring Low-Dimensionality in Music Generation Model.

[BibT_eX]

[DOI]

Shu Wang

Proceedings of the IEEE International Conference on Big Data, 2024

2023

Don't Be So Dense: Sparse-to-Sparse GAN Training Without Sacrificing Performance.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., October, 2023

Supervised Feature Selection with Neuron Evolution in Sparse Neural Networks.

[BibT_eX]

[DOI]

Raymond N. J. Veldhuis

Trans. Mach. Learn. Res., 2023

The Counterattack of CNNs in Self-Supervised Learning: Larger Kernel Size might be All You Need.

[BibT_eX]

[DOI]

CoRR, 2023

Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity.

[BibT_eX]

[DOI]

CoRR, 2023

Junk DNA Hypothesis: A Task-Centric Angle of LLM Pre-trained Weights through Sparsity.

[BibT_eX]

[DOI]

CoRR, 2023

Ten Lessons We Have Learned in the New "Sparseland": A Short Handbook for Sparse Neural Network Researchers.

[BibT_eX]

[DOI]

CoRR, 2023

REST: Enhancing Group Robustness in DNNs Through Reweighted Sparse Training.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning and Knowledge Discovery in Databases: Research Track, 2023

Enhancing Adversarial Training via Reweighting Optimization Trajectory.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning and Knowledge Discovery in Databases: Research Track, 2023

Towards Data-Agnostic Pruning At Initialization: What Makes a Good Sparse Mask?

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that Matter.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Don't just prune by magnitude! Your mask topology is a secret weapon.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Dynamic Sparsity Is Channel-Level Sparsity Learner.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Instant Soup: Cheap Pruning Ensembles in A Single Pass Can Draw Lottery Tickets from Large Models.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Graph Ladling: Shockingly Simple Parallel GNN Training without Intermediate Communication.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Are Large Kernels Better Teachers than Transformers for ConvNets?

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Sparsity May Cry: Let Us Fail (Current) Sparse Neural Networks Together!

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

More ConvNets in the 2020s: Scaling up Kernels Beyond 51x51 using Sparsity.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Revisiting Pruning at Initialization Through the Lens of Ramanujan Graph.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Data Augmented Flatness-aware Gradient Projection for Continual Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Many-Task Federated Learning: A New Problem Setting and A Simple Baseline.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Lottery Pools: Winning More by Interpolating Tickets without Increasing Training or Inference Cost.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

A brain-inspired algorithm for training highly sparse neural networks.

[BibT_eX]

[DOI]

Zahra Atashgahi

Joost Pieterse

Raymond N. J. Veldhuis

Mach. Learn., 2022

More ConvNets in the 2020s: Scaling up Kernels Beyond 51x51 using Sparsity.

[BibT_eX]

[DOI]

CoRR, 2022

Superposing Many Tickets into One: A Performance Booster for Sparse Neural Network Training.

[BibT_eX]

[DOI]

CoRR, 2022

Achieving Personalized Federated Learning with Sparse Local Models.

[BibT_eX]

[DOI]

CoRR, 2022

Dynamic Sparse Network for Time Series Classification: Learning What to "See".

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

You Can Have Better Graph Neural Networks by Not Training Weights at All: Finding Untrained GNNs Tickets.

[BibT_eX]

[DOI]

Proceedings of the Learning on Graphs Conference, 2022

The Unreasonable Effectiveness of Random Pruning: Return of the Most Naive Baseline for Sparse Training.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

Deep Ensembling with No Overhead for either Training or Testing: The All-Round Blessings of Dynamic Sparsity.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

2021

Efficient and effective training of sparse recurrent neural networks.

[BibT_eX]

[DOI]

Iftitahu Ni'mah

Vlado Menkovski

Neural Comput. Appl., 2021

Sparse evolutionary deep learning with over one million artificial neurons on commodity hardware.

[BibT_eX]

[DOI]

Amarsagar Reddy Ramapuram Matavalam

Yulong Pei

Neural Comput. Appl., 2021

FreeTickets: Accurate, Robust and Efficient Deep Ensemble by Training with Dynamic Sparsity.

[BibT_eX]

[DOI]

CoRR, 2021

Sparse Training via Boosting Pruning Plasticity with Neuroregeneration.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Do We Actually Need Dense Over-Parameterization? In-Time Over-Parameterization in Sparse Training.

[BibT_eX]

[DOI]

Lu Yin

Proceedings of the 38th International Conference on Machine Learning, 2021

Selfish Sparse RNN Training.

[BibT_eX]

[DOI]

Yulong Pei

Proceedings of the 38th International Conference on Machine Learning, 2021

Hierarchical Semantic Segmentation using Psychometric Learning.

[BibT_eX]

[DOI]

Proceedings of the Asian Conference on Machine Learning, 2021

2020

Topological Insights in Sparse Neural Networks.

[BibT_eX]

[DOI]

CoRR, 2020

Topological Insights into Sparse Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2020

Learning Sparse Neural Networks for Better Generalization.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

Network Performance Optimization with Real Time Traffic Prediction in Data Center Network.

[BibT_eX]

[DOI]

Fulong Yan

Nicola Calabretta

Proceedings of the European Conference on Optical Communications, 2020

2019

On improving deep learning generalization with adaptive sparse connectivity.

[BibT_eX]

[DOI]

CoRR, 2019

Intrinsically Sparse Long Short-Term Memory Networks.

[BibT_eX]

[DOI]