Shiwei Liu

Orcid: 0009-0001-1255-4436

Affiliations:
  • University of Oxford, Mathematical Institute, UK
  • University of Texas at Austin, TX, USA
  • Eindhoven University of Technology, Eindhoven, The Netherlands (PhD)


According to our database1, Shiwei Liu authored at least 90 papers between 2019 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Data-Adaptive Weight-Ensembling for Multi-task Model Fusion.
Int. J. Comput. Vis., August, 2025

LOST: Low-rank and Sparse Pre-training for Large Language Models.
CoRR, August, 2025

GPAS: Accelerating Convergence of LLM Pretraining via Gradient-Preserving Activation Scaling.
CoRR, June, 2025

Double-Checker: Enhancing Reasoning of Slow-Thinking LLMs via Self-Critical Fine-Tuning.
CoRR, June, 2025

GPTailor: Large Language Model Pruning Through Layer Cutting and Stitching.
CoRR, June, 2025

AlphaDecay: Module-wise Weight Decay for Heavy-Tailed Balancing in LLMs.
CoRR, June, 2025

A Technical Study into 0.5B Reasoning Language Models.
CoRR, June, 2025

LIFT the Veil for the Truth: Principal Weights Emerge after Rank Reduction for Reasoning-Focused Supervised Fine-Tuning.
CoRR, June, 2025

Revisiting Flatness-Aware Optimization in Continual Learning With Orthogonal Gradient Projection.
IEEE Trans. Pattern Anal. Mach. Intell., May, 2025

Leave it to the Specialist: Repair Sparse LLMs with Sparse Fine-Tuning via Sparsity Evolution.
CoRR, May, 2025

NeuroTrails: Training with Dynamic Sparse Heads as the Key to Effective Ensembling.
CoRR, May, 2025

SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers.
CoRR, February, 2025

Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam.
CoRR, February, 2025

Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More.
CoRR, February, 2025

The Curse of Depth in Large Language Models.
CoRR, February, 2025

O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning.
CoRR, January, 2025

FS-GNN: Improving Fairness in Graph Neural Networks via Joint Sparsification.
Neurocomputing, 2025

Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Composable Interventions for Language Models.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Full-Rank No More: Low-Rank Weight Training for Modern Speech Recognition Models.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.
Proceedings of the Conference on Parsimony and Learning, 2025

Outlier-weighed Layerwise Sampling for LLM Fine-tuning.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
Condense, Don't Just Prune: Enhancing Efficiency and Performance in MoE Layer Pruning.
CoRR, 2024

(PASS) Visual Prompt Locates Good Structure Sparsity through a Recurrent HyperNetwork.
CoRR, 2024

From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients.
CoRR, 2024

Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion.
CoRR, 2024

OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning.
CoRR, 2024

E2ENet: Dynamic Sparse Feature Fusion for Accurate and Efficient 3D Medical Image Segmentation.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved Layer-wise Pruning of Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Q-Hitter: A Better Token Oracle for Efficient LLM Inference via Sparse-Quantized KV Cache.
Proceedings of the Seventh Annual Conference on Machine Learning and Systems, 2024

Dynamic Data Pruning for Automatic Speech Recognition.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

MSRS: Training Multimodal Speech Recognition Models from Scratch with Sparse Mask Optimization.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Sparse Cocktail: Every Sparse Pattern Every Sparse Ratio All At Once.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Advancing Dynamic Sparse Training by Exploring Optimization Opportunities.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Junk DNA Hypothesis: Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs "Difficult" Downstream Tasks in LLMs.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

CaM: Cache Merging for Memory-efficient LLMs Inference.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

AdaMerging: Adaptive Model Merging for Multi-Task Learning.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

NeurRev: Train Better Sparse Neural Network Practically via Neuron Revitalization.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMs.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

FFN-SkipLLM: A Hidden Gem for Autoregressive Decoding with Adaptive Feed Forward Skipping.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Is C4 Dataset Optimal for Pruning? An Investigation of Calibration Data for LLM Pruning.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

HRBP: Hardware-friendly Regrouping towards Block-based Pruning for Sparse CNN Training.
Proceedings of the Conference on Parsimony and Learning, 2024

Sparse Sounds: Exploring Low-Dimensionality in Music Generation Model.
Proceedings of the IEEE International Conference on Big Data, 2024

2023
Don't Be So Dense: Sparse-to-Sparse GAN Training Without Sacrificing Performance.
Int. J. Comput. Vis., October, 2023

Supervised Feature Selection with Neuron Evolution in Sparse Neural Networks.
Trans. Mach. Learn. Res., 2023

The Counterattack of CNNs in Self-Supervised Learning: Larger Kernel Size might be All You Need.
CoRR, 2023

Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity.
CoRR, 2023

Junk DNA Hypothesis: A Task-Centric Angle of LLM Pre-trained Weights through Sparsity.
CoRR, 2023

Ten Lessons We Have Learned in the New "Sparseland": A Short Handbook for Sparse Neural Network Researchers.
CoRR, 2023

REST: Enhancing Group Robustness in DNNs Through Reweighted Sparse Training.
Proceedings of the Machine Learning and Knowledge Discovery in Databases: Research Track, 2023

Enhancing Adversarial Training via Reweighting Optimization Trajectory.
Proceedings of the Machine Learning and Knowledge Discovery in Databases: Research Track, 2023

Towards Data-Agnostic Pruning At Initialization: What Makes a Good Sparse Mask?
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that Matter.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Don't just prune by magnitude! Your mask topology is a secret weapon.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Dynamic Sparsity Is Channel-Level Sparsity Learner.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Instant Soup: Cheap Pruning Ensembles in A Single Pass Can Draw Lottery Tickets from Large Models.
Proceedings of the International Conference on Machine Learning, 2023

Graph Ladling: Shockingly Simple Parallel GNN Training without Intermediate Communication.
Proceedings of the International Conference on Machine Learning, 2023

Are Large Kernels Better Teachers than Transformers for ConvNets?
Proceedings of the International Conference on Machine Learning, 2023

Sparsity May Cry: Let Us Fail (Current) Sparse Neural Networks Together!
Proceedings of the Eleventh International Conference on Learning Representations, 2023

More ConvNets in the 2020s: Scaling up Kernels Beyond 51x51 using Sparsity.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Revisiting Pruning at Initialization Through the Lens of Ramanujan Graph.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Data Augmented Flatness-aware Gradient Projection for Continual Learning.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Many-Task Federated Learning: A New Problem Setting and A Simple Baseline.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Lottery Pools: Winning More by Interpolating Tickets without Increasing Training or Inference Cost.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
A brain-inspired algorithm for training highly sparse neural networks.
Mach. Learn., 2022

More ConvNets in the 2020s: Scaling up Kernels Beyond 51x51 using Sparsity.
CoRR, 2022

Superposing Many Tickets into One: A Performance Booster for Sparse Neural Network Training.
CoRR, 2022

Achieving Personalized Federated Learning with Sparse Local Models.
CoRR, 2022

Dynamic Sparse Network for Time Series Classification: Learning What to "See".
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

You Can Have Better Graph Neural Networks by Not Training Weights at All: Finding Untrained GNNs Tickets.
Proceedings of the Learning on Graphs Conference, 2022

The Unreasonable Effectiveness of Random Pruning: Return of the Most Naive Baseline for Sparse Training.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Deep Ensembling with No Overhead for either Training or Testing: The All-Round Blessings of Dynamic Sparsity.
Proceedings of the Tenth International Conference on Learning Representations, 2022

2021
Efficient and effective training of sparse recurrent neural networks.
Neural Comput. Appl., 2021

Sparse evolutionary deep learning with over one million artificial neurons on commodity hardware.
Neural Comput. Appl., 2021

FreeTickets: Accurate, Robust and Efficient Deep Ensemble by Training with Dynamic Sparsity.
CoRR, 2021

Sparse Training via Boosting Pruning Plasticity with Neuroregeneration.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Do We Actually Need Dense Over-Parameterization? In-Time Over-Parameterization in Sparse Training.
Proceedings of the 38th International Conference on Machine Learning, 2021

Selfish Sparse RNN Training.
Proceedings of the 38th International Conference on Machine Learning, 2021

Hierarchical Semantic Segmentation using Psychometric Learning.
Proceedings of the Asian Conference on Machine Learning, 2021

2020
Topological Insights in Sparse Neural Networks.
CoRR, 2020

Topological Insights into Sparse Neural Networks.
Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2020

Learning Sparse Neural Networks for Better Generalization.
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

Network Performance Optimization with Real Time Traffic Prediction in Data Center Network.
Proceedings of the European Conference on Optical Communications, 2020

2019
On improving deep learning generalization with adaptive sparse connectivity.
CoRR, 2019

Intrinsically Sparse Long Short-Term Memory Networks.
CoRR, 2019


  Loading...