Shwai He

According to our database¹, Shwai He authored at least 32 papers between 2021 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

Dense Video Understanding with Gated Residual Tokenization.

[BibT_eX]

[DOI]

CoRR, September, 2025

DualSparse-MoE: Coordinating Tensor/Neuron-Level Sparsity with Expert Partition and Reconstruction.

[BibT_eX]

[DOI]

CoRR, August, 2025

CogniPair: From LLM Chatbots to Conscious AI Agents - GNWT-Based Multi-Agent Digital Twins for Social Pairing - Dating & Hiring Applications.

[BibT_eX]

[DOI]

CoRR, June, 2025

CoIn: Counting the Invisible Reasoning Tokens in Commercial Opaque LLM APIs.

[BibT_eX]

[DOI]

CoRR, May, 2025

SymRTLO: Enhancing RTL Code Optimization with LLMs and Neuron-Inspired Symbolic Reasoning.

[BibT_eX]

[DOI]

CoRR, April, 2025

Capacity-Aware Inference: Mitigating the Straggler Effect in Mixture of Experts.

[BibT_eX]

[DOI]

CoRR, March, 2025

Towards Efficient Mixture of Experts: A Holistic Study of Compression Techniques.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2025

Towards counterfactual fairness through auxiliary variables.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024

Towards counterfactual fairness thorough auxiliary variables.

[BibT_eX]

[DOI]

CoRR, 2024

Fair Diagnosis: Leveraging Causal Modeling to Mitigate Medical Bias.

[BibT_eX]

[DOI]

CoRR, 2024

Router-Tuning: A Simple and Effective Approach for Enabling Dynamic-Depth in Transformers.

[BibT_eX]

[DOI]

CoRR, 2024

What Matters in Transformers? Not All Attention is Needed.

[BibT_eX]

[DOI]

CoRR, 2024

Demystifying the Compression of Mixture-of-Experts Through a Unified Framework.

[BibT_eX]

[DOI]

CoRR, 2024

RESSA: Repair Sparse Vision-Language Models via Sparse Cross-Modality Adaptation.

[BibT_eX]

[DOI]

Shwai He

Tianlong Chen

CoRR, 2024

Accurate prediction of antibody function and structure using bio-inspired antibody language model.

[BibT_eX]

[DOI]

Briefings Bioinform., 2024

Loki: Low-rank Keys for Efficient Sparse Attention.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Reformatted Alignment.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023

Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning.

[BibT_eX]

[DOI]

CoRR, 2023

MerA: Merging Pretrained Adapters For Few-Shot Learning.

[BibT_eX]

[DOI]

CoRR, 2023

SD-Conv: Towards the Parameter-Efficiency of Dynamic Convolution.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023

NeuralSlice: Neural 3D Triangle Mesh Reconstruction via Slicing 4D Tetrahedral Meshes.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Merging Experts into One: Improving Computational Efficiency of Mixture of Experts.

[BibT_eX]

[DOI]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

PAD-Net: An Efficient Framework for Dynamic Networks.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022

Cherry Hypothesis: Identifying the Cherry on the Cake for Dynamic Networks.

[BibT_eX]

[DOI]

CoRR, 2022

SparseAdapter: An Easy Approach for Improving the Parameter-Efficiency of Adapters.

[BibT_eX]

[DOI]

CoRR, 2022

Vega-MT: The JD Explore Academy Translation System for WMT22.

[BibT_eX]

[DOI]

CoRR, 2022

When Sparsity Meets Dynamic Convolution.

[BibT_eX]

[DOI]

CoRR, 2022

Vega-MT: The JD Explore Academy Machine Translation System for WMT22.

[BibT_eX]

[DOI]

Proceedings of the Seventh Conference on Machine Translation, 2022

SparseAdapter: An Easy Approach for Improving the Parameter-Efficiency of Adapters.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

2021

Multi-modal Attention Network for Stock Movements Prediction.

[BibT_eX]

[DOI]

Shwai He

Shi Gu

CoRR, 2021

Shwai He

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...