Zihan Qiu

Orcid: 0009-0008-3991-0817

According to our database1, Zihan Qiu authored at least 30 papers between 2021 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
A survey on multilingual large language models: corpora, alignment, and bias.
Frontiers Comput. Sci., November, 2025

Blending Supervised and Reinforcement Fine-Tuning with Prefix Sampling.
CoRR, July, 2025

A Controllable Examination for Long-Context Language Models.
CoRR, June, 2025

Qwen3 Technical Report.
CoRR, May, 2025

Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free.
CoRR, May, 2025

Neo-TKGC: Enhancing Temporal Knowledge Graph Completion with Integrated Node Weights and Future Information.
Proceedings of the Eighteenth ACM International Conference on Web Search and Data Mining, 2025

A Closer Look into Mixture-of-Experts in Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, USA, April 29, 2025

Layerwise Recurrent Router for Mixture-of-Experts.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Post-hoc Reward Calibration: A Case Study on Length Bias.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Video Repurposing from User Generated Content: A Large-scale Dataset and Benchmark.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
Qwen2.5 Technical Report.
CoRR, 2024

Reconstructing Global Daily CO2 Emissions via Machine Learning.
CoRR, 2024

GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory.
CoRR, 2024

A Survey on Multilingual Large Language Models: Corpora, Alignment, and Bias.
CoRR, 2024

HyperMoE: Paying Attention to Unselected Experts in Mixture of Experts via Dynamic Transfer.
CoRR, 2024

DSIFNet: Implicit feature network for nasal cavity and vestibule segmentation from 3D head CT.
Comput. Medical Imaging Graph., 2024

ValueCSV: Evaluating Core Socialist Values Understanding in Large Language Models.
Proceedings of the Natural Language Processing and Chinese Computing, 2024

Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Unlocking Emergent Modularity in Large Language Models.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

Empirical Study on Updating Key-Value Memories in Transformer Feed-forward Layers.
Proceedings of the Second Tiny Papers Track at ICLR 2024, 2024

Unlocking Continual Learning Abilities in Language Models.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

HyperMoE: Towards Better Mixture of Experts via Transferring Among Experts.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
Heterogenous Memory Augmented Neural Networks.
CoRR, 2023

Emergent Mixture-of-Experts: Can Dense Pre-trained Transformers Benefit from Emergent Modular Structures?
CoRR, 2023

2022
Supported Policy Optimization for Offline Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

2021
Detection of Advertising Users Based on K-SMOTE and Ensemble Learning.
Proceedings of the Human Centered Computing - 7th International Conference, 2021

Academic Article Classification Algorithm Based on Pre-trained Model and Keyword Extraction.
Proceedings of the Computer Supported Cooperative Work and Social Computing, 2021

ResConvE: Deeper Convolution-Based Knowledge Graph Embeddings.
Proceedings of the Computer Supported Cooperative Work and Social Computing, 2021

A University Portrait System Incorporating Academic Social Network.
Proceedings of the Computer Supported Cooperative Work and Social Computing, 2021


  Loading...