Zangwei Zheng

According to our database1, Zangwei Zheng authored at least 19 papers between 2021 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
DSP: Dynamic Sequence Parallelism for Multi-Dimensional Transformers.
CoRR, 2024

Helen: Optimizing CTR Prediction Models with Frequency-wise Hessian Eigenvalue Regularization.
CoRR, 2024

OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models.
CoRR, 2024

2023
Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline.
CoRR, 2023

Preventing Zero-Shot Transfer Degradation in Continual Learning of Vision-Language Models.
CoRR, 2023

InfoBatch: Lossless Training Speed Up by Unbiased Dynamic Data Pruning.
CoRR, 2023

Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

A Study on Transformer Configuration and Training Objective.
Proceedings of the International Conference on Machine Learning, 2023

Preventing Zero-Shot Transfer Degradation in Continual Learning of Vision-Language Models.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

CAME: Confidence-guided Adaptive Memory Efficient Optimization.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

CowClip: Reducing CTR Prediction Model Training Time from 12 Hours to 10 Minutes on 1 GPU.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Prompt Vision Transformer for Domain Generalization.
CoRR, 2022

Deeper vs Wider: A Revisit of Transformer Configuration.
CoRR, 2022

CowClip: Reducing CTR Prediction Model Training Time from 12 hours to 10 minutes on 1 GPU.
CoRR, 2022

2021
Multi-source Few-shot Domain Adaptation.
CoRR, 2021

Sparse-MLP: A Fully-MLP Architecture with Conditional Computation.
CoRR, 2021

Scene-aware Learning Network for Radar Object Detection.
Proceedings of the ICMR '21: International Conference on Multimedia Retrieval, 2021

Prototypical Cross-Domain Self-Supervised Learning for Few-Shot Unsupervised Domain Adaptation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021


  Loading...