We stand with Ukraine

We stand with Ukraine

Simiao Zuo

Orcid: 0009-0002-8014-3150

According to our database¹, Simiao Zuo authored at least 30 papers between 2018 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

m3BERT: A Modern, Multi-lingual, Matryoshka Bidirectional Encoder.

[DOI]

,

,

,

,

,

,

CoRR, May, 2026

2025

Towards Consistent Natural-Language Explanations via Explanation-Consistency Finetuning.

[DOI]

,

,

,

,

,

,

Proceedings of the 31st International Conference on Computational Linguistics, 2025

2024

Task Oriented In-Domain Data Augmentation.

[DOI]

,

,

,

,

,

,

,

CoRR, 2024

Evoke: Evoking Critical Thinking Abilities in LLMs via Reviewer-Author Prompt Editing.

[DOI]

,

,

,

,

,

,

,

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Task Oriented In-Domain Data Augmentation.

[DOI]

,

,

,

,

,

,

,

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

2023

On Training, Inference, and Sample Efficiencies of Language Models.

[DOI]

PhD thesis, 2023

Robust Multi-Agent Reinforcement Learning via Adversarial Regularization: Theoretical Foundation and Stable Algorithms.

[DOI]

Alexander Bukharin

,

,

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Less is More: Task-aware Layer-wise Distillation for Language Model Compression.

[DOI]

,

,

,

,

,

Proceedings of the International Conference on Machine Learning, 2023

SMURF-THP: Score Matching-based UnceRtainty quantiFication for Transformer Hawkes Process.

[DOI]

,

,

,

,

,

,

Proceedings of the International Conference on Machine Learning, 2023

Machine Learning Force Fields with Data Cost Aware Training.

[DOI]

Alexander Bukharin

,

,

,

,

,

,

Proceedings of the International Conference on Machine Learning, 2023

DeepTagger: Knowledge Enhanced Named Entity Recognition for Web-Based Ads Queries.

[DOI]

,

,

,

,

,

Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, 2023

Context-Aware Query Rewriting for Improving Users' Search Experience on E-commerce Websites.

[DOI]

,

,

,

,

,

,

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track), 2023

2022

Efficient Long Sequence Modeling via State Space Augmented Transformer.

[DOI]

,

,

,

,

,

,

CoRR, 2022

DiP-GNN: Discriminative Pre-Training of Graph Neural Networks.

[DOI]

,

,

,

,

,

CoRR, 2022

Differentially Private Estimation of Hawkes Process.

[DOI]

,

,

,

CoRR, 2022

MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation.

[DOI]

,

,

,

,

,

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Self-Training with Differentiable Teacher.

[DOI]

,

,

,

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2022, 2022

Adversarially Regularized Policy Learning Guided by Trajectory Optimization.

[DOI]

,

,

,

Proceedings of the Learning for Dynamics and Control Conference, 2022

PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance.

[DOI]

,

,

,

Alexander Bukharin

,

,

,

Proceedings of the International Conference on Machine Learning, 2022

Taming Sparsely Activated Transformer with Stochastic Experts.

[DOI]

,

,

,

,

,

,

,

Proceedings of the Tenth International Conference on Learning Representations, 2022

No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models.

[DOI]

,

,

,

,

,

,

,

Proceedings of the Tenth International Conference on Learning Representations, 2022

2021

Adversarial Training as Stackelberg Game: An Unrolled Optimization Approach.

[DOI]

,

,

,

,

,

,

,

CoRR, 2021

Fine-Tuning Pre-trained Language Model with Weak Supervision: A Contrastive-Regularized Self-Training Approach.

[DOI]

,

,

,

,

,

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

A Hypergradient Approach to Robust Regression without Correspondence.

[DOI]

,

,

,

,

,

,

Proceedings of the 9th International Conference on Learning Representations, 2021

Adversarial Regularization as Stackelberg Game: An Unrolled Optimization Approach.

[DOI]

,

,

,

,

,

,

,

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

ARCH: Efficient Adversarial Regularized Training with Caching.

[DOI]

,

,

,

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, 2021

Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization.

[DOI]

,

,

,

,

,

,

,

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

2020

Transformer Hawkes Process.

[DOI]

,

,

,

,

Proceedings of the 37th International Conference on Machine Learning, 2020

2019

Tensor maps for synchronizing heterogeneous shape collections.

[DOI]

,

,

,

,

Chandrajit Bajaj

ACM Trans. Graph., 2019

2018

Image Score: How to Select Useful Samples.

[DOI]

,

CoRR, 2018

Loading...