Shuming Ma

According to our database1, Shuming Ma authored at least 101 papers between 2010 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits.
CoRR, 2024

2023
GTrans: Grouping and Fusing Transformer Layers for Neural Machine Translation.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

When an Image is Worth 1, 024 x 1, 024 Words: A Case Study in Computational Pathology.
CoRR, 2023

Auto-ICL: In-Context Learning without Human Supervision.
CoRR, 2023

BitNet: Scaling 1-bit Transformers for Large Language Models.
CoRR, 2023

Kosmos-2.5: A Multimodal Literate Model.
CoRR, 2023

Retentive Network: A Successor to Transformer for Large Language Models.
CoRR, 2023

LongNet: Scaling Transformers to 1, 000, 000, 000 Tokens.
CoRR, 2023

Kosmos-2: Grounding Multimodal Large Language Models to the World.
CoRR, 2023

Discourse Centric Evaluation of Machine Translation with a Densely Annotated Parallel Corpus.
CoRR, 2023

On the Pareto Front of Multilingual Neural Machine Translation.
CoRR, 2023

Language Is Not All You Need: Aligning Perception with Language Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

On the Pareto Front of Multilingual Neural Machine Translation.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Magneto: A Foundation Transformer.
Proceedings of the International Conference on Machine Learning, 2023

Are More Layers Beneficial to Graph Transformers?
Proceedings of the Eleventh International Conference on Learning Representations, 2023

TRIP: Accelerating Document-level Multilingual Pre-training via Triangular Document-level Pre-training on Parallel Data Triplets.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

HanoiT: Enhancing Context-aware Translation via Selective Context.
Proceedings of the Database Systems for Advanced Applications, 2023

GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

A Length-Extrapolatable Transformer.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Discourse-Centric Evaluation of Document-level Machine Translation with a New Densely Annotated Parallel Corpus of Novels.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

On the Off-Target Problem of Zero-Shot Multilingual Neural Machine Translation.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022
TRIP: Triangular Document-level Pre-training for Multilingual Language Models.
CoRR, 2022

TorchScale: Transformers at Scale.
CoRR, 2022

A Bilingual Parallel Corpus with Discourse Annotations.
CoRR, 2022

Foundation Transformers.
CoRR, 2022

Towards Multilingual Transitivity and Bidirectional Multilingual Agreement for Multilingual Document-level Machine Translation.
CoRR, 2022

GTrans: Grouping and Fusing Transformer Layers for Neural Machine Translation.
CoRR, 2022

HLT-MT: High-resource Language-specific Training for Multilingual Neural Machine Translation.
CoRR, 2022

Language Models are General-Purpose Interfaces.
CoRR, 2022

On the Representation Collapse of Sparse Mixture of Experts.
CoRR, 2022

DeepNet: Scaling Transformers to 1, 000 Layers.
CoRR, 2022

Phrase-level Adversarial Example Generation for Neural Machine Translation.
CoRR, 2022

SMDT: Selective Memory-Augmented Neural Document Translation.
CoRR, 2022

On the Representation Collapse of Sparse Mixture of Experts.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

BlonDe: An Automatic Evaluation Metric for Document-level Machine Translation.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

UM4: Unified Multilingual Multiple Teacher-Student Model for Zero-Resource Neural Machine Translation.
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

High-resource Language-specific Training for Multilingual Neural Machine Translation.
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

A Unified Strategy for Multilingual Grammatical Error Correction with Pre-trained Cross-Lingual Language Model.
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

CROP: Zero-shot Cross-lingual Named Entity Recognition with Multilingual Labeled Sequence Translation.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

Zero-shot Cross-lingual Transfer of Prompt-based Tuning with a Unified Multilingual Prompt.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

PAEG: Phrase-level Adversarial Example Generation for Neural Machine Translation.
Proceedings of the 29th International Conference on Computational Linguistics, 2022

StableMoE: Stable Routing Strategy for Mixture of Experts.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

XLM-E: Cross-lingual Language Model Pre-training via ELECTRA.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

Towards Making the Most of Cross-Lingual Transfer for Zero-Shot Neural Machine Translation.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021
Towards Making the Most of Multilingual Pretraining for Zero-Shot Neural Machine Translation.
CoRR, 2021

XLM-E: Cross-lingual Language Model Pre-training via ELECTRA.
CoRR, 2021

DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders.
CoRR, 2021

mT6: Multilingual Pretrained Text-to-Text Transformer with Translation Pairs.
CoRR, 2021

BlonD: An Automatic Evaluation Metric for Document-level MachineTranslation.
CoRR, 2021

Multilingual Machine Translation Systems from Microsoft for WMT21 Shared Task.
Proceedings of the Sixth Conference on Machine Translation, 2021

Learning to Select Relevant Knowledge for Neural Machine Translation.
Proceedings of the Natural Language Processing and Chinese Computing, 2021

Smart-Start Decoding for Neural Machine Translation.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Improving Multilingual Neural Machine Translation with Auxiliary Source Languages.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, 2021

mT6: Multilingual Pretrained Text-to-Text Transformer with Translation Pairs.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Zero-Shot Cross-Lingual Transfer of Neural Machine Translation with Multilingual Pretrained Encoders.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Multilingual Agreement for Multilingual Neural Machine Translation.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

How Does Distilled Data Complexity Impact the Quality and Confidence of Non-Autoregressive Machine Translation?
Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021

2020
Training Simplification and Model Simplification for Deep Learning : A Minimal Effort Back Propagation Method.
IEEE Trans. Knowl. Data Eng., 2020

XLM-T: Scaling up Multilingual Machine Translation with Pretrained Cross-lingual Transformer Encoders.
CoRR, 2020

Multimodal Matching Transformer for Live Commenting.
Proceedings of the ECAI 2020 - 24th European Conference on Artificial Intelligence, 29 August-8 September 2020, Santiago de Compostela, Spain, August 29 - September 8, 2020, 2020

Improving Neural Machine Translation with Soft Template Prediction.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

A Simple and Effective Unified Encoder for Document-Level Machine Translation.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Alternating Language Modeling for Cross-Lingual Pre-Training.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Towards easier and faster sequence labeling for natural language processing: A search-based probabilistic online learning framework (SAPO).
Inf. Sci., 2019

Group, Extract and Aggregate: Summarizing a Large Amount of Finance News for Forex Movement Prediction.
CoRR, 2019

Recursive Graphical Neural Networks for Text Classification.
CoRR, 2019

Predicting Popular News Comments Based on Multi-Target Text Matching Model.
Proceedings of the Natural Language Processing and Chinese Computing, 2019

A Deep Reinforced Sequence-to-Set Model for Multi-Label Classification.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

Key Fact as Pivot: A Two-Stage Model for Low Resource Table-to-Text Generation.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

LiveBot: Generating Live Video Comments Based on Visual and Textual Contexts.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

Hierarchical Encoder with Auxiliary Supervision for Neural Table-to-Text Generation: Learning Better Representation for Tables.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
Unsupervised Machine Commenting with Neural Variational Topic Model.
CoRR, 2018

A Deep Reinforced Sequence-to-Set Model for Multi-Label Text Classification.
CoRR, 2018

Identifying High-Quality Chinese News Comments Based on Multi-Target Text Matching Model.
CoRR, 2018

Decoding-History-Based Adaptive Control of Attention for Neural Machine Translation.
CoRR, 2018

Accelerating Graph-Based Dependency Parsing with Lock-Free Parallel Perceptron.
Proceedings of the Natural Language Processing and Chinese Computing, 2018

Query and Output: Generating Words by Querying Distributed Word Representations for Paraphrase Generation.
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018

A Hierarchical End-to-End Model for Jointly Improving Text Summarization and Sentiment Classification.
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

Phrase-level Self-Attention Networks for Universal Sentence Encoding.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

Semantic-Unit-Based Dilated Convolution for Multi-Label Text Classification.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

Does Higher Order LSTM Have Better Accuracy for Segmenting and Labeling Sequence Data?
Proceedings of the 27th International Conference on Computational Linguistics, 2018

SGM: Sequence Generation Model for Multi-label Classification.
Proceedings of the 27th International Conference on Computational Linguistics, 2018

A Neural Question Answering Model Based on Semi-Structured Tables.
Proceedings of the 27th International Conference on Computational Linguistics, 2018

Deconvolution-Based Global Decoding for Neural Machine Translation.
Proceedings of the 27th International Conference on Computational Linguistics, 2018

Automatic Academic Paper Rating Based on Modularized Hierarchical Convolutional Neural Network.
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018

Bag-of-Words as Target for Neural Machine Translation.
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018

Autoencoder as Assistant Supervisor: Improving Text Representation for Chinese Social Media Text Summarization.
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018

Global Encoding for Abstractive Summarization.
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018

2017
Complex Structure Leads to Overfitting: A Structure Regularization Decoding Method for Natural Language Processing.
CoRR, 2017

Training Simplification and Model Simplification for Deep Learning: A Minimal Effort Back Propagation Method.
CoRR, 2017

Stochastic Strictly Contractive Peaceman-Rachford Splitting Method.
CoRR, 2017

Label Embedding Network: Learning Label Representation for Soft Training of Deep Networks.
CoRR, 2017

A Semantic Relevance Based Neural Network for Text Summarization and Text Simplification.
CoRR, 2017

Lock-Free Parallel Perceptron for Graph-based Dependency Parsing.
CoRR, 2017

A Generic Online Parallel Learning Framework for Large Margin Models.
CoRR, 2017

Transfer Deep Learning for Low-Resource Chinese Word Segmentation with a Novel Neural Network.
Proceedings of the Natural Language Processing and Chinese Computing, 2017

meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting.
Proceedings of the 34th International Conference on Machine Learning, 2017

Improving Semantic Relevance for Sequence-to-Sequence Learning of Chinese Social Media Text Summarization.
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017

2016
A New Recurrent Neural CRF for Learning Non-linear Edge Features.
CoRR, 2016

2010
The Application of Leader-Member Exchange Theory on Improving Team Performance.
Proceedings of the International Conference on E-Business and E-Government, 2010


  Loading...