Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms.

[BibT_eX]

[DOI]

Siyu Yuan

Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

CLaMP 2: Multimodal Music Information Retrieval Across 101 Languages Using Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, USA, April 29, 2025

GETMusic: Generating Music Tracks with a Unified Representation and Diffusion Framework.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, 2025

ALMTokenizer: A Low-bitrate and Semantic-rich Audio Codec Tokenizer for Audio Language Modeling.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

MuPT: A Generative Symbolic Music Pretrained Transformer.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

ZSVC: Zero-shot Style Voice Conversion with Disentangled Latent Diffusion Models and Adversarial Training.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

Memories are One-to-Many Mapping Alleviators in Talking Face Generation.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., December, 2024

NaturalSpeech: End-to-End Text-to-Speech Synthesis With Human-Level Quality.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., June, 2024

Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey.

[BibT_eX]

[DOI]

CoRR, 2024

BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec.

[BibT_eX]

[DOI]

CoRR, 2024

Foundation Models for Music: A Survey.

[BibT_eX]

[DOI]

CoRR, 2024

Make Your Actor Talk: Generalizable and High-Fidelity Lip Sync with Motion and Appearance Disentanglement.

[BibT_eX]

[DOI]

CoRR, 2024

VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers.

[BibT_eX]

[DOI]

CoRR, 2024

VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling.

[BibT_eX]

[DOI]

CoRR, 2024

MuPT: A Generative Symbolic Music Pretrained Transformer.

[BibT_eX]

[DOI]

CoRR, 2024

RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

CoRR, 2024

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2024

Beyond Language Models: Byte Models are Digital World Simulators.

[BibT_eX]

[DOI]

CoRR, 2024

Codec-Superb @ SLT 2024: A Lightweight Benchmark For Neural Audio Codec Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

UniAudio 1.5: Large Language Model-Driven Audio Codec is A Few-Shot Audio Task Learner.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Predictor-Corrector Enhanced Transformers with Exponential Moving Average Coefficient Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

TaskBench: Benchmarking Large Language Models for Task Automation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Empowering Diffusion Models on the Embedding Space for Text Generation.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

UniStyle: Unified Style Modeling for Speaking Style Captioning and Stylistic Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

FlashSpeech: Efficient Zero-Shot Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Contrastive Context-Speech Pretraining for Expressive Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

TiVA: Time-Aligned Video-to-Audio Generation.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

COMOSVC: Consistency Model-Based Singing Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

Sentence-Level or Token-Level? A Comprehensive Study on Knowledge Distillation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

Re-creation of Creations: A New Paradigm for Lyric-to-Melody Generation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

FastSAG: Towards Fast Non-Autoregressive Singing Accompaniment Generation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

UniAudio: Towards Universal Audio Generation with Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

PromptTTS 2: Describing and Generating Voices with Text Prompt.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

GAIA: Zero-shot Talking Avatar Generation.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

PyramidCodec: Hierarchical Codec for Long-form Music Generation in Audio Domain.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Mitigating Reversal Curse in Large Language Models via Semantic-aware Permutation Training.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

Regeneration Learning: A Learning Paradigm for Data Generation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

StableFace: Analyzing and Improving Motion Stability for Talking Face Generation.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Signal Process., November, 2023

Neural Text-to-Speech Synthesis

[BibT_eX]

[DOI]

Xu Tan

Artificial Intelligence: Foundations, Theory, and Algorithms, Springer, ISBN: 978-981-99-0826-4, 2023

Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

CoRR, 2023

UniAudio: An Audio Foundation Model Toward Universal Audio Generation.

[BibT_eX]

[DOI]

CoRR, 2023

MelodyGLM: Multi-task Pre-training for Symbolic Melody Generation.

[BibT_eX]

[DOI]

CoRR, 2023

PromptTTS 2: Describing and Generating Voices with Text Prompt.

[BibT_eX]

[DOI]

CoRR, 2023

EmoGen: Eliminating Subjective Bias in Emotional Music Generation.

[BibT_eX]

[DOI]

CoRR, 2023

Extract and Attend: Improving Entity Translation in Neural Machine Translation.

[BibT_eX]

[DOI]

CoRR, 2023

MuseCoco: Generating Symbolic Music from Text.

[BibT_eX]

[DOI]

CoRR, 2023

Deliberate then Generate: Enhanced Prompting Framework for Text Generation.

[BibT_eX]

[DOI]

CoRR, 2023

GETMusic: Generating Any Music Tracks with a Unified Representation and Diffusion Framework.

[BibT_eX]

[DOI]

CoRR, 2023

ResiDual: Transformer with Dual Residual Connections.

[BibT_eX]

[DOI]

CoRR, 2023

NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers.

[BibT_eX]

[DOI]

CoRR, 2023

HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace.

[BibT_eX]

[DOI]

CoRR, 2023

FoundationTTS: Text-to-Speech for ASR Customization with Generative Language Model.

[BibT_eX]

[DOI]

CoRR, 2023

A Study on ReLU and Softmax in Transformer.

[BibT_eX]

[DOI]

CoRR, 2023

ERA-Solver: Error-Robust Adams Solver for Fast Sampling of Diffusion Probabilistic Models.

[BibT_eX]

[DOI]

CoRR, 2023

N-Gram Nearest Neighbor Machine Translation.

[BibT_eX]

[DOI]

CoRR, 2023

WuYun: Exploring hierarchical skeleton-guided melody generation using knowledge-enhanced deep learning.

[BibT_eX]

[DOI]

CoRR, 2023

AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Context-Aware Talking-Head Video Editing.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Revisiting Learning Paradigms for Multimedia Data Generation.

[BibT_eX]

[DOI]

Xu Tan

Proceedings of the 31st ACM International Conference on Multimedia, 2023

DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

CLaMP: Contrastive Language-Music Pre-Training for Cross-Modal Symbolic Music Information Retrieval.

[BibT_eX]

[DOI]

Proceedings of the 24th International Society for Music Information Retrieval Conference, 2023

ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

NAS-FM: Neural Architecture Search for Tunable and Interpretable Sound Synthesis Based on Frequency Modulation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

HiFace: High-Fidelity 3D Face Reconstruction by Learning Static and Dynamic Details.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Prompttts: Controllable Text-To-Speech With Text Descriptions.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Improving Few-Shot Learning for Talking Face System with TTS Data Augmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Towards Understanding Omission in Dialogue Summarization.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Extract and Attend: Improving Entity Translation in Neural Machine Translation.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

TranSFormer: Slow-Fast Transformer for Machine Translation.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

DiffusionNER: Boundary Diffusion for Named Entity Recognition.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

VideoDubber: Machine Translation with Speech-Aware Length Control for Video Dubbing.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

SoftCorrect: Error Correction with Soft Detection for Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech.

[BibT_eX]

[DOI]

CoRR, 2022

Difformer: Empowering Diffusion Model on Embedding Space for Text Generation.

[BibT_eX]

[DOI]

CoRR, 2022

Museformer: Transformer with Fine- and Coarse-Grained Attention for Music Generation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Transcormer: Transformer for Sentence Scoring with Sliding Language Modeling.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

A Study of Syntactic Multi-Modality in Non-Autoregressive Machine Translation.

[BibT_eX]

[DOI]

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

ReLyMe: Improving Lyric-to-Melody Generation by Incorporating Lyric-Melody Relationships.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

MeloForm: Generating Melody with Musical Form based on Expert Systems and Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 23rd International Society for Music Information Retrieval Conference, 2022

PDAugment: Data Augmentation by Pitch and Duration Adjustments for Automatic Lyrics Transcription.

[BibT_eX]

[DOI]

Proceedings of the 23rd International Society for Music Information Retrieval Conference, 2022

Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

DelightfulTTS 2: End-to-End Speech Synthesis with Adversarial Vector-Quantized Auto-Encoders.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Analyzing and Mitigating Interference in Neural Architecture Search.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

PriorGrad: Improving Conditional Denoising Diffusion Models with Data-Dependent Adaptive Prior.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

A Study on the Efficacy of Model Pre-Training In Developing Neural Text-to-Speech System.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Transformer-S2A: Robust and Efficient Speech-to-Animation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Infergrad: Improving Diffusion Models for Vocoder by Considering Inference in Training.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Mask the Correct Tokens: An Embarrassingly Simple Approach for Error Correction.

[BibT_eX]

[DOI]

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

TeleMelody: Lyric-to-Melody Generation with a Template-Based Two-Stage Method.

[BibT_eX]

[DOI]

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Non-Autoregressive Sequence Generation.

[BibT_eX]

[DOI]

Jiatao Gu

Xu Tan

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 2022

ProphetChat: Enhancing Dialogue Generation with Simulation of Future Conversation.

[BibT_eX]

[DOI]

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

Revisiting Over-Smoothness in Text to Speech.

[BibT_eX]

[DOI]

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

Adaptive Logit Adjustment Loss for Long-Tailed Visual Recognition.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021

FastCorrect 2: Fast Error Correction on Multiple Candidates for Automatic Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2021

AdaSpeech 3: Adaptive Text to Speech for Spontaneous Style.

[BibT_eX]

[DOI]

CoRR, 2021

A Survey on Neural Speech Synthesis.

[BibT_eX]

[DOI]

CoRR, 2021

PriorGrad: Improving Conditional Denoising Diffusion Models with Data-Driven Adaptive Prior.

[BibT_eX]

[DOI]

CoRR, 2021

Improving Long-Tailed Classification from Instance Level.

[BibT_eX]

[DOI]

CoRR, 2021

FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Speech-T: Transducer for Text to Speech and Beyond.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

A Tutorial on AI Music Composition.

[BibT_eX]

[DOI]

Xu Tan

Xiaobing Li

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

NAS-BERT: Task-Agnostic and Adaptive-Size BERT Compression with Neural Architecture Search.

[BibT_eX]

[DOI]

Proceedings of the KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2021

Adaptive Text to Speech for Spontaneous Style.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Cross-Domain Speech Recognition with Unsupervised Character-Level Distribution Matching.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

A Survey on Low-Resource Neural Machine Translation.

[BibT_eX]

[DOI]

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021

AdaSpeech: Adaptive Text to Speech for Custom Voice.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

Denoispeech: Denoising Text to Speech with Frame-Level Noise Modeling.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Adaspeech 2: Adaptive Text to Speech with Untranscribed Data.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

MixSpeech: Data Augmentation for Low-Resource Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Lightspeech: Lightweight and Fast Text to Speech with Neural Architecture Search.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

MBNET: MOS Prediction for Synthesized Speech with Mean-Bias Network.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

FastCorrect 2: Fast Error Correction on Multiple Candidates for Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, 2021

DelightfulTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2021.

[BibT_eX]

[DOI]

Proceedings of the Blizzard Challenge 2021, virtual, October 23, 2021, 2021

MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021

DeepRapper: Neural Rap Generation with Rhyme and Rhythm Modeling.

[BibT_eX]

[DOI]

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

UWSpeech: Speech to Speech Translation for Unwritten Languages.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

SongMASS: Automatic Song Writing with Pre-training and Alignment Constraint.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis.

[BibT_eX]

[DOI]

CoRR, 2020

Neural Architecture Search with GBDT.

[BibT_eX]

[DOI]

CoRR, 2020

LightPAFF: A Two-Stage Distillation Framework for Pre-training and Fine-tuning.

[BibT_eX]

[DOI]

CoRR, 2020

VESR-Net: The Winning Solution to Youku Video Enhancement and Super-Resolution Challenge.

[BibT_eX]

[DOI]

CoRR, 2020

MPNet: Masked and Permuted Pre-training for Language Understanding.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Semi-Supervised Neural Architecture Search.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

PopMAG: Pop Music Accompaniment Generation.

[BibT_eX]

[DOI]

Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

DualLip: A System for Joint Lip Reading and Generation.

[BibT_eX]

[DOI]

Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition.

[BibT_eX]

[DOI]

Proceedings of the KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2020

DeepSinger: Singing Voice Synthesis with Data Mined From the Web.

[BibT_eX]

[DOI]

Proceedings of the KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2020

XiaoiceSing: A High-Quality and Integrated Singing Voice Synthesis System.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

MultiSpeech: Multi-Speaker Text to Speech with Transformer.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Neural Machine Translation with Error Correction.

[BibT_eX]

[DOI]

Kaitao Song

Xu Tan

Jianfeng Lu

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

Task-Level Curriculum Learning for Non-Autoregressive Neural Machine Translation.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

Espnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

IAD: A Benchmark Dataset and a New Method for Illegal Advertising Classification.

[BibT_eX]

[DOI]

Proceedings of the ECAI 2020 - 24th European Conference on Artificial Intelligence, 29 August-8 September 2020, Santiago de Compostela, Spain, August 29 - September 8, 2020, 2020

A Study of Non-autoregressive Model for Sequence Generation.

[BibT_eX]

[DOI]

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

SimulSpeech: End-to-End Simultaneous Speech to Text Translation.

[BibT_eX]

[DOI]

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Fine-Tuning by Curriculum Learning for Non-Autoregressive Neural Machine Translation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019

Beyond Error Propagation: Language Branching Also Affects the Accuracy of Sequence Generation.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2019

A Study of Multilingual Neural Machine Translation.

[BibT_eX]

[DOI]

CoRR, 2019

Microsoft Research Asia's Systems for WMT19.

[BibT_eX]

[DOI]

CoRR, 2019

Efficient Bidirectional Neural Machine Translation.

[BibT_eX]

[DOI]

CoRR, 2019

Hard but Robust, Easy but Sensitive: How Encoder and Decoder Perform in Neural Machine Translation.

[BibT_eX]

[DOI]

Tianyu He

Xu Tan

Tao Qin

CoRR, 2019

Language Graph Distillation for Low-Resource Machine Translation.

[BibT_eX]

[DOI]

CoRR, 2019

Microsoft Research Asia's Systems for WMT19.

[BibT_eX]

[DOI]

Proceedings of the Fourth Conference on Machine Translation, 2019

FastSpeech: Fast, Robust and Controllable Text to Speech.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Token-Level Ensemble Distillation for Grapheme-to-Phoneme Conversion.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Deliberation Learning for Image-to-Image Translation.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

MASS: Masked Sequence to Sequence Pre-training for Language Generation.

[BibT_eX]

[DOI]

Proceedings of the 36th International Conference on Machine Learning, 2019

Almost Unsupervised Text to Speech and Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 36th International Conference on Machine Learning, 2019

Multilingual Neural Machine Translation with Knowledge Distillation.

[BibT_eX]

[DOI]

Proceedings of the 7th International Conference on Learning Representations, 2019

Representation Degeneration Problem in Training Natural Language Generation Models.

[BibT_eX]

[DOI]

Proceedings of the 7th International Conference on Learning Representations, 2019

Multilingual Neural Machine Translation with Language Clustering.

[BibT_eX]

[DOI]

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

Collaborative learning between cloud and end devices: an empirical study on location prediction.

[BibT_eX]

[DOI]

Proceedings of the 4th ACM/IEEE Symposium on Edge Computing, 2019

Knowledge Distillation from Bert in Pre-Training and Fine-Tuning for Polyphone Disambiguation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Unsupervised Pivot Translation for Distant Languages.

[BibT_eX]

[DOI]

Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

Tied Transformers: Neural Machine Translation with Shared Encoder and Decoder.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

Non-Autoregressive Neural Machine Translation with Enhanced Decoder Input.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

Sentence-Wise Smooth Regularization for Sequence to Sequence Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018

Achieving Human Parity on Automatic Chinese to English News Translation.

[BibT_eX]

[DOI]

Marcin Junczys-Dowmunt

CoRR, 2018

Layer-Wise Coordination between Encoder and Decoder for Neural Machine Translation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

FRAGE: Frequency-Agnostic Word Representation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Dense Information Flow for Neural Machine Translation.

[BibT_eX]

[DOI]

Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018

Progressive Blockwise Knowledge Distillation for Neural Network Acceleration.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

Model-Level Dual Learning.

[BibT_eX]

[DOI]

Proceedings of the 35th International Conference on Machine Learning, 2018

Beyond Error Propagation in Neural Machine Translation: Characteristics of Language Also Matter.

[BibT_eX]

[DOI]

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

Double Path Networks for Sequence to Sequence Learning.

[BibT_eX]

[DOI]

Proceedings of the 27th International Conference on Computational Linguistics, 2018

2015

Structured Visual Feature Learning for Classification via Supervised Probabilistic Tensor Factorization.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2015

2014

Characterizing and Modeling Package Dynamics in Express Shipping Service Network.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE International Congress on Big Data, Anchorage, AK, USA, June 27, 2014

2013

Supervised Nonnegative Tensor Factorization with Maximum-Margin Constraint.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence, 2013

2012

Logistic Tensor Regression for Classification.

[BibT_eX]

[DOI]

Proceedings of the Intelligent Science and Intelligent Data Engineering, 2012

Nonnegative Matrix Factorization for Multimodality Data from Multi-source Domain.

[BibT_eX]

[DOI]

Proceedings of the Eighth International Conference on Intelligent Information Hiding and Multimedia Signal Processing, 2012

Xu Tan

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...