DurIAN-E 2: Duration Informed Attention Network with Adaptive Variational Autoencoder and Adversarial Learning for Expressive Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

Yu Gu

Qiushi Zhu

Guangzhi Lei

Chao Weng

Dan Su

Proceedings of the IEEE International Conference on Acoustics, 2024

Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

MM-LLMs: Recent Advances in MultiModal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023

A High Fidelity and Low Complexity Neural Audio Coding.

[BibT_eX]

[DOI]

CoRR, 2023

DurIAN-E: Duration Informed Attention Network For Expressive Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

CoRR, 2023

Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Compressed MoE ASR Model Based on Knowledge Distillation and Quantization.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Multi-mode Neural Speech Coding Based on Deep Generative Networks.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Trinet: Stabilizing Self-Supervised Learning From Complete or Slow Collapse.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

UniSyn: An End-to-End Unified Model for Text-to-Speech and Singing Voice Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

Cross-Speaker Emotion Transfer Through Information Perturbation in Emotional Speech Synthesis.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2022

The DKU-Tencent System for the VoxCeleb Speaker Recognition Challenge 2022.

[BibT_eX]

[DOI]

CoRR, 2022

DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs.

[BibT_eX]

[DOI]

Songxiang Liu

Dan Su

Dong Yu

CoRR, 2022

3M: Multi-loss, Multi-path and Multi-level Neural Networks for speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

End-to-End Voice Conversion with Information Perturbation.

[BibT_eX]

[DOI]

Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

AdaVITS: Tiny VITS for Low Computing Resource Speaker Adaptation.

[BibT_eX]

[DOI]

Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Enhancing Word-Level Semantic Representation via Dependency Structure for Expressive Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Learning Noise-independent Speech Representation for High-quality Voice Conversion for Noisy Target Speakers.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Cross-Age Speaker Verification: Learning Age-Invariant Speaker Embeddings.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

Multi-Channel Speaker Diarization Using Spatial Features for Meetings.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

The CUHK-Tencent Speaker Diarization System for the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Speechmoe2: Mixture-of-Experts Model with Improved Routing.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

VCVTS: Multi-Speaker Video-to-Speech Synthesis Via Cross-Modal Knowledge Transfer from Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Consistent Training and Decoding for End-to-End Speech Recognition Using Lattice-Free MMI.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Simple Attention Module Based Speaker Verification with Iterative Noisy Label Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

DP-DWA: Dual-Path Dynamic Weight Attention Network With Streaming Dfsmn-San For Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Referee: Towards Reference-Free Cross-Speaker Style Transfer with Low-Quality Data for Expressive Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Enhancing Speaking Styles in Conversational Text-to-Speech Synthesis with Graph-Based Multi-Modal Context Modeling.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Meta-Voice: Fast few-shot style transfer for expressive voice cloning using meta learning.

[BibT_eX]

[DOI]

Songxiang Liu

Dan Su

Dong Yu

CoRR, 2021

Bilateral Denoising Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2021

GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10, 000 Hours of Transcribed Audio.

[BibT_eX]

[DOI]

CoRR, 2021

Spoken Style Learning with Multi-modal Hierarchical Context Encoding for Conversational Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

CoRR, 2021

VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention.

[BibT_eX]

[DOI]

CoRR, 2021

Effective Low-Cost Time-Domain Audio Separation Using Globally Attentive Locally Recurrent Networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Exploring Cross-lingual Singing Voice Synthesis Using Speech Data.

[BibT_eX]

[DOI]

Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

SpeechMoE: Scaling to Large Acoustic Models with Dynamic Routing Mixture of Experts.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

TeCANet: Temporal-Contextual Attention Network for Environment-Aware Speech Dereverberation.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Raw Waveform Encoder with Multi-Scale Globally Attentive Locally Recurrent Networks for End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Glow-WaveGAN: Learning Speech Representations from GAN-Based Variational Auto-Encoder for High Fidelity Flow-Based Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Controllable Context-Aware Conversational Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

GigaSpeech: An Evolving, Multi-Domain ASR Corpus with 10, 000 Hours of Transcribed Audio.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

TeNC: Low Bit-Rate Speech Coding with VQ-VAE and GAN.

[BibT_eX]

[DOI]

Proceedings of the ICMI '21 Companion: Companion Publication of the 2021 International Conference on Multimodal Interaction, Montreal, QC, Canada, October 18, 2021

Fastsvc: Fast Cross-Domain Singing Voice Conversion With Feature-Wise Linear Modulation.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE International Conference on Multimedia and Expo, 2021

A Joint Training Framework of Multi-Look Separator and Speaker Embedding Extractor for Overlapped Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Contrastive Separative Coding for Self-Supervised Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Non-Autoregressive Transformer ASR with CTC-Enhanced Decoder Input.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Replay and Synthetic Speech Detection with Res2Net Architecture.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Sandglasset: A Light Multi-Granularity Self-Attentive Network for Time-Domain Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Learned Transferable Architectures Can Surpass Hand-Designed Architectures for Large Scale Speech Recognition.

[BibT_eX]

[DOI]

Liqiang He

Dan Su

Dong Yu

Proceedings of the IEEE International Conference on Acoustics, 2021

DiffSVC: A Diffusion Probabilistic Model for Singing Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

Latency-Controlled Neural Architecture Search for Streaming Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

Speaker Independent and Multilingual/Mixlingual Speech-Driven Talking Head Generation Using Phonetic Posteriorgrams.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

Tune-In: Training Under Negative Environments with Interference for Attention Networks Simulating Cocktail Party Effect.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

A Framework for Adapting DNN Speaker Embedding Across Languages.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2020

On the localness modeling for the self-attention based end-to-end speech synthesis.

[BibT_eX]

[DOI]

Neural Networks, 2020

Phonetic Posteriorgrams based Many-to-Many Singing Voice Conversion via Adversarial Training.

[BibT_eX]

[DOI]

CoRR, 2020

Audio-Visual Multi-Channel Recognition of Overlapped Speech.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

DurIAN: Duration Informed Attention Network for Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Speech-XLNet: Unsupervised Acoustic Model Pretraining for Self-Attention Networks.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

SpecSwap: A Simple Data Augmentation Method for End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Transferring Source Style in Non-Parallel Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Investigating Robustness of Adversarial Samples Detection for Automatic Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

End-to-End Multi-Look Keyword Spotting.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Dfsmn-San with Persistent Memory Model for Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

End-To-End Accent Conversion Without Using Native Utterances.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Multi-Level Deep Neural Network Adaptation for Speaker Verification Using MMD and Consistency Regularization.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Mixup-breakdown: A Consistency Training Method for Improving Generalization of Speech Separation Models.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Speaker-Aware Target Speaker Enhancement by Jointly Learning with Speaker Embedding Extraction.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Integration of Multi-Look Beamformers for Multi-Channel Keyword Spotting.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

A Random Gossip BMUF Process for Neural Language Modeling.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Enhancing End-to-End Multi-Channel Speech Separation Via Spatial Feature Learning.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Code-Switched Speech Synthesis Using Bilingual Phonetic Posteriorgram with Only Monolingual Corpora.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Speech-XLNet: Unsupervised Acoustic Model Pretraining For Self-Attention Networks.

[BibT_eX]

[DOI]

CoRR, 2019

DurIAN: Duration Informed Attention Network For Multimodal Synthesis.

[BibT_eX]

[DOI]

CoRR, 2019

Maximizing Mutual Information for Tacotron.

[BibT_eX]

[DOI]

CoRR, 2019

Phrase-Level Class based Language Model for Mandarin Smart Speaker Query Recognition.

[BibT_eX]

[DOI]

CoRR, 2019

End-to-End Multi-Channel Speech Separation.

[BibT_eX]

[DOI]

CoRR, 2019

Extract, Adapt and Recognize: An End-to-End Neural Network for Corrupted Monaural Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Neural Spatial Filter: Target Speaker Speech Separation Assisted with Directional Information.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-Trained BERT.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Teach an All-rounder with Experts in Different Domains.

[BibT_eX]

[DOI]

Zhao You

Dan Su

Dong Yu

Proceedings of the IEEE International Conference on Acoustics, 2019

Joint Training of Complex Ratio Mask Based Beamformer and Acoustic Model for Noise Robust Asr.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Quasi-fully Convolutional Neural Network with Variational Inference for Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Learning Discriminative Features in Sequence Training without Requiring Framewise Labelled Data.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Investigating End-to-end Speech Recognition for Mandarin-english Code-switching.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Component Fusion: Learning Replaceable Language Model Component for End-to-end Speech Recognition System.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Boundary Discriminative Large Margin Cosine Loss for Text-independent Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Multi-band PIT and Model Integration for Improved Multi-channel Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Improving Speech Enhancement with Phonetic Embedding Features.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Syllable-Dependent Discriminative Learning for Small Footprint Text-Dependent Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Automatic Prosodic Structure Labeling using DNN-BGRU-CRF Hybrid Neural Network.

[BibT_eX]

[DOI]

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Prosodic Structure Prediction using Deep Self-attention Neural Network.

[BibT_eX]

[DOI]

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

2018

Improving Attention-Based End-to-End ASR Systems with Sequence-Based Loss Functions.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Speech Super-Resolution Using Parallel WaveNet.

[BibT_eX]

[DOI]

Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Text-Dependent Speech Enhancement for Small-Footprint Robust Keyword Detection.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Rapid Style Adaptation Using Residual Error Embedding for Expressive Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Improving Attention Based Sequence-to-Sequence Models for End-to-End English Conversational Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Deep Extractor Network for Target Speaker Recovery from Single Channel Speech Mixtures.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Deep Discriminative Embeddings for Duration Robust Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Permutation Invariant Training of Generative Adversarial Network for Monaural Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

2017

Chiral Buckybowl Molecules.

[BibT_eX]

[DOI]

Symmetry, 2017

Dan Su

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...