Songxiang Liu

Orcid: 0000-0002-0943-2446

According to our database¹, Songxiang Liu authored at least 52 papers between 2018 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

Adaptive Coordinated Control of an Assistive Lower-Limb Exoskeleton for Hemiparetic Patients.

[BibT_eX]

[DOI]

IEEE Robotics Autom. Lett., May, 2026

UniAudio 2.0: A Unified Audio Language Model with Text-Aligned Factorized Audio Tokenization.

[BibT_eX]

[DOI]

CoRR, February, 2026

HeartMuLa: A Family of Open Sourced Music Foundation Models.

[BibT_eX]

[DOI]

CoRR, January, 2026

V-FAT: Benchmarking Visual Fidelity Against Text-bias.

[BibT_eX]

[DOI]

CoRR, January, 2026

2025

MMDuet2: Enhancing Proactive Interaction of Video MLLMs with Multi-Turn Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, December, 2025

Omni-AutoThink: Adaptive Multimodal Reasoning via Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, December, 2025

Kimi-Audio Technical Report.

[BibT_eX]

[DOI]

CoRR, April, 2025

Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens.

[BibT_eX]

[DOI]

CoRR, March, 2025

ALMTokenizer: A Low-bitrate and Semantic-rich Audio Codec Tokenizer for Audio Language Modeling.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

2024

Application of artificial intelligence technology in the field of orthopedics: a narrative review.

[BibT_eX]

[DOI]

Artif. Intell. Rev., January, 2024

InstructTTS: Modelling Expressive TTS in Discrete Latent Space With Natural Language Style Prompt.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

SimpleSpeech 2: Towards Simple and Efficient Text-to-Speech with Flow-based Scalar Latent Transformer Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2024

Codec-Superb @ SLT 2024: A Lightweight Benchmark For Neural Audio Codec Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

UniAudio: Towards Universal Audio Generation with Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Unifying One-Shot Voice Conversion and Cloning with Disentangled Speech Representations.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

UniAudio: An Audio Foundation Model Toward Universal Audio Generation.

[BibT_eX]

[DOI]

CoRR, 2023

The Singing Voice Conversion Challenge 2023.

[BibT_eX]

[DOI]

Wen-Chin Huang

Lester Phillip Violeta

CoRR, 2023

HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio Codec.

[BibT_eX]

[DOI]

CoRR, 2023

InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt.

[BibT_eX]

[DOI]

CoRR, 2023

NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Diverse and Expressive Speech Prosody Prediction with Denoising Diffusion Probabilistic Model.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

SnakeGAN: A Universal Vocoder Leveraging DDSP Prior Knowledge and Periodic Inductive Bias.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2023

The Singing Voice Conversion Challenge 2023.

[BibT_eX]

[DOI]

Wen-Chin Huang

Lester Phillip Violeta

Songxiang Liu

Jiatong Shi

Tomoki Toda

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022

DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs.

[BibT_eX]

[DOI]

Songxiang Liu

Dan Su

Dong Yu

CoRR, 2022

ASR-Robust Natural Language Understanding on ASR-GLUE dataset.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Speaker Identity Preservation in Dysarthric Speech Reconstruction by Adversarial Speaker Adaptation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Referee: Towards Reference-Free Cross-Speaker Style Transfer with Low-Quality Data for Expressive Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Speech Emotion Recognition Using Sequential Capsule Networks.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2021

Exemplar-Based Emotive Speech Synthesis.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2021

Any-to-Many Voice Conversion With Location-Relative Sequence-to-Sequence Modeling.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2021

Meta-Voice: Fast few-shot style transfer for expressive voice cloning using meta learning.

[BibT_eX]

[DOI]

Songxiang Liu

Dan Su

Dong Yu

CoRR, 2021

ASR-GLUE: A New Multi-task Benchmark for ASR-Robust Natural Language Understanding.

[BibT_eX]

[DOI]

CoRR, 2021

VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention.

[BibT_eX]

[DOI]

CoRR, 2021

Exploring Cross-lingual Singing Voice Synthesis Using Speech Data.

[BibT_eX]

[DOI]

Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Learning Explicit Prosody Models and Deep Speaker Embeddings for Atypical Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Fastsvc: Fast Cross-Domain Singing Voice Conversion With Feature-Wise Linear Modulation.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE International Conference on Multimedia and Expo, 2021

DiffSVC: A Diffusion Probabilistic Model for Singing Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020

Transferring Source Style in Non-Parallel Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Defense Against Adversarial Attacks on Spoofing Countermeasures of ASV.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

End-To-End Voice Conversion Via Cross-Modal Knowledge Distillation for Dysarthric Speech Reconstruction.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

End-To-End Accent Conversion Without Using Native Utterances.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Code-Switched Speech Synthesis Using Bilingual Phonetic Posteriorgram with Only Monolingual Corpora.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Multi-Layer Content Interaction Through Quaternion Product for Visual Question Answering.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Jointly Trained Conversion Model and WaveNet Vocoder for Non-Parallel Voice Conversion Using Mel-Spectrograms and Phonetic Posteriorgrams.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Recurrent Neural Network Language Model Training Using Natural Gradient.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Speech Emotion Recognition Using Capsule Networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

End-to-end Code-switched TTS with Mix of Monolingual Recordings.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Adversarial Attacks on Spoofing Countermeasures of Automatic Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018

The HCCL-CUHK System for the Voice Conversion Challenge 2018.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018

Rapid Style Adaptation Using Residual Error Embedding for Expressive Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Voice Conversion Across Arbitrary Speakers Based on a Single Target-Speaker Utterance.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Feature Based Adaptation for Speaking Style Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Songxiang Liu

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...