Shansong Liu

Orcid: 0000-0001-6202-5615

According to our database¹, Shansong Liu authored at least 37 papers between 2017 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

Unison: Harmonizing Motion, Speech, and Sound for Human-Centric Audio-Video Generation.

[BibT_eX]

[DOI]

CoRR, May, 2026

High-Fidelity Generative Audio Compression at 0.275kbps.

[BibT_eX]

[DOI]

CoRR, February, 2026

MuMu-LLaMA: Multi-modal music understanding and generation via large language models.

[BibT_eX]

[DOI]

Expert Syst. Appl., 2026

2025

Rare Word Recognition and Translation Without Fine-Tuning via Task Vector in Speech Models.

[BibT_eX]

[DOI]

CoRR, December, 2025

YuE: Scaling Open Foundation Models for Long-Form Music Generation.

[BibT_eX]

[DOI]

CoRR, March, 2025

Editing Music with Melody and Text: Using ControlNet for Diffusion Transformer.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

2024

Unified Pretraining Target Based Video-Music Retrieval with Music Rhythm and Video Optical Flow Information.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Humtrans: A Novel Open-Source Dataset for Humming Melody Transcription and Beyond.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Music Understanding LLaMA: Advancing Text-to-Music Generation with Question Answering and Captioning.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

M<sup>2</sup>UGen: Multi-modal Music Understanding and Generation with the Power of Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

Prosody Modeling with 3D Visual Information for Expressive Video Dubbing.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

2022

Neural Architecture Search for LF-MMI Trained Time Delay Neural Networks.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2022

A Hierarchical Speaker Representation Framework for One-shot Singing Voice Conversion.

[BibT_eX]

[DOI]

Xu Li

Shansong Liu

Ying Shan

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Exploiting Cross Domain Acoustic-to-Articulatory Inverted Features for Disordered Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Audio-Visual Multi-Channel Integration and Recognition of Overlapped Speech.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2021

Recent Progress in the CUHK Dysarthric Speech Recognition System.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2021

Bayesian Learning of LF-MMI Trained Time Delay Neural Networks for Speech Recognition.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2021

Adversarial Data Augmentation for Disordered Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Spectro-Temporal Deep Features for Disordered Speech Assessment and Recognition.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Bayesian Parametric and Architectural Domain Adaptation of LF-MMI Trained TDNNs for Elderly and Dysarthric Speech Recognition.

[BibT_eX]

[DOI]

Jiajun Deng

Fabian Ritter Gutierrez

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Development of the Cuhk Elderly Speech Recognition System for Neurocognitive Disorder Detection Using the Dementiabank Corpus.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Bayesian Transformer Language Models for Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Neural Architecture Search for LF-MMI Trained Time Delay Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

Neural Architecture Search for Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2020

Exploiting Cross-Domain Visual Feature Generation for Disordered Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Investigation of Data Augmentation Techniques for Disordered Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Audio-Visual Recognition of Overlapped Speech for the LRS2 Dataset.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Exploiting Visual Features Using Bayesian Gated Neural Networks for Disordered Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

On the Use of Pitch Features for Disordered Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

LF-MMI Training of Bayesian and Gaussian Process Time Delay Neural Networks for Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

The CUHK Dysarthric Speech Recognition Systems for English and Cantonese.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Bayesian and Gaussian Process Neural Networks for Large Vocabulary Continuous Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

2018

Comprehensive simulation of metagenomic sequencing data with non-uniform sampling distribution.

[BibT_eX]

[DOI]

Quant. Biol., 2018

Development of the CUHK Dysarthric Speech Recognition System for the UA Speech Corpus.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Gaussian Process Neural Networks for Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Limited-Memory BFGS Optimization of Recurrent Neural Network Language Models for Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

Reading the Underlying Information From Massive Metagenomic Sequencing Data.

[BibT_eX]

[DOI]

Proc. IEEE, 2017

Shansong Liu

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...