Shiyin Kang

Orcid: 0000-0001-8304-5260

According to our database¹, Shiyin Kang authored at least 68 papers between 2009 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

How Should LLMs Listen While Speaking? A Study of User-Stream Routing in Full-Duplex Spoken Dialogue.

[BibT_eX]

[DOI]

CoRR, May, 2026

Towards Streaming Target Speaker Extraction via Chunk-wise Interleaved Splicing of Autoregressive Language Model.

[BibT_eX]

[DOI]

CoRR, April, 2026

2025

InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue.

[BibT_eX]

[DOI]

CoRR, October, 2025

AdaMesh: Personalized Facial Expressions and Head Poses for Adaptive Speech-Driven 3D Facial Animation.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2025

2024

An End-to-End Approach for Chord-Conditioned Song Generation.

[BibT_eX]

[DOI]

CoRR, 2024

Foundation Models for Music: A Survey.

[BibT_eX]

[DOI]

CoRR, 2024

ChatMusician: Understanding and Generating Music Intrinsically with LLM.

[BibT_eX]

[DOI]

CoRR, 2024

SongCreator: Lyrics-based Universal Song Generation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

An End-to-End Approach for Chord-Conditioned Song Generation.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

SCNet: Sparse Compression Network for Music Source Separation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Multi-View Midivae: Fusing Track- and Bar-View Representations for Long Multi-Track Symbolic Music Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Generating Stereophonic Music with Single-Stage Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

ChatMusician: Understanding and Generating Music Intrinsically with LLM.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023

MSStyleTTS: Multi-Scale Style Modeling With Hierarchical Context Information for Expressive Speech Synthesis.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2023

AdaMesh: Personalized Facial Expressions and Head Poses for Speech-Driven 3D Facial Animation.

[BibT_eX]

[DOI]

CoRR, 2023

Towards Spontaneous Style Modeling with Semi-supervised Pre-training for Conversational Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

GTN-Bailando: Genre Consistent long-Term 3D Dance Generation Based on Pre-Trained Genre Token Network.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

CB-Conformer: Contextual Biasing Conformer for Biased Word Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

TFCnet: Time-Frequency Domain Corrector for Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Context-Aware Coherent Speaking Style Prediction with Hierarchical Transformers for Audiobook Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

2022

Disentangleing Content and Fine-grained Prosody Information via Hybrid ASR Bottleneck Features for Voice Conversion.

[BibT_eX]

[DOI]

CoRR, 2022

Efficient Text Analysis with Pre-Trained Neural Network Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Towards Improving the Expressiveness of Singing Voice Synthesis with BERT Derived Semantic Information.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Towards Multi-Scale Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Disentangling Content and Fine-Grained Prosody Information Via Hybrid ASR Bottleneck Features for Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Towards Expressive Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

FullSubNet+: Channel Attention Fullsubnet with Complex Spectrograms for Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Exemplar-Based Emotive Speech Synthesis.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2021

Exploring Cross-lingual Singing Voice Synthesis Using Speech Data.

[BibT_eX]

[DOI]

Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Adversarially Learning Disentangled Speech Representations for Robust Multi-Factor Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

VAENAR-TTS: Variational Auto-Encoder Based Non-AutoRegressive Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

The Huya Multi-Speaker and Multi-Style Speech Synthesis System for M2voc Challenge 2020.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Speaker Independent and Multilingual/Mixlingual Speech-Driven Talking Head Generation Using Phonetic Posteriorgrams.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

2020

On the localness modeling for the self-attention based end-to-end speech synthesis.

[BibT_eX]

[DOI]

Neural Networks, 2020

DurIAN: Duration Informed Attention Network for Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Transferring Source Style in Non-Parallel Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Audio-Visual Recognition of Overlapped Speech for the LRS2 Dataset.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

End-To-End Accent Conversion Without Using Native Utterances.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Code-Switched Speech Synthesis Using Bilingual Phonetic Posteriorgram with Only Monolingual Corpora.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

DurIAN: Duration Informed Attention Network For Multimodal Synthesis.

[BibT_eX]

[DOI]

CoRR, 2019

Maximizing Mutual Information for Tacotron.

[BibT_eX]

[DOI]

CoRR, 2019

One-Shot Voice Conversion with Global Speaker Embeddings.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Multimedia Simultaneous Translation System for Minority Language Communication with Mandarin.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-Trained BERT.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Quasi-fully Convolutional Neural Network with Variational Inference for Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

A Compact Framework for Voice Conversion Using Wavenet Conditioned on Phonetic Posteriorgrams.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Automatic Prosodic Structure Labeling using DNN-BGRU-CRF Hybrid Neural Network.

[BibT_eX]

[DOI]

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Prosodic Structure Prediction using Deep Self-attention Neural Network.

[BibT_eX]

[DOI]

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

2018

Speech Super-Resolution Using Parallel WaveNet.

[BibT_eX]

[DOI]

Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Rapid Style Adaptation Using Residual Error Embedding for Expressive Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Neural Network Language Modeling with Letter-Based Features and Importance Sampling.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Feature Based Adaptation for Speaking Style Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2016

Personalized, Cross-Lingual TTS Using Phonetic Posteriorgrams.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Phonetic posteriorgrams for many-to-one voice conversion without parallel data training.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2016

Learning cross-lingual information with multilingual BLSTM for speech synthesis of low-resource languages.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015

Deep Learning for Acoustic Modeling in Parametric Speech Generation: A systematic review of existing techniques and future trends.

[BibT_eX]

[DOI]

IEEE Signal Process. Mag., 2015

Integrating acoustic and state-transition models for free phone recognition in L2 English speech using multi-distribution deep neural networks.

[BibT_eX]

[DOI]

Proceedings of the ISCA International Workshop on Speech and Language Technology in Education, 2015

Voice conversion using deep Bidirectional Long Short-Term Memory based Recurrent Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

A deep recurrent approach for acoustic-to-articulatory inversion.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

2014

Statistical parametric speech synthesis using weighted multi-distribution deep belief network.

[BibT_eX]

[DOI]

Shiyin Kang

Helen M. Meng

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

2013

Lexical stress detection for L2 English speech using deep belief networks.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Multi-distribution deep belief network for speech synthesis.

[BibT_eX]

[DOI]

Shiyin Kang

Xiaojun Qian

Helen Meng

Proceedings of the IEEE International Conference on Acoustics, 2013

2010

HMM based TTS for mixed language text.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Comparison of Syllable/Phone HMM Based Mandarin TTS.

[BibT_eX]

[DOI]

Proceedings of the 20th International Conference on Pattern Recognition, 2010

2009

Syllable HMM based Mandarin TTS and comparison with concatenative TTS.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Voiced/unvoiced decision algorithm for HMM-based speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Shiyin Kang

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...