Shiyin Kang

Orcid: 0000-0001-8304-5260

According to our database1, Shiyin Kang authored at least 57 papers between 2009 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
ChatMusician: Understanding and Generating Music Intrinsically with LLM.
CoRR, 2024

Multi-view MidiVAE: Fusing Track- and Bar-view Representations for Long Multi-track Symbolic Music Generation.
CoRR, 2024

2023
MSStyleTTS: Multi-Scale Style Modeling With Hierarchical Context Information for Expressive Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

AdaMesh: Personalized Facial Expressions and Head Poses for Speech-Driven 3D Facial Animation.
CoRR, 2023

Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts.
CoRR, 2023

Towards Spontaneous Style Modeling with Semi-supervised Pre-training for Conversational Text-to-Speech Synthesis.
CoRR, 2023

GTN-Bailando: Genre Consistent long-Term 3D Dance Generation Based on Pre-Trained Genre Token Network.
Proceedings of the IEEE International Conference on Acoustics, 2023

CB-Conformer: Contextual Biasing Conformer for Biased Word Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

TFCnet: Time-Frequency Domain Corrector for Speech Separation.
Proceedings of the IEEE International Conference on Acoustics, 2023

Context-Aware Coherent Speaking Style Prediction with Hierarchical Transformers for Audiobook Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2023

2022
Disentangleing Content and Fine-grained Prosody Information via Hybrid ASR Bottleneck Features for Voice Conversion.
CoRR, 2022

Efficient Text Analysis with Pre-Trained Neural Network Models.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Towards Improving the Expressiveness of Singing Voice Synthesis with BERT Derived Semantic Information.
Proceedings of the Interspeech 2022, 2022

Towards Multi-Scale Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis.
Proceedings of the Interspeech 2022, 2022

Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information.
Proceedings of the Interspeech 2022, 2022

Disentangling Content and Fine-Grained Prosody Information Via Hybrid ASR Bottleneck Features for Voice Conversion.
Proceedings of the IEEE International Conference on Acoustics, 2022

Towards Expressive Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2022

FullSubNet+: Channel Attention Fullsubnet with Complex Spectrograms for Speech Enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Exemplar-Based Emotive Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Exploring Cross-lingual Singing Voice Synthesis Using Speech Data.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Adversarially Learning Disentangled Speech Representations for Robust Multi-Factor Voice Conversion.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

VAENAR-TTS: Variational Auto-Encoder Based Non-AutoRegressive Text-to-Speech Synthesis.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

The Huya Multi-Speaker and Multi-Style Speech Synthesis System for M2voc Challenge 2020.
Proceedings of the IEEE International Conference on Acoustics, 2021

Speaker Independent and Multilingual/Mixlingual Speech-Driven Talking Head Generation Using Phonetic Posteriorgrams.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

2020
On the localness modeling for the self-attention based end-to-end speech synthesis.
Neural Networks, 2020

DurIAN: Duration Informed Attention Network for Speech Synthesis.
Proceedings of the Interspeech 2020, 2020

Transferring Source Style in Non-Parallel Voice Conversion.
Proceedings of the Interspeech 2020, 2020

Audio-Visual Recognition of Overlapped Speech for the LRS2 Dataset.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

End-To-End Accent Conversion Without Using Native Utterances.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Code-Switched Speech Synthesis Using Bilingual Phonetic Posteriorgram with Only Monolingual Corpora.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
DurIAN: Duration Informed Attention Network For Multimodal Synthesis.
CoRR, 2019

Maximizing Mutual Information for Tacotron.
CoRR, 2019

One-Shot Voice Conversion with Global Speaker Embeddings.
Proceedings of the Interspeech 2019, 2019

Multimedia Simultaneous Translation System for Minority Language Communication with Mandarin.
Proceedings of the Interspeech 2019, 2019

Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-Trained BERT.
Proceedings of the Interspeech 2019, 2019

Quasi-fully Convolutional Neural Network with Variational Inference for Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2019

A Compact Framework for Voice Conversion Using Wavenet Conditioned on Phonetic Posteriorgrams.
Proceedings of the IEEE International Conference on Acoustics, 2019

Automatic Prosodic Structure Labeling using DNN-BGRU-CRF Hybrid Neural Network.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Prosodic Structure Prediction using Deep Self-attention Neural Network.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

2018
Speech Super-Resolution Using Parallel WaveNet.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Rapid Style Adaptation Using Residual Error Embedding for Expressive Speech Synthesis.
Proceedings of the Interspeech 2018, 2018

Neural Network Language Modeling with Letter-Based Features and Importance Sampling.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Feature Based Adaptation for Speaking Style Synthesis.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2016
Personalized, Cross-Lingual TTS Using Phonetic Posteriorgrams.
Proceedings of the Interspeech 2016, 2016

Phonetic posteriorgrams for many-to-one voice conversion without parallel data training.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2016

Learning cross-lingual information with multilingual BLSTM for speech synthesis of low-resource languages.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015
Deep Learning for Acoustic Modeling in Parametric Speech Generation: A systematic review of existing techniques and future trends.
IEEE Signal Process. Mag., 2015

Integrating acoustic and state-transition models for free phone recognition in L2 English speech using multi-distribution deep neural networks.
Proceedings of the ISCA International Workshop on Speech and Language Technology in Education, 2015

Voice conversion using deep Bidirectional Long Short-Term Memory based Recurrent Neural Networks.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

A deep recurrent approach for acoustic-to-articulatory inversion.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

2014
Statistical parametric speech synthesis using weighted multi-distribution deep belief network.
Proceedings of the INTERSPEECH 2014, 2014

2013
Lexical stress detection for L2 English speech using deep belief networks.
Proceedings of the INTERSPEECH 2013, 2013

Multi-distribution deep belief network for speech synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2013

2010
HMM based TTS for mixed language text.
Proceedings of the INTERSPEECH 2010, 2010

Comparison of Syllable/Phone HMM Based Mandarin TTS.
Proceedings of the 20th International Conference on Pattern Recognition, 2010

2009
Syllable HMM based Mandarin TTS and comparison with concatenative TTS.
Proceedings of the INTERSPEECH 2009, 2009

Voiced/unvoiced decision algorithm for HMM-based speech synthesis.
Proceedings of the INTERSPEECH 2009, 2009


  Loading...