Ya Li
Orcid: 0000-0002-6284-5039Affiliations:
- Beijing University of Posts and Telecommunications, School of Artificial Intelligence, Beijing, China
- Chinese Academy of Sciences (CAS), Institute of Automation, National Laboratory of Pattern Recognition, Beijing, China (PhD 2012)
According to our database1,
Ya Li
authored at least 97 papers
between 2009 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2025
CoRR, July, 2025
CoRR, March, 2025
Beyond Surface Simplicity: Revealing Hidden Reasoning Attributes for Precise Commonsense Diagnosis.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025
2024
DepressionMLP: A Multi-Layer Perceptron Architecture for Automatic Depression Level Prediction via Facial Keypoints and Action Units.
IEEE Trans. Circuits Syst. Video Technol., September, 2024
Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation.
IEEE ACM Trans. Audio Speech Lang. Process., 2024
Articulatory Copy Synthesis Based on the Speech Synthesizer VocalTractLab and Convolutional Recurrent Neural Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2024
IEEE Trans. Affect. Comput., 2024
CoRR, 2024
CoRR, 2024
Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model.
CoRR, 2024
Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024
Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024
Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024
Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024
Retrieval Augmented Generation in Prompt-based Text-to-Speech Synthesis with Context-Aware Contrastive Language-Audio Pretraining.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024
Enhancing Modal Fusion by Alignment and Label Matching for Multimodal Emotion Recognition.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024
A Preliminary Study on Automatic Pronunciation Error Detection for Hearing-impaired Children.
Proceedings of the 10th International Conference on Communication and Information Processing, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Concss: Contrastive-based Context Comprehension for Dialogue-Appropriate Prosody in Conversational Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2024
2023
ACM Trans. Multim. Comput. Commun. Appl., 2023
Dual Attention and Element Recalibration Networks for Automatic Depression Level Prediction.
IEEE Trans. Affect. Comput., 2023
CoRR, 2023
Mining High-quality Samples from Raw Data and Majority Voting Method for Multimodal Emotion Recognition.
Proceedings of the 31st ACM International Conference on Multimedia, 2023
CMCU-CSS: Enhancing Naturalness via Commonsense-based Multi-modal Context Understanding in Conversational Speech Synthesis.
Proceedings of the 31st ACM International Conference on Multimedia, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Exploring the interpretability in speech-based adolescent depression detection by SHAP.
Proceedings of the 9th International Conference on Communication and Information Processing, 2023
GaitParse: Gait Parsing Algorithm with Self-Supervised Fine-Tuning for Gait Recognition.
Proceedings of the 9th International Conference on Communication and Information Processing, 2023
M<sup>2</sup>-CTTS: End-to-End Multi-Scale Multi-Modal Conversational Text-to-Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2023
2022
Selective Element and Two Orders Vectorization Networks for Automatic Depression Severity Diagnosis via Facial Changes.
IEEE Trans. Circuits Syst. Video Technol., 2022
Depressioner: Facial dynamic representation for automatic depression level prediction.
Expert Syst. Appl., 2022
A Keypoint Based Enhancement Method for Audio Driven Free View Talking Head Synthesis.
Proceedings of the 24th IEEE International Workshop on Multimedia Signal Processing, 2022
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022
Rhythm-controllable Attention with High Robustness for Long Sentence Speech Synthesis.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022
Automatic Respiratory Sound Classification Via Multi-Branch Temporal Convolutional Network.
Proceedings of the IEEE International Conference on Acoustics, 2022
Automatic Depression Level Assessment from Speech By Long-Term Global Information Embedding.
Proceedings of the IEEE International Conference on Acoustics, 2022
2021
Int. J. Autom. Comput., 2021
2020
Int. J. Autom. Comput., 2020
2019
Int. J. Autom. Comput., 2019
Discriminative Video Representation with Temporal Order for Micro-expression Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019
2018
Investigating Deep Neural Network Adaptation for Generating Exclamatory and Interrogative Speech in Mandarin.
J. Signal Process. Syst., 2018
Investigation of Multimodal Features, Classifiers and Fusion Methods for Emotion Recognition.
CoRR, 2018
Proceedings of the 2018 on Audio/Visual Emotion Challenge and Workshop, 2018
Multimodal Continuous Emotion Recognition with Data Augmentation Using Recurrent Neural Networks.
Proceedings of the 2018 on Audio/Visual Emotion Challenge and Workshop, 2018
BLSTM-CRF Based End-to-End Prosodic Boundary Prediction with Context Sensitive Embeddings in a Text-to-Speech Front-End.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
2017
Quantitative intonation modeling of interrogative sentences for Mandarin speech synthesis.
Speech Commun., 2017
J. Ambient Intell. Humaniz. Comput., 2017
Continuous Multimodal Emotion Prediction Based on Long Short Term Memory Recurrent Neural Network.
Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge, Mountain View, CA, USA, October 23, 2017
Investigating Efficient Feature Representation Methods and Training Objective for BLSTM-Based Phone Duration Prediction.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Proceedings of the Blizzard Challenge 2017, Stockholm, Sweden, August 25, 2017, 2017
2016
Investigating Effect of Rich Syntactic Features on Mandarin Prosodic Boundaries Prediction.
J. Signal Process. Syst., 2016
CoRR, 2016
Text-based sentential stress prediction using continuous lexical embedding for Mandarin speech synthesis.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016
End-to-end keywords spotting based on connectionist temporal classification for Mandarin.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016
Improving Prosodic Boundaries Prediction for Mandarin Speech Synthesis by Using Enhanced Embedding Feature and Model Fusion Approach.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
The Parameterized Phoneme Identity Feature as a Continuous Real-Valued Vector for Neural Network Based Speech Synthesis.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Long short term memory recurrent neural network based encoding method for emotion recognition in video.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
Proceedings of the Pattern Recognition - 7th Chinese Conference, 2016
Proceedings of the Blizzard Challenge 2016, Cuppertino, CA, USA, September 16, 2016, 2016
2015
Hierarchical stress modeling and generation in mandarin for expressive Text-to-Speech.
Speech Commun., 2015
Long Short Term Memory Recurrent Neural Network based Multimodal Dimensional Emotion Recognition.
Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge, 2015
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
From simulated speech to natural speech, what are the robust features for emotion recognition?
Proceedings of the 2015 International Conference on Affective Computing and Intelligent Interaction, 2015
Proceedings of the 2015 International Conference on Affective Computing and Intelligent Interaction, 2015
2014
Phonological influences on the realization of final lowering evidence from dialogue Chinese Mandarin.
Proceedings of the 2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA), 2014
Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge, 2014
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014
Context features based pre-selection and weight prediction in concatenation speech synthesis system.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014
Efficient voice activity detection algorithm based on sub-band temporal envelope and sub-band long-term signal variability.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014
Investigating effect of rich syntactic features on Mandarin prosodic phrase boundaries prediction.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014
Improving generation performance of speech emotion recognition by denoising autoencoders.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
A novel hybrid mandarin speech synthesis system using different base units for model training and concatenation.
Proceedings of the IEEE International Conference on Acoustics, 2014
2013
A novel unit selection method for concatenation speech system using similarity measure.
Proceedings of the 2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2013
Proceedings of the Chinese Lexical Semantics - 14th Workshop, 2013
Proceedings of the 2nd IAPR Asian Conference on Pattern Recognition, 2013
Bayesian Inference Based Temporal Modeling for Naturalistic Affective Expression Classification.
Proceedings of the 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, 2013
2012
J. Multimodal User Interfaces, 2012
2011
EURASIP J. Adv. Signal Process., 2011
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011
Proceedings of the 17th International Congress of Phonetic Sciences, 2011
Proceedings of the Affective Computing and Intelligent Interaction, 2011
2010
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010
Proceedings of the Blizzard Challenge 2010, Kansai Science City, Japan, September 25, 2010, 2010
2009
Proceedings of the Blizzard Challenge 2009, Edinburgh, Scotland, UK, September 4, 2009, 2009