Jingbei Li

Orcid: 0000-0002-6284-5979

According to our database¹, Jingbei Li authored at least 23 papers between 2018 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

Evaluating and Rewarding LALMs for Expressive Role-Play TTS via Mean Continuation Log-Probability.

[BibT_eX]

[DOI]

CoRR, January, 2026

2025

VoxRole: A Comprehensive Benchmark for Evaluating Speech-Based Role-Playing Agents.

[BibT_eX]

[DOI]

CoRR, September, 2025

Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model.

[BibT_eX]

[DOI]

CoRR, June, 2025

DiffCSS: Diverse and Expressive Conversational Speech Synthesis with Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

2024

Joint Multiscale Cross-Lingual Speaking Style Transfer With Bidirectional Attention Mechanism for Automatic Dubbing.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

2023

DiCLET-TTS: Diffusion Model Based Cross-Lingual Emotion Transfer for Text-to-Speech - A Study Between English and Mandarin.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2023

Joint Multi-scale Cross-lingual Speaking Style Transfer with Bidirectional Attention Mechanism for Automatic Dubbing.

[BibT_eX]

[DOI]

CoRR, 2023

2022

Inferring Speaking Styles from Multi-modal Conversational Context by Multi-scale Relational Graph Convolutional Networks.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Enhancing Word-Level Semantic Representation via Dependency Structure for Expressive Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Neufa: Neural Network Based End-to-End Forced Alignment with Bidirectional Attention Mechanism.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Enhancing Speaking Styles in Conversational Text-to-Speech Synthesis with Graph-Based Multi-Modal Context Modeling.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Spoken Style Learning with Multi-modal Hierarchical Context Encoding for Conversational Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

CoRR, 2021

Dependency Parsing based Semantic Representation Learning with Graph Neural Network for Enhancing Expressiveness of Text-to-Speech.

[BibT_eX]

[DOI]

CoRR, 2021

Adversarially learning disentangled speech representations for robust multi-factor voice conversion.

[BibT_eX]

[DOI]

CoRR, 2021

Adversarially Learning Disentangled Speech Representations for Robust Multi-Factor Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Towards Multi-Scale Style Control for Expressive Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Joint Face Detection and Landmark Localization Based on an Extremely Lightweight Network.

[BibT_eX]

[DOI]

Proceedings of the Image and Graphics - 11th International Conference, 2021

Syntactic Representation Learning For Neural Network Based TTS with Syntactic Parse Tree Traversal.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Emotion Controllable Speech Synthesis Using Emotion-Unlabeled Dataset with the Assistance of Cross-Domain Speech Emotion Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

2019

Knowledge-Based Linguistic Encoding for End-to-End Mandarin Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

2018

Inferring User Emotive State Changes in Realistic Human-Computer Conversational Dialogs.

[BibT_eX]

[DOI]

Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

Wiener Loss: A Strong Correlative Loss Applied to Conditional GAN for Color Prediction.

[BibT_eX]

[DOI]

Proceedings of the 2018 International Conference on Artificial Intelligence and Virtual Reality, 2018

Multi-modal Multi-scale Speech Expression Evaluation in Computer-Assisted Language Learning.

[BibT_eX]

[DOI]

Proceedings of the Artificial Intelligence and Mobile Services - AIMS 2018, 2018

Jingbei Li

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...