Jason Li

Affiliations:
  • NVIDIA, Santa Clara, CA, USA


According to our database1, Jason Li authored at least 21 papers between 2018 and 2025.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2025
Align2Speak: Improving TTS for Low Resource Languages via ASR-Guided Online Preference Optimization.
CoRR, September, 2025

Frame-Stacked Local Transformers For Efficient Multi-Codebook Speech Generation.
CoRR, September, 2025

HiFiTTS-2: A Large-Scale High Bandwidth Speech Dataset.
Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Efficient and Direct Duplex Modeling for Speech-to-Speech Language Model.
Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

NanoCodec: Towards High-Quality Ultra Fast Speech LLM Inference.
Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Low Frame-rate Speech Codec: a Codec Designed for Fast High-quality Speech LLM Training and Inference.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

TTS-Transducer: End-to-End Speech Synthesis with Neural Transducer.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance.
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

Open Full-duplex Voice Agent with Speech-to-Speech Language Model.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

2024
Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment.
CoRR, 2024

Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

SALM: Speech-Augmented Language Model with in-Context Learning for Speech Recognition and Translation.
Proceedings of the IEEE International Conference on Acoustics, 2024

2023
ACE-VC: Adaptive and Controllable Voice Conversion Using Explicitly Disentangled Self-Supervised Speech Representations.
Proceedings of the IEEE International Conference on Acoustics, 2023

2021
Adapting TTS models For New Speakers using Transfer Learning.
CoRR, 2021

Cross-Language Transfer Learning and Domain Adaptation for End-to-End Automatic Speech Recognition.
Proceedings of the 2021 IEEE International Conference on Multimedia and Expo, 2021

2020
Mellotron: Multispeaker Expressive Voice Synthesis by Conditioning on Rhythm, Pitch and Global Style Tokens.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Quartznet: Deep Automatic Speech Recognition with 1D Time-Channel Separable Convolutions.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
NeMo: a toolkit for building AI applications using Neural Modules.
CoRR, 2019

Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks.
CoRR, 2019

Jasper: An End-to-End Convolutional Neural Acoustic Model.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

2018
Training Neural Speech Recognition Systems with Synthetic Speech Augmentation.
CoRR, 2018


  Loading...