Jinchuan Tian

Orcid: 0000-0002-2129-471X

According to our database1, Jinchuan Tian authored at least 59 papers between 2020 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
Online Register for Dual-Mode Self-Supervised Speech Models: Mitigating The Lack of Future Context.
CoRR, February, 2026

Bagpiper: Solving Open-Ended Audio Tasks via Rich Captions.
CoRR, February, 2026

Optimizing Conversational Quality in Spoken Dialogue Systems with Reinforcement Learning from AI Feedback.
CoRR, January, 2026

Do Neural Codecs Generalize? A Controlled Study Across Unseen Languages and Non-Speech Tasks.
CoRR, January, 2026

Speech-Hands: A Self-Reflection Voice Agentic Approach to Speech Recognition and Audio Reasoning with Omni Perception.
CoRR, January, 2026

BSCodec: A Band-Split Neural Codec for High-Quality Universal Audio Reconstruction.
Proceedings of the Findings of the Association for Computational Linguistics: EACL 2026, 2026

Speech-Hands: A Self-Reflection Voice Agentic Approach to Speech Recognition and Audio Reasoning with Omni Perception.
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026

2025
Adapting Speech Language Model to Singing Voice Synthesis.
CoRR, December, 2025

VSpeechLM: A Visual Speech Language Model for Visual Text-to-Speech Task.
CoRR, November, 2025

OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM.
CoRR, October, 2025

UALM: Unified Audio Language Model for Understanding, Generation and Reasoning.
CoRR, October, 2025

Chain-of-Thought Reasoning in Streaming Full-Duplex End-to-End Spoken Dialogue Systems.
CoRR, October, 2025

SpeechIQ: Speech Intelligence Quotient Across Cognitive Levels in Voice Understanding Large Language Models.
CoRR, July, 2025

ARECHO: Autoregressive Evaluation via Chain-Based Hypothesis Optimization for Speech Multi-Metric Estimation.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

ESPnet-SpeechLM: An Open Speech Language Model Toolkit.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

VERSA: A Versatile Evaluation Toolkit for Speech, Audio, and Music.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

ESPnet-SDS: Unified Toolkit and Demo for Spoken Dialogue Systems.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

A Visual Speech Language Model for Visual Text-to-Speech Task.
Proceedings of the 7th ACM International Conference on Multimedia in Asia, 2025

OpusLM: A Family of Open Unified Speech Language Models.
Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Context-Driven Dynamic Pruning for Large Speech Foundation Models.
Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

The Text-to-speech in the Wild (TITW) Database.
Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Exploring Linear Variant Transformers and k-NN Memory Inference for Long-Form ASR.
Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Chain-of-Thought Training for Open E2E Spoken Dialogue Systems.
Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

OWSM v4: Improving Open Whisper-Style Speech Models via Data Scaling and Cleaning.
Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

OWLS: Scaling Laws for Multilingual Speech Recognition and Translation Models.
Proceedings of the Forty-second International Conference on Machine Learning, 2025

Preference Alignment Improves Language Model-Based TTS.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Continual Pre-training for Codec-Based Speech LLMs: Balancing Understanding and Generation.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

PURE Codec: Progressive Unfolding of Residual Entropy for Speech Codec Learning.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

VERSA-v2: A Modular and Scalable Toolkit for Speech and Audio Evaluation with Expanded Metrics, Visualization, and LLM Integration.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

Evaluating Self-Supervised Speech Models Via Text-Based LLMs.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

SpeechIQ: Speech-Agentic Intelligence Quotient Across Cognitive Levels in Voice Understanding by Large Language Models.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024
SpoofCeleb: Speech Deepfake Detection and SASV In The Wild.
CoRR, 2024

Text-To-Speech Synthesis In The Wild.
CoRR, 2024

ESPnet-Codec: Comprehensive Training and Evaluation of Neural Codecs For Audio, Music, and Speech.
Proceedings of the IEEE Spoken Language Technology Workshop, 2024

CMU's IWSLT 2024 Offline Speech Translation System: A Cascaded Approach For Long-Form Robustness.
Proceedings of the 21st International Conference on Spoken Language Translation, 2024

On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

The Interspeech 2024 Challenge on Speech Processing Using Discrete Units.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

UniAudio: Towards Universal Audio Generation with Large Language Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

AutoPrep: An Automatic Preprocessing Framework for In-The-Wild Speech Data.
Proceedings of the IEEE International Conference on Acoustics, 2024

Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study.
Proceedings of the IEEE International Conference on Acoustics, 2024

Towards Robust Speech Representation Learning for Thousands of Languages.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Make-A-Voice: Revisiting Voice Large Language Models as Scalable Multilingual and Multitask Learners.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
Integrating Lattice-Free MMI Into End-to-End Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

UniAudio: An Audio Foundation Model Toward Universal Audio Generation.
CoRR, 2023

HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio Codec.
CoRR, 2023

The MineTrans Systems for IWSLT 2023 Offline Speech Translation and Speech-to-Speech Translation Tasks.
Proceedings of the 20th International Conference on Spoken Language Translation, 2023

Bayes Risk Transducer: Transducer with Controllable Alignment Prediction.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Bayes Risk CTC: Controllable CTC Alignment in Sequence-to-Sequence Tasks.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Reproducing Whisper-Style Training Using An Open-Source Toolkit And Publicly Available Data.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022
Improving Mandarin End-to-End Speech Recognition With Word N-Gram Language Model.
IEEE Signal Process. Lett., 2022

LAE: Language-Aware Encoder for Monolingual and Multilingual ASR.
CoRR, 2022

Integrate Lattice-Free MMI into End-to-End Speech Recognition.
CoRR, 2022

Speaker-Aware Mixture of Mixtures Training for Weakly Supervised Speaker Extraction.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

LAE: Language-Aware Encoder for Monolingual and Multilingual ASR.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Consistent Training and Decoding for End-to-End Speech Recognition Using Lattice-Free MMI.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Layer Reduction: Accelerating Conformer-Based Self-Supervised Model via Layer Consistency.
CoRR, 2021

2020
A Random Gossip BMUF Process for Neural Language Modeling.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020


  Loading...