We stand with Ukraine

We stand with Ukraine

Zhengkun Tian

Orcid: 0000-0002-0469-3049

According to our database¹, Zhengkun Tian authored at least 42 papers between 2019 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2025

LongCat-Audio-Codec: An Audio Tokenizer and Detokenizer Solution Designed for Speech Large Language Models.

[DOI]

,

,

,

,

,

,

,

CoRR, October, 2025

2024

SceneFake: An initial dataset and benchmarks for scene fake audio detection.

[DOI]

,

,

,

,

,

,

,

Pattern Recognit., 2024

MSR-86K: An Evolving, Multilingual Corpus with 86,300 Hours of Transcribed Audio for Speech Recognition Research.

[DOI]

,

,

,

,

,

CoRR, 2024

MSR-86K: An Evolving, Multilingual Corpus with 86, 300 Hours of Transcribed Audio for Speech Recognition Research.

[DOI]

,

,

,

,

,

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

2023

Transfer knowledge for punctuation prediction via adversarial training.

[DOI]

,

,

,

,

Speech Commun., April, 2023

CPPF: A contextual and post-processing-free model for automatic speech recognition.

[DOI]

,

,

,

,

,

,

CoRR, 2023

Peak-First CTC: Reducing the Peak Latency of CTC Models by Applying Peak-First Regularization.

[DOI]

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2023

TST: Time-Sparse Transducer for Automatic Speech Recognition.

[DOI]

,

,

,

,

Proceedings of the Artificial Intelligence - Third CAAI International Conference, 2023

2022

Hybrid Autoregressive and Non-Autoregressive Transformer Models for Speech Recognition.

[DOI]

,

,

,

,

IEEE Signal Process. Lett., 2022

SceneFake: An Initial Dataset and Benchmarks for Scene Fake Audio Detection.

[DOI]

,

,

,

,

,

,

CoRR, 2022

System Fingerprints Detection for DeepFake Audio: An Initial Dataset and Investigation.

[DOI]

,

,

,

,

,

,

CoRR, 2022

ADD 2022: the First Audio Deep Synthesis Detection Challenge.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, 2022

Reducing language context confusion for end-to-end code-switching automatic speech recognition.

[DOI]

,

,

,

,

,

CoRR, 2022

Fully Automated End-to-End Fake Audio Detection.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the DDAM@MM 2022: Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia, 2022

reducing multilingual context confusion for end-to-end code-switching automatic speech recognition.

[DOI]

,

,

,

,

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

ADD 2022: the first Audio Deep Synthesis Detection Challenge.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2022

End-to-End Network Based on Transformer for Automatic Detection of Covid-19.

[DOI]

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Gated Recurrent Fusion With Joint Training Framework for Robust End-to-End Speech Recognition.

[DOI]

,

,

,

,

,

IEEE ACM Trans. Audio Speech Lang. Process., 2021

Integrating Knowledge Into End-to-End Speech Recognition From External Text-Only Data.

[DOI]

,

,

,

,

,

IEEE ACM Trans. Audio Speech Lang. Process., 2021

Fast End-to-End Speech Recognition Via Non-Autoregressive Models and Cross-Modal Knowledge Transferring From BERT.

[DOI]

,

,

,

,

,

IEEE ACM Trans. Audio Speech Lang. Process., 2021

Half-Truth: A Partially Fake Audio Detection Dataset.

[DOI]

,

,

,

,

,

,

CoRR, 2021

TSNAT: Two-Step Non-Autoregressvie Transformer Models for Speech Recognition.

[DOI]

,

,

,

,

,

,

CoRR, 2021

Fast End-to-End Speech Recognition via a Non-Autoregressive Model and Cross-Modal Knowledge Transferring from BERT.

[DOI]

,

,

,

,

,

CoRR, 2021

Rnn-transducer With Language Bias For End-to-end Mandarin-English Code-switching Speech Recognition.

[DOI]

,

,

,

,

Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Hierarchically Attending Time-Frequency and Channel Features for Improving Speaker Verification.

[DOI]

,

,

,

,

Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Half-Truth: A Partially Fake Audio Detection Dataset.

[DOI]

,

,

,

,

,

,

,

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

FSR: Accelerating the Inference Process of Transducer-Based Models by Applying Fast-Skip Regularization.

[DOI]

,

,

,

,

,

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Continual Learning for Fake Audio Detection.

[DOI]

,

,

,

,

,

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

End-to-End Spelling Correction Conditioned on Acoustic Feature for Code-Switching Speech Recognition.

[DOI]

,

,

,

,

,

,

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Decoupling Pronunciation and Language for End-to-End Code-Switching Automatic Speech Recognition.

[DOI]

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2021

One In A Hundred: Selecting the Best Predicted Sequence from Numerous Candidates for Speech Recognition.

[DOI]

,

,

,

,

,

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

A Large-Scale Chinese Multimodal NER Dataset with Speech Clues.

[DOI]

,

,

,

,

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

2020

Deep imitator: Handwriting calligraphy imitation via deep attention networks.

[DOI]

,

,

,

,

,

Pattern Recognit., 2020

Adversarial Transfer Learning for Punctuation Restoration.

[DOI]

,

,

,

,

CoRR, 2020

Focal Loss for Punctuation Prediction.

[DOI]

,

,

,

,

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition.

[DOI]

,

,

,

,

,

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition.

[DOI]

,

,

,

,

,

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Synchronous Transformers for end-to-end Speech Recognition.

[DOI]

,

,

,

,

,

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Integrating Whole Context to Sequence-to-sequence Speech Recognition.

[DOI]

,

,

,

,

,

CoRR, 2019

Self-Attention Transducers for End-to-End Speech Recognition.

[DOI]

,

,

,

,

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

A Time Delay Neural Network with Shared Weight Self-Attention for Small-Footprint Keyword Spotting.

[DOI]

,

,

,

,

,

,

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Learn Spelling from Teachers: Transferring Knowledge from Language Models to Sequence-to-Sequence Speech Recognition.

[DOI]

,

,

,

,

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Loading...