Jiatong Shi

Orcid: 0000-0002-9050-8304

According to our database1, Jiatong Shi authored at least 63 papers between 2018 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Wav2Gloss: Generating Interlinear Glossed Text from Speech.
CoRR, 2024

Singing Voice Data Scaling-up: An Introduction to ACE-Opencpop and KiSing-v2.
CoRR, 2024

ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models.
CoRR, 2024

OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer.
CoRR, 2024

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
An iteration-based interactive attention network for 3D point cloud registration.
Neurocomputing, December, 2023

A dynamic graph aggregation framework for 3D point cloud registration.
Eng. Appl. Artif. Intell., April, 2023

Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond.
CoRR, 2023

HuBERTopic: Enhancing Semantic Representation of HuBERT through Self-supervision Utilizing Topic Model.
CoRR, 2023

EFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Multilingual and Low Resource Scenarios.
CoRR, 2023

Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised Learning with Masked Unit Prediction.
CoRR, 2023

UniAudio: An Audio Foundation Model Toward Universal Audio Generation.
CoRR, 2023

Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study.
CoRR, 2023

Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech.
CoRR, 2023

A Systematic Exploration of Joint-training for Singing Voice Synthesis.
CoRR, 2023

The Singing Voice Conversion Challenge 2023.
CoRR, 2023

Exploration on HuBERT with Multiple Resolutions.
CoRR, 2023

ML-SUPERB: Multilingual Speech Universal PERformance Benchmark.
CoRR, 2023

Improving Cascaded Unsupervised Speech Translation with Denoising Back-translation.
CoRR, 2023

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head.
CoRR, 2023

CMU's IWSLT 2023 Simultaneous Speech Translation System.
Proceedings of the 20th International Conference on Spoken Language Translation, 2023


Phoneix: Acoustic Feature Processing Strategy for Enhanced Singing Pronunciation With Phoneme Distribution Predictor.
Proceedings of the IEEE International Conference on Acoustics, 2023

Enhancing Speech-To-Speech Translation with Multiple TTS Targets.
Proceedings of the IEEE International Conference on Acoustics, 2023

Bridging Speech and Textual Pre-Trained Models With Unsupervised ASR.
Proceedings of the IEEE International Conference on Acoustics, 2023

Euro: Espnet Unsupervised ASR Open-Source Toolkit.
Proceedings of the IEEE International Conference on Acoustics, 2023

Improving Massively Multilingual ASR with Auxiliary CTC Objectives.
Proceedings of the IEEE International Conference on Acoustics, 2023

Findings of the 2023 ML-Superb Challenge: Pre-Training And Evaluation Over More Languages And Beyond.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Reproducing Whisper-Style Training Using An Open-Source Toolkit And Publicly Available Data.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

The Singing Voice Conversion Challenge 2023.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Evaluating Self-Supervised Speech Models on a Taiwanese Hokkien Corpus.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Joint Prediction and Denoising for Large-Scale Multilingual Self-Supervised Learning.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2023

UniLG: A Unified Structure-aware Framework for Lyrics Generation.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022
An investigation of neural uncertainty estimation for target speaker extraction equipped RNN transducer.
Comput. Speech Lang., 2022

4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders.
CoRR, 2022

Muskits: an End-to-End Music Processing Toolkit for Singing Voice Synthesis.
CoRR, 2022

On Compressing Sequences for Self-Supervised Speech Models.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Superb @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

CMU's IWSLT 2022 Dialect Speech Translation System.
Proceedings of the 19th International Conference on Spoken Language Translation, 2022


Streaming Automatic Speech Recognition with Re-blocking Processing Based on Integrated Voice Activity Detection.
Proceedings of the Interspeech 2022, 2022

VQ-T: RNN Transducers using Vector-Quantized Prediction Network States.
Proceedings of the Interspeech 2022, 2022

Muskits: an End-to-end Music Processing Toolkit for Singing Voice Synthesis.
Proceedings of the Interspeech 2022, 2022

SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy.
Proceedings of the Interspeech 2022, 2022

Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech Translation.
Proceedings of the Interspeech 2022, 2022

Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation.
Proceedings of the Interspeech 2022, 2022

Towards end-to-end Speaker Diarization with Generalized Neural Speaker Clustering.
Proceedings of the IEEE International Conference on Acoustics, 2022

Training Strategies for Automatic Song Writing: A Unified Framework Perspective.
Proceedings of the IEEE International Conference on Acoustics, 2022

SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021
Leveraging deep learning with audio analytics to predict the success of crowdfunding projects.
J. Supercomput., 2021

ESPnet2-TTS: Extending the Edge of TTS Research.
CoRR, 2021

ESPnet-ST IWSLT 2021 Offline Speech Translation System.
Proceedings of the 18th International Conference on Spoken Language Translation, 2021

SUPERB: Speech Processing Universal PERformance Benchmark.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Improving RNN Transducer with Target Speaker Extraction and Neural Uncertainty Estimation.
Proceedings of the IEEE International Conference on Acoustics, 2021

Sequence-To-Sequence Singing Voice Synthesis With Perceptual Entropy Loss.
Proceedings of the IEEE International Conference on Acoustics, 2021

Recent Developments on Espnet Toolkit Boosted By Conformer.
Proceedings of the IEEE International Conference on Acoustics, 2021

Leveraging End-to-End ASR for Endangered Language Documentation: An Empirical Study on Yolóxochitl Mixtec.
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021

Cross-Lingual Transfer for Speech Processing Using Acoustic Language Similarity.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

Understanding the Tradeoffs in Client-side Privacy for Downstream Speech Tasks.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

2020
Context-Aware Goodness of Pronunciation for Computer-Assisted Pronunciation Training.
Proceedings of the Interspeech 2020, 2020

Large-Scale End-to-End Multilingual Speech Recognition and Language Identification with Multi-Task Learning.
Proceedings of the Interspeech 2020, 2020

2018
Identifying Impact Factors of Question Quality in Online Health Q&A Communities: an Empirical Analysis on MedHelp.
Proceedings of the 22nd Pacific Asia Conference on Information Systems, 2018


  Loading...