Yiwen Shao

According to our database1, Yiwen Shao authored at least 33 papers between 2018 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
Unlocking Strong Supervision: A Data-Centric Study of General-Purpose Audio Pre-Training Methods.
CoRR, March, 2026

JAEGER: Joint 3D Audio-Visual Grounding and Reasoning in Simulated Physical Environments.
CoRR, February, 2026

Towards Comprehensive Semantic Speech Embeddings for Chinese Dialects.
CoRR, January, 2026

TagSpeech: End-to-End Multi-Speaker ASR and Diarization with Fine-Grained Temporal Grounding.
CoRR, January, 2026

AzeroS: Extending LLM to Speech with Self-Generated Instruction-Free Tuning.
CoRR, January, 2026

DualSpeechLM: Towards Unified Speech Understanding and Generation via Dual Speech Token Modeling with Large Language Models.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
Revisiting Audio-language Pretraining for Learning General-purpose Audio Representation.
CoRR, November, 2025

Auden-Voice: General-Purpose Voice Encoder for Speech and Language Understanding.
CoRR, November, 2025

Taming the Chaos: Coordinated Autoscaling for Heterogeneous and Disaggregated LLM Inference.
CoRR, August, 2025

VietASR: Achieving Industry-level Vietnamese ASR with 50-hour labeled data and Large-Scale Speech Pretraining.
Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Efficient Multilingual ASR Finetuning via LoRA Language Experts.
Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Efficient Scaling for LLM-based ASR.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

2024
Advancing Multi-Talker ASR Performance With Large Language Models.
Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Spatialemb: Extract and Encode Spatial Information for 1-Stage Multi-Channel Multi-Speaker ASR on Arbitrary Microphone Arrays.
Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Multi-Channel Multi-Speaker ASR Using Target Speaker's Solo Segment.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

RIR-SF: Room Impulse Response Based Spatial Feature for Target Speech Recognition in Multi-Channel Multi-Speaker Scenarios.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

UniX-Encoder: A Universal X-Channel Speech Encoder for AD-HOC Microphone Array Speech Processing.
Proceedings of the IEEE International Conference on Acoustics, 2024

2023
Early diagnosis and mechanistic understanding of citrus Huanglongbing via sun-induced chlorophyll fluorescence.
Comput. Electron. Agric., December, 2023

RIR-SF: Room Impulse Response Based Spatial Feature for Multi-channel Multi-talker ASR.
CoRR, 2023

Challenges and Insights: Exploring 3D Spatial Features and Complex Networks on the MISP Dataset.
CoRR, 2023

2022
Defense against Adversarial Attacks on Hybrid Speech Recognition using Joint Adversarial Fine-tuning with Denoiser.
CoRR, 2022

Chunking Defense for Adversarial Attacks on ASR.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Defense against Adversarial Attacks on Hybrid Speech Recognition System using Adversarial Fine-tuning with Denoiser.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Multi-Channel Multi-Speaker ASR Using 3D Spatial Feature.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Adversarial Attacks and Defenses for Speech Recognition Systems.
CoRR, 2021

2020
PyChain: A Fully Parallelized PyTorch Implementation of LF-MMI for End-to-End ASR.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Speaker Diarization with Region Proposal Network.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Practices of backuping homomorphically encrypted databases.
Frontiers Comput. Sci., 2019

Using ASR Methods for OCR.
Proceedings of the 2019 International Conference on Document Analysis and Recognition, 2019

Espresso: A Fast End-to-End Neural Speech Recognition Toolkit.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018
A Novel Normalization Method for Autocorrelation Function for Pitch Detection and for Speech Activity Detection.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Use of Pitch Continuity for Robust Speech Activity Detection.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

CryptZip: Squeezing out the Redundancy in Homomorphically Encrypted Backup Data.
Proceedings of the 9th Asia-Pacific Workshop on Systems, 2018


  Loading...