Yi-Jen Shih

Orcid: 0000-0003-3481-3117

According to our database1, Yi-Jen Shih authored at least 14 papers between 2022 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2025
Can Speech LLMs Think while Listening?
CoRR, October, 2025

Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Unifying Model and Layer Fusion for Speech Foundation Models.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

2024
Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks.
CoRR, 2024

Measuring Sound Symbolism In Audio-Visual Models.
Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Self-Supervised Speech Models For Word-Level Stuttered Speech Detection.
Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Interface Design for Self-Supervised Speech Models.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

SpeechCLIP+: Self-Supervised Multi-Task Representation Learning for Speech Via Clip and Speech-Image Data.
Proceedings of the IEEE International Conference on Acoustics, 2024

AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models.
Proceedings of the IEEE International Conference on Acoustics, 2024

Integrating Self-Supervised Speech Model with Pseudo Word-Level Targets from Visually-Grounded Speech Model.
Proceedings of the IEEE International Conference on Acoustics, 2024

2023
Theme Transformer: Symbolic Music Generation With Theme-Conditioned Transformer.
IEEE Trans. Multim., 2023

AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models.
CoRR, 2023

M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval.
Proceedings of the IEEE International Conference on Acoustics, 2023

2022
SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022


  Loading...