Chao-Han Huck Yang

Samuel Yen-Chi Chen

Proceedings of the IEEE International Symposium on Circuits and Systems, 2025

OpusLM: A Family of Open Unified Speech Language Models.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

VoiceNoNG: Robust High-Quality Speech Editing Model without Hallucinations.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

OWLS: Scaling Laws for Multilingual Speech Recognition and Translation Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Towards Neural Scaling Laws for Time Series Foundation Models.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Fugatto 1: Foundational Generative Audio Transformer Opus 1.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

A Quantum Circuit-Based Compression Perspective for Parameter-Efficient Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Audio Large Language Models Can Be Descriptive Speech Quality Evaluators.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Bias in Gender Bias Benchmarks: How Spurious Features Distort Evaluation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Revise, Reason, and Recognize: LLM-Based Emotion Recognition via Emotion-Specific Prompts and ASR Error Correction.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Chain-of-Thought Prompting for Speech Translation.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Projection Valued-based Quantum Machine Learning Adapting to Differential Privacy Algorithm for Word-level Lipreading.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

CoVoGER: A Multilingual Multitask Benchmark for Speech-to-text Generative Error Correction with Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

Spoken Conversational Agents with Large Language Models.

[BibT_eX]

[DOI]

Huck Yang

Andreas Stolcke

Larry P. Heck

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

Extending Automatic Machine Translation Evaluation to Book-Length Documents.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

SpeechIQ: Speech-Agentic Intelligence Quotient Across Cognitive Levels in Voice Understanding by Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

NeKo: Cross-Modality Post-Recognition Error Correction with Tasks-Guided Mixture-of-Experts Language Model.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track), 2025

LOTUS: A Leaderboard for Detailed Image Captioning from Quality to Societal Bias and User Preferences.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track), 2025

MISP-Meeting: A Real-World Dataset with Multimodal Cues for Long-form Meeting Transcription and Summarization.

[BibT_eX]

[DOI]

HangChen HangChen

Jia-Chen Gu

Jun Du

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

Multi-Domain Audio Question Answering in the DCASE 2025 Challenge.

[BibT_eX]

[DOI]

Dataset, April, 2024

A Perturbation Approach to Differential Privacy for Deep Learning based Speech Processing.

[BibT_eX]

[DOI]

PhD thesis, 2024

Leveraging Pre-Trained Neural Networks to Enhance Machine Learning with Variational Quantum Circuits.

[BibT_eX]

[DOI]

CoRR, 2024

NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts.

[BibT_eX]

[DOI]

CoRR, 2024

Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks.

[BibT_eX]

[DOI]

Fabian Ritter Gutierrez

CoRR, 2024

EoRA: Training-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation.

[BibT_eX]

[DOI]

CoRR, 2024

Benchmarking Japanese Speech Recognition on ASR-LLM Setups with Multi-Pass Augmented Generative Error Correction.

[BibT_eX]

[DOI]

CoRR, 2024

Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models.

[BibT_eX]

[DOI]

CoRR, 2024

Investigating Training Strategies and Model Robustness of Low-Rank Adaptation for Language Modeling in Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2024

Large Language Model Based Generative Error Correction: A Challenge and Baselines For Speech Recognition, Speaker Tagging, and Emotion Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Detecting the Undetectable: Assessing the Efficacy of Current Spoof Detection Methods Against Seamless Speech Edits.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

An Investigation of Incorporating Mamba For Speech Enhancement.

[BibT_eX]

[DOI]

Rong Chao

Wen-Huang Cheng

Moreno La Quatra

Szu-Wei Fu

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

QTRL: Toward Practical Quantum Reinforcement Learning via Quantum-Train.

[BibT_eX]

[DOI]

Chen-Yu Liu

Chu-Hsuan Abraham Lin

Kuan-Cheng Chen

Min-Hsiu Hsieh

Proceedings of the IEEE International Conference on Quantum Computing and Engineering, 2024

Large Language Models are Efficient Learners of Noise-Robust Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition.

[BibT_eX]

[DOI]

Chen Chen

Ruizhe Li

Yuchen Hu

Engsiong Chng

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Can Whisper Perform Speech-Based In-Context Learning?

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification.

[BibT_eX]

[DOI]

Venkatesh Ravichandran

Phani Sankar Nidadavolu

Proceedings of the IEEE International Conference on Acoustics, 2024

Paralinguistics-Enhanced Large Language Modeling of Spoken Dialogue.

[BibT_eX]

[DOI]

Guan-Ting Lin

Proceedings of the IEEE International Conference on Acoustics, 2024

Hot-Fixing Wake Word Recognition for End-to-End ASR Via Neural Model Reprogramming.

[BibT_eX]

[DOI]

Phani Sankar Nidadavolu

Proceedings of the IEEE International Conference on Acoustics, 2024

Exploiting A Quantum Multiple Kernel Learning Approach For Low-Resource Spoken Command Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Towards ASR Robust Spoken Language Understanding Through in-Context Learning with Word Confusion Networks.

[BibT_eX]

[DOI]

Kevin Everson

Yile Gu

Proceedings of the IEEE International Conference on Acoustics, 2024

Bayesian Example Selection Improves In-Context Learning for Speech, Text and Visual Modalities.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

FastAdaSP: Multitask-Adapted Efficient Inference for Large Speech Language Model.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: EMNLP 2024, 2024

From Descriptive Richness to Bias: Unveiling the Dark Side of Generative Image Caption Enrichment.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

Exploiting Low-Rank Tensor-Train Deep Neural Networks Based on Riemannian Gradient Descent With Illustrations of Speech Processing.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2023

Generative error correction for code-switching speech recognition using large language models.

[BibT_eX]

[DOI]

Eng Siong Chng

CoRR, 2023

A Neural State-Space Model Approach to Efficient Speech Separation.

[BibT_eX]

[DOI]

CoRR, 2023

Treatment Learning Causal Transformer for Noisy Image Classification.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023

Pessimistic Model Selection for Offline Deep Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Uncertainty in Artificial Intelligence, 2023

HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models.

[BibT_eX]

[DOI]

Chen Chen

Yuchen Hu

Chng Eng Siong

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Inference and Denoise: Causal Inference-Based Neural Speech Enhancement.

[BibT_eX]

[DOI]

Tsun-An Hsieh

Proceedings of the 33rd IEEE International Workshop on Machine Learning for Signal Processing, 2023

Neural Model Reprogramming with Similarity Based Mapping for Low-Resource Spoken Command Recognition.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Parameter-Efficient Learning for Text-to-Speech Accent Adaptation.

[BibT_eX]

[DOI]

Li-Jen Yang

Jen-Tzung Chien

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

A Parameter-Efficient Learning Approach to Arabic Dialect Identification with Pre-Trained General-Purpose Speech Model.

[BibT_eX]

[DOI]

Srijith Radhakrishnan

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

A Multi-dimensional Deep Structured State Space Approach to Speech Enhancement Using Small-footprint Models.

[BibT_eX]

[DOI]

Pin-Jui Ku

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Differentially Private Adapters for Parameter Efficient Acoustic Modeling.

[BibT_eX]

[DOI]

Chun-Wei Ho

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

How to Estimate Model Transferability of Pre-Trained Speech Models?

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

A Neural State-Space Modeling Approach to Efficient Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

A Quantum Kernel Learning Approach to Acoustic Modeling for Spoken Command Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Low-Resource Music Genre Classification with Cross-Modal Neural Model Reprogramming.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Certified Robustness of Quantum Classifiers Against Adversarial Examples Through Quantum Noise.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition.

[BibT_eX]

[DOI]

Srijith Radhakrishnan

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Causalainer: Causal Explainer for Automatic Video Summarization.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Low-Rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition.

[BibT_eX]

[DOI]

Yu Yu

Jari Kolehmainen

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Generative Speech Recognition Error Correction With Large Language Models and Task-Activating Prompting.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Enhancing Privacy Preservation with Quantum Computing for Word-Level Audio-Visual Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

2022

Low-Resource Music Genre Classification with Advanced Neural Model Reprogramming.

[BibT_eX]

[DOI]

CoRR, 2022

Theoretical Error Performance Analysis for Variational Quantum Circuit Based Functional Regression.

[BibT_eX]

[DOI]

CoRR, 2022

Treatment Learning Transformer for Noisy Image Classification.

[BibT_eX]

[DOI]

CoRR, 2022

Non-local Attention Improves Description Generation for Retinal Images.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022

An Experimental Study on Private Aggregation of Teacher Ensemble Learning for End-to-End Speech Recognition.

[BibT_eX]

[DOI]

I-Fan Chen

Andreas Stolcke

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

An Ensemble Teacher-Student Learning Approach with Poisson Sub-sampling to Differential Privacy Preserving Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

A Study on Joint Modeling and Data Augmentation of Multi-Modalities for Audio-Visual Scene Classification.

[BibT_eX]

[DOI]

Yannan Wang

Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

Causal Video Summarizer for Video Exploration.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2022

A Study of Designing Compact Audio-Visual Wake Word Spotting System Based on Iterative Fine-Tuning in Neural Network Pruning.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

When BERT Meets Quantum Temporal Convolution Learning for Text Classification in Heterogeneous Computing.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Mitigating Closed-Model Adversarial Examples with Bayesian Neural Modeling for Enhanced End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

A Variational Bayesian Approach to Learning Latent Variables for Acoustic Knowledge Transfer.

[BibT_eX]

[DOI]

Hu Hu

Proceedings of the IEEE International Conference on Acoustics, 2022

Training a Resilient Q-network against Observational Interference.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021

Attention Based Bidirectional Convolutional LSTM for High-Resolution Radio Tomographic Imaging.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. II Express Briefs, 2021

A Study of Low-Resource Speech Commands Recognition based on Adversarial Reprogramming.

[BibT_eX]

[DOI]

CoRR, 2021

QTN-VQC: An End-to-End Learning framework for Quantum Neural Networks.

[BibT_eX]

[DOI]

CoRR, 2021

A Lottery Ticket Hypothesis Framework for Low-Complexity Device-Robust Neural Acoustic Scene Classification.

[BibT_eX]

[DOI]

Hu Hu

CoRR, 2021

Longer Version for "Deep Context-Encoding Network for Retinal Image Captioning".

[BibT_eX]

[DOI]

CoRR, 2021

Causal Inference Q-Network: Toward Resilient Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2021

DeepOpht: Medical Report Generation for Retinal Images via Deep Models and Visual Explanation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2021

PATE-AAE: Incorporating Adversarial Autoencoder into Private Aggregation of Teacher Ensembles for Spoken Command Classification.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Voice2Series: Reprogramming Acoustic Models for Time Series Classification.

[BibT_eX]

[DOI]

Yun-Yun Tsai

Proceedings of the 38th International Conference on Machine Learning, 2021

Robust Unsupervised Multi-Object Tracking In Noisy Environments.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE International Conference on Image Processing, 2021

Deep Context-Encoding Network For Retinal Image Captioning.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE International Conference on Image Processing, 2021

Decentralizing Feature Extraction with Quantum Convolutional Neural Network for Automatic Speech Recognition.

[BibT_eX]

[DOI]

Xiaoli Ma

Proceedings of the IEEE International Conference on Acoustics, 2021

A Two-Stage Approach to Device-Robust Acoustic Scene Classification.

[BibT_eX]

[DOI]

Yannan Wang

Jun Du

Proceedings of the IEEE International Conference on Acoustics, 2021

Multi-Task Language Modeling for Improving Speech Recognition of Rare Words.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020

Device-Robust Acoustic Scene Classification Based on Two-Stage Categorization and Data Augmentation.

[BibT_eX]

[DOI]

Yannan Wang

Jun Du

CoRR, 2020

Variational Quantum Circuits for Deep Reinforcement Learning.

[BibT_eX]

[DOI]

IEEE Access, 2020

Exploring Deep Hybrid Tensor-to-Vector Network Architectures for Regression Based Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Wavelet Channel Attention Module With A Fusion Network For Single Image Deraining.

[BibT_eX]

[DOI]

Hao-Hsiang Yang

Yu-Chiang Frank Wang

Proceedings of the IEEE International Conference on Image Processing, 2020

Y-Net: Multi-Scale Feature Aggregation Network With Wavelet Structure Similarity Loss Function For Single Image Dehazing.

[BibT_eX]

[DOI]

Hao-Hsiang Yang

Yi-Chang James Tsai

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Enhanced Adversarial Strategically-Timed Attacks Against Deep Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Characterizing Speech Adversarial Examples Using Self-Attention U-Net Enhancement.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Submodular Rank Aggregation on Score-Based Permutations for Distributed Automatic Speech Recognition.

[BibT_eX]

[DOI]

Javier Tejedor

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Tensor-To-Vector Regression for Multi-Channel Speech Enhancement Based on Tensor-Train Network.

[BibT_eX]

[DOI]