Yangyang Shi

Proceedings of the Forty-second International Conference on Machine Learning, 2025

From Global to Local: Mamba-Based Hierarchical Registration for Respiratory Lung Deformation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine, 2025

Distance-Aware and Knowledge-Driven Vision Mamba U-Net for Radiotherapy Dose Prediction.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine, 2025

MMW: Side Talk Rejection Multi-Microphone Whisper On Smart Glasses.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

AutoMixer: Checkpoint Artifacts as Automatic Data Mixers.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

Self-Calibration Method of Displacement Sensor in AMB-Rotor System Based on Magnetic Bearing Current Control.

[BibT_eX]

[DOI]

IEEE Trans. Ind. Electron., May, 2024

SyncFlow: Toward Temporally Aligned Joint Audio-Video Generation from Text.

[BibT_eX]

[DOI]

CoRR, 2024

MASV: Speaker Verification with Global and Local Context Mamba.

[BibT_eX]

[DOI]

CoRR, 2024

Llama Guard 3-1B-INT4: Compact and Efficient Safeguard for Human-AI Conversations.

[BibT_eX]

[DOI]

CoRR, 2024

MFF-FTNet: Multi-scale Feature Fusion across Frequency and Temporal Domains for Time Series Forecasting.

[BibT_eX]

[DOI]

CoRR, 2024

High Fidelity Text-Guided Music Generation and Editing via Single-Stage Flow Matching.

[BibT_eX]

[DOI]

CoRR, 2024

Speech ReaLLM - Real-time Streaming Speech Recognition with Multimodal LLMs by Teaching the Flow of Time.

[BibT_eX]

[DOI]

CoRR, 2024

Basis Selection: Low-Rank Decomposition of Pretrained Large Language Models for Target Applications.

[BibT_eX]

[DOI]

CoRR, 2024

Not All Weights Are Created Equal: Enhancing Energy Efficiency in On-Device Streaming Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2024

FADI-AEC: Fast Score Based Diffusion Model Guided by Far-end Signal for Acoustic Echo Cancellation.

[BibT_eX]

[DOI]

CoRR, 2024

StegoType: Surface Typing from Egocentric Cameras.

[BibT_eX]

[DOI]

Proceedings of the Adjunct Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology, 2024

Data Efficient Reflow for Few Step Audio Generation.

[BibT_eX]

[DOI]

Wei-Ning Hsu

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Foleygen: Visually-Guided Audio Generation.

[BibT_eX]

[DOI]

Proceedings of the 34th IEEE International Workshop on Machine Learning for Signal Processing, 2024

Characterizing the Histology Spatial Intersections Between Tumor-Infiltrating Lymphocytes and Tumors for Survival Prediction of Cancers Via Graph Contrastive Learning.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning in Medical Imaging - 15th International Workshop, 2024

Speech ReaLLM - Real-time Speech Recognition with Multimodal Language Models by Teaching the Flow of Time.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases.

[BibT_eX]

[DOI]

Liangzhen Lai

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Folding Attention: Memory and Power Optimization for On-Device Transformer-Based Streaming Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Stack-and-Delay: A New Codebook Pattern for Music Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

On the Open Prompt Challenge in Conditional Audio Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

In-Context Prompt Editing for Conditional Audio Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Scheduled Execution-Based Binary Indirect Call Targets Refinement.

[BibT_eX]

[DOI]

Proceedings of the Computer Security - ESORICS 2024, 2024

Scaling Parameter-Constrained Language Models with Quality Data.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: EMNLP 2024, 2024

Target-Aware Language Modeling via Granular Data Sampling.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Tumor Micro-Environment Interactions Guided Graph Learning for Survival Analysis of Human Cancers from Whole-Slide Pathological Images.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

LLM-QAT: Data-Free Quantization Aware Training for Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023

Model Reference Adaptive Compensation and Robust Controller for Magnetic Bearing Systems With Strong Persistent Disturbances.

[BibT_eX]

[DOI]

IEEE Trans. Ind. Electron., November, 2023

Characterizing the Survival-Associated Interactions Between Tumor-Infiltrating Lymphocytes and Tumors From Pathological Images and Multi-Omics Data.

[BibT_eX]

[DOI]

IEEE Trans. Medical Imaging, October, 2023

TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch.

[BibT_eX]

[DOI]

CoRR, 2023

Enhance audio generation controllability through representation similarity regularization.

[BibT_eX]

[DOI]

CoRR, 2023

Folding Attention: Memory and Power Optimization for On-Device Transformer-based Streaming Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2023

DISGO: Automatic End-to-End Evaluation for Scene Text OCR.

[BibT_eX]

[DOI]

CoRR, 2023

Biased Self-supervised Learning for ASR.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Multi-Head State Space Model for Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

SCA: Streaming Cross-Attention Alignment For Echo Cancellation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Improving fast-slow Encoder based Transducer with Streaming Deliberation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

TorchAudio 2.1: Advancing Speech Recognition, Self-Supervised Learning, and Audio Processing Components for Pytorch.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Towards Zero-Shot Multilingual Transfer for Code-Switched Responses.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Binary and Ternary Natural Language Generation.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Revisiting Sample Size Determination in Natural Language Understanding.

[BibT_eX]

[DOI]

Ernie Chang

Muhammad Hassan Rashid

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022

Position Extraction of Ultralow-Speed Gimbal Servo System With Linear Hall Sensors.

[BibT_eX]

[DOI]

Haitao Li

Bangcheng Han

IEEE Trans. Ind. Electron., 2022

Synergistic Digital Twin and Holographic Augmented-Reality-Guided Percutaneous Puncture of Respiratory Liver Tumor.

[BibT_eX]

[DOI]

IEEE Trans. Hum. Mach. Syst., 2022

LiCo-Net: Linearized Convolution Network for Hardware-efficient Keyword Spotting.

[BibT_eX]

[DOI]

CoRR, 2022

Learning a Dual-Mode Speech Recognition Model VIA Self-Pruning.

[BibT_eX]

[DOI]

Ozlem Kalinli

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Streaming parallel transducer beam search with fast slow cascaded encoders.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Streaming Transformer Transducer based Speech Recognition Using Non-Causal Convolution.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Gadgets Splicing: Dynamic Binary Transformation for Precise Rewriting.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2022

2021

TorchAudio: Building Blocks for Audio and Speech Processing.

[BibT_eX]

[DOI]

CoRR, 2021

Transferring Voice Knowledge for Acoustic Event Detection: An Empirical Study.

[BibT_eX]

[DOI]

CoRR, 2021

Flexi-Transducer: Optimizing Latency, Accuracy and Compute forMulti-Domain On-Device Scenarios.

[BibT_eX]

[DOI]

CoRR, 2021

A multiple-relaxation-time collision model by Hermite expansion.

[BibT_eX]

[DOI]

Xiaowen Shan

Xuhui Li

CoRR, 2021

Versatile multi-constrained planning for thermal ablation of large liver tumors.

[BibT_eX]

[DOI]

Comput. Medical Imaging Graph., 2021

Streaming Attention-Based Models with Augmented Memory for End-To-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Internal Motion Estimation during Free-Breathing via External/Internal Correlation Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Real-time Computing and Robotics, 2021

Transformer-Based Acoustic Modeling for Streaming Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Dynamic Encoder Transducer: A Flexible Solution for Trading Off Accuracy for Latency.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Dissecting User-Perceived Latency of On-Device E2E Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Collaborative Training of Acoustic Encoders for Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Flexi-Transducer: Optimizing Latency, Accuracy and Compute for Multi-Domain On-Device Scenarios.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Contextualized Streaming End-to-End Speech Recognition with Trie-Based Deep Biasing and Shallow Fusion.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Transformer in Action: A Comparative Study of Transformer-Based Acoustic Models for Large Scale Speech Recognition Applications.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Emformer: Efficient Memory Transformer Based Acoustic Model for Low Latency Streaming Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

On Lattice-Free Boosted MMI Training of HMM and CTC-Based Full-Context ASR Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020

Emformer: Efficient Memory Transformer Based Acoustic Model For Low Latency Streaming Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2020

Incorporating Android Code Smells into Java Static Code Metrics for Security Risk Prediction of Android Applications.

[BibT_eX]

[DOI]

Proceedings of the 20th IEEE International Conference on Software Quality, 2020

Functional code clone detection with syntax and semantics fusion learning.

[BibT_eX]

[DOI]

Proceedings of the ISSTA '20: 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2020

Streaming Transformer-Based Acoustic Models Using Self-Attention with Augmented Memory.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Weak-Attention Suppression for Transformer Based Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Mining Effective Negative Training Samples for Keyword Spotting.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Region Proposal Network Based Small-Footprint Keyword Spotting.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2019

Knowledge Distillation for Recurrent Neural Network Language Modeling with Trust Regularization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

End-to-end Speech Recognition Using a High Rank LSTM-CTC Based Model.

[BibT_eX]

[DOI]

Mei-Yuh Hwang

Xin Lei

Proceedings of the IEEE International Conference on Acoustics, 2019

2018

A review of "linear programming computation" by Ping-Qi Pan.

[BibT_eX]

[DOI]

Lei-Hong Zhang

Wenxing Zhu

Eur. J. Oper. Res., 2018

Robust Control for a Magnetically Suspended Control Moment Gyro with Strong Gyroscopic Effects.

[BibT_eX]

[DOI]

Proceedings of the IECON 2018, 2018

2017

基于Feistel结构的超轻量级分组密码算法(PFP) (Ultra-lightweight Block Cipher Algorithm (PFP) Based on Feistel Structure).

[BibT_eX]

[DOI]

计算机科学, 2017

2016

Deep LSTM based Feature Mapping for Query Classification.

[BibT_eX]

[DOI]

Proceedings of the NAACL HLT 2016, 2016

Recurrent Support Vector Machines For Slot Tagging In Spoken Language Understanding.

[BibT_eX]

[DOI]

Proceedings of the NAACL HLT 2016, 2016

2015

Integrating meta-information into recurrent neural network language models.

[BibT_eX]

[DOI]

Speech Commun., 2015

Recurrent neural network language model adaptation with curriculum learning.

[BibT_eX]

[DOI]

Martha A. Larson

Comput. Speech Lang., 2015

RNN-based labeled data generation for spoken language understanding.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Contextual spoken language understanding using recurrent neural networks.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

A factorization network based method for multi-lingual domain classification.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Semi-supervised slot tagging in spoken language understanding using recurrent transductive support vector machines.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014

Spoken language understanding using long short-term memory neural networks.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

Cluster based Chinese abbreviation modeling.

[BibT_eX]

[DOI]

Yi-Cheng Pan

Mei-Yuh Hwang

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

2013

Classifying the socio-situational settings of transcripts of spoken discourses.

[BibT_eX]

[DOI]

Speech Commun., 2013

K-Component Adaptive Recurrent Neural Network Language Models.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech, and Dialogue - 16th International Conference, 2013

Recurrent neural networks for language understanding.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Exploiting the succeeding words in recurrent neural network language models.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Speed up of recurrent neural network language models with sentence independent subsampling stochastic gradient descent.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

K-component recurrent neural network language models using curriculum learning.

[BibT_eX]

[DOI]

Martha A. Larson

Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

2012

Adaptive Language Modeling with a Set of Domain Dependent Models.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech and Dialogue - 15th International Conference, 2012

TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization with one-vs-all classifiers.

[BibT_eX]

[DOI]

Peng Xu

Martha A. Larson

Proceedings of the Working Notes Proceedings of the MediaEval 2012 Workshop, 2012

MediaEval 2012 Tagging Task: Prediction based on One Best List and Confusion Networks.

[BibT_eX]

[DOI]

Proceedings of the Working Notes Proceedings of the MediaEval 2012 Workshop, 2012

Towards Recurrent Neural Networks Language Models with Linguistic and Contextual Features.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Dynamic Bayesian socio-situational setting classification.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011

Combining Topic Specific Language Models.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech and Dialogue - 14th International Conference, 2011

Socio-situational setting classification based on language use.

[BibT_eX]

[DOI]