Wei Han

Pan Zhou

Shuicheng Yan

CoRR, March, 2026

2025

Spiking Variational Graph Representation Inference for Video Summarization.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2025

PREMISE: Matching-based Prediction for Accurate Review Recommendation.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, USA, April 29, 2025

Hyperbolic-Constraint Point Cloud Reconstruction from Single RGB-D Images.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

Riemann-based Multi-scale Attention Reasoning Network for Text-3D Retrieval.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

Two are better than one: Context window extension with multi-grained self-injection.

[BibT_eX]

[DOI]

CoRR, 2024

INSTRAUG: Automatic Instruction Augmentation for Multimodal Instruction Fine-tuning.

[BibT_eX]

[DOI]

CoRR, 2024

Self-Adaptive Sampling for Accurate Video Question Answering on Image Text Models.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

RoboVQA: Multimodal Long-Horizon Reasoning for Robotics.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Automation, 2024

Retrieval Augmented End-to-End Spoken Dialog Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

Dialogue Relation Extraction with Document-Level Heterogeneous Graph Attention Networks.

[BibT_eX]

[DOI]

Cogn. Comput., March, 2023

SLM: Bridge the thin gap between speech and text foundation models.

[BibT_eX]

[DOI]

CoRR, 2023

Multimodal Modeling For Spoken Language Identification.

[BibT_eX]

[DOI]

CoRR, 2023

SAS Video-QA: Self-Adaptive Sampling for Efficient Video Question-Answering.

[BibT_eX]

[DOI]

CoRR, 2023

AudioPaLM: A Large Language Model That Can Speak and Listen.

[BibT_eX]

[DOI]

CoRR, 2023

Speech-to-Text Adapter and Speech-to-Entity Retriever Augmented LLMs for Speech Understanding.

[BibT_eX]

[DOI]

CoRR, 2023

Domain-Expanded ASTE: Rethinking Generalization in Aspect Sentiment Triplet Extraction.

[BibT_eX]

[DOI]

Sharifah Mahani Aljunied

Lidong Bing

CoRR, 2023

Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages.

[BibT_eX]

[DOI]

CoRR, 2023

Noise2Music: Text-conditioned Music Generation with Diffusion Models.

[BibT_eX]

[DOI]

Christian Havnø Frank

CoRR, 2023

Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations.

[BibT_eX]

[DOI]

Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2023

Label Aware Speech Representation Learning For Language Identification.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Speech Aware Dialog System Technology Challenge (DSTC11).

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Accelerating RNN-T Training and Inference Using CTC Guidance.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Efficient Domain Adaptation for Speech Foundation Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

SLM: Bridge the Thin Gap Between Speech and Text Foundation Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022

BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Signal Process., 2022

Accented Speech Recognition: Benchmarking, Pre-training, and Diverse Data.

[BibT_eX]

[DOI]

CoRR, 2022

Unsupervised Data Selection via Discrete Speech Representation for ASR.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Universal Paralinguistic Speech Representations Using self-Supervised Conformers.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Improving The Latency And Quality Of Cascaded Encoders.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

MM-Align: Learning Optimal Transport-based Alignment Dynamics for Fast and Accurate Inference on Missing Modality Sequences.

[BibT_eX]

[DOI]

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

SAT: Improving Semi-Supervised Text Classification with Simple Instance-Adaptive Self-Training.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

SANCL: Multimodal Review Helpfulness Prediction with Selective Attention and Natural Contrastive Learning.

[BibT_eX]

[DOI]

Proceedings of the 29th International Conference on Computational Linguistics, 2022

DoubleMix: Simple Interpolation-Based Data Augmentation for Text Classification.

[BibT_eX]

[DOI]

Proceedings of the 29th International Conference on Computational Linguistics, 2022

2021

Bridging the gap between streaming and non-streaming ASR systems bydistilling ensembles of CTC and RNN-T models.

[BibT_eX]

[DOI]

CoRR, 2021

RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and Solutions.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Exploring Targeted Universal Adversarial Perturbations to End-to-End ASR Models.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Bridging the Gap Between Streaming and Non-Streaming ASR Systems by Distilling Ensembles of CTC and RNN-T Models.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal Sentiment Analysis.

[BibT_eX]

[DOI]

Louis-Philippe Morency

Proceedings of the ICMI '21: International Conference on Multimodal Interaction, 2021

Dual-mode ASR: Unify and Improve Streaming ASR with Full-context Modeling.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

FastEmit: Low-Latency Streaming ASR with Sequence-Level Emission Regularization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

A Better and Faster end-to-end Model for Streaming ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Improving Streaming Automatic Speech Recognition with Non-Streaming Model Distillation on Unsupervised Data.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis.

[BibT_eX]

[DOI]

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

w2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020

Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2020

Universal ASR: Unify and Improve Streaming ASR with Full-context Modeling.

[BibT_eX]

[DOI]

CoRR, 2020

Improved Noisy Student Training for Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Conformer: Convolution-augmented Transformer for Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Streaming Object Detection for 3-D Point Clouds.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Scalability in Perception for Autonomous Driving: Waymo Open Dataset.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019

Learning compact neural network representations with structural priors

[BibT_eX]

[DOI]