Chong Zhang

Orcid: 0000-0002-2162-4344

Affiliations:
  • Alibaba Group, Speech Lab of DAMO Academy, Singapore
  • National University of Singapore, Department of Electrical and Computer Engineering, Singapore (PhD 2017)


According to our database1, Chong Zhang authored at least 37 papers between 2015 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
OmniDRCA: Parallel Speech-Text Foundation Model via Dual-Resolution Speech Representations and Contrastive Alignment.
CoRR, June, 2025

Online Audio-Visual Autoregressive Speaker Extraction.
CoRR, June, 2025

Plug-and-Play Co-Occurring Face Attention for Robust Audio-Visual Speaker Extraction.
CoRR, May, 2025

InspireMusic: Integrating Super Resolution and Large Language Model for High-Fidelity Long-Form Music Generation.
CoRR, March, 2025

MinMo: A Multimodal Large Language Model for Seamless Voice Interaction.
CoRR, January, 2025

Conditional Latent Diffusion-Based Speech Enhancement via Dual Context Learning.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

HiFi-SR: A Unified Generative Transformer-Convolutional Adversarial Network for High-Fidelity Speech Super-Resolution.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

UniCodec: Unified Audio Codec with Single Domain-Adaptive Codebook.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024
Fine-Tuning Channel-Pruned Deep Model via Knowledge Distillation.
J. Comput. Sci. Technol., November, 2024

Tuning Large Language Model for Speech Recognition With Mixed-Scale Re-Tokenization.
IEEE Signal Process. Lett., 2024

Emotional Dimension Control in Language Model-Based Text-to-Speech: Spanning a Broad Spectrum of Human Emotions.
CoRR, 2024

Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers.
CoRR, 2024

Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

MossFormer2: Combining Transformer and RNN-Free Recurrent Network for Enhanced Time-Domain Monaural Speech Separation.
Proceedings of the IEEE International Conference on Acoustics, 2024

SPGM: Prioritizing Local Features for Enhanced Speech Separation Performance.
Proceedings of the IEEE International Conference on Acoustics, 2024

Are Soft Prompts Good Zero-Shot Learners for Speech Recognition?
Proceedings of the IEEE International Conference on Acoustics, 2024

Loss Masking Is Not Needed In Decoder-Only Transformer For Discrete-Token-Based ASR.
Proceedings of the IEEE International Conference on Acoustics, 2024

2023
ACA-Net: Towards Lightweight Speaker Verification using Asymmetric Cross Attention.
CoRR, 2023

deHuBERT: Disentangling Noise in a Self-supervised Model for Robust Speech Recognition.
CoRR, 2023

ACA-Net: Towards Lightweight Speaker Verification using Asymmetric Cross Attention.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Dual-Memory Multi-Modal Learning for Continual Spoken Keyword Spotting with Confidence Selection and Diversity Enhancement.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

A Unified Recognition and Correction Model under Noisy and Accent Speech Conditions.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Dual Acoustic Linguistic Self-supervised Representation Learning for Cross-Domain Speech Recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Adapter-tuning with Effective Token-dependent Representation Shift for Automatic Speech Recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Adaptive Knowledge Distillation Between Text and Speech Pre-Trained Models.
Proceedings of the IEEE International Conference on Acoustics, 2023

Contrastive Speech Mixup for Low-Resource Keyword Spotting.
Proceedings of the IEEE International Conference on Acoustics, 2023

De'hubert: Disentangling Noise in a Self-Supervised Model for Robust Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

Auxiliary Pooling Layer For Spoken Language Understanding.
Proceedings of the IEEE International Conference on Acoustics, 2023

Ditto: A Simple and Efficient Approach to Improve Sentence Embeddings.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

2022
I2CR: Improving Noise Robustness on Keyword Spotting Using Inter-Intra Contrastive Regularization.
CoRR, 2022

2019
A Cost-Sensitive Deep Belief Network for Imbalanced Classification.
IEEE Trans. Neural Networks Learn. Syst., 2019

2018
A Multi-State Diagnosis and Prognosis Framework with Feature Learning for Tool Condition Monitoring.
CoRR, 2018

Gated Recurrent Units Based Neural Network For Tool Condition Monitoring.
Proceedings of the 2018 International Joint Conference on Neural Networks, 2018

2017
Multiobjective Deep Belief Networks Ensemble for Remaining Useful Life Estimation in Prognostics.
IEEE Trans. Neural Networks Learn. Syst., 2017

A data-driven prognostics framework for tool remaining useful life estimation in tool condition monitoring.
Proceedings of the 22nd IEEE International Conference on Emerging Technologies and Factory Automation, 2017

2016
Training cost-sensitive Deep Belief Networks on imbalance data problems.
Proceedings of the 2016 International Joint Conference on Neural Networks, 2016

2015
Deep Belief Networks Ensemble with Multi-objective Optimization for Failure Diagnosis.
Proceedings of the 2015 IEEE International Conference on Systems, 2015


  Loading...