Ziyang Ma
Orcid: 0000-0002-8195-3262Affiliations:
- Shanghai Jiao Tong University, Department of Computer Science and Engineering, AI Institute, MoE Key Lab of Artificial Intelligence, Shanghai, China
- Shandong University, School of Computer Science and Technology, Shandong, China (until 2022)
According to our database1,
Ziyang Ma
authored at least 69 papers
between 2021 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on ziyang.tech
-
on orcid.org
On csauthors.net:
Bibliography
2025
CoRR, July, 2025
NTU Speechlab LLM-Based Multilingual ASR System for Interspeech MLC-SLM Challenge 2025.
CoRR, June, 2025
MagiCodec: Simple Masked Gaussian-Injected Codec for High-Fidelity Reconstruction and Generation.
CoRR, June, 2025
Accelerating Flow-Matching-Based Text-to-Speech via Empirically Pruned Step Sampling.
CoRR, May, 2025
AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models.
CoRR, May, 2025
MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix.
CoRR, May, 2025
Enhancing Speech-to-Speech Dialogue Modeling with End-to-End Retrieval-Augmented Generation.
CoRR, May, 2025
CoRR, April, 2025
CoRR, March, 2025
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens.
CoRR, March, 2025
CoRR, February, 2025
CoRR, January, 2025
MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization.
CoRR, January, 2025
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025
DRCap: Decoding CLAP Latents with Retrieval-Augmented Generation for Zero-shot Audio Captioning.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025
SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025
Proceedings of the Findings of the Association for Computational Linguistics, 2025
GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025
Proceedings of the Findings of the Association for Computational Linguistics, 2025
ELLA-V: Stable Neural Codec Language Modeling with Alignment-Guided Sequence Reordering.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025
Speech Recognition Meets Large Language Model: Benchmarking, Models, and Exploration.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025
2024
IEEE ACM Trans. Audio Speech Lang. Process., 2024
CoRR, 2024
Progressive Residual Extraction based Pre-training for Speech Representation Learning.
CoRR, 2024
CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens.
CoRR, 2024
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs.
CoRR, 2024
CoRR, 2024
The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge.
CoRR, 2024
Proceedings of the IEEE Spoken Language Technology Workshop, 2024
Proceedings of the IEEE Spoken Language Technology Workshop, 2024
1st Place Solution to Odyssey Emotion Recognition Challenge Task1: Tackling Class Imbalance Problem.
Proceedings of the Odyssey 2024: The Speaker and Language Recognition Workshop, 2024
MER 2024: Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Multimodal Emotion Recognition.
Proceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing, 2024
Improving Emotion Recognition with Pre-Trained Models, Multimodality, and Contextual Information.
Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024
The X-Lance Technical Report for Interspeech 2024 Speech Processing using Discrete Speech Unit Challenge.
Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024
TacoLM: GaTed Attention Equipped Codec Language Model are Efficient Zero-Shot Text to Speech Synthesizers.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024
Proceedings of the Forty-first International Conference on Machine Learning, 2024
Hourglass-AVSR: Down-Up Sampling-Based Computational Efficiency Model for Audio-Visual Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the Findings of the Association for Computational Linguistics, 2024
Proceedings of the Findings of the Association for Computational Linguistics, 2024
2023
Improving Code-Switching and Named Entity Recognition in ASR with Speech Editing based Data Augmentation.
CoRR, 2023
Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Improving Code-Switching and Name Entity Recognition in ASR with Speech Editing based Data Augmentation.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Front-End Adapter: Adapting Front-End Input of Speech Based Self-Supervised Learning for Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Fast-Hubert: an Efficient Training Framework for Self-Supervised Speech Representation Learning.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech Recognition.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
2022
2021
Proceedings of the MMAsia '21: ACM Multimedia Asia, Gold Coast, Australia, December 1, 2021