Zhikang Niu

Orcid: 0009-0002-2709-9381

According to our database1, Zhikang Niu authored at least 19 papers between 2023 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence.
CoRR, October, 2025

SAC: Neural Speech Codec with Semantic-Acoustic Dual-Stream Quantization.
CoRR, October, 2025

DiSTAR: Diffusion over a Scalable Token Autoregressive Representation for Speech Generation.
CoRR, October, 2025

UniVoice: Unifying Autoregressive ASR and Flow-Matching based TTS with Large Language Models.
CoRR, October, 2025

Cross-Lingual F5-TTS: Towards Language-Agnostic Voice Cloning and Speech Synthesis.
CoRR, September, 2025

MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows.
CoRR, August, 2025

Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment.
CoRR, May, 2025

MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix.
CoRR, May, 2025

EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting.
CoRR, April, 2025

URO-Bench: A Comprehensive Benchmark for End-to-End Spoken Dialogue Models.
CoRR, February, 2025

Deep Learning-Based Real-Time Precise Pose Estimation Using Differential Magnetic Signals in the Dual-Robot Processing System.
IEEE Trans. Instrum. Meas., 2025

Accelerating Flow-Matching-Based Text-to-Speech via Empirically Pruned Step Sampling.
Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Accelerating Diffusion-based Text-to-Speech Model Trainingwith Dual Modality Alignment.
Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

A Progressive Generation Framework with Speech Pre-trained Model for Expressive Voice Conversion.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2025

VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024
NDVQ: Robust Neural Audio Codec With Normal Distribution-Based Vector Quantization.
Proceedings of the IEEE Spoken Language Technology Workshop, 2024

2023
Fast-Hubert: an Efficient Training Framework for Self-Supervised Speech Representation Learning.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023


  Loading...