Shengpeng Ji

Orcid: 0000-0003-0988-5266

According to our database1, Shengpeng Ji authored at least 45 papers between 2022 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
TAP: Parameter-efficient Task-Aware Prompting for Adverse Weather Removal.
CoRR, August, 2025

Open-set Cross Modal Generalization via Multimodal Unified Representation.
CoRR, July, 2025

IRBridge: Solving Image Restoration Bridge with Pre-trained Generative Diffusion Models.
CoRR, May, 2025

WavReward: Spoken Dialogue Models With Generalist Reward Evaluators.
CoRR, May, 2025

Astrea: A MOE-based Visual Understanding Model with Progressive Alignment.
CoRR, March, 2025

Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis.
CoRR, February, 2025

WavRAG: Audio-Integrated Retrieval Augmented Generation for Spoken Dialogue Models.
CoRR, February, 2025

OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios.
CoRR, January, 2025

WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

VoxDialogue: Can Spoken Dialogue Systems Understand Information Beyond Words?
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Enhancing Expressive Voice Conversion with Discrete Pitch-Conditioned Flow Matching Model.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

SpatialCLIP: Learning 3D-aware Image Representations from Spatially Discriminative Language.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

VoxpopuliTTS: a large-scale multilingual TTS corpus for zero-shot speech generation.
Proceedings of the 31st International Conference on Computational Linguistics, 2025

Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

InSerter: Speech Instruction Following with Unsupervised Interleaved Pre-training.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

UniCodec: Unified Audio Codec with Single Domain-Adaptive Codebook.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Language-Codec: Bridging Discrete Codec Representations and Speech Language Models.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

WavRAG: Audio-Integrated Retrieval Augmented Generation for Spoken Dialogue Models.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Enhancing Multimodal Unified Representations for Cross Modal Generalization.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

CART: A Generative Cross-Modal Retrieval Framework With Coarse-To-Fine Semantic Modeling.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

T2A-Feedback: Improving Basic Capabilities of Text-to-Audio Generation via Fine-grained AI Feedback.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Speech Watermarking with Discrete Intermediate Representations.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
Generating Neural Networks for Diverse Networking Classification Tasks via Hardware-Aware Neural Architecture Search.
IEEE Trans. Computers, February, 2024

LLaVA-MR: Large Language-and-Vision Assistant for Video Moment Retrieval.
CoRR, 2024

WavChat: A Survey of Spoken Dialogue Models.
CoRR, 2024

OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup.
CoRR, 2024

MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization.
CoRR, 2024

WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling.
CoRR, 2024

FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs.
CoRR, 2024

ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling.
CoRR, 2024

ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec.
CoRR, 2024

Unlocking the Potential of Multimodal Unified Discrete Representation through Training-Free Codebook Optimization and Hierarchical Alignment.
CoRR, 2024

Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models.
CoRR, 2024

MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech.
CoRR, 2024

SyncTalklip: Highly Synchronized Lip-Readable Speaker Generation with Multi-Task Learning.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

TextrolSpeech: A Text Style Control Speech Corpus with Codec Language Text-to-Speech Models.
Proceedings of the IEEE International Conference on Acoustics, 2024

AudioVSR: Enhancing Video Speech Recognition with Audio Data.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models.
CoRR, 2023

Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias.
CoRR, 2023

2022
Coded Distributed Computing Schemes with Fewer Output Functions.
Proceedings of the 6th International Conference on Computer Science and Artificial Intelligence, 2022


  Loading...