Jinzheng He

Orcid: 0009-0003-3024-9624

According to our database¹, Jinzheng He authored at least 31 papers between 2020 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2025

Omni-Captioner: Data Pipeline, Models, and Benchmark for Omni Detailed Perception.

[BibT_eX]

[DOI]

CoRR, October, 2025

WavReward: Spoken Dialogue Models With Generalist Reward Evaluators.

[BibT_eX]

[DOI]

CoRR, May, 2025

Qwen2.5-Omni Technical Report.

[BibT_eX]

[DOI]

CoRR, March, 2025

WavRAG: Audio-Integrated Retrieval Augmented Generation for Spoken Dialogue Models.

[BibT_eX]

[DOI]

CoRR, February, 2025

WavRAG: Audio-Integrated Retrieval Augmented Generation for Spoken Dialogue Models.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes.

[BibT_eX]

[DOI]

CoRR, 2024

SongTrans: An unified song transcription and alignment method for lyrics and notes.

[BibT_eX]

[DOI]

CoRR, 2024

MulliVC: Multi-lingual Voice Conversion With Cycle Consistency.

[BibT_eX]

[DOI]

CoRR, 2024

Qwen2-Audio Technical Report.

[BibT_eX]

[DOI]

CoRR, 2024

Qwen2 Technical Report.

[BibT_eX]

[DOI]

CoRR, 2024

GTSinger: A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Wav2SQL: Direct Generalizable Speech-To-SQL Parsing.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Mega-TTS 2: Zero-Shot Text-to-Speech with Arbitrary Length Speech Prompts.

[BibT_eX]

[DOI]

CoRR, 2023

GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation.

[BibT_eX]

[DOI]

CoRR, 2023

UniSinger: Unified End-to-End Singing Voice Synthesis With Cross-Modality Information Matching.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer.

[BibT_eX]

[DOI]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-Training.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

RMSSinger: Realistic-Music-Score based Singing Voice Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022

M4Singer: A Multi-Style, Multi-Singer and Musical Score Provided Mandarin Singing Corpus.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

DualSign: Semi-Supervised Sign Language Production with Balanced Multi-Modal Multi-Task Dual Transformation.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Multi-task Learning-Driven Volume and Slice Level Contrastive Learning for 3D Medical Image Classification.

[BibT_eX]

[DOI]

Jiayuan Zhu

Shujun Wang

Jinzheng He

Carola-Bibiane Schönlieb

Lequan Yu

Proceedings of the Computational Mathematics Modeling in Cancer Analysis, 2022

Flow-Based Unconstrained Lip to Speech Generation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2020

PopMAG: Pop Music Accompaniment Generation.

[BibT_eX]

[DOI]

Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Jinzheng He

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...