Yan Rong

Orcid: 0009-0006-0857-203X

According to our database1, Yan Rong authored at least 16 papers between 2010 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
Audio-DeepThinker: Progressive Reasoning-Aware Reinforcement Learning for High-Quality Chain-of-Thought Emergence in Audio Language Models.
CoRR, April, 2026

2025
PhyAVBench: A Challenging Audio Physics-Sensitivity Benchmark for Physically Grounded Text-to-Audio-Video Generation.
CoRR, December, 2025

AudioGenie-Reasoner: A Training-Free Multi-Agent Framework for Coarse-to-Fine Audio Deep Reasoning.
CoRR, September, 2025

Dopamine Audiobook: A Training-free MLLM Agent for Emotional and Human-like Audiobook Generation.
CoRR, April, 2025

AudioGenie: A Training-Free Multi-Agent Framework for Diverse Multimodality-to-Multiaudio Generation.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Using Domain Adapter and Prompt Learner for Few-Shot Teaching Action Recognition in videos.
Proceedings of the International Joint Conference on Neural Networks, 2025

A Hybrid CNN-Transformer Network for Fine-Grained Tongue Multi-Attribute Classification with Tongue Group Attribute Mask Training in Traditional Chinese Medicine Diagnosis.
Proceedings of the Advanced Intelligent Computing Technology and Applications, 2025

Towards Controllable Speech Synthesis in the Era of Large Language Models: A Systematic Survey.
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

Seeing Your Speech Style: A Novel Zero-Shot Identity-Disentanglement Face-based Voice Conversion.
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024
Towards Controllable Speech Synthesis in the Era of Large Language Models: A Survey.
CoRR, 2024

Seeing Your Speech Style: A Novel Zero-Shot Identity-Disentanglement Face-based Voice Conversion.
CoRR, 2024

Segment Anything for Videos: A Systematic Survey.
CoRR, 2024

Cued Speech-Integrated Audio-Visual Variational Autoencoder for Speech Enhancement.
Proceedings of the Social Robotics - 16th International Conference, 2024

SFERNet: Student Facial Expression Recognition Using Superpixel-Assisted Global Semantic Enhancement and Fine-Grained Features.
Proceedings of the Advanced Intelligent Computing Technology and Applications, 2024

2015
PAS: An Efficient Privacy-Preserving Multidimensional Aggregation Scheme for Smart Grid.
Int. J. Distributed Sens. Networks, 2015

2010
Closed-Loop Stiffness Modeling and Stiffness Performance Analysis for Multi-axis Process System.
Proceedings of the Intelligent Robotics and Applications - Third International Conference, 2010


  Loading...