Xulong Zhang

Orcid: 0000-0001-7005-992X

Affiliations:

Lab of Large Audio Model (LLAM), Shanghai, China
Ping An Technology, Shenzhen, China
Fudan University, Shanghai, China (PhD 2021)

According to our database¹, Xulong Zhang authored at least 84 papers between 2013 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

DIVA: Harnessing the Representation Divergence in Unified Multimodal Models for Mutual Reinforcement.

[BibT_eX]

[DOI]

CoRR, May, 2026

From Inheritance to Saturation: Disentangling the Evolution of Visual Redundancy for Architecture-Aware MLLM Inference Acceleration.

[BibT_eX]

[DOI]

CoRR, April, 2026

Evolvable Embodied Agent for Robotic Manipulation via Long Short-Term Reflection and Optimization.

[BibT_eX]

[DOI]

CoRR, April, 2026

Attention-weighted Centered Kernel Alignment for Knowledge Distillation in Large Audio-Language Models Applied to Speech Emotion Recognition.

[BibT_eX]

[DOI]

CoRR, February, 2026

MIRRORTALK: Forging Personalized Avatars Via Disentangled Style and Hierarchical Motion Control.

[BibT_eX]

[DOI]

CoRR, January, 2026

Head-Aware Visual Cropping: Enhancing Fine-Grained VQA with Attention-Guided Subimage.

[BibT_eX]

[DOI]

Junfei Xie

Peng Pan

Xulong Zhang

CoRR, January, 2026

CARE: Multi-Task Pretraining for Latent Continuous Action Representation in Robot Control.

[BibT_eX]

[DOI]

CoRR, January, 2026

2025

Knowledge distillation for financial large language models: a systematic review of strategies, applications, and evaluation.

[BibT_eX]

[DOI]

Frontiers Inf. Technol. Electron. Eng., October, 2025

Logic Consistency Makes Large Language Models Personalized Reasoning Teachers.

[BibT_eX]

[DOI]

Proceedings of the International Joint Conference on Neural Networks, 2025

Bridging the Modality Gap: Semantic-Calibrated Zero-shot Speech Emotion Captioning.

[BibT_eX]

[DOI]

Jianzong Wang

Xulong Zhang

Xiaoyang Qu

Proceedings of the International Joint Conference on Neural Networks, 2025

Rano: Restorable Speaker Anonymization via Conditional Invertible Neural Network.

[BibT_eX]

[DOI]

Jianzong Wang

Xulong Zhang

Xiaoyang Qu

Proceedings of the International Joint Conference on Neural Networks, 2025

Turbo-TTS: Enhancing Diffusion Model TTS with an Improved ODE Solver.

[BibT_eX]

[DOI]

Proceedings of the Neural Information Processing - 32nd International Conference, 2025

CycleFlow: Leveraging Cycle Consistency in Flow Matching for Speaker Style Adaptation.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Homogeneous Graph Extraction: An Approach to Learning Heterogeneous Graph Embedding.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Graph Contrastive Learning with Decoupled Augmentation.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

2024

Semi-Supervised Self-Learning Enhanced Music Emotion Recognition.

[BibT_eX]

[DOI]

CoRR, 2024

EfficientASR: Speech Recognition Network Compression via Attention Redundancy and Chunk-Level FFN Optimization.

[BibT_eX]

[DOI]

Proceedings of the International Joint Conference on Neural Networks, 2024

ConTuner: Singing Voice Beautifying with Pitch and Expressiveness Condition.

[BibT_eX]

[DOI]

Proceedings of the International Joint Conference on Neural Networks, 2024

QLSC: A Query Latent Semantic Calibrator for Robust Extractive Question Answering.

[BibT_eX]

[DOI]

Proceedings of the International Joint Conference on Neural Networks, 2024

EAD-VC: Enhancing Speech Auto-Disentanglement for Voice Conversion with IFUB Estimator and Joint Text-Guided Consistent Learning.

[BibT_eX]

[DOI]

Proceedings of the International Joint Conference on Neural Networks, 2024

MAIN-VC: Lightweight Speech Representation Disentanglement for One-shot Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the International Joint Conference on Neural Networks, 2024

Learning Expressive Disentangled Speech Representations with Soft Speech Units and Adversarial Style Augmentation.

[BibT_eX]

[DOI]

Proceedings of the International Joint Conference on Neural Networks, 2024

RREH: Reconstruction Relations Embedded Hashing for Semi-paired Cross-Modal Retrieval.

[BibT_eX]

[DOI]

Proceedings of the Advanced Intelligent Computing Technology and Applications, 2024

Enhancing Emotion Recognition in Conversation Through Emotional Cross-Modal Fusion and Inter-class Contrastive Learning.

[BibT_eX]

[DOI]

Proceedings of the Advanced Intelligent Computing Technology and Applications, 2024

EmoTalker: Emotionally Editable Talking Face Generation via Diffusion Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

ED-TTS: Multi-Scale Emotion Modeling Using Cross-Domain Emotion Diarization for Emotional Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Learning Disentangled Speech Representations with Contrastive Learning and Time-Invariant Retrieval.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

IDEAW: Robust Neural Audio Watermarking with Invertible Dual-Embedding.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Medical Speech Symptoms Classification via Disentangled Representation.

[BibT_eX]

[DOI]

Proceedings of the 27th International Conference on Computer Supported Cooperative Work in Design, 2024

RSET: Remapping-Based Sorting Method for Emotion Transfer Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Web and Big Data - 8th International Joint Conference, 2024

2023

Melody Generation from Lyrics with Local Interpretability.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., 2023

DiffTalker: Co-driven audio-image diffusion for talking faces via intermediate landmarks.

[BibT_eX]

[DOI]

CoRR, 2023

Machine Unlearning Methodology base on Stochastic Teacher Network.

[BibT_eX]

[DOI]

CoRR, 2023

Symbolic & Acoustic: Multi-domain Music Emotion Modeling for Instrumental Music.

[BibT_eX]

[DOI]

CoRR, 2023

Sparks of Large Audio Models: A Survey and Outlook.

[BibT_eX]

[DOI]

CoRR, 2023

PMVC: Data Augmentation-Based Prosody Modeling for Expressive Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

EmoMix: Emotion Mixing via Diffusion Models for Emotional Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Investigation of Music Emotion Recognition Based on Segmented Semi-Supervised Learning.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

SAR: Self-Supervised Anti-Distortion Representation for End-To-End Speech Model.

[BibT_eX]

[DOI]

Proceedings of the International Joint Conference on Neural Networks, 2023

FastGraphTTS: An Ultrafast Syntax-Aware Speech Synthesis Framework.

[BibT_eX]

[DOI]

Proceedings of the 35th IEEE International Conference on Tools with Artificial Intelligence, 2023

AOSR-Net: All-in-One Sandstorm Removal Network.

[BibT_eX]

[DOI]

Proceedings of the 35th IEEE International Conference on Tools with Artificial Intelligence, 2023

Contrastive Latent Space Reconstruction Learning for Audio-Text Retrieval.

[BibT_eX]

[DOI]

Proceedings of the 35th IEEE International Conference on Tools with Artificial Intelligence, 2023

Improving EEG-based Emotion Recognition by Fusing Time-Frequency and Spatial Representations.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Dynamic Alignment Mask CTC: Improved Mask CTC With Aligned Cross Entropy.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

QI-TTS: Questioning Intonation Control for Emotional Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

VQ-CL: Learning Disentangled Speech Representations with Contrastive Learning and Vector Quantization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Learning Speech Representations with Flexible Hidden Feature Dimensions.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Improving Music Genre Classification from multi-modal Properties of Music and Genre Correlations Perspective.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

DQR-TTS: Semi-supervised Text-to-speech Synthesis with Dynamic Quantized Representation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Intl Conf on Parallel & Distributed Processing with Applications, 2023

CP-EB: Talking Face Generation with Controllable Pose and Eye Blinking Embedding.

[BibT_eX]

[DOI]

Proceedings of the IEEE Intl Conf on Parallel & Distributed Processing with Applications, 2023

CLN-VC: Text-Free Voice Conversion Based on Fine-Grained Style Control and Contrastive Learning with Negative Samples Augmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Intl Conf on Parallel & Distributed Processing with Applications, 2023

Research on the Impact of Executive Shareholding on New Investment in Enterprises Based on Multivariable Linear Regression Model.

[BibT_eX]

[DOI]

Proceedings of the Web and Big Data - 7th International Joint Conference, 2023

A Hierarchy-Based Analysis Approach for Blended Learning: A Case Study with Chinese Students.

[BibT_eX]

[DOI]

Proceedings of the Web and Big Data - 7th International Joint Conference, 2023

Stock Volatility Prediction Based on Transformer Model Using Mixed-Frequency Data.

[BibT_eX]

[DOI]

Proceedings of the Web and Big Data - 7th International Joint Conference, 2023

An Empirical Study of Attention Networks for Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Web and Big Data - 7th International Joint Conference, 2023

Symbolic and Acoustic: Multi-domain Music Emotion Modeling for Instrumental Music.

[BibT_eX]

[DOI]

Proceedings of the Advanced Data Mining and Applications - 19th International Conference, 2023

Voice Conversion with Denoising Diffusion Probabilistic GAN Models.

[BibT_eX]

[DOI]

Proceedings of the Advanced Data Mining and Applications - 19th International Conference, 2023

Machine Unlearning Methodology Based on Stochastic Teacher Network.

[BibT_eX]

[DOI]

Proceedings of the Advanced Data Mining and Applications - 19th International Conference, 2023

2022

Boosting Star-GANs for Voice Conversion with Contrastive Discriminator.

[BibT_eX]

[DOI]

CoRR, 2022

Linguistic-Enhanced Transformer with CTC Embedding for Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 18th International Conference on Mobility, Sensing and Networking, 2022

Improving Speech Representation Learning via Speech-level and Phoneme-level Masking Approach.

[BibT_eX]

[DOI]

Proceedings of the 18th International Conference on Mobility, Sensing and Networking, 2022

Improving Imbalanced Text Classification with Dynamic Curriculum Learning.

[BibT_eX]

[DOI]

Proceedings of the 18th International Conference on Mobility, Sensing and Networking, 2022

Semi-Supervised Learning Based on Reference Model for Low-resource TTS.

[BibT_eX]

[DOI]

Proceedings of the 18th International Conference on Mobility, Sensing and Networking, 2022

MetaSpeech: Speech Effects Switch Along with Environment for Metaverse.

[BibT_eX]

[DOI]

Proceedings of the 18th International Conference on Mobility, Sensing and Networking, 2022

Adapitch: Adaption Multi-Speaker Text-to-Speech Conditioned on Pitch Disentangling with Untranscribed Data.

[BibT_eX]

[DOI]

Proceedings of the 18th International Conference on Mobility, Sensing and Networking, 2022

Tiny-Sepformer: A Tiny Time-Domain Transformer Network For Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

MetaSID: Singer Identification with Domain Adaptation for Metaverse.

[BibT_eX]

[DOI]

Proceedings of the International Joint Conference on Neural Networks, 2022

Singer Identification for Metaverse with Timbral and Middle-Level Perceptual Features.

[BibT_eX]

[DOI]

Proceedings of the International Joint Conference on Neural Networks, 2022

TDASS: Target Domain Adaptation Speech Synthesis Framework for Multi-speaker Low-Resource TTS.

[BibT_eX]

[DOI]

Proceedings of the International Joint Conference on Neural Networks, 2022

MDCNN-SID: Multi-scale Dilated Convolution Network for Singer Identification.

[BibT_eX]

[DOI]

Proceedings of the International Joint Conference on Neural Networks, 2022

SUSing: SU-net for Singing Voice Synthesis.

[BibT_eX]

[DOI]

Proceedings of the International Joint Conference on Neural Networks, 2022

Pre-Avatar: An Automatic Presentation Generation Framework Leveraging Talking Avatar.

[BibT_eX]

[DOI]

Proceedings of the 34th IEEE International Conference on Tools with Artificial Intelligence, 2022

Boosting StarGANs for Voice Conversion with Contrastive Discriminator.

[BibT_eX]

[DOI]

Proceedings of the Neural Information Processing - 29th International Conference, 2022

nnSpeech: Speaker-Guided Conditional Variational Autoencoder for Zero-Shot Multi-speaker text-to-speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

DRVC: A Framework of Any-to-Any Voice Conversion with Self-Supervised Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Avqvc: One-Shot Voice Conversion By Vector Quantization With Applying Contrastive Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Shallow Diffusion Motion Model for Talking Face Generation from Speech.

[BibT_eX]

[DOI]

Proceedings of the Web and Big Data - 6th International Joint Conference, 2022

2021

Singer Identification Using Deep Timbre Feature Learning with KNN-NET.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Cyclegean: Cycle Generative Enhanced Adversarial Network for Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

TGAVC: Improving Autoencoder Voice Conversion with Text-Guided and Adversarial Training.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020

Music Artist Classification with WaveNet Classifier for Raw Waveform Audio Data.

[BibT_eX]

[DOI]

CoRR, 2020

Comparison for Improvements of Singing Voice Detection System Based on Vocal Separation.

[BibT_eX]

[DOI]

CoRR, 2020

2017

流行音乐主旋律提取技术综述 (Review on Main Melody Extraction from Pop Music).

[BibT_eX]

[DOI]

计算机科学, 2017

2013

Probability-Symmetric Storage Allocation for Distributed Storage Systems Based on Network Coding.

[BibT_eX]

[DOI]

Int. J. Online Biomed. Eng., 2013

Xulong Zhang

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...