Wei Xue

Orcid: 0000-0002-4942-7748

Affiliations:
  • Hong Kong Baptist University, Division of Emerging Interdisciplinary Areas, Hong Kong
  • Imperial College London, Department of Electrical and Electronic Engineering, UK
  • Chinese Academy of Sciences (CAS), Pattern Recognition and Intelligent Systems from the Institute of Automation, Beijing, China (PhD 2015)


According to our database1, Wei Xue authored at least 120 papers between 2016 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts.
Int. J. Comput. Vis., April, 2026

Talker-T2AV: Joint Talking Audio-Video Generation with Autoregressive Diffusion Modeling.
CoRR, April, 2026

ClawNet: Human-Symbiotic Agent Network for Cross-User Autonomous Cooperation.
CoRR, April, 2026

ActorMind: Emulating Human Actor Reasoning for Speech Role-Playing.
CoRR, April, 2026

Audio-Omni: Extending Multi-modal Understanding to Versatile Audio Generation and Editing.
CoRR, April, 2026

SqueezeComposer: Temporal Speed-up is A Simple Trick for Long-form Music Composing.
CoRR, March, 2026

DC-W2S: Dual-Consensus Weak-to-Strong Training for Reliable Process Reward Modeling in Biological Reasoning.
CoRR, March, 2026

MemFly: On-the-Fly Memory Optimization via Information Bottleneck.
CoRR, February, 2026

Reinforcement Learning of Large Language Models for Interpretable Credit Card Fraud Detection.
CoRR, January, 2026

Crisis-Bench: Benchmarking Strategic Ambiguity and Reputation Management in Large Language Models.
CoRR, January, 2026

UniSH: Unifying Scene and Human Reconstruction in a Feed-Forward Pass.
CoRR, January, 2026

CoCoGesture: Towards coherent co-speech 3D gesture generation in the wild.
Inf. Fusion, 2026

Lighthouse: A Self-Reconfiguring Sociotechnical Infrastructure for the Unforeseen Long-Tail of Urban Crisis.
Proceedings of the Extended Abstracts of the 2026 CHI Conference on Human Factors in Computing Systems, 2026

WenetSpeech-Yue: A Large-Scale Cantonese Speech Corpus with Multi-dimensional Annotation.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

Inference-time Scaling for Diffusion-based Audio Super-resolution.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

VMChill: A Dataset for Fine-Grained Visual-Musical Synergy.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
CogniEdit: Dense Gradient Flow Optimization for Fine-Grained Image Editing.
CoRR, December, 2025

PrismAudio: Decomposed Chain-of-Thoughts and Multi-dimensional Rewards for Video-to-Audio Generation.
CoRR, November, 2025

Every Angle is Worth a Second Glance: Mining Kinematic Skeletal Structures From Multi-View Joint Cloud.
IEEE Trans. Vis. Comput. Graph., October, 2025

PodEval: A Multimodal Evaluation Framework for Podcast Audio Generation.
CoRR, October, 2025

MeanFlowSE: One-Step Generative Speech Enhancement via MeanFlow.
CoRR, September, 2025

Semantic Voting: A Self-Evaluation-Free Approach for Efficient LLM Self-Improvement on Unverifiable Open-ended Tasks.
CoRR, September, 2025

WoW: Towards a World omniscient World model Through Embodied Interaction.
CoRR, September, 2025

UniSS: Unified Expressive Speech-to-Speech Translation with Your Voice.
CoRR, September, 2025

ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing.
CoRR, June, 2025

MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix.
CoRR, May, 2025

J1: Exploring Simple Test-Time Scaling for LLM-as-a-Judge.
CoRR, May, 2025

CMD: Controllable Multiview Diffusion for 3D Editing and Progressive Generation.
CoRR, May, 2025

Co<sup>3</sup>Gesture: Towards Coherent Concurrent Co-speech 3D Gesture Generation with Interactive Diffusion.
CoRR, May, 2025

AudioX: Diffusion Transformer for Anything-to-Audio Generation.
CoRR, March, 2025

YuE: Scaling Open Foundation Models for Long-Form Music Generation.
CoRR, March, 2025

Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens.
CoRR, March, 2025

Audio-FLAN: A Preliminary Release.
CoRR, February, 2025

VFX Creator: Animated Visual Effect Generation with Controllable Diffusion Transformer.
CoRR, February, 2025

Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis.
CoRR, February, 2025

CMD: Controllable Multiview Diffusion for 3D Editing and Progressive Generation.
Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference, 2025

MelodyEdit: Zero-shot Music Editing with Disentangled Inversion Control.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025

OmniAudio: Generating Spatial Audio from 360-Degree Video.
Proceedings of the Forty-second International Conference on Machine Learning, 2025

MoE-SVD: Structured Mixture-of-Experts LLMs Compression via Singular Value Decomposition.
Proceedings of the Forty-second International Conference on Machine Learning, 2025

Delta Decompression for MoE-based LLMs Compression.
Proceedings of the Forty-second International Conference on Machine Learning, 2025

Empowering World Models with Reflection for Embodied Video Prediction.
Proceedings of the Forty-second International Conference on Machine Learning, 2025

Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

MuPT: A Generative Symbolic Music Pretrained Transformer.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Co3Gesture: Towards Coherent Concurrent Co-speech 3D Gesture Generation with Interactive Diffusion.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

AIRA: Activation-Informed Low-Rank Adaptation for Large Models.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Efficient Fine-Tuning of Large Models Via Nested Low-Rank Adaptation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Graceful Forgetting in Generative Language Models.
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

PSHuman: Photorealistic Single-image 3D Human Reconstruction using Cross-Scale Multiview Diffusion and Explicit Remeshing.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Llasa+: Free Lunch for Accelerated and Streaming Llama-Based Speech Synthesis.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

FlashAudio: Rectified Flow for Fast and High-Fidelity Text-to-Audio Generation.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

BayesKD: Bayesian Knowledge Distillation for Compact LLMs in Constrained Fine-tuning Scenarios.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Boosting Policy and Process Reward Models with Monte Carlo Tree Search in Open-Domain QA.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model.
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

Importance Weighting Can Help Large Language Models Self-Improve.
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024
Deep Cross-Modal Retrieval Between Spatial Image and Acoustic Speech.
IEEE Trans. Multim., 2024

SINGER: Vivid Audio-driven Singing Video Generation with Multi-scale Spectral Diffusion Model.
CoRR, 2024

Foundation Cures Personalization: Recovering Facial Personalized Models' Prompt Consistency.
CoRR, 2024

pTSE-T: Presentation Target Speaker Extraction using Unaligned Text Cues.
CoRR, 2024

EVA: An Embodied World Model for Future Video Anticipation.
CoRR, 2024

FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation.
CoRR, 2024

You Know What I'm Saying: Jailbreak Attack via Implicit Reference.
CoRR, 2024

PSHuman: Photorealistic Single-view Human Reconstruction using Cross-Scale Diffusion.
CoRR, 2024

HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts.
CoRR, 2024

AgentMonitor: A Plug-and-Play Framework for Predictive and Secure Multi-Agent Systems.
CoRR, 2024

NoRA: Nested Low-Rank Adaptation for Efficient Fine-Tuning Large Models.
CoRR, 2024

STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs.
CoRR, 2024

MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions.
CoRR, 2024

M-LRM: Multi-view Large Reconstruction Model.
CoRR, 2024

VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling.
CoRR, 2024

LLMs Meet Multimodal Generation and Editing: A Survey.
CoRR, 2024

MuPT: A Generative Symbolic Music Pretrained Transformer.
CoRR, 2024

RQ-RAG: Learning to Refine Queries for Retrieval Augmented Generation.
CoRR, 2024

ChatMusician: Understanding and Generating Music Intrinsically with LLM.
CoRR, 2024

Dirichlet Continual Learning: Tackling Catastrophic Forgetting in NLP.
Proceedings of the Uncertainty in Artificial Intelligence, 2024

Era3D: High-Resolution Multiview Diffusion using Efficient Row-wise Attention.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Discovering Sparsity Allocation for Layer-wise Pruning of Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

FlashSpeech: Efficient Zero-Shot Speech Synthesis.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Can LLMs "Reason" in Music? an Evaluation of LLMs' Capability of Music Understanding and Generation.
Proceedings of the 25th International Society for Music Information Retrieval Conference, 2024

ComposerX: Multi-Agent Symbolic Music Composition With LLMs.
Proceedings of the 25th International Society for Music Information Retrieval Conference, 2024

COMOSVC: Consistency Model-Based Singing Voice Conversion.
Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

FastSAG: Towards Fast Non-Autoregressive Singing Accompaniment Generation.
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

DetKDS: Knowledge Distillation Search for Object Detectors.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

ViDA: Homeostatic Visual Domain Adapter for Continual Test Time Adaptation.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

PyramidCodec: Hierarchical Codec for Long-form Music Generation in Audio Domain.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

AttnZero: Efficient Attention Discovery for Vision Transformers.
Proceedings of the Computer Vision - ECCV 2024, 2024

Auto-GAS: Automated Proxy Discovery for Training-Free Generative Architecture Search.
Proceedings of the Computer Vision - ECCV 2024, 2024

Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-Speech Gesture Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024


FM-OV3D: Foundation Model-Based Cross-Modal Knowledge Blending for Open-Vocabulary 3D Detection.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
FM-OV3D: Foundation Model-based Cross-modal Knowledge Blending for Open-Vocabulary 3D Detection.
CoRR, 2023

Continual Learning with Dirichlet Generative-based Rehearsal.
CoRR, 2023

ViDA: Homeostatic Visual Domain Adapter for Continual Test Time Adaptation.
CoRR, 2023

Learn to Sing by Listening: Building Controllable Virtual Singer by Unsupervised Learning from Voice Recordings.
CoRR, 2023

MARBLE: Music Audio Representation Benchmark for Universal Evaluation.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

LyricWhiz: Robust Multilingual Zero-Shot Lyrics Transcription by Whispering to ChatGPT.
Proceedings of the 24th International Society for Music Information Retrieval Conference, 2023

NAS-FM: Neural Architecture Search for Tunable and Interpretable Sound Synthesis Based on Frequency Modulation.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

GCC-Speaker: Target Speaker Localization with Optimal Speaker-Dependent Weighting in Multi-Speaker Scenarios.
Proceedings of the IEEE International Conference on Acoustics, 2023

MoMusic: A Motion-Driven Human-AI Collaborative Music Composition and Performing System.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Deep Audio-Visual Beamforming for Speaker Localization.
IEEE Signal Process. Lett., 2022

Pathway to Future Symbiotic Creativity.
CoRR, 2022

2021
Speech Enhancement Based on Modulation-Domain Parametric Multichannel Kalman Filtering.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Neural Kalman Filtering for Speech Enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2021

Causal System Identification based Compensation for Reverberation-Robust DOA Estimation.
Proceedings of the 29th European Signal Processing Conference, 2021

2020
Sound Event Localization and Detection Based on Multiple DOA Beamforming and Multi-Task Learning.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

The JD AI Speaker Verification System for the FFSVC 2020 Challenge.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

2019
Noise Covariance Matrix Estimation for Rotating Microphone Arrays.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

Direct-Path Signal Cross-Correlation Estimation for Sound Source Localization in Reverberation.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

2018
Modulation-Domain Multichannel Kalman Filtering for Speech Enhancement.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

Binaural Mask-Informed Speech Enhancement for Hearing AIDS with Head Tracking.
Proceedings of the 16th International Workshop on Acoustic Signal Enhancement, 2018

Multichannel Kalman Filtering for Speech Ehnancement.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Modulation-Domain Parametric Multichannel Kalman Filtering for Speech Enhancement.
Proceedings of the 26th European Signal Processing Conference, 2018

Estimation of the Noise Covariance Matrix for Rotating Sensor Arrays.
Proceedings of the 52nd Asilomar Conference on Signals, Systems, and Computers, 2018

2017
Frequency-domain under-modelled blind system identification based on cross power spectrum and sparsity regularization.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

2016
Under-modelled blind system identification for time delay estimation in reverberant environments.
Proceedings of the IEEE International Workshop on Acoustic Signal Enhancement, 2016

Cross-correlation based under-modelled multichannel blind acoustic system identification with sparsity regularization.
Proceedings of the 24th European Signal Processing Conference, 2016


  Loading...