Xinyuan Qian

Orcid: 0000-0002-9511-6713

Affiliations:
  • University of Science and Technology Beijing (USTB), Beijing, China
  • National University of Singapore (NUS), Singapore (former)
  • Queen Mary University of London (QMUL), London, UK (former, PhD)


According to our database1, Xinyuan Qian authored at least 56 papers between 2014 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
VP-SelDoA: Visual-prompted Selective DoA Estimation of Target Sound via Semantic-Spatial Matching.
CoRR, July, 2025

SAV-SE: Scene-Aware Audio-Visual Speech Enhancement With Selective State Space Model.
IEEE J. Sel. Top. Signal Process., May, 2025

Audio-Visual Class-Incremental Learning for Fish Feeding intensity Assessment in Aquaculture.
CoRR, April, 2025

Enhancing Real-World Active Speaker Detection With Multi-Modal Extraction Pre-Training.
IEEE Trans. Multim., 2025

Improving Bird Vocalization Recognition in Open-Set Cross-Corpus Scenarios With Semantic Feature Reconstruction and Dual Strategy Scoring.
IEEE Signal Process. Lett., 2025

Analytic Class Incremental Learning for Sound Source Localization With Privacy Protection.
IEEE Signal Process. Lett., 2025

Breaking Through the Spike: Spike Window Decoding for Accelerated and Precise Automatic Speech Recognition.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

M2PAIR: A High-Quality Acoustic Impulse Response Computation Model.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

FaceSpeak: Expressive and High-Quality Speech Synthesis from Human Portraits of Different Styles.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
Audio-Visual Temporal Forgery Detection Using Embedding-Level Fusion and Multi-Dimensional Contrastive Loss.
IEEE Trans. Circuits Syst. Video Technol., August, 2024

Deep Cross-Modal Retrieval Between Spatial Image and Acoustic Speech.
IEEE Trans. Multim., 2024

M3TTS: Multi-modal text-to-speech of multi-scale style control for dubbing.
Pattern Recognit. Lett., 2024

pTSE-T: Presentation Target Speaker Extraction using Unaligned Text Cues.
CoRR, 2024

Mamba in Speech: Towards an Alternative to Self-Attention.
CoRR, 2024

Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention.
CoRR, 2024

Semi-supervised Speaker Localization with Gaussian-Like Pseudo-labeling.
Proceedings of the Social Robotics - 16th International Conference, 2024

MMAL: Multi-Modal Analytic Learning for Exemplar-Free Audio-Visual Class Incremental Tasks.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

ListenFormer: Responsive Listening Head Generation with Non-autoregressive Transformers.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

An Exploration of Length Generalization in Transformer-Based Speech Enhancement.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Transmitted and Aggregated Self-Attention for Automatic Speech Recognition.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

GLMB 3D Speaker Tracking with Video-Assisted Multi-Channel Audio Optimization Functions.
Proceedings of the IEEE International Conference on Acoustics, 2024

Visually Guided Binaural Audio Generation with Cross-Modal Consistency.
Proceedings of the IEEE International Conference on Acoustics, 2024

LOCSELECT: Target Speaker Localization with an Auditory Selective Hearing Mechanism.
Proceedings of the IEEE International Conference on Acoustics, 2024

Text-Queried Target Sound Event Localization.
Proceedings of the 32nd European Signal Processing Conference, 2024

2023
Speech-Oriented Sparse Attention Denoising for Voice User Interface Toward Industry 5.0.
IEEE Trans. Ind. Informatics, 2023

Neural-Free Attention for Monaural Speech Enhancement Toward Voice User Interface for Consumer Electronics.
IEEE Trans. Consumer Electron., 2023

A Time-Frequency Attention Module for Neural Speech Enhancement.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Device Features Based on Linear Transformation With Parallel Training Data for Replay Speech Detection.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Audio-Visual Cross-Attention Network for Robotic Speaker Tracking.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

L$^{3}$ F-TOUCH: A Wireless GelSight With Decoupled Tactile and Three-Axis Force Sensing.
IEEE Robotics Autom. Lett., 2023

Audio-Visual Speaker Tracking: Progress, Challenges, and Future Directions.
CoRR, 2023

Audio Visual Speaker Localization from EgoCentric Views.
CoRR, 2023

Rethinking Speech Recognition with A Multimodal Perspective via Acoustic and Semantic Cooperative Decoding.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

InterFormer: Interactive Local and Global Features Fusion for Automatic Speech Recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

A Miniaturised Camera-based Multi-Modal Tactile Sensor.
Proceedings of the IEEE International Conference on Robotics and Automation, 2023

Ripple Sparse Self-Attention for Monaural Speech Enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2023

Self-Convolution for Automatic Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

Stream Attention Based U-Net for L3DAS23 Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2023

Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Audio-Visual Tracking of Concurrent Speakers.
IEEE Trans. Multim., 2022

Deep Audio-Visual Beamforming for Speaker Localization.
IEEE Signal Process. Lett., 2022

Speaker Extraction With Co-Speech Gestures Cue.
IEEE Signal Process. Lett., 2022

Predict-and-Update Network: Audio-Visual Speech Recognition Inspired by Human Speech Perception.
CoRR, 2022

Iterative Sound Source Localization for Unknown Number of Sources.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

2021
Three-Dimensional Speaker Localization: Audio-Refined Visual Scaling Factor Estimation.
IEEE Signal Process. Lett., 2021

SLoClas: A Database for Joint Sound Localization and Classification.
Proceedings of the 24th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2021

Is Someone Speaking?: Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

GCC-PHAT with Speech-oriented Attention for Robotic Sound Source Localization.
Proceedings of the IEEE International Conference on Robotics and Automation, 2021

Multi-Target DoA Estimation with an Audio-Visual Fusion Mechanism.
Proceedings of the IEEE International Conference on Acoustics, 2021

2020
Audio-Visual Multi-Speaker Tracking Based on the GLMB Framework.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

2019
Multi-Speaker Tracking From an Audio-Visual Sensing Device.
IEEE Trans. Multim., 2019

LOCATA challenge: speaker localization with a planar array.
CoRR, 2019

Accurate Target Annotation in 3D from Multimodal Streams.
Proceedings of the IEEE International Conference on Acoustics, 2019

2018
3D Mouth Tracking from a Compact Microphone Array Co-Located with a camera.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017
3D audio-visual speaker tracking with an adaptive particle filter.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

2014
Profile driven dataflow optimisation of mean shift visual tracking.
Proceedings of the 2014 IEEE Global Conference on Signal and Information Processing, 2014


  Loading...