We stand with Ukraine

We stand with Ukraine

Xinyuan Qian

Orcid: 0000-0002-9511-6713

Affiliations:

University of Science and Technology Beijing (USTB), Beijing, China
National University of Singapore (NUS), Singapore (former)
Queen Mary University of London (QMUL), London, UK (former, PhD)

According to our database¹, Xinyuan Qian authored at least 60 papers between 2014 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

Online presence:

On csauthors.net:

Bibliography

2025

Understanding Dynamic Auditory and Tactile Perception for Water Filling Level Estimation.

[BibT_eX]

[DOI]

,

,

,

,

,

Int. J. Soc. Robotics, October, 2025

Dual-Path Transformer-Based GAN for Co-speech Gesture Synthesis.

[BibT_eX]

[DOI]

,

,

,

,

Int. J. Soc. Robotics, October, 2025

Region-Specific Audio Tagging for Spatial Sound.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Mark D. Plumbley

,

CoRR, September, 2025

VP-SelDoA: Visual-prompted Selective DoA Estimation of Target Sound via Semantic-Spatial Matching.

[BibT_eX]

[DOI]

,

,

,

,

,

CoRR, July, 2025

SAV-SE: Scene-Aware Audio-Visual Speech Enhancement With Selective State Space Model.

[BibT_eX]

[DOI]

,

,

,

,

,

Leibny Paola García-Perera

,

IEEE J. Sel. Top. Signal Process., May, 2025

Audio-Visual Class-Incremental Learning for Fish Feeding intensity Assessment in Aquaculture.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

CoRR, April, 2025

Enhancing Real-World Active Speaker Detection With Multi-Modal Extraction Pre-Training.

[BibT_eX]

[DOI]

,

,

Rohan Kumar Das

,

,

,

IEEE Trans. Multim., 2025

SSDQ: Target Speaker Extraction via Semantic and Spatial Dual Querying.

[BibT_eX]

[DOI]

,

,

IEEE Signal Process. Lett., 2025

Improving Bird Vocalization Recognition in Open-Set Cross-Corpus Scenarios With Semantic Feature Reconstruction and Dual Strategy Scoring.

[BibT_eX]

[DOI]

,

,

,

,

Björn W. Schuller

IEEE Signal Process. Lett., 2025

Analytic Class Incremental Learning for Sound Source Localization With Privacy Protection.

[BibT_eX]

[DOI]

,

,

,

,

IEEE Signal Process. Lett., 2025

Breaking Through the Spike: Spike Window Decoding for Accelerated and Precise Automatic Speech Recognition.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

M2PAIR: A High-Quality Acoustic Impulse Response Computation Model.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

FaceSpeak: Expressive and High-Quality Speech Synthesis from Human Portraits of Different Styles.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024

Audio-Visual Temporal Forgery Detection Using Embedding-Level Fusion and Multi-Dimensional Contrastive Loss.

[BibT_eX]

[DOI]

,

,

,

IEEE Trans. Circuits Syst. Video Technol., August, 2024

Deep Cross-Modal Retrieval Between Spatial Image and Acoustic Speech.

[BibT_eX]

[DOI]

,

,

,

,

IEEE Trans. Multim., 2024

M3TTS: Multi-modal text-to-speech of multi-scale style control for dubbing.

[BibT_eX]

[DOI]

,

,

,

,

,

Pattern Recognit. Lett., 2024

pTSE-T: Presentation Target Speaker Extraction using Unaligned Text Cues.

[BibT_eX]

[DOI]

,

,

,

,

,

CoRR, 2024

Mamba in Speech: Towards an Alternative to Self-Attention.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Eliathamby Ambikairajah

,

,

CoRR, 2024

Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention.

[BibT_eX]

[DOI]

,

,

,

,

,

CoRR, 2024

Semi-supervised Speaker Localization with Gaussian-Like Pseudo-labeling.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the Social Robotics - 16th International Conference, 2024

MMAL: Multi-Modal Analytic Learning for Exemplar-Free Audio-Visual Class Incremental Tasks.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

ListenFormer: Responsive Listening Head Generation with Non-autoregressive Transformers.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

An Exploration of Length Generalization in Transformer-Based Speech Enhancement.

[BibT_eX]

[DOI]

,

,

,

Eliathamby Ambikairajah

,

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Transmitted and Aggregated Self-Attention for Automatic Speech Recognition.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

GLMB 3D Speaker Tracking with Video-Assisted Multi-Channel Audio Optimization Functions.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2024

Visually Guided Binaural Audio Generation with Cross-Modal Consistency.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2024

LOCSELECT: Target Speaker Localization with an Auditory Selective Hearing Mechanism.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2024

Text-Queried Target Sound Event Localization.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the 32nd European Signal Processing Conference, 2024

2023

Speech-Oriented Sparse Attention Denoising for Voice User Interface Toward Industry 5.0.

[BibT_eX]

[DOI]

,

,

,

IEEE Trans. Ind. Informatics, 2023

Neural-Free Attention for Monaural Speech Enhancement Toward Voice User Interface for Consumer Electronics.

[BibT_eX]

[DOI]

,

,

,

,

,

,

IEEE Trans. Consumer Electron., 2023

A Time-Frequency Attention Module for Neural Speech Enhancement.

[BibT_eX]

[DOI]

,

,

,

,

Eliathamby Ambikairajah

,

IEEE ACM Trans. Audio Speech Lang. Process., 2023

Device Features Based on Linear Transformation With Parallel Training Data for Replay Speech Detection.

[BibT_eX]

[DOI]

,

,

,

,

IEEE ACM Trans. Audio Speech Lang. Process., 2023

Audio-Visual Cross-Attention Network for Robotic Speaker Tracking.

[BibT_eX]

[DOI]

,

,

,

,

IEEE ACM Trans. Audio Speech Lang. Process., 2023

L$^{3}$ F-TOUCH: A Wireless GelSight With Decoupled Tactile and Three-Axis Force Sensing.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Kaspar Althoefer

,

IEEE Robotics Autom. Lett., 2023

Audio-Visual Speaker Tracking: Progress, Challenges, and Future Directions.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Philip J. B. Jackson

,

CoRR, 2023

Audio Visual Speaker Localization from EgoCentric Views.

[BibT_eX]

[DOI]

,

,

,

CoRR, 2023

Rethinking Speech Recognition with A Multimodal Perspective via Acoustic and Semantic Cooperative Decoding.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

InterFormer: Interactive Local and Global Features Fusion for Automatic Speech Recognition.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

A Miniaturised Camera-based Multi-Modal Tactile Sensor.

[BibT_eX]

[DOI]

Kaspar Althoefer

,

,

,

,

,

Proceedings of the IEEE International Conference on Robotics and Automation, 2023

Ripple Sparse Self-Attention for Monaural Speech Enhancement.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2023

Self-Convolution for Automatic Speech Recognition.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2023

Stream Attention Based U-Net for L3DAS23 Challenge.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2023

Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Audio-Visual Tracking of Concurrent Speakers.

[BibT_eX]

[DOI]

,

,

,

Maurizio Omologo

,

Andrea Cavallaro

IEEE Trans. Multim., 2022

Deep Audio-Visual Beamforming for Speaker Localization.

[BibT_eX]

[DOI]

,

,

,

IEEE Signal Process. Lett., 2022

Speaker Extraction With Co-Speech Gestures Cue.

[BibT_eX]

[DOI]

,

,

IEEE Signal Process. Lett., 2022

Predict-and-Update Network: Audio-Visual Speech Recognition Inspired by Human Speech Perception.

[BibT_eX]

[DOI]

,

,

CoRR, 2022

Iterative Sound Source Localization for Unknown Number of Sources.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

2021

Three-Dimensional Speaker Localization: Audio-Refined Visual Scaling Factor Estimation.

[BibT_eX]

[DOI]

,

,

,

IEEE Signal Process. Lett., 2021

SLoClas: A Database for Joint Sound Localization and Classification.

[BibT_eX]

[DOI]

,

,

Amine El Abridi

,

Proceedings of the 24th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2021

Is Someone Speaking?: Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection.

[BibT_eX]

[DOI]

,

,

Rohan Kumar Das

,

,

Mike Zheng Shou

,

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

GCC-PHAT with Speech-oriented Attention for Robotic Sound Source Localization.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the IEEE International Conference on Robotics and Automation, 2021

Multi-Target DoA Estimation with an Audio-Visual Fusion Mechanism.

[BibT_eX]

[DOI]

,

Maulik C. Madhavi

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

Audio-Visual Multi-Speaker Tracking Based on the GLMB Framework.

[BibT_eX]

[DOI]

,

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

2019

Multi-Speaker Tracking From an Audio-Visual Sensing Device.

[BibT_eX]

[DOI]

,

,

,

Maurizio Omologo

,

Andrea Cavallaro

IEEE Trans. Multim., 2019

LOCATA challenge: speaker localization with a planar array.

[BibT_eX]

[DOI]

,

Andrea Cavallaro

,

,

Maurizio Omologo

CoRR, 2019

Accurate Target Annotation in 3D from Multimodal Streams.

[BibT_eX]

[DOI]

,

,

Alessio Xompero

,

,

Maurizio Omologo

,

Andrea Cavallaro

Proceedings of the IEEE International Conference on Acoustics, 2019

2018

3D Mouth Tracking from a Compact Microphone Array Co-Located with a camera.

[BibT_eX]

[DOI]

,

Alessio Xompero

,

Andrea Cavallaro

,

,

,

Maurizio Omologo

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

3D audio-visual speaker tracking with an adaptive particle filter.

[BibT_eX]

[DOI]

,

,

Maurizio Omologo

,

Andrea Cavallaro

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

2014

Profile driven dataflow optimisation of mean shift visual tracking.

[BibT_eX]

[DOI]

Deepayan Bhowmik

,

Andrew M. Wallace

,

Robert J. Stewart

,

,

Greg J. Michaelson

Proceedings of the 2014 IEEE Global Conference on Signal and Information Processing, 2014

Loading...