Ruijie Tao

Orcid: 0000-0003-0021-5661

According to our database¹, Ruijie Tao authored at least 34 papers between 2020 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

CueNet: Robust Audio-Visual Speaker Extraction through Cross-Modal Cue Mining and Interaction.

[BibT_eX]

[DOI]

CoRR, March, 2026

2025

Ego4D: Around the World in 3,600 Hours of Egocentric Video.

[BibT_eX]

[DOI]

Santhosh Kumar Ramakrishnan

Christoph Feichtenhofer

Kiran K. Somasundaram

Giovanni Maria Farinella

IEEE Trans. Pattern Anal. Mach. Intell., November, 2025

Addressing Gradient Misalignment in Data-Augmented Training for Robust Speech Deepfake Detection.

[BibT_eX]

[DOI]

CoRR, September, 2025

QAMO: Quality-aware Multi-centroid One-class Learning For Speech Deepfake Detection.

[BibT_eX]

[DOI]

CoRR, September, 2025

Enhancing Real-World Active Speaker Detection With Multi-Modal Extraction Pre-Training.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2025

A Benchmark for Multi-Speaker Anonymization.

[BibT_eX]

[DOI]

IEEE Trans. Inf. Forensics Secur., 2025

Unified Audio Event Detection.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

I<sup>2</sup>TTS: Image-Indicated Immersive Text-to-Speech Synthesis with Spatial Perception.

[BibT_eX]

[DOI]

Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2025

Leveraging Language Information for Target Language Extraction.

[BibT_eX]

[DOI]

Mehmet Sinan Yildirim

Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2025

Voice Conversion Augmentation for Speaker Recognition on Defective Datasets.

[BibT_eX]

[DOI]

Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2025

Interpolating Speaker Identities in Embedding Space for Data Expansion.

[BibT_eX]

[DOI]

Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2025

2024

Deep Cross-Modal Retrieval Between Spatial Image and Acoustic Speech.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2024

Target Speech Diarization with Multimodal Prompts.

[BibT_eX]

[DOI]

CoRR, 2024

Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention.

[BibT_eX]

[DOI]

CoRR, 2024

Multi-Stage Face-Voice Association Learning with Keynote Speaker Diarization.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Temporal-Channel Modeling in Multi-head Self-Attention for Synthetic Speech Detection.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

How Do Neural Spoofing Countermeasures Detect Partially Spoofed Audio?

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Emphasized Non-Target Speaker Knowledge in Knowledge Distillation for Automatic Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Audio-Visual Active Speaker Extraction for Sparsely Overlapped Multi-Talker Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Prompt-Driven Target Speech Diarization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

Self-Supervised Training of Speaker Encoder With Multi-Modal Diverse Positive Pairs.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2023

USED: Universal Speaker Extraction and Diarization.

[BibT_eX]

[DOI]

Junyi Ao

Mehmet Sinan Yildirim

CoRR, 2023

Target Active Speaker Detection with Audio-visual Cues.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Speaker Recognition with Two-Step Multi-Modal Deep Cleansing.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

2022

Selective Listening by Synchronizing Speech With Lips.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2022

I4U System Description for NIST SRE'20 CTS Challenge.

[BibT_eX]

[DOI]

CoRR, 2022

Self-Supervised Speaker Recognition with Loss-Gated Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Ego4D: Around the World in 3, 000 Hours of Egocentric Video.

[BibT_eX]

[DOI]

Santhosh Kumar Ramakrishnan

Christoph Feichtenhofer

Kiran K. Somasundaram

Giovanni Maria Farinella

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Ego4D: Around the World in 3, 000 Hours of Egocentric Video.

[BibT_eX]

[DOI]

Santhosh Kumar Ramakrishnan

Christoph Feichtenhofer

Kiran K. Somasundaram

Giovanni Maria Farinella

CoRR, 2021

Is Someone Speaking?: Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection.

[BibT_eX]

[DOI]

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Muse: Multi-Modal Target Speaker Extraction with Visual Cues.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

HLT-NUS Submission for NIST 2019 Multimedia Speaker Recognition Evaluation.

[BibT_eX]

[DOI]

CoRR, 2020

Audio-Visual Speaker Recognition with a Cross-Modal Discriminative Network.

[BibT_eX]

[DOI]

Ruijie Tao

Rohan Kumar Das

Haizhou Li

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

HLT-NUS Submission for 2019 NIST Multimedia Speaker Recognition Evaluation.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

Ruijie Tao

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...