Songjun Cao

According to our database¹, Songjun Cao authored at least 24 papers between 2020 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

Diffusion Reconstruction towards Generalizable Audio Deepfake Detection.

[BibT_eX]

[DOI]

CoRR, April, 2026

Controllable Spoken Dialogue Generation: An LLM-Driven Grading System for K-12 Non-Native English Learners.

[BibT_eX]

[DOI]

CoRR, April, 2026

Thinking with Constructions: A Benchmark and Policy Optimization for Visual-Text Interleaved Geometric Reasoning.

[BibT_eX]

[DOI]

CoRR, March, 2026

Leveraging large multimodal models for audio-video deepfake detection: a pilot study.

[BibT_eX]

[DOI]

CoRR, February, 2026

Detect All-Type Deepfake Audio: Wavelet Prompt Tuning for Enhanced Auditory Perception.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

PseudoVC: Improving One-shot Voice Conversion with Pseudo Paired Data.

[BibT_eX]

[DOI]

CoRR, June, 2025

Neural Codec Source Tracing: Toward Comprehensive Attribution in Open-Set Condition.

[BibT_eX]

[DOI]

CoRR, January, 2025

SonarGuard2: Ultrasonic Face Liveness Detection Based on Adaptive Doppler Effect Feature Extraction.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Monotonic Attention for Robust Text-to-Speech Synthesis in Large Language Model Frameworks.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

MPE-TTS: Customized Emotion Zero-Shot Text-To-Speech Using Multi-Modal Prompt.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

DiffCSS: Diverse and Expressive Conversational Speech Synthesis with Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

M-MoE: Mixture of Mixture-of-Expert Model for CTC-based Streaming Multilingual ASR.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

2024

A Transcription Prompt-based Efficient Audio Large Language Model for Robust Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

2023

DistillW2V2: A Small and Streaming Wav2vec 2.0 Based ASR Model.

[BibT_eX]

[DOI]

CoRR, 2023

2022

A practical framework for multi-domain speech recognition and an instance sampling method to neural language modeling.

[BibT_eX]

[DOI]

CoRR, 2022

Censer: Curriculum Semi-supervised Learning for Speech Recognition Based on Self-supervised Pre-training.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Improving CTC-Based Speech Recognition Via Knowledge Transferring from Pre-Trained Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Improving Hybrid CTC/Attention End-to-end Speech Recognition with Pretrained Acoustic and Language Model.

[BibT_eX]

[DOI]

CoRR, 2021

Improving Speech Recognition Accuracy of Local POI Using Geographical Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Explore wav2vec 2.0 for Mispronunciation Detection.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Improving Accent Identification and Accented Speech Recognition Under a Framework of Self-Supervised Learning.

[BibT_eX]

[DOI]

Keqi Deng

Songjun Cao

Long Ma

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Improving Streaming Transformer Based ASR Under a Framework of Self-Supervised Learning.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Improving Hybrid CTC/Attention End-to-End Speech Recognition with Pretrained Acoustic and Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020

Multi-head Monotonic Chunkwise Attention For Online Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2020

Songjun Cao

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...