Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement.

[BibT_eX]

[DOI]

Wangyou Zhang

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Disentangled Representation Learning for Environment-agnostic Speaker Recognition.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models.

[BibT_eX]

[DOI]

Ahmed Hussen Abdelaziz

Shinji Watanabe

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

To what extent can ASV systems naturally defend against spoofing attacks?

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Can you Remove the Downstream Model for Speaker Recognition with Self-Supervised Speech Features?

[BibT_eX]

[DOI]

Ahmed Hussen Abdelaziz

Shinji Watanabe

Barry-John Theobald

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Improving Design of Input Condition Invariant Speech Enhancement.

[BibT_eX]

[DOI]

Wangyou Zhang

Jee-weon Jung

Yanmin Qian

Proceedings of the IEEE International Conference on Acoustics, 2024

Improving Audio Captioning Models with Fine-Grained Audio Features, Text Embedding Supervision, and LLM Mix-Up Augmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

VoxtLM: Unified Decoder-Only Models for Consolidating Speech Recognition, Synthesis and Speech, Text Continuation Tasks.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

VoxMM: Rich Transcription of Conversations in the Wild.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

AugSumm: Towards Generalizable Speech Summarization Using Synthetic Labels from Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

One Model to Rule Them All ? Towards End-to-End Joint Speaker Diarization and Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Understanding Probe Behaviors Through Variational Bounds of Mutual Information.

[BibT_eX]

[DOI]

Kwanghee Choi

Jee-Weon Jung

Shinji Watanabe

Proceedings of the IEEE International Conference on Acoustics, 2024

Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

On the Evaluation of Speech Foundation Models for Spoken Language Understanding.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023

UniverSLU: Universal Spoken Language Understanding for Diverse Classification and Sequence Generation Tasks with a Single Network.

[BibT_eX]

[DOI]

CoRR, 2023

VoxSRC 2022: The Fourth VoxCeleb Speaker Recognition Challenge.

[BibT_eX]

[DOI]

CoRR, 2023

Multi-Dataset Co-Training with Sharpness-Aware Optimization for Audio Anti-spoofing.

[BibT_eX]

[DOI]

Hye-jin Shim

Jee-weon Jung

Tomi Kinnunen

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Disentangled Representation Learning for Multilingual Speaker Recognition.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Towards Single Integrated Spoofing-aware Speaker Verification Embeddings.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Encoder-decoder Multimodal Speaker Change Detection.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Curriculum Learning for Self-supervised Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Absolute Decision Corrupts Absolutely: Conservative Online Speaker Diarisation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Advancing the Dimensionality Reduction of Speaker Embeddings for Speaker Diarisation: Disentangling Noise and Informing Speech Activity.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

In Search of Strong Embedding Extractors for Speaker Diarisation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

High-Resolution Embedding Extractor for Speaker Diarisation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Reproducing Whisper-Style Training Using An Open-Source Toolkit And Publicly Available Data.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022

Disentangled representation learning for multilingual speaker recognition.

[BibT_eX]

[DOI]

CoRR, 2022

Large-scale learning of generalised representations for speaker recognition.

[BibT_eX]

[DOI]

CoRR, 2022

Selective Kernel Attention for Robust Speaker Verification.

[BibT_eX]

[DOI]

Sung Hwan Mun

Jee-weon Jung

Nam Soo Kim

CoRR, 2022

SASV Challenge 2022: A Spoofing Aware Speaker Verification Challenge Evaluation Plan.

[BibT_eX]

[DOI]

CoRR, 2022

Frequency and Multi-Scale Selective Kernel Attention for Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Automatic Speaker Verification Spoofing and Deepfake Detection Using Wav2vec 2.0 and Data Augmentation.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022

Baseline Systems for the First Spoofing-Aware Speaker Verification Challenge: Score and Embedding Fusion.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022

SASV 2022: The First Spoofing-Aware Speaker Verification Challenge.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Pushing the limits of raw waveform speaker recognition.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Attentive Max Feature Map and Joint Training for Acoustic Scene Classification.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Multi-Scale Speaker Embedding-Based Graph Attention Networks For Speaker Diarisation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

AASIST: Audio Anti-Spoofing Using Integrated Spectro-Temporal Graph Attention Networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Disentangled dimensionality reduction for noise-robust speaker diarisation.

[BibT_eX]

[DOI]

CoRR, 2021

End-to-End Spectro-Temporal Graph Attention Networks for Speaker Verification Anti-Spoofing and Speech Deepfake Detection.

[BibT_eX]

[DOI]

CoRR, 2021

Attentive Max Feature Map for Acoustic Scene Classification with Joint Learning considering the Abstraction of Classes.

[BibT_eX]

[DOI]

CoRR, 2021

Learning Metrics from Mean Teacher: A Supervised Learning Method for Improving the Generalization of Speaker Verification System.

[BibT_eX]

[DOI]

CoRR, 2021

Graph Attention Networks for Anti-Spoofing.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Adapting Speaker Embeddings for Speaker Diarisation.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Three-Class Overlapped Speech Detection Using a Convolutional Recurrent Neural Network.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

DCASENET: An Integrated Pretrained Deep Neural Network for Detecting and Classifying Acoustic Scenes and Events.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Graph Attention Networks for Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

Capturing scattered discriminative information using a deep architecture in acoustic scene classification.

[BibT_eX]

[DOI]

CoRR, 2020

Integrated Replay Spoofing-aware Text-independent Speaker Verification.

[BibT_eX]

[DOI]

CoRR, 2020

Improved RawNet with Filter-wise Rescaling for Text-independent Speaker Verification using Raw Waveforms.

[BibT_eX]

[DOI]

CoRR, 2020

A study on the role of subsidiary information in replay attack spoofing detection.

[BibT_eX]

[DOI]

CoRR, 2020

Knowledge Distillation in Acoustic Scene Classification.

[BibT_eX]

[DOI]

IEEE Access, 2020

Selective Deep Speaker Embedding Enhancement for Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

Self-Supervised Pre-Training with Acoustic Configurations for Replay Spoofing Detection.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Segment Aggregation for Short Utterances Speaker Verification Using Raw Waveforms.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Acoustic Scene Classification Using Audio Tagging.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Improved RawNet with Feature Map Scaling for Text-Independent Speaker Verification Using Raw Waveforms.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Audio Tag Representation Guided Dual Attention Network for Acoustic Scene Classification.

[BibT_eX]

[DOI]

Proceedings of 5th the Workshop on Detection and Classification of Acoustic Scenes and Events 2020 (DCASE 2020), 2020

2019

Cosine similarity-based adversarial process.

[BibT_eX]

[DOI]

CoRR, 2019

Replay Attack Detection with Complementary High-Resolution Information Using End-to-End DNN for the ASVspoof 2019 Challenge.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

RawNet: Advanced End-to-End Deep Neural Network Using Raw Waveforms for Text-Independent Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

End-to-End Losses Based on Speaker Basis Vectors and All-Speaker Hard Negative Mining for Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Acoustic Scene Classification Using Teacher-Student Learning with Soft-Labels.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Distilling the Knowledge of Specialist Deep Neural Networks in Acoustic Scene Classification.

[BibT_eX]

[DOI]

Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events 2019 (DCASE 2019), 2019

Short Utterance Compensation in Speaker Verification via Cosine-Based Teacher-Student Learning of Speaker Embeddings.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018

Replay attack spoofing detection system using replay noise by multi-task learning.

[BibT_eX]

[DOI]

CoRR, 2018

Replay Spoofing Detection System for Automatic Speaker Verification Using Multi-Task Learning of Noise Classes.

[BibT_eX]

[DOI]

Proceedings of the Conference on Technologies and Applications of Artificial Intelligence, 2018

Avoiding Speaker Overfitting in End-to-End DNNs Using Raw Waveform for Text-Independent Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

A Complete End-to-End Speaker Verification System Using Deep Neural Networks: From Raw Signals to Verification Result.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

DNN based multi-level feature ensemble for acoustic scene classification.

[BibT_eX]

[DOI]

Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events, 2018

2017

Joint Training of Expanded End-to-End DNN for Text-Dependent Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

DNN-Based Audio Scene Classification for DCASE2017: Dual Input Features, Balancing Cost, and Stochastic Data Duplication.

[BibT_eX]

[DOI]

Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events, 2017

Jee-Weon Jung

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...