Heinrich Dinkel

Orcid: 0000-0003-4330-8980

According to our database¹, Heinrich Dinkel authored at least 56 papers between 2015 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

Dasheng AudioGen: A Unified Model for Generating Coherent Audio Scenes from Text.

[BibT_eX]

[DOI]

CoRR, May, 2026

ACAVCaps: Enabling large-scale training for fine-grained and diverse audio understanding.

[BibT_eX]

[DOI]

CoRR, March, 2026

The Interspeech 2026 Audio Encoder Capability Challenge for Large Audio Language Models.

[BibT_eX]

[DOI]

CoRR, March, 2026

DashengTokenizer: One layer is enough for unified audio understanding and generation.

[BibT_eX]

[DOI]

CoRR, February, 2026

2025

MiDashengLM: Efficient Audio Understanding with General Audio Captions.

[BibT_eX]

[DOI]

CoRR, August, 2025

MECAT: A Multi-Experts Constructed Benchmark for Fine-Grained Audio Understanding Tasks.

[BibT_eX]

[DOI]

CoRR, July, 2025

GLAP: General contrastive audio-text pretraining across domains and languages.

[BibT_eX]

[DOI]

CoRR, June, 2025

Reinforcement Learning Outperforms Supervised Fine-Tuning: A Case Study on Audio Question Answering.

[BibT_eX]

[DOI]

CoRR, March, 2025

The ICME 2025 Audio Encoder Capability Challenge.

[BibT_eX]

[DOI]

CoRR, January, 2025

X-ARES: A Comprehensive Framework for Assessing Audio Encoder Performance.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Efficient Speech Enhancement via Embeddings from Pre-trained Generative Audioencoders.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

GLCLAP: A Novel Contrastive Learning Pre-trained Model for Contextual Biasing in ASR.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

2024

Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding.

[BibT_eX]

[DOI]

CoRR, 2024

Scaling up masked audio encoder learning for general audio classification.

[BibT_eX]

[DOI]

CoRR, 2024

Bridging Language Gaps in Audio-Text Retrieval.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Streaming Audio Transformers for Online Audio Tagging.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Scaling up masked audio encoder learning for general audio classification.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

CED: Consistent Ensemble Distillation for Audio Tagging.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

Understanding temporally weakly supervised training: A case study for keyword spotting.

[BibT_eX]

[DOI]

CoRR, 2023

Streaming Audio Transformers for Online Audio Tagging.

[BibT_eX]

[DOI]

CoRR, 2023

Focus on the Sound around You: Monaural Target Speaker Extraction via Distance and Speaker Information.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Av-Sepformer: Cross-Attention Sepformer for Audio-Visual Target Speaker Extraction.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Unified Keyword Spotting and Audio Tagging on Mobile Devices with Transformers.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

2022

An Empirical Study of Weakly Supervised Audio Tagging Embeddings for General Audio Representations.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022

UniKW-AT: Unified Keyword Spotting and Audio Tagging.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Category-Adapted Sound Event Enhancement with Weakly Labeled Data.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Pseudo Strong Labels for Large Scale Weakly Supervised Audio Tagging.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Voice Activity Detection in the Wild: A Data-Driven Approach Using Teacher-Student Training.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2021

Towards Duration Robust Weakly Supervised Sound Event Detection.

[BibT_eX]

[DOI]

Heinrich Dinkel

Mengyue Wu

Kai Yu

IEEE ACM Trans. Audio Speech Lang. Process., 2021

DEPA: Self-Supervised Audio Embedding for Depression Detection.

[BibT_eX]

[DOI]

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Audio Caption in a Car Setting with a Sentence-Level Loss.

[BibT_eX]

[DOI]

Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

A Lightweight Framework for Online Voice Activity Detection in the Wild.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Investigating Local and Global Information for Automated Audio Captioning with Transfer Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Text-to-Audio Grounding: Building Correspondence Between Captions and Sound Events.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

A Lightweight Approach for Semi-Supervised Sound Event Detection with Unsupervised Data Augmentation.

[BibT_eX]

[DOI]

Xinyu Cai

Heinrich Dinkel

Proceedings of the 6th Workshop on Detection and Classification of Acoustic Scenes and Events 2021 (DCASE 2021), 2021

A Contrastive Semi-Supervised Learning Framework For Anomaly Sound Detection.

[BibT_eX]

[DOI]

Xinyu Cai

Heinrich Dinkel

Proceedings of the 6th Workshop on Detection and Classification of Acoustic Scenes and Events 2021 (DCASE 2021), 2021

2020

GPVAD: Towards noise robust voice activity detection via weakly supervised sound event detection.

[BibT_eX]

[DOI]

CoRR, 2020

Dual-Adversarial Domain Adaptation for Generalized Replay Attack Detection.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Voice Activity Detection in the Wild via Weakly Supervised Sound Event Detection.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Duration Robust Weakly Supervised Sound Event Detection.

[BibT_eX]

[DOI]

Heinrich Dinkel

Kai Yu

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Multiple Sound Sources Localization from Coarse to Fine.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

A CRNN-GRU Based Reinforcement Learning Approach to Audio Captioning.

[BibT_eX]

[DOI]

Proceedings of 5th the Workshop on Detection and Classification of Acoustic Scenes and Events 2020 (DCASE 2020), 2020

2019

What does a Car-ssette tape tell?

[BibT_eX]

[DOI]

CoRR, 2019

Text-based Depression Detection: What Triggers An Alert.

[BibT_eX]

[DOI]

Heinrich Dinkel

Mengyue Wu

Kai Yu

CoRR, 2019

Duration robust sound event detection.

[BibT_eX]

[DOI]

Heinrich Dinkel

Kai Yu

CoRR, 2019

The SJTU Robust Anti-Spoofing System for the ASVspoof 2019 Challenge.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Cross-Domain Replay Spoofing Attack Detection Using Domain Adversarial Training.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Audio Caption: Listen and Tell.

[BibT_eX]

[DOI]

Mengyue Wu

Heinrich Dinkel

Kai Yu

Proceedings of the IEEE International Conference on Acoustics, 2019

2018

Investigating Raw Wave Deep Neural Networks for End-to-End Speaker Spoofing Detection.

[BibT_eX]

[DOI]

Heinrich Dinkel

Yanmin Qian

Kai Yu

IEEE ACM Trans. Audio Speech Lang. Process., 2018

Covariance Based Deep Feature for Text-Dependent Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the Intelligence Science and Big Data Engineering, 2018

2017

Deep Feature Engineering for Noise Robust Spoofing Detection.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2017

Small-footprint convolutional neural network for spoofing detection.

[BibT_eX]

[DOI]

Heinrich Dinkel

Yanmin Qian

Kai Yu

Proceedings of the 2017 International Joint Conference on Neural Networks, 2017

End-to-end spoofing detection with raw waveform CLDNNS.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

2016

Overview of BTAS 2016 speaker anti-spoofing competition.

[BibT_eX]

[DOI]

Ricardo Paranhos Velloso Violato

Flávio Olmos Simões

Mário Uliani Neto

Marcus de Assis Angeloni

Proceedings of the 8th IEEE International Conference on Biometrics Theory, 2016

2015

Robust deep feature for spoofing detection - the SJTU system for ASVspoof 2015 challenge.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Heinrich Dinkel

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...