Bhiksha Raj

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

MACE: Leveraging Audio for Evaluating Audio Captioning Systems.

[BibT_eX]

[DOI]

Satvik Dixit

Proceedings of the IEEE International Conference on Acoustics, 2025

Tessellated Linear Model for Age Prediction from Voice.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

PhoniTale: Phonologically Grounded Mnemonic Generation for Typologically Distant Language Pairs.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

CAARMA: Class Augmentation with Adversarial Mixup Regularization.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

SVeritas: Benchmark for Robust Speaker Verification under Diverse Conditions.

[BibT_eX]

[DOI]

Massa Baali

Sarthak Bisht

Francisco Teixeira

Kateryna Shapovalenko

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

FALCON: Fairness Learning via Contrastive Attention Approach to Continual Semantic Scene Understanding.

[BibT_eX]

[DOI]

Utsav Prabhu

Ravi Teja N. V. S. Chappa

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

CoLMbo: Speaker Language Model for Descriptive Profiling.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

On the Robust Approximation of ASR Metrics.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

Lost in Transcription, Found in Distribution Shift: Demystifying Hallucination in Speech Foundation Models.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

Audio Entailment: Assessing Deductive Reasoning for Audio Understanding.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

On Catastrophic Inheritance of Large Foundation Models.

[BibT_eX]

[DOI]

J. Data-centric Mach. Learn. Res., 2024

A closer look at reinforcement learning-based automatic speech recognition.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2024

XQ-GAN: An Open-source Image Tokenization Framework for Autoregressive Generation.

[BibT_eX]

[DOI]

CoRR, 2024

Perturbation Ontology based Graph Attention Networks.

[BibT_eX]

[DOI]

CoRR, 2024

FLAASH: Flow-Attention Adaptive Semantic Hierarchical Fusion for Multi-Modal Tobacco Content Analysis.

[BibT_eX]

[DOI]

Page Daniel Dobbs

CoRR, 2024

On the Diversity of Synthetic Data and its Impact on Training Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

What Do Speech Foundation Models Not Learn About Speech?

[BibT_eX]

[DOI]

CoRR, 2024

Improving Speaker Representations Using Contrastive Losses on Multi-scale Features.

[BibT_eX]

[DOI]

CoRR, 2024

RelUNet: Relative Channel Fusion U-Net for Multichannel Speech Enhancement.

[BibT_eX]

[DOI]

CoRR, 2024

Did You Hear That? Introducing AADG: A Framework for Generating Benchmark Data in Audio Anomaly Detection.

[BibT_eX]

[DOI]

CoRR, 2024

Efficient Autoregressive Audio Modeling via Next-Scale Prediction.

[BibT_eX]

[DOI]

CoRR, 2024

Speech vs. Transcript: Does It Matter for Human Annotators in Speech Summarization?

[BibT_eX]

[DOI]

CoRR, 2024

Emergent Interpretable Symbols and Content-Style Disentanglement via Variance-Invariance Constraints.

[BibT_eX]

[DOI]

CoRR, 2024

From Perfect to Noisy World Simulation: Customizable Embodied Multi-modal Perturbations for SLAM Robustness Benchmarking.

[BibT_eX]

[DOI]

Matthew Johnson-Roberson

Xiaonan Huang

CoRR, 2024

ControlVAR: Exploring Controllable Visual Autoregressive Modeling.

[BibT_eX]

[DOI]

CoRR, 2024

ED-SAM: An Efficient Diffusion Sampling Approach to Domain Generalization in Vision-Language Foundation Models.

[BibT_eX]

[DOI]

Xin Li

CoRR, 2024

Improving Membership Inference in ASR Model Auditing with Perturbed Loss Features.

[BibT_eX]

[DOI]

CoRR, 2024

Learning with Noisy Foundation Models.

[BibT_eX]

[DOI]

CoRR, 2024

AutoPRM: Automating Procedural Supervision for Multi-Step Reasoning via Controllable Question Decomposition.

[BibT_eX]

[DOI]

CoRR, 2024

Evaluating and Improving Continual Learning in Spoken Language Understanding.

[BibT_eX]

[DOI]

CoRR, 2024

Customizable Perturbation Synthesis for Robust SLAM Benchmarking.

[BibT_eX]

[DOI]

Matthew Johnson-Roberson

Xiaonan Huang

CoRR, 2024

AugSumm: towards generalizable speech summarization using synthetic labels from large language model.

[BibT_eX]

[DOI]

CoRR, 2024

Privacy-Oriented Manipulation of Speaker Representations.

[BibT_eX]

[DOI]

IEEE Access, 2024

ESPnet-Codec: Comprehensive Training and Evaluation of Neural Codecs For Audio, Music, and Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

PDAF: A Phonetic Debiasing Attention Framework For Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Metric from Human: Zero-shot Monocular Metric Depth Estimation via Test-time Adaptation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

EAGLE: Efficient Adaptive Geometry-based Learning in Cross-view Understanding.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Slight Corruption in Pre-training Data Makes Better Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Imprecise Label Learning: A Unified Framework for Learning with Various Imprecise Label Configurations.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

R-BASS : Relevance-aided Block-wise Adaptation for Speech Summarization.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

AutoPRM: Automating Procedural Supervision for Multi-Step Reasoning via Controllable Question Decomposition.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

DeWinder: Single-Channel Wind Noise Reduction using Ultrasound Sensing.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Domain Adaptation for Contrastive Audio-Language Models.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

PAM: Prompting Audio-Language Models for Audio Quality Assessment.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

SELM: Enhancing Speech Emotion Recognition for Out-of-Domain Scenarios.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Fashion Image Retrieval with Occlusion.

[BibT_eX]

[DOI]

Proceedings of the Pattern Recognition - 27th International Conference, 2024

Completing Visual Objects via Bridging Generation and Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

A General Framework for Learning from Weak Supervision.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Understanding and Mitigating the Label Noise in Pre-training on Downstream Tasks.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

uSee: Unified Speech Enhancement And Editing with Conditional Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Improving Continual Learning of Acoustic Scene Classification via Mutual Information Optimization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Fixed Inter-Neuron Covariability Induces Adversarial Robustness.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

AugSumm: Towards Generalizable Speech Summarization Using Synthetic Labels from Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Dynamic-Superb: Towards a Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark For Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Prompting Audios Using Acoustic Properties for Emotion Representation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Training Audio Captioning Models without Audio.

[BibT_eX]

[DOI]

Dimitra Emmanouilidou

Huaming Wang

Proceedings of the IEEE International Conference on Acoustics, 2024

Importance of Negative Sampling in Weak Label Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

R<sup>2</sup>-Bench: Benchmarking the Robustness of Referring Perception Models Under Perturbations.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Synergistic Global-Space Camera and Human Reconstruction from Videos.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

QDFormer: Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Continual Contrastive Spoken Language Understanding.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

Speech vs. Transcript: Does It Matter for Human Annotators in Speech Summarization?

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

Understanding political polarization using language models: A dataset and method.

[BibT_eX]

[DOI]

AI Mag., September, 2023

SphereFace Revived: Unifying Hyperspherical Face Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2023

AOE-Net: Entities Interactions Modeling with Adaptive Attention Mechanism for Temporal Action Proposals Generation.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., 2023

FALCON: Fairness Learning via Contrastive Attention Approach to Continual Semantic Scene Understanding in Open World.

[BibT_eX]

[DOI]

Utsav Prabhu

CoRR, 2023

Psychoacoustic Challenges Of Speech Enhancement On VoIP Platforms.

[BibT_eX]

[DOI]

CoRR, 2023

LoFT: Local Proxy Fine-tuning For Improving Transferability Of Adversarial Attacks Against Large Language Model.

[BibT_eX]

[DOI]

CoRR, 2023

Completing Visual Objects via Bridging Generation and Segmentation.

[BibT_eX]

[DOI]

CoRR, 2023

Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech.

[BibT_eX]

[DOI]

CoRR, 2023

Rethinking Audiovisual Segmentation with Semantic Quantization and Decomposition.

[BibT_eX]

[DOI]

CoRR, 2023

Fixed Inter-Neuron Covariability Induces Adversarial Robustness.

[BibT_eX]

[DOI]

CoRR, 2023

Training on Foveated Images Improves Robustness to Adversarial Attacks.

[BibT_eX]

[DOI]

CoRR, 2023

UTOPIA: Unconstrained Tracking Objects without Preliminary Examination via Cross-Domain Adaptation.

[BibT_eX]

[DOI]

CoRR, 2023

PaintSeg: Training-free Segmentation via Painting.

[BibT_eX]

[DOI]

CoRR, 2023

Imprecise Label Learning: A Unified Framework for Learning with Various Imprecise Label Configurations.

[BibT_eX]

[DOI]

CoRR, 2023

GPT-Sentinel: Distinguishing Human and ChatGPT Generated Content.

[BibT_eX]

[DOI]

CoRR, 2023

Improving Perceptual Quality, Intelligibility, and Acoustics on VoIP Platforms.

[BibT_eX]

[DOI]

CoRR, 2023

Approach to Learning Generalized Audio Representation Through Batch Embedding Covariance Regularization and Constant-Q Transforms.

[BibT_eX]

[DOI]

CoRR, 2023

Synergy between human and machine approaches to sound/scene recognition and processing: An overview of ICASSP special session.

[BibT_eX]

[DOI]

CoRR, 2023

SoftMatch: Addressing the Quantity-Quality Trade-off in Semi-supervised Learning.

[BibT_eX]

[DOI]

CoRR, 2023

Understanding Political Polarisation using Language Models: A dataset and method.

[BibT_eX]

[DOI]

CoRR, 2023

Fairness Continual Learning Approach to Semantic Scene Understanding in Open-World Environments.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Training on Foveated Images Improves Robustness to Adversarial Attacks.

[BibT_eX]

[DOI]

Aqsa Kashaf

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Weakly-Supervised Audio-Visual Segmentation.

[BibT_eX]

[DOI]

Shentong Mo

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

PaintSeg: Painting Pixels for Training-free Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Rethinking Voice-Face Correlation: A Geometry View.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

The Hidden Dance of Phonemes and Visage: Unveiling the Enigmatic Link between Phonemes and Facial Features.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

There is more than one kind of robustness: Fooling Whisper with adversarial examples.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

BASS: Block-wise Adaptation for Speech Summarization.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

How Many Perturbations Break This Model? Evaluating Robustness Beyond Adversarial Accuracy.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

FreeMatch: Self-adaptive Thresholding for Semi-supervised Learning.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

SoftMatch: Addressing the Quantity-Quality Tradeoff in Semi-supervised Learning.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Pairwise Similarity Learning is SimPLE.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Robust Referring Video Object Segmentation with Cyclic Structural Consensus.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

TAPLoss: A Temporal Acoustic Parameter Loss for Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Paaploss: A Phonetic-Aligned Acoustic Parameter Loss for Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Privacy-Preserving Automatic Speaker Diarization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

An Approach to Ontological Learning from Weak Labels.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Towards Noise-Tolerant Speech-Referring Video Object Segmentation: Bridging Speech and Text.

[BibT_eX]

[DOI]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Token Prediction as Implicit Classification to Identify LLM-Generated Text.

[BibT_eX]

[DOI]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

FREDOM: Fairness Domain Adaptation Approach to Semantic Scene Understanding.

[BibT_eX]

[DOI]

Ngan Le

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Espnet-Summ: Introducing a Novel Large Dataset, Toolkit, and a Cross-Corpora Evaluation of Speech Summarization Systems.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

Panoramic Video Salient Object Detection with Ambisonic Audio Guidance.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning.

[BibT_eX]

[DOI]

CoRR, 2022

Describing emotions with acoustic property prompts for speech emotion recognition.

[BibT_eX]

[DOI]

CoRR, 2022

XNOR-FORMER: Learning Accurate Approximations in Long Speech Transformers.

[BibT_eX]

[DOI]

Roshan Sharma

CoRR, 2022

Unifying the Discrete and Continuous Emotion labels for Speech Emotion Recognition.

[BibT_eX]

[DOI]

CoRR, 2022

Watch What You Pretrain For: Targeted, Transferable Adversarial Examples on Self-Supervised Speech Recognition models.

[BibT_eX]

[DOI]

Hadi Abdullah

CoRR, 2022

USB: A Unified Semi-supervised Learning Benchmark.

[BibT_eX]

[DOI]

CoRR, 2022

Online Video Instance Segmentation via Robust Context Fusion.

[BibT_eX]

[DOI]

CoRR, 2022

Not all broken defenses are equal: The dead angles of adversarial accuracy.

[BibT_eX]

[DOI]

CoRR, 2022

R^2VOS: Robust Referring Video Object Segmentation via Relational Multimodal Cycle Consistency.

[BibT_eX]

[DOI]

CoRR, 2022

Self-supervision and Learnable STRFs for Age, Emotion, and Country Prediction.

[BibT_eX]

[DOI]

CoRR, 2022

Bear the Query in Mind: Visual Grounding with Query-conditioned Convolution.

[BibT_eX]

[DOI]

CoRR, 2022

FreeMatch: Self-adaptive Thresholding for Semi-supervised Learning.

[BibT_eX]

[DOI]

CoRR, 2022

On the pragmatism of using binary classifiers over data intensive neural network classifiers for detection of COVID-19 from voice.

[BibT_eX]

[DOI]

CoRR, 2022

HEAR 2021: Holistic Evaluation of Audio Representations.

[BibT_eX]

[DOI]

CoRR, 2022

Ontological Learning from Weak Labels.

[BibT_eX]

[DOI]

CoRR, 2022

USB: A Unified Semi-supervised Learning Benchmark for Classification.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Improving Speech Enhancement through Fine-Grained Speech Characteristics.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Towards End-to-End Private Automatic Speaker Recognition.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Recent improvements of ASR models in the face of adversarial attacks.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Positional Encoding for Capturing Modality Specific Cadence for Emotion Detection.

[BibT_eX]

[DOI]

Hira Dhamyal

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

SphereFace2: Binary Classification is All You Need for Deep Face Recognition.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

Cross-utterance context for multimodal video transcription.

[BibT_eX]

[DOI]

Roshan Sharma

Proceedings of the 56th Asilomar Conference on Signals, Systems, and Computers, ACSSC 2022, Pacific Grove, CA, USA, October 31, 2022

2021

Discriminative Dictionary Learning for Autism Spectrum Disorder Identification.

[BibT_eX]

[DOI]

Frontiers Comput. Neurosci., 2021

Training image classifiers using Semi-Weak Label Data.

[BibT_eX]

[DOI]

Anxiang Zhang

Ankit Shah

CoRR, 2021

Constant Random Perturbations Provide Adversarial Robustness with Minimal Effect on Accuracy.

[BibT_eX]

[DOI]

CoRR, 2021

Identifying Actions for Sound Event Classification.

[BibT_eX]

[DOI]

Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2021

Detection and Evaluation of Human and Machine Generated Speech in Spoofing Attacks on Automatic Speaker Verification Systems.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

HEAR: Holistic Evaluation of Audio Representations.

[BibT_eX]

[DOI]

Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track, 2021

Masked Proxy Loss for Text-Independent Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Improving Weakly Supervised Sound Event Detection with Self-Supervised Auxiliary Tasks.

[BibT_eX]

[DOI]

Francisco Sepúlveda Teixeira

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Self-Supervised 3D Face Reconstruction via Conditional Estimation.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

The Right to Talk: An Audio-Visual Transformer Approach.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Contrast and Order Representations for Video Self-supervised Learning.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

FoolHD: Fooling Speaker Identification by Highly Imperceptible Adversarial Disturbances.

[BibT_eX]

[DOI]

Ali Shahin Shamsabadi

Proceedings of the IEEE International Conference on Acoustics, 2021

Towards Adversarial Robustness Via Compact Feature Representations.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

High-Frequency Adversarial Defense for Speech and Audio.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

The in-the-Wild Speech Medical Corpus.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Sequential Randomized Smoothing for Adversarially Robust Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Point3D: tracking actions as moving points with 3D CNNs.

[BibT_eX]

[DOI]

Proceedings of the 32nd British Machine Vision Conference 2021, 2021

2020

Mask Proxy Loss for Text-Independent Speaker Recognition.

[BibT_eX]

[DOI]

CoRR, 2020

Multi-Task Learning for Interpretable Weakly Labelled Sound Event Detection.

[BibT_eX]

[DOI]

CoRR, 2020

Exploring Optimal DNN Architecture for End-to-End Beamformers Based on Time-frequency References.

[BibT_eX]

[DOI]

Yuichiro Koyama

CoRR, 2020

Efficient Integration of Multi-channel Information for Speaker-independent Speech Separation.

[BibT_eX]

[DOI]

Yuichiro Koyama

Oluwafemi Azeez

CoRR, 2020

Exploring the Best Loss Function for DNN-Based Low-latency Speech Enhancement with Temporal Convolutional Networks.

[BibT_eX]

[DOI]

CoRR, 2020

Is normalization indispensable for training deep neural network?

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Sherlock: A Crowd-sourced System For Automatic Tagging Of Indoor Floor Plans.

[BibT_eX]

[DOI]

Khaled A. Harras

Proceedings of the 17th IEEE International Conference on Mobile Ad Hoc and Sensor Systems, 2020

Automatic In-the-wild Dataset Annotation with Deep Generalized Multiple Instance Learning.

[BibT_eX]

[DOI]

Maria Joana Correia

Proceedings of The 12th Language Resources and Evaluation Conference, 2020

Controlled AutoEncoders to Generate Faces from Voices.

[BibT_eX]

[DOI]

Proceedings of the Advances in Visual Computing - 15th International Symposium, 2020

Hide and Speak: Towards Deep Neural Networks for Speech Steganography.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

The Phonetic Bases of Vocal Expressed Emotion: Natural versus Acted.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Hierarchical Routing Mixture of Experts.

[BibT_eX]

[DOI]

Proceedings of the 25th International Conference on Pattern Recognition, 2020

Optimal Strategies For Comparing Covariates To Solve Matching Problems.

[BibT_eX]

[DOI]

Proceedings of the 25th International Conference on Pattern Recognition, 2020

Exploiting Non-Linear Redundancy for Neural Model Compression.

[BibT_eX]

[DOI]

Proceedings of the 25th International Conference on Pattern Recognition, 2020

Artificial Creative Intelligence: Breaking the Imitation Barrier.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Computational Creativity, 2020

Deriving Compact Feature Representations Via Annealed Contraction.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Sound Event Detection in the DCASE 2017 Challenge.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2019

W-Net BF: DNN-based Beamformer Using Joint Training Approach.

[BibT_eX]

[DOI]

Yuichiro Koyama

CoRR, 2019

Detecting gender differences in perception of emotion in crowdsourced data.

[BibT_eX]

[DOI]

CoRR, 2019

Non-Determinism in Neural Networks for Adversarial Robustness.

[BibT_eX]

[DOI]

CoRR, 2019

Reconstructing faces from voices.

[BibT_eX]

[DOI]

Yandong Wen

CoRR, 2019

Nonlinear Semi-Parametric Models for Survival Analysis.

[BibT_eX]

[DOI]

CoRR, 2019

Hide and Speak: Deep Neural Networks for Speech Steganography.

[BibT_eX]

[DOI]

CoRR, 2019

Face Reconstruction from Voice using Generative Adversarial Networks.

[BibT_eX]

[DOI]

Yandong Wen

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Neural Regression Trees.

[BibT_eX]

[DOI]

Proceedings of the International Joint Conference on Neural Networks, 2019

Learning Sound Events from Webly Labeled Data.

[BibT_eX]

[DOI]

Ankit Shah

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Disjoint Mapping Network for Cross-modal Matching of Voices and Faces.

[BibT_eX]

[DOI]

Proceedings of the 7th International Conference on Learning Representations, 2019

Human Behaviour Recognition Using Wifi Channel State Information.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Time Signal Classification Using Random Convolutional Features.

[BibT_eX]

[DOI]

Abelino Jiménez

Proceedings of the IEEE International Conference on Acoustics, 2019

Cross Modal Audio Search and Retrieval with Joint Embeddings Based on Text and Audio.

[BibT_eX]

[DOI]

Shuayb Zarar

Proceedings of the IEEE International Conference on Acoustics, 2019

Optimizing Neural Network Embeddings Using a Pair-Wise Loss for Text-Independent Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

In-the-Wild End-to-End Detection of Speech Affecting Diseases.

[BibT_eX]

[DOI]

M. Joana Correia

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018

AudioPairBank: towards a large-scale tag-pair-based audio content analysis.

[BibT_eX]

[DOI]

EURASIP J. Audio Speech Music. Process., 2018

Optimal Strategies for Matching and Retrieval Problems by Comparing Covariates.

[BibT_eX]

[DOI]

CoRR, 2018

A Closer Look at Weak Label Learning for Audio Events.

[BibT_eX]

[DOI]

Ankit Shah

CoRR, 2018

NELS - Never-Ending Learner of Sounds.

[BibT_eX]

[DOI]

CoRR, 2018

Speech Analytics for Medical Applications.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech, and Dialogue - 21st International Conference, 2018

Querying Depression Vlogs.

[BibT_eX]

[DOI]

M. Joana Correia

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Analysing Speech for Clinical Applications.

[BibT_eX]

[DOI]

Proceedings of the Statistical Language and Speech Processing, 2018

Classifier Risk Estimation Under Limited Labeling Resources.

[BibT_eX]

[DOI]

Proceedings of the Advances in Knowledge Discovery and Data Mining, 2018

Mining Multimodal Repositories for Speech Affecting Diseases.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Interactive Evaluation of Classifiers Under Limited Resources.

[BibT_eX]

[DOI]

Proceedings of the 17th IEEE International Conference on Machine Learning and Applications, 2018

A Corrective Learning Approach for Text-Independent Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Content-Based Representations of Audio Using Siamese Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Acoustic Scene Classification Using Discrete Random Hashing for Laplacian Kernel Machines.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Voice Impersonation Using Generative Adversarial Networks.

[BibT_eX]

[DOI]

Yang Gao

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Framework for Evaluation of Sound Event Detection in Web Videos.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Audition for multimedia computing.

[BibT_eX]

[DOI]

Proceedings of the Frontiers of Multimedia Research, 2018

2017

The Incredible Shrinking Neural Network: New Perspectives on Learning Representations Through The Lens of Pruning.

[BibT_eX]

[DOI]

CoRR, 2017

On the Origin of Deep Learning.

[BibT_eX]

[DOI]

Haohan Wang

Eric P. Xing

CoRR, 2017

Be Careful What You Backpropagate: A Case For Linear Output Activations & Gradient Boosting.

[BibT_eX]

[DOI]

CoRR, 2017

Deep CNN Framework for Audio Event Recognition using Weakly Labeled Web Data.

[BibT_eX]

[DOI]

CoRR, 2017

A two factor transformation for speaker verification through ℓ1 comparison.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Workshop on Information Forensics and Security, 2017

Inferring room semantics using acoustic monitoring.

[BibT_eX]

[DOI]

Khaled A. Harras

Proceedings of the 27th IEEE International Workshop on Machine Learning for Signal Processing, 2017

Hidden Markov Model Variational Autoencoder for Acoustic Unit Discovery.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Audio Content Based Geotagging in Multimedia.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Audio event and scene recognition: A unified approach using strongly and weakly labeled data.

[BibT_eX]

[DOI]

Proceedings of the 2017 International Joint Conference on Neural Networks, 2017

Supervised monaural source separation based on autoencoders.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Privacy preserving Distance computation using somewhat-trusted third parties.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Discovering sound concepts and acoustic relations in text.

[BibT_eX]

[DOI]

Ndapandula Nakashole

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

An approach for self-training audio event detectors using web data.

[BibT_eX]

[DOI]

Proceedings of the 25th European Signal Processing Conference, 2017

DCASE2017 Challenge Setup: Tasks, Datasets and Baseline System.

[BibT_eX]

[DOI]

Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events, 2017

DCASE 2017 Task 1: Acoustic Scene Classification Using Shift-Invariant Kernels and Random Features.

[BibT_eX]

[DOI]

Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events, 2017

SphereFace: Deep Hypersphere Embedding for Face Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Topic and Prosodic Modeling for Interruption Management in Multi-User Multitasking Communication Interactions.

[BibT_eX]

[DOI]

Nia Peters

Mohammad Javad Taghizadeh

Griffin D. Romigh

Proceedings of the 2017 AAAI Fall Symposia, Arlington, Virginia, USA, November 9-11, 2017, 2017

The REVERB Challenge: A Benchmark Task for Reverberation-Robust ASR Techniques.

[BibT_eX]

[DOI]

Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

2016

Binary Sparse Coding of Convolutive Mixtures for Sound Localization and Separation via Spatialization.

[BibT_eX]

[DOI]

Afsaneh Asaei

IEEE Trans. Signal Process., 2016

Learning Model-Based Sparsity via Projected Gradient Descent.

[BibT_eX]

[DOI]

Petros T. Boufounos

IEEE Trans. Inf. Theory, 2016

A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant speech processing research.

[BibT_eX]

[DOI]

EURASIP J. Adv. Signal Process., 2016

An Approach for Self-Training Audio Event Detectors Using Web Data.

[BibT_eX]

[DOI]

CoRR, 2016

AudioSentibank: Large-scale Semantic Ontology of Acoustic Concepts for Audio Content Analysis.

[BibT_eX]

[DOI]

CoRR, 2016

Environmental Noise Embeddings for Robust Speech Recognition.

[BibT_eX]

[DOI]

Suyoun Kim

Ian R. Lane

CoRR, 2016

Content-based Video Indexing and Retrieval Using Corr-LDA.

[BibT_eX]

[DOI]

Rahul Radhakrishnan Iyer

CoRR, 2016

Features and Kernels for Audio Event Recognition.

[BibT_eX]

[DOI]

CoRR, 2016

Adaptation of SVM for MIL for inferring the polarity of movies and movie reviews.

[BibT_eX]

[DOI]

Maria Joana Correia

Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

Audio Event Detection using Weakly Labeled Data.

[BibT_eX]

[DOI]

Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

Forensic anthropometry from voice: An articulatory-phonetic approach.

[BibT_eX]

[DOI]

Deniz Gençaga

Proceedings of the 39th International Convention on Information and Communication Technology, 2016

Short-term analysis for estimating physical parameters of speakers.

[BibT_eX]

[DOI]

James Baker

Proceedings of the 4th International Conference on Biometrics and Forensics, 2016

Formant manipulations in voice disguise by mimicry.

[BibT_eX]

[DOI]

Deniz Gençaga

Proceedings of the 4th International Conference on Biometrics and Forensics, 2016

On the Appropriateness of Complex-Valued Neural Networks for Speech Enhancement.

[BibT_eX]

[DOI]

Lukas Drude

Reinhold Haeb-Umbach

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Viral Spread via Entertainment and Voice-Messaging Among Telephone Users in India.

[BibT_eX]

[DOI]

Proceedings of the Eighth International Conference on Information and Communication Technologies and Development, 2016

Weakly supervised scalable audio content analysis.

[BibT_eX]

[DOI]

Ramón Fernandez Astudillo

Proceedings of the IEEE International Conference on Multimedia and Expo, 2016

The relationship of voice onset time and Voice Offset Time to physical age.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Crowdsourced Video Subtitling with Adaptation Based on User-Corrected Lattices.

[BibT_eX]

[DOI]

João Miranda

Proceedings of the Advances in Speech and Language Technologies for Iberian Languages, 2016

Detecting Psychological Distress in Adults Through Transcriptions of Clinical Interviews.

[BibT_eX]

[DOI]

Maria Joana Correia

Proceedings of the Advances in Speech and Language Technologies for Iberian Languages, 2016

Experiments on the DCASE Challenge 2016: Acoustic Scene Classification and Sound Event Detection in Real Life Recording.

[BibT_eX]

[DOI]

Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events, 2016

The Best of BothWorlds: Combining Data-Independent and Data-Driven Approaches for Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2016

2015

Compositional Models for Audio Processing: Uncovering the structure of sound mixtures.

[BibT_eX]

[DOI]

IEEE Signal Process. Mag., 2015

A Survey: Time Travel in Deep Learning Space: An Introduction to Deep Learning Models and How Deep Learning Models Evolved from the Initial Ideas.

[BibT_eX]

[DOI]

Haohan Wang

CoRR, 2015

Privacy-Preserving Multi-Document Summarization.

[BibT_eX]

[DOI]

Luís Marujo

José Portelo

Wang Ling

David Martins de Matos

CoRR, 2015

Handcrafted Local Features are Convolutional Neural Networks.

[BibT_eX]

[DOI]

CoRR, 2015

Unsupervised Fusion Weight Learning in Multiple Classifier Systems.

[BibT_eX]

[DOI]

CoRR, 2015

Secure Modular Hashing.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Workshop on Information Forensics and Security, 2015

Complex recurrent neural networks for denoising speech signals.

[BibT_eX]

[DOI]

Keiichi Osako

Proceedings of the 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2015

CMU Informedia@TRECVID 2015: MED/SIN/LNK/SED.

[BibT_eX]

[DOI]

Proceedings of the 2015 TREC Video Retrieval Evaluation, 2015

Rapid development of public health education systems in low-literacy multilingual environments: combating ebola through voice messaging.

[BibT_eX]

[DOI]

Proceedings of the ISCA International Workshop on Speech and Language Technology in Education, 2015

Locality constrained transitive distance clustering on speech data.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Privacy-preserving Query-by-Example Speech Search.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Reducing communication overhead in distributed learning by an order of magnitude (almost).

[BibT_eX]

[DOI]

Anders Øland

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

A novel ranking method for multiple classifier systems.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Beyond Gaussian Pyramid: Multi-skip Feature Stacking for action recognition.

[BibT_eX]

[DOI]

Zhen-Zhong Lan

Ming Lin

Xuanchong Li

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

Efficient autism spectrum disorder prediction with eye movement: A machine learning framework.

[BibT_eX]

[DOI]

Proceedings of the 2015 International Conference on Affective Computing and Intelligent Interaction, 2015

2014

Bach in 2014: Music Composition with Recurrent Neural Network.

[BibT_eX]

[DOI]

I-Ting Liu

Bhiksha Ramakrishnan

CoRR, 2014

Informedia @ TRECVID 2014.

[BibT_eX]

[DOI]

Proceedings of the 2014 TREC Video Retrieval Evaluation, 2014

Privacy-Preserving Important Passage Retrieval.

[BibT_eX]

[DOI]

Luís Marujo

José Portelo

David Martins de Matos

Proceedings of the Proceeding of the 1st International Workshop on Privacy-Preserving IR: When Information Retrieval Meets Privacy and Security co-located with 37th Annual International ACM SIGIR conference, 2014

Privacy-preserving speaker verification using secure binary embeddings.

[BibT_eX]

[DOI]

Proceedings of the 37th International Convention on Information and Communication Technology, 2014

Post-masking: a hybrid approach to array processing for speech recognition.

[BibT_eX]

[DOI]

Amir R. Moghimi

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Active-set newton algorithm for non-negative sparse coding of audio.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

Iterative Bayesian word segmentation for unsupervised vocabulary discovery from phoneme lattices.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

Privacy-preserving speaker verification using garbled GMMS.

[BibT_eX]

[DOI]

Proceedings of the 22nd European Signal Processing Conference, 2014

Detecting sound objects in audio recordings.

[BibT_eX]

[DOI]

Proceedings of the 22nd European Signal Processing Conference, 2014

2013

Active-Set Newton Algorithm for Overcomplete Non-Negative Representations of Audio.

[BibT_eX]

[DOI]

Jort Florent Gemmeke

IEEE Trans. Speech Audio Process., 2013

Privacy-Preserving Speaker Verification and Identification Using Gaussian Mixture Models.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2013

Privacy-Preserving Speech Processing: Cryptographic and String-Matching Frameworks Show Promise.

[BibT_eX]

[DOI]

IEEE Signal Process. Mag., 2013

Measuring prevalence of other-oriented transactive contributions using an automated measure of speech style accommodation.

[BibT_eX]

[DOI]

Int. J. Comput. Support. Collab. Learn., 2013

Robust 1-bit Compressive Sensing via Gradient Support Pursuit

[BibT_eX]

[DOI]

Petros T. Boufounos

CoRR, 2013

Informedia@TRECVID 2013.

[BibT_eX]

[DOI]

Proceedings of the 2013 TREC Video Retrieval Evaluation, 2013

Swara Histogram Based Structural Analysis And Identification Of Indian Classical Ragas.

[BibT_eX]

[DOI]

Pranay Dighe

Harish Karnick

Proceedings of the 14th International Society for Music Information Retrieval Conference, 2013

A Comparative Study Of Indian And Western Music Forms.

[BibT_eX]

[DOI]

Parul Agarwal

Harish Karnick

Proceedings of the 14th International Society for Music Information Retrieval Conference, 2013

Secure binary embeddings of front-end factor analysis for privacy preserving speaker verification.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Discriminatively trained dependency language modeling for conversational speech recognition.

[BibT_eX]

[DOI]

Benjamin Lambert

Leibny Paola García-Perera

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Ensemble approach in speaker verification.

[BibT_eX]

[DOI]

Juan Arturo Nolazco-Flores

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Scale independent raga identification using chromagram patterns and swara based features.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE International Conference on Multimedia and Expo Workshops, 2013

Doppler based speed estimation of vehicles using passive sensor.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE International Conference on Multimedia and Expo Workshops, 2013

Speaker tracking with spherical microphone arrays.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

Optimization of the DET curve in speaker verification under noisy conditions.

[BibT_eX]

[DOI]

Leibny Paola García-Perera

Juan Arturo Nolazco-Flores

Proceedings of the IEEE International Conference on Acoustics, 2013

Unsupervised hierarchical structure induction for deeper semantic analysis of audio.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

Speaker verification using Secure Binary Embeddings.

[BibT_eX]

[DOI]

Proceedings of the 21st European Signal Processing Conference, 2013

Event detection in short duration audio using Gaussian Mixture Model and Random Forest Classifier.

[BibT_eX]

[DOI]

Proceedings of the 21st European Signal Processing Conference, 2013

A hierarchical system for word discovery exploiting DTW-based initialization.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

Unsupervised word segmentation from noisy input.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

2012

Large Margin Gaussian Mixture Models with Differential Privacy.

[BibT_eX]

[DOI]

IEEE Trans. Dependable Secur. Comput., 2012

Microphone Array Processing for Distant Speech Recognition: From Close-Talking Microphones to Far-Field Sensors.

[BibT_eX]

[DOI]

IEEE Signal Process. Mag., 2012

Ultrasonic Doppler Sensing in HCI.

[BibT_eX]

[DOI]

IEEE Pervasive Comput., 2012

The Markov selection model for concurrent speech recognition.

[BibT_eX]

[DOI]

Juan Arturo Nolazco-Flores

Neurocomputing, 2012

Informedia @TRECVID 2012.

[BibT_eX]

[DOI]

Proceedings of the 2012 TREC Video Retrieval Evaluation, 2012

Optimization of the DET curve in speaker verification.

[BibT_eX]

[DOI]

L. Paola García-Perera

Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012

Unsupervised Structure Discovery for Semantic Analysis of Audio.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012

Privacy-Preserving Speaker Authentication.

[BibT_eX]

[DOI]

Proceedings of the Information Security - 15th International Conference, 2012

Language identification using spectro-temporal patch features.

[BibT_eX]

[DOI]

Proceedings of the ISCA Workshop on Statistical And Perceptual Audition, 2012

Microphone Array Post-filter based on Spatially-Correlated Noise Measurements for Distant Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Plagiarism Detection in Polyphonic Music using Monaural Signal Separation.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Exploiting Temporal Sequence Structure for Semantic Analysis of Multimedia.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Structured sparse coding for microphone array location calibration.

[BibT_eX]

[DOI]

Proceedings of the ISCA Workshop on Statistical And Perceptual Audition, 2012

Predicting Idea Co-Construction in Speech Data using Insights from Sociolinguistics.

[BibT_eX]

[DOI]

Proceedings of the Future of Learning: Proceedings of the 10th International Conference of the Learning Sciences, 2012

Attacking a privacy preserving music matching algorithm.

[BibT_eX]

[DOI]

José Portelo

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Privacy-preserving speaker verification as password matching.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Audio event detection from acoustic unit occurrence patterns.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Spectrographic seam patterns for discriminative word spotting.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

An Unsupervised Dynamic Bayesian Network Approach to Measuring Speech Style Accommodation.

[BibT_eX]

[DOI]

Carolyn Penstein Rosé

Proceedings of the EACL 2012, 2012

Microphone array processing for distant speech recognition: Spherical arrays.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2012

Microphone array processing for distant speech recognition: Towards real-world deployment.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2012

Introduction.

[BibT_eX]

[DOI]

Proceedings of the Techniques for Noise Robustness in Automatic Speech Recognition, 2012

The Basics of Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the Techniques for Noise Robustness in Automatic Speech Recognition, 2012

The Problem of Robustness in Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the Techniques for Noise Robustness in Automatic Speech Recognition, 2012

2011

Missing Data Imputation for Time-Frequency Representations of Audio Signals.

[BibT_eX]

[DOI]

J. Signal Process. Syst., 2011

Efficient Protocols for Principal Eigenvector Computation over Private Data.

[BibT_eX]

[DOI]

Trans. Data Priv., 2011

Preface.

[BibT_eX]

[DOI]

Martin Heckmann

Speech Commun., 2011

A Unifying Analysis of Projected Gradient Descent for $ell_p$-constrained Least Squares

[BibT_eX]

[DOI]

CoRR, 2011

Privacy Preserving Spam Filtering

[BibT_eX]

[DOI]

Mehrbod Sharifi

CoRR, 2011

On the combination of voice prompt suppression with maximum kurtosis beamforming.

[BibT_eX]

[DOI]

Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2011

Block-wise incremental adaptation algorithm for maximum kurtosis beamforming.

[BibT_eX]

[DOI]

Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2011

Learning contextual relevance of audio segments using discriminative models over AUD sequences.

[BibT_eX]

[DOI]

Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2011

A Comparison of Latent Variable Models For Conversation Analysis.

[BibT_eX]

[DOI]

Proceedings of the SIGDIAL 2011 Conference, 2011

Phoneme-Dependent NMF for Speech Enhancement in Monaural Mixtures.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Privacy Preserving Speaker Verification Using Adapted GMMs.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

A Paradigm for Limited Vocabulary Speech Recognition Based on Redundant Spectro-Temporal Feature Sets.

[BibT_eX]

[DOI]

Tony Ezzat

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Unsupervised Learning of Acoustic Unit Descriptors for Audio Content Representation and Classification.

[BibT_eX]

[DOI]

Mark Harvilla

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

A paired test for recognizer selection with untranscribed data.

[BibT_eX]

[DOI]

James Baker

Proceedings of the IEEE International Conference on Acoustics, 2011

Privacy preserving probabilistic inference with Hidden Markov Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

Gammatone sub-band magnitude-domain dereverberation for ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

An iterative least-squares technique for dereverberation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

On the implementation of a secure musical database matching.

[BibT_eX]

[DOI]

Proceedings of the 19th European Signal Processing Conference, 2011

The automatic assessment of knowledge integration processes in project teams.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Computer Supported Collaborative Learning, 2011

Maximum kurtosis beamforming with a subspace filter for distant speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

An information filter for voice prompt suppression.

[BibT_eX]

[DOI]

Proceedings of the Conference Record of the Forty Fifth Asilomar Conference on Signals, 2011

Greedy sparsity-constrained optimization.

[BibT_eX]

[DOI]

Petros Boufounos

Proceedings of the Conference Record of the Forty Fifth Asilomar Conference on Signals, 2011

Reconstructing Noise-Corrupted Spectrographic Components for Robust Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the Robust Speech Recognition of Uncertain or Missing Data, 2011

2010

Scalable Audio-Content Analysis.

[BibT_eX]

[DOI]

EURASIP J. Audio Speech Music. Process., 2010

Privacy Preserving Protocols for Eigenvector Computation.

[BibT_eX]

[DOI]

Proceedings of the Privacy and Security Issues in Data Mining and Machine Learning, 2010

Large Margin Multiclass Gaussian Classification with Differential Privacy.

[BibT_eX]

[DOI]

Proceedings of the Privacy and Security Issues in Data Mining and Machine Learning, 2010

Multiparty Differential Privacy via Aggregation of Locally Trained Classifiers.

[BibT_eX]

[DOI]

Shantanu Rane

Proceedings of the Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, 2010

The use of sense in unsupervised training of acoustic models for ASR systems.

[BibT_eX]

[DOI]

Benjamin Lambert

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Ungrounded independent non-negative factor analysis.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Non-negative matrix factorization based compensation of music for automatic speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Creating a linguistic plausibility dataset with non-expert annotators.

[BibT_eX]

[DOI]

Benjamin Lambert

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Spectrogram dimensionality reductionwith independence constraints.

[BibT_eX]

[DOI]

Kevin W. Wilson

Proceedings of the IEEE International Conference on Acoustics, 2010

Synthesizing speech from Doppler signals.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2010

Ultrasonic sensing for robust speech recognition.

[BibT_eX]

[DOI]

Sundararajan Srinivasan

Tony Ezzat

Proceedings of the IEEE International Conference on Acoustics, 2010

Latent-variable decomposition based dereverberation of monaural and multi-channel signals.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2010

Learning-based auditory encoding for robust speech recognition.

[BibT_eX]

[DOI]

Yu-Hsiang Bosco Chiu

Proceedings of the IEEE International Conference on Acoustics, 2010

A hybrid physical and statistical dynamic articulatory framework incorporating analysis-by-synthesis for improved phone classification.

[BibT_eX]

[DOI]

Ziad Al Bawab

Proceedings of the IEEE International Conference on Acoustics, 2010

Non-negative Hidden Markov Modeling of Audio with Application to Source Separation.

[BibT_eX]

[DOI]

Gautham J. Mysore

Proceedings of the Latent Variable Analysis and Signal Separation, 2010

2009

A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, 2009

Signal separation for robust speech recognition based on phase difference information obtained in the frequency domain.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Towards fusion of feature extraction and acoustic model training: a top down process for robust speech recognition.

[BibT_eX]

[DOI]

Yu-Hsiang Bosco Chiu

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Deriving vocal tract shapes from electromagnetic articulograph data via geometric adaptation and matching.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Probabilistic Factorization of Non-negative Data with Entropic Co-occurrence Constraints.

[BibT_eX]

[DOI]

Gautham J. Mysore

Proceedings of the Independent Component Analysis and Signal Separation, 2009

One-handed gesture recognition using ultrasonic Doppler sonar.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2009

A joint decoding algorithm for multiple-example-based addition of words to a pronunciation lexicon.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2009

Word Particles Applied to Information Retrieval.

[BibT_eX]

[DOI]

Evandro B. Gouvêa

Proceedings of the Advances in Information Retrieval, 2009

2008

Probabilistic Latent Variable Models as Nonnegative Factorizations.

[BibT_eX]

[DOI]

Comput. Intell. Neurosci., 2008

Regularized non-negative matrix factorization with temporal dependencies for speech denoising.

[BibT_eX]

[DOI]

Kevin W. Wilson

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Speech denoising using nonnegative matrix factorization with priors.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2008

Sparse and shift-invariant feature extraction from non-negative data.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2008

Ultrasonic Doppler sensor for speaker recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2008

Analysis-by-synthesis features for speech recognition.

[BibT_eX]

[DOI]

Ziad Al Bawab

Proceedings of the IEEE International Conference on Acoustics, 2008

Recognizing talking faces from acoustic Doppler reflections.

[BibT_eX]

[DOI]

Proceedings of the 8th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2008), 2008

2007

Soft Mask Methods for Single-Channel Speaker Separation.

[BibT_eX]

[DOI]

Aarthi M. Reddy

IEEE Trans. Speech Audio Process., 2007

Ultrasonic Doppler Sensor for Voice Activity Detection.

[BibT_eX]

[DOI]

Rongquiang Hu

IEEE Signal Process. Lett., 2007

An FFT-Based Companding Front End for Noise-Robust Automatic Speech Recognition.

[BibT_eX]

[DOI]

EURASIP J. Audio Speech Music. Process., 2007

Sparse Overcomplete Latent Variable Decomposition of Counts Data.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 20, 2007

Probabilistic deduction of symbol mappings for extension of lexicons.

[BibT_eX]

[DOI]

Evandro B. Gouvêa

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Sparse Overcomplete Decomposition for Single Channel Speaker Separation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2007

Bandwidth Expansionwith a pólya URN Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2007

Supervised and Semi-supervised Separation of Sounds from Single-Channel Mixtures.

[BibT_eX]

[DOI]

Proceedings of the Independent Component Analysis and Signal Separation, 2007

Sensor and Data Systems, Audio-Assisted Cameras and Acoustic Doppler Sensors.

[BibT_eX]

[DOI]

Proceedings of the 2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2007), 2007

Acoustic Doppler sonar for gait recogination.

[BibT_eX]

[DOI]

Proceedings of the Fourth IEEE International Conference on Advanced Video and Signal Based Surveillance, 2007

2006

An acoustic Doppler-Based Front End for Hands Free spoken User Interfaces.

[BibT_eX]

[DOI]

Proceedings of the 2006 IEEE ACL Spoken Language Technology Workshop, 2006

An integrated approach to improve speech recognition rate for non-native speakers.

[BibT_eX]

[DOI]

Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Latent Dirichlet Decomposition for Single Channel Speaker Separation.

[BibT_eX]

[DOI]

Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

2005

Missing-feature approaches in speech recognition.

[BibT_eX]

[DOI]

IEEE Signal Process. Mag., 2005

Voice driven applications in non-stationary and chaotic environment.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Biomimetics, 2005

Recognizing speech from simultaneous speakers.

[BibT_eX]

[DOI]

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Bandwidth expansion of narrowband speech using non-negative matrix factorization.

[BibT_eX]

[DOI]

Dhananjay Bansal

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

A Comparison Between Spoken Queries and Menu-Based Interfaces for In-car Digital Music Selection.

[BibT_eX]

[DOI]

Proceedings of the Human-Computer Interaction, 2005

A Companding Front End for Noise-Robust Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

Feature compensation with secondary sensor measurements for robust speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 13th European Signal Processing Conference, 2005

Speech Recognizer Based Maximum Likelihood Beamforming.

[BibT_eX]

[DOI]

Manuel Jesus Reyes-Gomez

Proceedings of the Speech Separation by Humans and Machines, 2005

2004

Classification in Likelihood Spaces.

[BibT_eX]

[DOI]

Technometrics, 2004

Likelihood-maximizing beamforming for robust hands-free speech recognition.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2004

A Bayesian classifier for spectrographic mask estimation for missing feature speech recognition.

[BibT_eX]

[DOI]

Speech Commun., 2004

Reconstruction of missing features for robust speech recognition.

[BibT_eX]

[DOI]

Speech Commun., 2004

A Speech-in List-out Approach to Spoken User Interfaces.

[BibT_eX]

[DOI]

Proceedings of HLT-NAACL 2004: Short Papers, Boston, Massachusetts, USA, May 2-7, 2004, 2004

Spokenquery: an alternate approach to chosing items with speech.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Soft mask estimation for single channel speaker separation.

[BibT_eX]

[DOI]

Aarthi M. Reddy

Proceedings of the ISCA Tutorial and Research Workshop on Statistical and Perceptual Audio Processing, 2004

A minimum mean squared error estimator for single channel speaker separation.

[BibT_eX]

[DOI]

Aarthi M. Reddy

Proceedings of the 8th International Conference on Spoken Language Processing, 2004

On tracking noise with linear dynamical system models.

[BibT_eX]

[DOI]

Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

2003

Speech-recognizer-based filter optimization for microphone array processing.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2003

Classifier-based non-linear projection for adaptive endpointing of continuous speech.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2003

Classification with free energy at raised temperatures.

[BibT_eX]

[DOI]

Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Design of the CMU sphinx-4 decoder.

[BibT_eX]

[DOI]

Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Tracking noise via dynamical systems with a continuum of states.

[BibT_eX]

[DOI]

Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

Lossless compression of language model structure and word identifiers.

[BibT_eX]

[DOI]

Edward W. D. Whittaker

Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

Multi-channel source separation by factorial HMMs.

[BibT_eX]

[DOI]

Manuel J. Reyes Gomez

Dan Ellis

Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

2002

Automatic generation of subword units for speech recognition systems.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2002

The MERL SpokenQuery information retrieval system a system for retrieving pertinent documents from a spoken query.

[BibT_eX]

[DOI]

Peter Wolf

Proceedings of the 2002 IEEE International Conference on Multimedia and Expo, 2002

Speech recognizer-based microphone array processing for robust hands-free speech recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2002

2001

Comparison of width-wise and length-wise language model compression.

[BibT_eX]

[DOI]

Edward W. D. Whittaker

Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

Quantization-based language model compression.

[BibT_eX]

[DOI]

Edward W. D. Whittaker

Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

Calibration of microphone arrays for improved speech recognition.

[BibT_eX]

[DOI]

Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

A boosting approach for confidence scoring.

[BibT_eX]

[DOI]

Pedro J. Moreno

Beth Logan

Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

Speech in Noisy Environments: robust automatic segmentation, feature extraction, and hypothesis combination.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2001

2000

Structured redefinition of sound units by merging and splitting for improved speech recognition.

[BibT_eX]

[DOI]

Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Classifier-based mask estimation for missing feature methods of robust speech recognition.

[BibT_eX]

[DOI]

Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Reconstruction of damaged spectrographic features for robust speech recognition.

[BibT_eX]

[DOI]

Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Automatic generation of phone sets and lexical transcriptions.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2000

1999

Domain adduced state tying for cross-domain acoustic modelling.

[BibT_eX]

[DOI]

Proceedings of the Sixth European Conference on Speech Communication and Technology, 1999

Automatic clustering and generation of contextual questions for tied states in hidden Markov models.

[BibT_eX]

[DOI]

Proceedings of the 1999 IEEE International Conference on Acoustics, 1999

1998

Data-driven environmental compensation for speech recognition: A unified approach.

[BibT_eX]

[DOI]

Pedro J. Moreno

Speech Commun., 1998

Inference of missing spectrographic features for robust speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998

1997

The effects of background music on speech recognition accuracy.

[BibT_eX]

[DOI]

Vipul N. Parikh