Shri Narayanan

Orcid: 0000-0002-1052-6204

Affiliations:
  • University of Southern California, Signal Analysis and Interpretation Lab, Los Angeles, USA


According to our database1, Shri Narayanan authored at least 888 papers between 1993 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Awards

ACM Fellow

ACM Fellow 2023, "For contributions to speech, language, multimedia processing, affective computing, and their human-centered applications".

IEEE Fellow

IEEE Fellow 2009, "For contributions to human-centric multimodal signal processing and applications".

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
The Strong Pull of Prior Knowledge in Large Language Models and Its Impact on Emotion Recognition.
CoRR, 2024

The NeurIPS 2023 Machine Learning for Audio Workshop: Affective Audio Benchmarks and Novel Data.
CoRR, 2024

Knowledge-guided EEG Representation Learning.
CoRR, 2024

Can Text-to-image Model Assist Multi-modal Learning for Visual Recognition with Visual Modality Missing?
CoRR, 2024

Understanding Stress, Burnout, and Behavioral Patterns in Medical Residents Using Large-scale Longitudinal Wearable Recordings.
CoRR, 2024

A Multi-Perspective Machine Learning Approach to Evaluate Police-Driver Interaction in Los Angeles.
CoRR, 2024

2023
An Engineering View on Emotions and Speech: From Analysis and Predictive Models to Responsible Human-Centered Applications.
Proc. IEEE, October, 2023

A study of bias mitigation strategies for speaker recognition.
Comput. Speech Lang., April, 2023

Modeling inter-individual differences in ambulatory-based multimodal signals via metric learning: a case study of personalized well-being estimation of healthcare workers.
Frontiers Digit. Health, March, 2023

Cross Modal Video Representations for Weakly Supervised Active Speaker Localization.
IEEE Trans. Multim., 2023

Explainable Severity ranking via pairwise n-hidden comparison: a case study of glaucoma.
CoRR, 2023

Audio-visual child-adult speaker classification in dyadic interactions.
CoRR, 2023

FedAIoT: A Federated Learning Benchmark for Artificial Intelligence of Things.
CoRR, 2023

Scaling Representation Learning from Ubiquitous ECG with State-Space Models.
CoRR, 2023

Does Video Summarization Require Videos? Quantifying the Effectiveness of Language in Video Summarization.
CoRR, 2023

Foundation Model Assisted Automatic Speech Emotion Recognition: Transcribing, Annotating, and Augmenting.
CoRR, 2023

Emotion-Aligned Contrastive Learning Between Images and Music.
CoRR, 2023

Learning Behavioral Representations of Routines From Large-scale Unlabeled Wearable Time-series Data Streams using Hawkes Point Process.
CoRR, 2023

Unlocking Foundation Models for Privacy-Enhancing Speech Understanding: An Early Study on Low Resource Speech Training Leveraging Label-guided Synthetic Speech Content.
CoRR, 2023

GPT-FL: Generative Pre-trained Model-Assisted Federated Learning.
CoRR, 2023

Understanding Spoken Language Development of Children with ASD Using Pre-trained Speech Embeddings.
CoRR, 2023

Socio-Linguistic Characteristics of Coordinated Inauthentic Accounts.
CoRR, 2023

TrustSER: On the Trustworthiness of Fine-tuning Pre-trained Speech Embeddings For Speech Emotion Recognition.
CoRR, 2023

MovieCLIP: Visual Scene Recognition in Movies.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023

SEAR: Semantically-grounded Audio Representations.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

MM-AU: Towards Multimodal Understanding of Advertisement Videos.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

FedMultimodal: A Benchmark for Multimodal Federated Learning.
Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023

Tensor Embedding: A Supervised Framework for Human Behavioral Data Mining and Prediction.
Proceedings of the 11th IEEE International Conference on Healthcare Informatics, 2023

FedAudio: A Federated Learning Benchmark for Audio Tasks.
Proceedings of the IEEE International Conference on Acoustics, 2023

Can Knowledge of End-to-End Text-to-Speech Models Improve Neural Midi-to-Audio Synthesis Systems?
Proceedings of the IEEE International Conference on Acoustics, 2023

Toward Privacy-Enhancing Ambulatory-Based Well-Being Monitoring: Investigating User Re-Identification Risk in Multimodal Data.
Proceedings of the IEEE International Conference on Acoustics, 2023

A Context-Aware Computational Approach for Measuring Vocal Entrainment in Dyadic Conversations.
Proceedings of the IEEE International Conference on Acoustics, 2023

A Dataset for Audio-Visual Sound Event Detection in Movies.
Proceedings of the IEEE International Conference on Acoustics, 2023

Using Emotion Embeddings to Transfer Knowledge between Emotions, Languages, and Annotation Formats.
Proceedings of the IEEE International Conference on Acoustics, 2023

Leveraging Label Correlations in a Multi-Label Setting: a Case Study in Emotion.
Proceedings of the IEEE International Conference on Acoustics, 2023

Contextually-Rich Human Affect Perception Using Multimodal Scene Information.
Proceedings of the IEEE International Conference on Acoustics, 2023

On the Role of Visual Context in Enriching Music Representations.
Proceedings of the IEEE International Conference on Acoustics, 2023

Multimodal Estimation Of Change Points Of Physiological Arousal During Driving.
Proceedings of the IEEE International Conference on Acoustics, 2023

Signal Processing Grand Challenge 2023 - E-Prevention: Sleep Behavior as an Indicator of Relapses in Psychotic Patients.
Proceedings of the IEEE International Conference on Acoustics, 2023

Navigating and Reaching Therapeutic Goals with Dynamical Systems in Conversation-Based Interventions.
Proceedings of the IEEE International Conference on Acoustics, 2023

Designing and Evaluating Speech Emotion Recognition Systems: A Reality Check Case Study with IEMOCAP.
Proceedings of the IEEE International Conference on Acoustics, 2023

Domain Adaptation for Sentiment Analysis Using Robust Internal Representations.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Character Coreference Resolution in Movie Screenplays.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

Context Unlocks Emotions: Text-based Emotion Classification Dataset Auditing with Large Language Models.
Proceedings of the 11th International Conference on Affective Computing and Intelligent Interaction, 2023

PEFT-SER: On the Use of Parameter Efficient Transfer Learning Approaches For Speech Emotion Recognition Using Pre-trained Speech Models.
Proceedings of the 11th International Conference on Affective Computing and Intelligent Interaction, 2023

2022
Robust Character Labeling in Movie Videos: Data Resources and Self-Supervised Feature Adaptation.
IEEE Trans. Multim., 2022

Joint Multi-Dimensional Model for Global and Time-Series Annotations.
IEEE Trans. Affect. Comput., 2022

Modeling Vocal Entrainment in Conversational Speech Using Deep Unsupervised Learning.
IEEE Trans. Affect. Comput., 2022

Multi-Label Multi-Task Deep Learning for Behavioral Coding.
IEEE Trans. Affect. Comput., 2022

Editorial: Intelligent Signal Analysis for Contagious Virus Diseases.
IEEE J. Sel. Top. Signal Process., 2022

Studying Large-Scale Behavioral Differences in Auschwitz-Birkenau with Simulation of Gendered Narratives.
Digit. Humanit. Q., 2022

End-to-end neural systems for automatic children speech recognition: An empirical study.
Comput. Speech Lang., 2022

A review of speaker diarization: Recent advances with deep learning.
Comput. Speech Lang., 2022

Causal indicators for assessing the truthfulness of child speech in forensic interviews.
Comput. Speech Lang., 2022

An automated quality evaluation framework of psychotherapy conversations with local quality estimates.
Comput. Speech Lang., 2022

Exploring Workplace Behaviors through Speaking Patterns using Large-scale Multimodal Wearable Recordings: A Study of Healthcare Providers.
CoRR, 2022

A Review of Speech-centric Trustworthy Machine Learning: Privacy, Safety, and Fairness.
CoRR, 2022

Audio-Visual Activity Guided Cross-Modal Identity Association for Active Speaker Detection.
CoRR, 2022

Multimodal Estimation of Change Points of Physiological Arousal in Drivers.
CoRR, 2022

Unsupervised active speaker detection in media content using cross-modal information.
CoRR, 2022

VAuLT: Augmenting the Vision-and-Language Transformer with the Propagation of Deep Language Representations.
CoRR, 2022

Local dynamic mode of Cognitive Behavioral Therapy.
CoRR, 2022

Using Active Speaker Faces for Diarization in TV shows.
CoRR, 2022

Mel Frequency Spectral Domain Defenses against Adversarial Attacks on Speech Recognition Systems.
CoRR, 2022

Audio visual character profiles for detecting background characters in entertainment media.
CoRR, 2022

Multimodal Clustering with Role Induced Constraints for Speaker Diarization.
Proceedings of the Interspeech 2022, 2022

User-Level Differential Privacy against Attribute Inference Attack of Speech Emotion Recognition on Federated Learning.
Proceedings of the Interspeech 2022, 2022

Semi-FedSER: Semi-supervised Learning for Speech Emotion Recognition On Federated Learning using Multiview Pseudo-Labeling.
Proceedings of the Interspeech 2022, 2022

Automating Detection of Papilledema in Pediatric Fundus Images with Explainable Machine Learning.
Proceedings of the 2022 IEEE International Conference on Image Processing, 2022

Enhancing Privacy Through Domain Adaptive Noise Injection For Speech Emotion Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022

Leveraging Open Data and Task Augmentation to Automated Behavioral Coding of Psychotherapy Conversations in Low-Resource Scenarios.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

Audio and ASR-based Filled Pause Detection.
Proceedings of the 10th International Conference on Affective Computing and Intelligent Interaction, 2022

2021
Generalized Multiview Shared Subspace Learning Using View Bootstrapping.
IEEE Trans. Signal Process., 2021

Evidence of Task-Independent Person-Specific Signatures in EEG Using Subspace Techniques.
IEEE Trans. Inf. Forensics Secur., 2021

Meta-Learning With Latent Space Clustering in Generative Adversarial Network for Speaker Diarization.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Temporal Dynamics of Workplace Acoustic Scenes: Egocentric Analysis and Prediction.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Multimodal Embeddings From Language Models for Emotion Recognition in the Wild.
IEEE Signal Process. Lett., 2021

Extending the Beta divergence to complex values.
Pattern Recognit. Lett., 2021

Computational Media Intelligence: Human-Centered Machine Analysis of Media.
Proc. IEEE, 2021

Deep multiple instance learning for foreground speech localization in ambient audio from wearable devices.
EURASIP J. Audio Speech Music. Process., 2021

Unsupervised speech representation learning for behavior modeling using triplet enhanced contextualized networks.
Comput. Speech Lang., 2021

Adversarial attack and defense strategies for deep speaker recognition systems.
Comput. Speech Lang., 2021

An analysis of observation length requirements for machine understanding of human behaviors from spoken language.
Comput. Speech Lang., 2021

Attribute Inference Attack of Speech Emotion Recognition in Federated Learning Settings.
CoRR, 2021

Understanding of Emotion Perception from Art.
CoRR, 2021

Cross Domain Emotion Recognition using Few Shot Knowledge Transfer.
CoRR, 2021

Representation of professions in entertainment media: Insights into frequency and sentiment trends through computational text analysis.
CoRR, 2021

Phone Duration Modeling for Speaker Age Estimation in Children.
CoRR, 2021

Perceptual-based deep-learning denoiser as a defense against adversarial attacks on ASR systems.
CoRR, 2021

Front-end Diarization for Percussion Separation in Taniavartanam of Carnatic Music Concerts.
CoRR, 2021

Automated Quality Assessment of Cognitive Behavioral Therapy Sessions Through Highly Contextualized Language Representations.
CoRR, 2021

"Am I A Good Therapist?" Automated Evaluation Of Psychotherapy Skills Using Speech And Language Technologies.
CoRR, 2021

A multispeaker dataset of raw and reconstructed speech production real-time MRI video and 3D volumetric images.
CoRR, 2021

Attention-gated convolutional neural networks for off-resonance correction of spiral real-time MRI.
CoRR, 2021

Confusion2vec 2.0: Enriching Ambiguous Spoken Language Representations with Subwords.
CoRR, 2021

RNN Based Incremental Online Spoken Language Understanding.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Developing Neural Representations for Robust Child-Adult Diarization.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Having a Bad Day? Detecting the Impact of Atypical Events Using Wearable Sensors.
Proceedings of the Social, Cultural, and Behavioral Modeling, 2021

Leveraging Real-Time MRI for Illuminating Linguistic Velum Action.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Acted vs. Improvised: Domain Adaptation for Elicitation Approaches in Audio-Visual Emotion Recognition.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Analyzing Short Term Dynamic Speech Features for Understanding Behavioral Traits of Children with Autism Spectrum Disorder.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Multi-Scale Speaker Diarization with Neural Affinity Score Fusion.
Proceedings of the IEEE International Conference on Acoustics, 2021

Adversarial Defense for Deep Speaker Recognition Using Hybrid Adversarial Training.
Proceedings of the IEEE International Conference on Acoustics, 2021

Context-Aware Speech Stress Detection in Hospital Workers Using Bi-LSTM Classifiers.
Proceedings of the IEEE International Conference on Acoustics, 2021

Feature Fusion Strategies for End-to-End Evaluation of Cognitive Behavior Therapy Sessions.
Proceedings of the 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society, 2021

Loss Function Approaches for Multi-label Music Tagging.
Proceedings of the 18th International Conference on Content-Based Multimedia Indexing, 2021

A Computational Tool to Study Vocal Participation of Women in UN-ITU Meetings.
Proceedings of the 18th International Conference on Content-Based Multimedia Indexing, 2021

Annotation and Evaluation of Coreference Resolution in Screenplays.
Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021

Mitigating the Bias of Heterogeneous Human Behavior in Affective Computing.
Proceedings of the 9th International Conference on Affective Computing and Intelligent Interaction, 2021

Privacy and Utility Preserving Data Transformation for Speech Emotion Recognition.
Proceedings of the 9th International Conference on Affective Computing and Intelligent Interaction, 2021

2020
Auto-Tuning Spectral Clustering for Speaker Diarization Using Normalized Maximum Eigengap.
IEEE Signal Process. Lett., 2020

Affect Estimation with Wearable Sensors.
J. Heal. Informatics Res., 2020

User-Based Collaborative Filtering Mobile Health System.
Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., 2020

Investigating Group-Specific Models of Hospital Workers' Well-Being: Implications for Algorithmic Bias.
Int. J. Semantic Comput., 2020

Leveraging Linguistic Context in Dyadic Interactions to Improve Automatic Speech Recognition for Children.
Comput. Speech Lang., 2020

Vocal tract shaping of emotional speech.
Comput. Speech Lang., 2020

Multi-Face: Self-supervised Multiview Adaptation for Robust Face Clustering in Videos.
CoRR, 2020

Victim or Perpetrator? Analysis of Violent Characters Portrayals from Movie Scripts.
CoRR, 2020

Having a Bad Day? Detecting the Impact of Atypical Life Events Using Wearable Sensors.
CoRR, 2020

Designing Neural Speaker Embeddings with Meta Learning.
CoRR, 2020

Affective Conditioning on Hierarchical Networks applied to Depression Detection from Transcribed Clinical Interviews.
CoRR, 2020

Generalized Multi-view Shared Subspace Learning using View Bootstrapping.
CoRR, 2020

TILES-2018: A longitudinal physiologic and behavioral data set of hospital workers.
CoRR, 2020

A Label Proportions Estimation Technique for Adversarial Domain Adaptation in Text Classification.
CoRR, 2020

Crossmodal learning for audio-visual speech event localization.
CoRR, 2020

The FFSVC 2020 Evaluation Plan.
CoRR, 2020

Neural Speech Decoding During Audition, Imagination and Production.
IEEE Access, 2020

Learning Behavioral Representations from Wearable Sensors.
Proceedings of the Social, Cultural, and Behavioral Modeling, 2020

An Empirical Analysis of Information Encoded in Disentangled Neural Speaker Representations.
Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

Linguistically Aided Speaker Diarization Using Speaker Role Information.
Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

ATQAM/MAST'20: Joint Workshop on Aesthetic and Technical Quality Assessment of Multimedia and Media Analytics for Societal Trends.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

MediaEval 2020 Emotion and Theme Recognition in Music Task: Loss Function Approaches for Multi-label Music Tagging.
Proceedings of the Working Notes Proceedings of the MediaEval 2020 Workshop, 2020

Affective Conditioning on Hierarchical Attention Networks Applied to Depression Detection from Transcribed Clinical Interviews.
Proceedings of the Interspeech 2020, 2020

Sentence Level Estimation of Psycholinguistic Norms Using Joint Multidimensional Annotations.
Proceedings of the Interspeech 2020, 2020

The INTERSPEECH 2020 Far-Field Speaker Verification Challenge.
Proceedings of the Interspeech 2020, 2020

Exploiting Conic Affinity Measures to Design Speech Enhancement Systems Operating in Unseen Noise Conditions.
Proceedings of the Interspeech 2020, 2020

Human-centered Multimodal Machine Intelligence.
Proceedings of the ICMI '20: International Conference on Multimodal Interaction, 2020

Fifty Shades of Green: Towards a Robust Measure of Inter-annotator Agreement for Continuous Signals.
Proceedings of the ICMI '20: International Conference on Multimodal Interaction, 2020

Automated Empathy Detection for Oncology Encounters.
Proceedings of the 8th IEEE International Conference on Healthcare Informatics, 2020

Bringing in the Outliers: A Sparse Subspace Clustering Approach to Learn a Dictionary of Mouse Ultrasonic Vocalizations.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Multitask Learning for Darpa Lorelei's Situation Frame Extraction Task.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Robust Speaker Recognition Using Unsupervised Adversarial Invariance.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Speaker Diarization Using Latent Space Clustering in Generative Adversarial Network.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Speaker-Invariant Affective Representation Learning via Adversarial Training.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Learning Domain Invariant Representations for Child-Adult Classification from Speech.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Meta-Learning for Robust Child-Adult Classification from Speech.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Vocal Tract Articulatory Contour Detection in Real-Time Magnetic Resonance Images Using Spatio-Temporal Context.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

The Role of Annotation Fusion Methods in the Study of Human-Reported Emotion Experience During Music Listening.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Modeling Behavioral Consistency in Large-Scale Wearable Recordings of Human Bio-Behavioral Signals.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Modeling Behavior as Mutual Dependency between Physiological Signals and Indoor Location in Large-Scale Wearable Sensor Study.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Automatic Prediction of Suicidal Risk in Military Couples Using Multimodal Interaction Cues from Couples Conversations.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Trapezoidal Segment Sequencing: A Novel Approach for Fusion of Human-Produced Continuous Annotations.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Identifying Truthful Language in Child Interviews.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Joint Estimation and Analysis of Risk Behavior Ratings in Movie Scripts.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Dynamical systems modeling of day-to-day signal-based patterns of emotional self-regulation and stress spillover in highly-demanding health professions.
Proceedings of the 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society, 2020

Modeling Human Movement Behavior Among Nursing Profession.
Proceedings of the 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society, 2020

Towards end-2-end learning for predicting behavior codes from spoken utterances in psychotherapy conversations.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Screenplay Quality Assessment: Can We Predict Who Gets Nominated?
Proceedings of the First Joint Workshop on Narrative Understanding, Storylines, and Events, 2020

2019
Total Variability Layer in Deep Neural Network Embeddings for Speaker Verification.
IEEE Signal Process. Lett., 2019

Generating labels for regression of subjective constructs using triplet embeddings.
Pattern Recognit. Lett., 2019

Articulatory characterization of English liquid-final rimes.
J. Phonetics, 2019

Efficient estimation and model generalization for the totalvariability model.
Comput. Speech Lang., 2019

An analysis of observation length requirements in spoken language for machine understanding of human behaviors.
CoRR, 2019

Language Aided Speaker Diarization Using Speaker Role Information.
CoRR, 2019

Characterizing dynamically varying acoustic scenes from egocentric audio recordings in workplace setting.
CoRR, 2019

A study of semi-supervised speaker diarization system using gan mixture model.
CoRR, 2019

Incremental Online Spoken Language Understanding.
CoRR, 2019

Multimodal Embeddings from Language Models.
CoRR, 2019

The Ambiguous World of Emotion Representation.
CoRR, 2019

Behavior Gated Language Models.
CoRR, 2019

Report of 2017 NSF Workshop on Multimedia Challenges, Opportunities and Research Roadmaps.
CoRR, 2019

Towards Adapting NMF Dictionaries Using Total Variability Modeling for Noise-Robust Acoustic Features.
CoRR, 2019

A system for the 2019 Sentiment, Emotion and Cognitive State Task of DARPAs LORELEI project.
CoRR, 2019

Multimodal Representation Learning using Deep Multiset Canonical Correlation.
CoRR, 2019

A Multimodal View into Music's Effect on Human Neural, Physiological, and Emotional Experience.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Multiview Shared Subspace Learning Across Speakers and Speech Commands.
Proceedings of the Interspeech 2019, 2019

Speaker Diarization with Lexical Information.
Proceedings of the Interspeech 2019, 2019

The Second DIHARD Challenge: System Description for USC-SAIL Team.
Proceedings of the Interspeech 2019, 2019

Modeling Interpersonal Linguistic Coordination in Conversations Using Word Mover's Distance.
Proceedings of the Interspeech 2019, 2019

Identifying Therapist and Client Personae for Therapeutic Alliance Estimation.
Proceedings of the Interspeech 2019, 2019

Multi-Task Discriminative Training of Hybrid DNN-TVM Model for Speaker Verification with Noisy and Far-Field Speech.
Proceedings of the Interspeech 2019, 2019


Toward Visual Voice Activity Detection for Unconstrained Videos.
Proceedings of the 2019 IEEE International Conference on Image Processing, 2019

Estimating Individualized Daily Self-Reported Affect with Wearable Sensors.
Proceedings of the 2019 IEEE International Conference on Healthcare Informatics, 2019

Reinforcing Self-expressive Representation with Constraint Propagation for Face Clustering in Movies.
Proceedings of the IEEE International Conference on Acoustics, 2019

An Empirical Study of Speech Processing in the Brain by Analyzing the Temporal Syllable Structure in Speech-input Induced EEG.
Proceedings of the IEEE International Conference on Acoustics, 2019

Speaker Agnostic Foreground Speech Detection from Audio Recordings in Workplace Settings from Wearable Recorders.
Proceedings of the IEEE International Conference on Acoustics, 2019

Bluetooth Based Indoor Localization Using Triplet Embeddings.
Proceedings of the IEEE International Conference on Acoustics, 2019

On Role and Location of Normalization before Model-based Data Augmentation in Residual Blocks for Classification Tasks.
Proceedings of the IEEE International Conference on Acoustics, 2019

Robust Speech Activity Detection in Movie Audio: Data Resources and Experimental Evaluation.
Proceedings of the IEEE International Conference on Acoustics, 2019

Learning Shared Vector Representations of Lyrics and Chords in Music.
Proceedings of the IEEE International Conference on Acoustics, 2019

Role Specific Lattice Rescoring for Speaker Role Recognition from Speech Recognition Outputs.
Proceedings of the IEEE International Conference on Acoustics, 2019

Discovering Optimal Variable-length Time Series Motifs in Large-scale Wearable Recordings of Human Bio-behavioral Signals.
Proceedings of the IEEE International Conference on Acoustics, 2019

Improving the Prediction of Therapist Behaviors in Addiction Counseling by Exploiting Class Confusions.
Proceedings of the IEEE International Conference on Acoustics, 2019

Toward Robust Interpretable Human Movement Pattern Analysis in a Workplace Setting.
Proceedings of the IEEE International Conference on Acoustics, 2019

On Evaluating CNN Representations for Low Resource Medical Image Classification.
Proceedings of the IEEE International Conference on Acoustics, 2019

Breathing Rate Complexity Features for "In-the-Wild" Stress and Anxiety Measurement.
Proceedings of the 27th European Signal Processing Conference, 2019

Stress and Anxiety Measurement "In-the-Wild" Using Quality-aware Multi-scale HRV Features.
Proceedings of the 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2019

A Comparative Study of Stress and Anxiety Estimation in Ecological Settings Using a Smart-shirt and a Smart-bracelet.
Proceedings of the 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2019

Subspace techniques for task-independent EEG person identification.
Proceedings of the 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2019

Imputing Missing Data In Large-Scale Multivariate Biomedical Wearable Recordings Using Bidirectional Recurrent Neural Networks With Temporal Activation Regularization.
Proceedings of the 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2019

Using Oliver API for emotion-aware movie content characterization.
Proceedings of the 2019 International Conference on Content-Based Multimedia Indexing, 2019

Prediction of Psychological Flexibility with multi-scale Heart Rate Variability and Breathing Features in an "in-the-wild" Setting.
Proceedings of the 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos, 2019

A system for the 2019 Sentiment, Emotion and Cognitive State Task of DARPA's LORELEI project.
Proceedings of the 8th International Conference on Affective Computing and Intelligent Interaction, 2019

Predicting Human-Reported Enjoyment Responses in Happy and Sad Music.
Proceedings of the 8th International Conference on Affective Computing and Intelligent Interaction, 2019

Trapezoidal Segmented Regression: A Novel Continuous-scale Real-time Annotation Approximation Algorithm.
Proceedings of the 8th International Conference on Affective Computing and Intelligent Interaction, 2019

Violence Rating Prediction from Movie Scripts.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
Irregularity-Aware Graph Fourier Transforms.
IEEE Trans. Signal Process., 2018

Unsupervised Discovery of Character Dictionaries in Animation Movies.
IEEE Trans. Multim., 2018

Acoustic Denoising Using Dictionary Learning With Spectral and Temporal Regularization.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

Modeling Multiple Time Series Annotations as Noisy Distortions of the Ground Truth: An Expectation-Maximization Approach.
IEEE Trans. Affect. Comput., 2018

A Computational Study of Expressive Facial Dynamics in Children with Autism.
IEEE Trans. Affect. Comput., 2018

Explaining Coronal Reduction: Prosodic Structure and Articulatory Posture.
Phonetica, 2018

The ELISA Situation Frame extraction for low resource languages pipeline for LoReHLT'2016.
Mach. Transl., 2018

Analysis of speech production real-time MRI.
Comput. Speech Lang., 2018

Normalization Before Shaking Toward Learning Symmetrically Distributed Representation Without Margin in Speech Emotion Recognition.
CoRR, 2018

Measuring Conversational Productivity in Child Forensic Interviews.
CoRR, 2018

Shaking Acoustic Spectral Sub-bands Can Better Regularize Learning in Affective Computing.
CoRR, 2018

Role Annotated Speech Recognition for Conversational Interactions.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

NTUA-SLP at SemEval-2018 Task 1: Predicting Affective Content in Tweets with Deep Attentive RNNs and Transfer Learning.
Proceedings of The 12th International Workshop on Semantic Evaluation, 2018

TILES audio recorder: an unobtrusive wearable solution to track audio activity.
Proceedings of the 4th ACM Workshop on Wearable Systems and Applications, 2018

Fusing Annotations with Majority Vote Triplet Embeddings.
Proceedings of the 2018 on Audio/Visual Emotion Challenge and Workshop, 2018

Using Prosodic and Lexical Information for Learning Utterance-level Behaviors in Psychotherapy.
Proceedings of the Interspeech 2018, 2018

Denoising and Raw-waveform Networks for Weakly-Supervised Gender Identification on Noisy Speech.
Proceedings of the Interspeech 2018, 2018

Computational Modeling of Conversational Humor in Psychotherapy.
Proceedings of the Interspeech 2018, 2018

Exploring the Relationship between Conic Affinity of NMF Dictionaries and Speech Enhancement Metrics.
Proceedings of the Interspeech 2018, 2018

Towards an Unsupervised Entrainment Distance in Conversational Speech Using Deep Neural Networks.
Proceedings of the Interspeech 2018, 2018

A Knowledge Driven Structural Segmentation Approach for Play-Talk Classification During Autism Assessment.
Proceedings of the Interspeech 2018, 2018

Stochastic Shake-Shake Regularization for Affective Learning from Speech.
Proceedings of the Interspeech 2018, 2018

Improving Gender Identification in Movie Audio Using Cross-Domain Data.
Proceedings of the Interspeech 2018, 2018

Combined Speaker Clustering and Role Recognition in Conversational Speech.
Proceedings of the Interspeech 2018, 2018

Language Features for Automated Evaluation of Cognitive Behavior Psychotherapy Sessions.
Proceedings of the Interspeech 2018, 2018

Multimodal Representation of Advertisements Using Segment-level Autoencoders.
Proceedings of the 2018 on International Conference on Multimodal Interaction, 2018

A Multimodal Approach to Understanding Human Vocal Expressions and Beyond.
Proceedings of the 2018 on International Conference on Multimodal Interaction, 2018

Multimodal Interaction Modeling of Child Forensic Interviewing.
Proceedings of the 2018 on International Conference on Multimodal Interaction, 2018

Improving Semi-Supervised Classification for Low-Resource Speech Interaction Applications.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Shaking Acoustic Spectral Sub-Bands can Letxer Regularize Learning in Affective Computing.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Semi-Supervised and Transfer Learning Approaches for Low Resource Sentiment Classification.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Pykaldi: A Python Wrapper for Kaldi.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

A Novel Method for Human Bias Correction of Continuous- Time Annotations.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

An Evaluation of EEG-based Metrics for Engagement Assessment of Distance Learners.
Proceedings of the 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2018

Discovering Latent Psychological Structures from Self-Report Assessments of Hospital Workers.
Proceedings of the 5th International Conference on Behavioral, 2018

A Multi-task Approach to Learning Multilingual Representations.
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018

2017
Modeling Dynamics of Expressive Body Gestures In Dyadic Interactions.
IEEE Trans. Affect. Comput., 2017

Multiple Instance Learning for Behavioral Coding.
IEEE Trans. Affect. Comput., 2017

Signal Processing and Machine Learning for Mental Health Research and Clinical Applications [Perspectives].
IEEE Signal Process. Mag., 2017

Characterizing Types of Convolution in Deep Convolutional Recurrent Neural Networks for Robust Speech Emotion Recognition.
CoRR, 2017

Using Multimodal Wearable Technology to Detect Conflict among Couples.
Computer, 2017

Tweester at SemEval-2017 Task 4: Fusion of Semantic-Affective and pairwise classification models for sentiment analysis in Twitter.
Proceedings of the 11th International Workshop on Semantic Evaluation, 2017

A Distribution Free Formulation of the Total Variability Model.
Proceedings of the Interspeech 2017, 2017

Test-Retest Repeatability of Articulatory Strategies Using Real-Time Magnetic Resonance Imaging.
Proceedings of the Interspeech 2017, 2017

Database of Volumetric and Real-Time Vocal Tract MRI for Speech Science.
Proceedings of the Interspeech 2017, 2017

Semantic Edge Detection for Tracking Vocal Tract Air-Tissue Boundaries in Real-Time Magnetic Resonance Images.
Proceedings of the Interspeech 2017, 2017

Comparison of Basic Beatboxing Articulations Between Expert and Novice Artists Using Real-Time Magnetic Resonance Imaging.
Proceedings of the Interspeech 2017, 2017


Global SNR Estimation of Speech Signals for Unknown Noise Conditions Using Noise Adapted Non-Linear Regression.
Proceedings of the Interspeech 2017, 2017

Complexity in Speech and its Relation to Emotional Bond in Therapist-Patient Interactions During Suicide Risk Assessment Interviews.
Proceedings of the Interspeech 2017, 2017

Exploiting Intra-Annotator Rating Consistency Through Copeland's Method for Estimation of Ground Truth Labels in Couples' Therapy.
Proceedings of the Interspeech 2017, 2017

Extracting Situation Frames from Non-English Speech: Evaluation Framework and Pilot Results.
Proceedings of the Interspeech 2017, 2017

Transfer Learning Between Concepts for Human Behavior Modeling: An Application to Sincerity and Deception Prediction.
Proceedings of the Interspeech 2017, 2017

Multi-Scale Context Adaptation for Improving Child Automatic Speech Recognition in Child-Adult Spoken Interactions.
Proceedings of the Interspeech 2017, 2017

An Affect Prediction Approach Through Depression Severity Parameter Incorporation in Neural Networks.
Proceedings of the Interspeech 2017, 2017

Attention Networks for Modeling Behaviors in Addiction Counseling.
Proceedings of the Interspeech 2017, 2017

Acoustic-Prosodic and Physiological Response to Stressful Interactions in Children with Autism Spectrum Disorder.
Proceedings of the Interspeech 2017, 2017

Sounds of the Human Vocal Tract.
Proceedings of the Interspeech 2017, 2017

VCV Synthesis Using Task Dynamics to Animate a Factor-Based Articulatory Model.
Proceedings of the Interspeech 2017, 2017

Deep convolutional recurrent neural network with attention mechanism for robust speech emotion recognition.
Proceedings of the 2017 IEEE International Conference on Multimedia and Expo, 2017

Estimation of vocal tract area function from volumetric Magnetic Resonance Imaging.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Grasp: A matlab toolbox for graph signal processing.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Towards a definition of local stationarity for graph signals.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Quantifying regulation mechanisms in dating couples through a dynamical systems model of acoustic and physiological arousal.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

A knowledge transfer and boosting approach to the prediction of affect in movies.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

A knowledge-driven framework for ECG representation and interpretation for wearable applications.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Multimodal detection of fake social media use through a fusion of classification and pairwise ranking systems.
Proceedings of the 25th European Signal Processing Conference, 2017

Linguistic analysis of differences in portrayal of movie characters.
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017

Weighted geodesic flow kernel for interpersonal mutual influence modeling and emotion recognition in dyadic interactions.
Proceedings of the Seventh International Conference on Affective Computing and Intelligent Interaction, 2017

Exploring sparse representation measures of physiological synchrony for romantic couples.
Proceedings of the Seventh International Conference on Affective Computing and Intelligent Interaction, 2017

Toward active and unobtrusive engagement assessment of distance learners.
Proceedings of the Seventh International Conference on Affective Computing and Intelligent Interaction, 2017

Designing Contestability: Interaction Design, Machine Learning, and Mental Health.
Proceedings of the 2017 Conference on Designing Interactive Systems, 2017

2016
Markov Chain Monte Carlo Inference of Parametric Dictionaries for Sparse Bayesian Approximations.
IEEE Trans. Signal Process., 2016

Long-Term SNR Estimation of Speech Signals in Known and Unknown Channel Conditions.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing.
IEEE Trans. Affect. Comput., 2016

A technology prototype system for rating therapist empathy from audio recordings in addiction counseling.
PeerJ Comput. Sci., 2016

A Socratic epistemology for verbal emotional intelligence.
PeerJ Comput. Sci., 2016

The USC CreativeIT database of multimodal dyadic interactions: from speech and full body motion capture to continuous emotional annotations.
Lang. Resour. Evaluation, 2016

Online rate adjustment for adaptive random access compressed sensing of time-varying fields.
EURASIP J. Adv. Signal Process., 2016

Directly data-derived articulatory gesture-like representations retain discriminatory information about phone categories.
Comput. Speech Lang., 2016

Speaker verification based on the fusion of speech acoustics and inverted articulatory signals.
Comput. Speech Lang., 2016

Analysis of engagement behavior in children during dyadic interactions using prosodic cues.
Comput. Speech Lang., 2016

Detecting paralinguistic events in audio stream using context in features and probabilistic decisions.
Comput. Speech Lang., 2016

Inferring object rankings based on noisy pairwise comparisons from multiple annotators.
CoRR, 2016

Tweester at SemEval-2016 Task 4: Sentiment Analysis in Twitter Using Semantic-Affective Model Adaptation.
Proceedings of the 10th International Workshop on Semantic Evaluation, 2016

Understanding individual-level speech variability: From novel speech production data to robust speaker recognition.
Proceedings of the Odyssey 2016: The Speaker and Language Recognition Workshop, 2016

Novel affective features for multiscale prediction of emotion in music.
Proceedings of the 18th IEEE International Workshop on Multimedia Signal Processing, 2016

Comparison of feature-level and kernel-level data fusion methods in multi-sensory fall detection.
Proceedings of the 18th IEEE International Workshop on Multimedia Signal Processing, 2016

Online Affect Tracking with Multimodal Kalman Filters.
Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge, 2016

Flow of Renyi information in deep neural networks.
Proceedings of the 26th IEEE International Workshop on Machine Learning for Signal Processing, 2016

Analyzing Temporal Dynamics of Dyadic Synchrony in Affective Interactions.
Proceedings of the Interspeech 2016, 2016

Behavioral Coding of Therapist Language in Addiction Counseling Using Recurrent Neural Networks.
Proceedings of the Interspeech 2016, 2016

Convex Hull Convolutive Non-Negative Matrix Factorization for Uncovering Temporal Patterns in Multivariate Time-Series Data.
Proceedings of the Interspeech 2016, 2016

Non-Iterative Parameter Estimation for Total Variability Model Using Randomized Singular Value Decomposition.
Proceedings of the Interspeech 2016, 2016

Articulatory Synthesis Based on Real-Time Magnetic Resonance Imaging Data.
Proceedings of the Interspeech 2016, 2016

Illustrating the Production of the International Phonetic Alphabet Sounds Using Fast Real-Time Magnetic Resonance Imaging.
Proceedings of the Interspeech 2016, 2016

Sensitivity of Quantitative RT-MRI Metrics of Vocal Tract Dynamics to Image Reconstruction Settings.
Proceedings of the Interspeech 2016, 2016

Characterizing Vocal Tract Dynamics Across Speakers Using Real-Time MRI.
Proceedings of the Interspeech 2016, 2016

An Expectation Maximization Approach to Joint Modeling of Multidimensional Ratings Derived from Multiple Annotators.
Proceedings of the Interspeech 2016, 2016

Noise Aware and Combined Noise Models for Speech Denoising in Unknown Noise Conditions.
Proceedings of the Interspeech 2016, 2016

Complexity in Prosody: A Nonlinear Dynamical Systems Approach for Dyadic Conversations; Behavior and Outcomes in Couples Therapy.
Proceedings of the Interspeech 2016, 2016

Perceptual Lateralization of Coda Rhotic Production in Puerto Rican Spanish.
Proceedings of the Interspeech 2016, 2016

State-of-the-Art MRI Protocol for Comprehensive Assessment of Vocal Tract Structure and Function.
Proceedings of the Interspeech 2016, 2016

Improved Depiction of Tissue Boundaries in Vocal Tract Real-Time MRI Using Automatic Off-Resonance Correction.
Proceedings of the Interspeech 2016, 2016

Investigation of Speed-Accuracy Tradeoffs in Speech Production Using Real-Time Magnetic Resonance Imaging.
Proceedings of the Interspeech 2016, 2016

Robust Multichannel Gender Classification from Speech in Movie Audio.
Proceedings of the Interspeech 2016, 2016

Objective Language Feature Analysis in Children with Neurodevelopmental Disorders During Autism Assessment.
Proceedings of the Interspeech 2016, 2016

Attention Assisted Discovery of Sub-Utterance Structure in Speech Emotion Recognition.
Proceedings of the Interspeech 2016, 2016

L2 Acquisition and Production of the English Rhotic Pharyngeal Gesture.
Proceedings of the Interspeech 2016, 2016

Laughter Valence Prediction in Motivational Interviewing Based on Lexical and Acoustic Cues.
Proceedings of the Interspeech 2016, 2016

Predicting Affective Dimensions Based on Self Assessed Depression Severity.
Proceedings of the Interspeech 2016, 2016

A Deep Learning Approach to Modeling Empathy in Addiction Counseling.
Proceedings of the Interspeech 2016, 2016

Automatic Estimation of Perceived Sincerity from Spoken Language.
Proceedings of the Interspeech 2016, 2016

Acoustic-Prosodic and Turn-Taking Features in Interactions with Children with Neurodevelopmental Disorders.
Proceedings of the Interspeech 2016, 2016

Velum Control for Oral Sounds.
Proceedings of the Interspeech 2016, 2016

Lightly-supervised utterance-level emotion identification using latent topic modeling of multimodal words.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

CNMF-based acoustic features for noise-robust ASR.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Opening big in box office? Trailer content can help.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Pathological speech processing: State-of-the-art, current challenges, and future directions.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

A multimodal mixture-of-experts model for dynamic emotion prediction in movies.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Localization bounds for the graph translation.
Proceedings of the 2016 IEEE Global Conference on Signal and Information Processing, 2016

EDA-gram: Designing electrodermal activity fingerprints for visualization and feature extraction.
Proceedings of the 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2016

Developing an Automated Report Card for Addiction Counseling: The Counselor Observer Ratings Expert for MI (CORE-MI).
Proceedings of the AMIA 2016, 2016

Speech and language processing for mental health research and care.
Proceedings of the 50th Asilomar Conference on Signals, Systems and Computers, 2016

Dynamical Systems Modeling of Acoustic and Physiological Arousal in Young Couples.
Proceedings of the 2016 AAAI Spring Symposia, 2016

2015
Head Motion Modeling for Human Behavior Analysis in Dyadic Interaction.
IEEE Trans. Multim., 2015

Sparse Representation of Electrodermal Activity With Knowledge-Driven Dictionaries.
IEEE Trans. Biomed. Eng., 2015

Rapid Language Identification.
IEEE ACM Trans. Audio Speech Lang. Process., 2015

Automatic intelligibility classification of sentence-level pathological speech.
Comput. Speech Lang., 2015

Structured sparse methods for active ocean observation systems with communication constraints.
IEEE Commun. Mag., 2015

Keynote speech 4: Extraction of linguistic and paralinguistic information from audio-visual data.
Proceedings of the 2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2015

Predicting Affect in Music Using Regression Methods on Low Level Features.
Proceedings of the Working Notes Proceedings of the MediaEval 2015 Workshop, 2015

Retrieving Social Images using Relevance Filtering and Diverse Selection.
Proceedings of the Working Notes Proceedings of the MediaEval 2015 Workshop, 2015

Analyzing speech rate entrainment and its relation to therapist empathy in drug addiction counseling.
Proceedings of the INTERSPEECH 2015, 2015

Learning a speech manifold for signal subspace speech denoising.
Proceedings of the INTERSPEECH 2015, 2015

Ensemble of Gaussian mixture localized neural networks with application to phone recognition.
Proceedings of the INTERSPEECH 2015, 2015

Experimental assessment of the tongue incompressibility hypothesis during speech production.
Proceedings of the INTERSPEECH 2015, 2015

Still together?: the role of acoustic features in predicting marital outcome.
Proceedings of the INTERSPEECH 2015, 2015

Therapy language analysis using automatically generated psycholinguistic norms.
Proceedings of the INTERSPEECH 2015, 2015

An analysis of the relationship between signal-derived vocal arousal score and human emotion production and perception.
Proceedings of the INTERSPEECH 2015, 2015

Automatic estimation of parkinson's disease severity from diverse speech tasks.
Proceedings of the INTERSPEECH 2015, 2015

Analysis and modeling of the role of laughter in motivational interviewing based psychotherapy conversations.
Proceedings of the INTERSPEECH 2015, 2015

Predicting therapist empathy in motivational interviews using language features inspired by psycholinguistic norms.
Proceedings of the INTERSPEECH 2015, 2015

A dialog act tagging approach to behavioral coding: a case study of addiction counseling conversations.
Proceedings of the INTERSPEECH 2015, 2015

Acoustic-prosodic correlates of 'awkward' prosody in story retellings from adolescents with autism.
Proceedings of the INTERSPEECH 2015, 2015

Automated evaluation of non-native English pronunciation quality: combining knowledge- and data-driven features at multiple time scales.
Proceedings of the INTERSPEECH 2015, 2015

A discriminative reliability-aware classification model with applications to intelligibility classification in pathological speech.
Proceedings of the INTERSPEECH 2015, 2015

Factor analysis of vocal-tract outlines derived from real-time magnetic resonance imaging data.
Proceedings of the 18th International Congress of Phonetic Sciences, 2015

Articulation of English vowels in running speech: A real-time MRI study.
Proceedings of the 18th International Congress of Phonetic Sciences, 2015

Gestural coordination of Brazilian Portugese nasal vowels in CV syllables: A real-time MRI study.
Proceedings of the 18th International Congress of Phonetic Sciences, 2015

Systematic variation in the articulation of the Korean liquid across prosodic positions.
Proceedings of the 18th International Congress of Phonetic Sciences, 2015

Gender Representation in Cinematic Content: A Multimodal Approach.
Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, Seattle, WA, USA, November 09, 2015

Modeling mutual influence of multimodal behavior in affective dyadic interactions.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Improvements to the IBM speech activity detection system for the DARPA RATS program.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Redundancy analysis of behavioral coding for couples therapy and improved estimation of behavior from noisy annotations.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

A mixture of experts approach towards intelligibility classification of pathological speech.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

On quantifying facial expression-related atypicality of children with Autism Spectrum Disorder.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Computationally deconstructing movie narratives: An informatics approach.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Quantifying EDA synchrony through joint sparse representation: A case-study of couples' interactions.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Affect prediction in music using boosted ensemble of filters.
Proceedings of the 23rd European Signal Processing Conference, 2015

A quantitative analysis of gender differences in movies using psycholinguistic normatives.
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015

A Dynamic Programming Algorithm for Computing N-gram Posteriors from Lattices.
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015

Modeling head motion entrainment for prediction of couples' behavioral characteristics.
Proceedings of the 2015 International Conference on Affective Computing and Intelligent Interaction, 2015

Context-sensitive learning for enhanced audiovisual emotion classification (Extended abstract).
Proceedings of the 2015 International Conference on Affective Computing and Intelligent Interaction, 2015

2014
Analysis and Predictive Modeling of Body Language Behavior in Dyadic Interactions From Multimodal Interlocutor Cues.
IEEE Trans. Multim., 2014

Theoretical Analysis of Diversity in an Ensemble of Automatic Speech Recognition Systems.
IEEE ACM Trans. Audio Speech Lang. Process., 2014

Robust Unsupervised Arousal Rating: A Rule-Based Framework withKnowledge-Inspired Vocal Features.
IEEE Trans. Affect. Comput., 2014

Gestural Control in the English Past-Tense Suffix: An Articulatory Study Using Real-Time MRI.
Phonetica, 2014

Simplified supervised i-vector modeling with application to robust and efficient language identification and speaker verification.
Comput. Speech Lang., 2014

Computing vocal entrainment: A signal-derived PCA-based quantification scheme with application to affect analysis in married couple interactions.
Comput. Speech Lang., 2014

Intoxicated speech detection: A fusion framework with speaker-normalized hierarchical functionals and GMM supervectors.
Comput. Speech Lang., 2014

Improving speech recognition for children using acoustic adaptation and pronunciation modeling.
Proceedings of the 4st Workshop on Child, Computer and Interaction, 2014

SAIL-GRS: Grammar Induction for Spoken Dialogue Systems using CF-IRF Rule Similarity.
Proceedings of the 8th International Workshop on Semantic Evaluation, 2014

SAIL: Sentiment Analysis using Semantic Similarity and Contrast Features.
Proceedings of the 8th International Workshop on Semantic Evaluation, 2014

Multimodal Prediction of Affective Dimensions and Depression in Human-Computer Interactions.
Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge, 2014

Detection of Musical Event Drop from Crowdsourced Annotations Using a Noisy Channel Model.
Proceedings of the Working Notes Proceedings of the MediaEval 2014 Workshop, 2014

Affective Feature Design and Predicting Continuous Affective Dimensions from Music.
Proceedings of the Working Notes Proceedings of the MediaEval 2014 Workshop, 2014

Analysis of emotional effect on speech-body gesture interplay.
Proceedings of the INTERSPEECH 2014, 2014

Modeling therapist empathy through prosody in drug addiction counseling.
Proceedings of the INTERSPEECH 2014, 2014

Joint filtering and factorization for recovering latent structure from noisy speech data.
Proceedings of the INTERSPEECH 2014, 2014

Enhancing audio source separability using spectro-temporal regularization with NMF.
Proceedings of the INTERSPEECH 2014, 2014

Modified-prior i-vector estimation for language identification of short duration utterances.
Proceedings of the INTERSPEECH 2014, 2014

Classification of cognitive load from speech using an i-vector framework.
Proceedings of the INTERSPEECH 2014, 2014

UBM fused total variability modeling for language identification.
Proceedings of the INTERSPEECH 2014, 2014

Motor control primitives arising from a learned dynamical systems model of speech articulation.
Proceedings of the INTERSPEECH 2014, 2014

Selection of optimal vocal tract regions using real-time magnetic resonance imaging for robust voice activity detection.
Proceedings of the INTERSPEECH 2014, 2014

Behavioral informatics from multimodal human interaction cues.
Proceedings of the 2nd International Workshop on Speech, Language and Audio in Multimedia, 2014

Estimation of the movement trajectories of non-crucial articulators based on the detection of crucial moments and physiological constraints.
Proceedings of the INTERSPEECH 2014, 2014

A study of invariant properties and variation patterns in the converter/distributor model for emotional speech.
Proceedings of the INTERSPEECH 2014, 2014

Unsupervised speaker diarization using riemannian manifold clustering.
Proceedings of the INTERSPEECH 2014, 2014

Predicting client's inclination towards target behavior change in motivational interviewing and investigating the role of laughter.
Proceedings of the INTERSPEECH 2014, 2014

Variable Span disfluency detection in ASR transcripts.
Proceedings of the INTERSPEECH 2014, 2014

Comparing time-frequency representations for directional derivative features.
Proceedings of the INTERSPEECH 2014, 2014

Robust language identification using convolutional neural network features.
Proceedings of the INTERSPEECH 2014, 2014

An investigation of vocal arousal dynamics in child-psychologist interactions using synchrony measures and a conversation-based model.
Proceedings of the INTERSPEECH 2014, 2014

A real-time MRI study of articulatory setting in second language speech.
Proceedings of the INTERSPEECH 2014, 2014

Gesture dynamics modeling for attitude analysis using graph based transform.
Proceedings of the 2014 IEEE International Conference on Image Processing, 2014

Graph-based approach for motion capture data representation and analysis.
Proceedings of the 2014 IEEE International Conference on Image Processing, 2014

Integration and Automation of Data Preparation and Data Mining.
Proceedings of the 2014 IEEE International Conference on Data Mining Workshops, 2014

Analysis of interaction attitudes using data-driven hand gesture phrases.
Proceedings of the IEEE International Conference on Acoustics, 2014

Power-spectral analysis of head motion signal for behavioral modeling in human interaction.
Proceedings of the IEEE International Conference on Acoustics, 2014

Energy-constrained minimum variance response filter for robust vowel spectral estimation.
Proceedings of the IEEE International Conference on Acoustics, 2014

Classification of clean and noisy bilingual movie audio for speech-to-speech translation corpora design.
Proceedings of the IEEE International Conference on Acoustics, 2014

Simplified and supervised i-vector modeling for speaker age regression.
Proceedings of the IEEE International Conference on Acoustics, 2014

A supervised signal-to-noise ratio estimation of speech signals.
Proceedings of the IEEE International Conference on Acoustics, 2014

Affective language model adaptation via corpus selection.
Proceedings of the IEEE International Conference on Acoustics, 2014

Training ensemble of diverse classifiers on feature subsets.
Proceedings of the IEEE International Conference on Acoustics, 2014

Learning multiple concepts with incremental diverse density.
Proceedings of the IEEE International Conference on Acoustics, 2014

A non-homogeneous poisson process model of Skin Conductance Responses integrated with observed regulatory behaviors for Autism intervention.
Proceedings of the IEEE International Conference on Acoustics, 2014

Barista: A framework for concurrent speech processing by usc-sail.
Proceedings of the IEEE International Conference on Acoustics, 2014

Semi-supervised term-weighted value rescoring for keyword search.
Proceedings of the IEEE International Conference on Acoustics, 2014

Fusion of diverse denoising systems for robust automatic speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2014

Hull detection based on largest empty sector angle with application to analysis of realtime MR images.
Proceedings of the IEEE International Conference on Acoustics, 2014

2013
Dynamic 3-D Visualization of Vocal Tract Shaping During Speech.
IEEE Trans. Medical Imaging, 2013

Toward the Automatic Extraction of Policy Networks Using Web Links and Documents.
IEEE Trans. Knowl. Data Eng., 2013

Distributional Semantic Models for Affective Text Analysis.
IEEE Trans. Speech Audio Process., 2013

Iterative Feature Normalization Scheme for Automatic Emotion Detection from Speech.
IEEE Trans. Affect. Comput., 2013

Statistical methods for estimation of direct and differential kinematics of the vocal tract.
Speech Commun., 2013

Toward automating a human behavioral coding system for married couples' interactions using speech acoustic features.
Speech Commun., 2013

An Overview on Perceptually Motivated Audio Indexing and Classification.
Proc. IEEE, 2013

Behavioral Signal Processing: Deriving Human Behavioral Informatics From Speech and Language.
Proc. IEEE, 2013

A Globally-Variant Locally-Constant Model for Fusion of Labels from Multiple Diverse Experts without Using Reference Labels.
IEEE Trans. Pattern Anal. Mach. Intell., 2013

Tracking continuous emotional trends of participants during affective dyadic interactions using body language and speech information.
Image Vis. Comput., 2013

Computational Audio Analysis (Dagstuhl Seminar 13451).
Dagstuhl Reports, 2013

High-quality bilingual subtitle document alignments with application to spontaneous speech translation.
Comput. Speech Lang., 2013

Enriching machine-mediated speech-to-speech translation using contextual information.
Comput. Speech Lang., 2013

Enabling effective design of multimodal interfaces for speech-to-speech translation system: An empirical study of longitudinal user behaviors over time and user strategies for coping with errors.
Comput. Speech Lang., 2013

Paralinguistics in speech and language - State-of-the-art and the challenge.
Comput. Speech Lang., 2013

Automatic speaker age and gender recognition using acoustic and prosodic level information fusion.
Comput. Speech Lang., 2013

Unsupervised data processing for classifier-based speech translator.
Comput. Speech Lang., 2013

Generalized Ambiguity Decomposition for Understanding Ensemble Diversity.
CoRR, 2013

Fuzzy Logic Models for the Meaning of Emotion Words.
IEEE Comput. Intell. Mag., 2013

DeepPurple: Lexical, String and Affective Feature Fusion for Sentence-Level Semantic Similarity Estimation.
Proceedings of the Second Joint Conference on Lexical and Computational Semantics, 2013

Which ASR should I choose for my dialogue system?
Proceedings of the SIGDIAL 2013 Conference, 2013

SAIL: A hybrid approach to sentiment analysis.
Proceedings of the 7th International Workshop on Semantic Evaluation, 2013

Faster 3d vocal tract real-time MRI using constrained reconstruction.
Proceedings of the INTERSPEECH 2013, 2013

The effect of word frequency and lexical class on articulatory-acoustic coupling.
Proceedings of the INTERSPEECH 2013, 2013

Modeling therapist empathy and vocal entrainment in drug addiction counseling.
Proceedings of the INTERSPEECH 2013, 2013

A two-step technique for MRI audio enhancement using dictionary learning and wavelet packet analysis.
Proceedings of the INTERSPEECH 2013, 2013

Toward transfer of acoustic cues of emphasis across languages.
Proceedings of the INTERSPEECH 2013, 2013

Multi-band long-term signal variability features for robust voice activity detection.
Proceedings of the INTERSPEECH 2013, 2013

Articulatory synthesis of French connected speech from EMA data.
Proceedings of the INTERSPEECH 2013, 2013

Stable articulatory tasks and their variable formation: tamil retroflex consonants.
Proceedings of the INTERSPEECH 2013, 2013

A robust frontend for VAD: exploiting contextual, discriminative and spectral cues of human voice.
Proceedings of the INTERSPEECH 2013, 2013

Articulatory settings facilitate mechanically advantageous motor control of vocal tract articulators.
Proceedings of the INTERSPEECH 2013, 2013

Velic coordination in French nasals: a real-time magnetic resonance imaging study.
Proceedings of the INTERSPEECH 2013, 2013

Speaker verification based on fusion of acoustic and articulatory information.
Proceedings of the INTERSPEECH 2013, 2013

Vocal tract cross-distance estimation from real-time MRI using region-of-interest analysis.
Proceedings of the INTERSPEECH 2013, 2013

Annotation and classification of Political advertisements.
Proceedings of the INTERSPEECH 2013, 2013

Truncation of pharyngeal gesture in English diphthong [aɪ].
Proceedings of the INTERSPEECH 2013, 2013

TRAP language identification system for RATS phase II evaluation.
Proceedings of the INTERSPEECH 2013, 2013

Paralinguistic event detection from speech using probabilistic time-series smoothing and masking.
Proceedings of the INTERSPEECH 2013, 2013

Spectro-temporal directional derivative features for automatic speech recognition.
Proceedings of the INTERSPEECH 2013, 2013

Information theoretic acoustic feature selection for acoustic-to-articulatory inversion.
Proceedings of the INTERSPEECH 2013, 2013

Analyzing the structure of parent-moderated narratives from children with ASD using an entity-based approach.
Proceedings of the INTERSPEECH 2013, 2013

On the computation of document frequency statistics from spoken corpora using factor automata.
Proceedings of the INTERSPEECH 2013, 2013

Analyzing eye-voice coordination in rapid automatized naming.
Proceedings of the INTERSPEECH 2013, 2013

Acoustic-prosodic, turn-taking, and language cues in child-psychologist interactions for varying social demand.
Proceedings of the INTERSPEECH 2013, 2013

Classifying language-related developmental disorders from speech cues: the promise and the potential confounds.
Proceedings of the INTERSPEECH 2013, 2013

Empirical link between hypothesis diversity and fusion performance in an ensemble of automatic speech recognition systems.
Proceedings of the INTERSPEECH 2013, 2013

Head motion synchrony and its correlation to affectivity in dyadic interactions.
Proceedings of the 2013 IEEE International Conference on Multimedia and Expo, 2013

Using emotional noise to uncloud audio-visual emotion perceptual evaluation.
Proceedings of the 2013 IEEE International Conference on Multimedia and Expo, 2013

Quantifying atypicality in affective facial expressions of children with autism spectrum disorders.
Proceedings of the 2013 IEEE International Conference on Multimedia and Expo, 2013

An audio-visual approach to learning salient behaviors in couples' problem solving discussions.
Proceedings of the 2013 IEEE International Conference on Multimedia and Expo Workshops, 2013

Toward body language generation in dyadic interaction settings from interlocutor multimodal cues.
Proceedings of the IEEE International Conference on Acoustics, 2013

Data driven modeling of head motion towards analysis of behaviors in couple interactions.
Proceedings of the IEEE International Conference on Acoustics, 2013

A study on the effect of prosodic emphasis transfer on overall speech translation quality.
Proceedings of the IEEE International Conference on Acoustics, 2013

Combining window predictions efficiently - A new imputation approach for noise robust automatic speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2013

A robust frontend for ASR: Combining denoising, noise masking and feature normalization.
Proceedings of the IEEE International Conference on Acoustics, 2013

Continuous models of affect from text using n-grams.
Proceedings of the IEEE International Conference on Acoustics, 2013

Speaker verification using simplified and supervised i-vector modeling.
Proceedings of the IEEE International Conference on Acoustics, 2013

Spatial and temporal alignment of multimodal human speech production data: Real time imaging, flesh point tracking and audio.
Proceedings of the IEEE International Conference on Acoustics, 2013

On-line genre classification of TV programs using audio content.
Proceedings of the IEEE International Conference on Acoustics, 2013

Using physiology and language cues for modeling verbal response latencies of children with ASD.
Proceedings of the IEEE International Conference on Acoustics, 2013

Annotation and processing of continuous emotional attributes: Challenges and opportunities.
Proceedings of the 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, 2013

Joint training of interpolated exponential n-gram models.
Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

2012
Complexity-Regularized Tree-Structured Partition for Mutual Information Estimation.
IEEE Trans. Inf. Theory, 2012

KNOWME: An Energy-Efficient Multimodal Body Area Network for Physical Activity Monitoring.
ACM Trans. Embed. Comput. Syst., 2012

Novel Variations of Group Sparse Regularization Techniques With Applications to Noise Robust Automatic Speech Recognition.
IEEE Trans. Speech Audio Process., 2012

Context-Sensitive Learning for Enhanced Audiovisual Emotion Classification.
IEEE Trans. Affect. Comput., 2012

On signal representations within the Bayes decision framework.
Pattern Recognit., 2012

Emotion and mental state recognition from speech.
EURASIP J. Adv. Signal Process., 2012

KNOWME: a case study in wireless body area sensor network design.
IEEE Commun. Mag., 2012

Acoustical analysis of engagement behavior in children.
Proceedings of the Third Workshop on Child, Computer and Interaction, 2012

A reranking approach for recognition and classification of speech input in conversational dialogue systems.
Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012

Features for comparing tune similarity of songs across different languages.
Proceedings of the 14th IEEE International Workshop on Multimedia Signal Processing, 2012

The Twins Corpus of Museum Visitor Questions.
Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2012

Based on Isolated Saliency or Causal Integration? Toward a Better Understanding of Human Annotation Process using Multiple Instance Learning and Sequential Probability Ratio Test.
Proceedings of the INTERSPEECH 2012, 2012

Intelligibility classification of pathological speech using fusion of multiple high level descriptors.
Proceedings of the INTERSPEECH 2012, 2012

A Sequential Bayesian Dialog Agent for Computational Ethnography.
Proceedings of the INTERSPEECH 2012, 2012

Emphatic segments and emphasis spread in Lebanese Arabic: a Real-time Magnetic Resonance Imaging Study.
Proceedings of the INTERSPEECH 2012, 2012

Characterizing Covert Articulation in Apraxic Speech Using real-time MRI.
Proceedings of the INTERSPEECH 2012, 2012

Interplay between verbal response latency and physiology of children with autism during ECA interactions.
Proceedings of the INTERSPEECH 2012, 2012

A Case Study: Detecting Counselor Reflections in Psychotherapy for Addictions using Linguistic Features.
Proceedings of the INTERSPEECH 2012, 2012

A Robust Unsupervised Arousal Rating Framework using Prosody with Cross-Corpora Evaluation.
Proceedings of the INTERSPEECH 2012, 2012

Spontaneous-Speech Acoustic-Prosodic Features of Children with Autism and the Interacting Psychologist.
Proceedings of the INTERSPEECH 2012, 2012

Speaker Personality Classification Using Systems Based on Acoustic-Lexical Cues and an Optimal Tree-Structured Bayesian Network.
Proceedings of the INTERSPEECH 2012, 2012

Multimodal detection of salient behaviors of approach-avoidance in dyadic interactions.
Proceedings of the International Conference on Multimodal Interaction, 2012

Analyzing the memory of BLSTM Neural Networks for enhanced emotion classification in dyadic spoken interactions.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Automatic recognition of emotion evoked by general sound events.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

A hierarchical framework for modeling multimodality and emotional evolution in affective dialogs.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Speaker states recognition using latent factor analysis based Eigenchannel factor vector modeling.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Object classification in sidescan sonar images with sparse representation techniques.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Classification of emotional content of sighs in dyadic human interactions.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

An acoustic analysis of shared enjoyment in ECA interactions of children with autism.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Improvements in predicting children's overall reading ability by modeling variability in evaluators' subjective judgments.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Creating ensemble of diverse maximum entropy models.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Analyzing quality of crowd-sourced speech transcriptions of noisy audio for acoustic model adaptation.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Supervised acoustic topic model with a consequent classifier for unstructured audio classification.
Proceedings of the 10th International Workshop on Content-Based Multimedia Indexing, 2012

Analyzing the language of therapist empathy in Motivational Interview based psychotherapy.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2012

Exploiting speech production information for automatic speech and speaker modeling and recognition - possibilities and new opportunities.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2012

Simplifying emotion classification through emotion distillation.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2012

Speaker verification using Lasso based sparse total variability supervector with PLDA modeling.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2012

Using measures of vocal entrainment to inform outcome-related behaviors in marital conflicts.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2012

A study of emotional information present in articulatory movements estimated using acoustic-to-articulatory inversion.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2012

Composite-DBN for recognition of environmental contexts.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2012

Using interval type-2 fuzzy logic to analyze Turkish emotion words.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2012

A System for Real-time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle.
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, 2012

2011
Optimal Time-Resource Allocation for Energy-Efficient Physical Activity Detection.
IEEE Trans. Signal Process., 2011

Introduction to the special issue on speech and language processing of children's speech for child-machine interaction applications.
ACM Trans. Speech Lang. Process., 2011

Automatically assessing the ABCs: Verification of children's spoken letter-names and letter-sounds.
ACM Trans. Speech Lang. Process., 2011

A Generative Student Model for Scoring Word Reading Skills.
IEEE Trans. Speech Audio Process., 2011

Enhanced Sparse Imputation Techniques for a Robust Speech Recognition Front-End.
IEEE ACM Trans. Audio Speech Lang. Process., 2011

A Framework for Automatic Human Emotion Classification Using Emotion Profiles.
IEEE Trans. Speech Audio Process., 2011

Robust Voice Activity Detection Using Long-Term Signal Variability.
IEEE Trans. Speech Audio Process., 2011

Automatic Prediction of Children's Reading Ability for High-Level Literacy Assessment.
IEEE ACM Trans. Audio Speech Lang. Process., 2011

Emotion recognition using a hierarchical binary decision tree approach.
Speech Commun., 2011

Joint source-filter optimization for robust glottal source estimation in the presence of shimmer and jitter.
Speech Commun., 2011

Detecting emotional state of a child in a conversational computer game.
Comput. Speech Lang., 2011

EmotiWord: Affective Lexicon Creation with Application to Interaction and Multimedia Data.
Proceedings of the Computational Intelligence for Multimedia Understanding, 2011

Behavioral signal processing for understanding (distressed) dyadic interactions: some recent developments.
Proceedings of the 2011 joint ACM workshop on Human gesture and behavior understanding, 2011

A Preplexity Based Cover Song Matching System for Short Length Queries.
Proceedings of the 12th International Society for Music Information Retrieval Conference, 2011

Acoustic and Visual Cues of Turn-Taking Dynamics in Dyadic Interactions.
Proceedings of the INTERSPEECH 2011, 2011

Automatic Data-Driven Learning of Articulatory Primitives from Real-Time MRI Data Using Convolutive NMF with Sparseness Constraints.
Proceedings of the INTERSPEECH 2011, 2011

Direct Estimation of Articulatory Kinematics from Real-Time Magnetic Resonance Image Sequences.
Proceedings of the INTERSPEECH 2011, 2011

A Multimodal Real-Time MRI Articulatory Corpus for Speech Research.
Proceedings of the INTERSPEECH 2011, 2011

Analyzing the Nature of ECA Interactions in Children with Autism.
Proceedings of the INTERSPEECH 2011, 2011

A Study of the Effectiveness of Articulatory Strokes for Phonemic Recognition.
Proceedings of the INTERSPEECH 2011, 2011

Kernel Models for Affective Lexicon Creation.
Proceedings of the INTERSPEECH 2011, 2011

Speaker Verification Using Sparse Representations on Total Variability i-vectors.
Proceedings of the INTERSPEECH 2011, 2011

An Analysis of PCA-Based Vocal Entrainment Measures in Married Couples' Affective Spoken Interactions.
Proceedings of the INTERSPEECH 2011, 2011

Morphological Variation in the Adult Vocal Tract: A Modeling Study of its Potential Acoustic Impact.
Proceedings of the INTERSPEECH 2011, 2011

Visualization of Vocal Tract Shape Using Interleaved Real-Time MRI of Multiple Scan Planes.
Proceedings of the INTERSPEECH 2011, 2011

An Exploratory Study of the Relations Between Perceived Emotion Strength and Articulatory Kinematics.
Proceedings of the INTERSPEECH 2011, 2011

Determining what Questions to Ask, with the Help of Spectral Graph Theory.
Proceedings of the INTERSPEECH 2011, 2011

Validating rt-MRI Based Articulatory Representations via Articulatory Recognition.
Proceedings of the INTERSPEECH 2011, 2011

Automatic Identification of Salient Acoustic Instances in Couples' Behavioral Interactions Using Diverse Density Support Vector Machines.
Proceedings of the INTERSPEECH 2011, 2011

Analysis of Inter-Articulator Correlation in Acoustic-to-Articulatory Inversion Using Generalized Smoothness Criterion.
Proceedings of the INTERSPEECH 2011, 2011

Enhancements to the Training Process of Classifier-Based Speech Translator via Topic Modeling.
Proceedings of the INTERSPEECH 2011, 2011

Intoxicated Speech Detection by Fusion of Speaker Normalized Hierarchical Features and GMM Supervectors.
Proceedings of the INTERSPEECH 2011, 2011

"You made me do it": Classification of Blame in Married Couples' Interactions by Fusing Automatically Derived Speech and Language Information.
Proceedings of the INTERSPEECH 2011, 2011

The USC CARE Corpus: Child-Psychologist Interactions of Children with Autism Spectrum Disorders.
Proceedings of the INTERSPEECH 2011, 2011

Reliability-Weighted Acoustic Model Adaptation Using Crowd Sourced Transcriptions.
Proceedings of the INTERSPEECH 2011, 2011

Rachel: Design of an emotionally targeted interactive agent for children with autism.
Proceedings of the 2011 IEEE International Conference on Multimedia and Expo, 2011

Overlapped speech detection using long-term spectro-temporal similarity in stereo recording.
Proceedings of the IEEE International Conference on Acoustics, 2011

Bilingual audio-subtitle extraction using automatic segmentation of movie audio.
Proceedings of the IEEE International Conference on Acoustics, 2011

Estimation of ordinal approach-avoidance labels in dyadic interactions: Ordinal logistic regression approach.
Proceedings of the IEEE International Conference on Acoustics, 2011

A hierarchical static-dynamic framework for emotion classification.
Proceedings of the IEEE International Conference on Acoustics, 2011

Tracking changes in continuous emotion states using body language and prosodic cues.
Proceedings of the IEEE International Conference on Acoustics, 2011

Robust talking face video verification using joint factor analysis and sparse representation on GMM mean shifted supervectors.
Proceedings of the IEEE International Conference on Acoustics, 2011

Directional descriptors using zernike moment phases for object orientation estimation in underwater sonar images.
Proceedings of the IEEE International Conference on Acoustics, 2011

A subject-independent acoustic-to-articulatory inversion.
Proceedings of the IEEE International Conference on Acoustics, 2011

Iterative feature normalization for emotional speech detection.
Proceedings of the IEEE International Conference on Acoustics, 2011

Emotion classification from speech using evaluator reliability-weighted combination of ranked lists.
Proceedings of the IEEE International Conference on Acoustics, 2011

Accurate transcription of broadcast news speech using multiple noisy transcribers and unsupervised reliability metrics.
Proceedings of the IEEE International Conference on Acoustics, 2011

Modeling high-level descriptions of real-life physical activities using latent topic modeling of multimodal sensor signals.
Proceedings of the 33rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2011

Affective State Recognition in Married Couples' Interactions Using PCA-Based Vocal Entrainment Measures with Multiple Instance Learning.
Proceedings of the Affective Computing and Intelligent Interaction, 2011

Emotion Twenty Questions: Toward a Crowd-Sourced Theory of Emotions.
Proceedings of the Affective Computing and Intelligent Interaction, 2011

EMO20Q Questioner Agent.
Proceedings of the Affective Computing and Intelligent Interaction, 2011

Multiple Instance Learning for Classification of Human Behavior Observations.
Proceedings of the Affective Computing and Intelligent Interaction, 2011

"That's Aggravating, Very Aggravating": Is It Possible to Classify Behaviors in Couple Interactions Using Automatically Derived Lexical Features?
Proceedings of the Affective Computing and Intelligent Interaction, 2011

2010
Stochastic Networked Computation.
IEEE Trans. Very Large Scale Integr. Syst., 2010

Nonproduct data-dependent partitions for mutual information estimation: strong consistency and applications.
IEEE Trans. Signal Process., 2010

Optimal Arousal Identification and Classification for Affective Computing Using Physiological Signals: Virtual Reality Stroop Task.
IEEE Trans. Affect. Comput., 2010

Bark Frequency Transform Using an Arbitrary Order Allpass Filter.
IEEE Signal Process. Lett., 2010

Multimodal Speaker Segmentation and Identification in Presence of Overlapped Speech Segments.
J. Multim., 2010

Robust Multimodal Person Recognition Using Low-Complexity Audio-Visual Feature Fusion Approaches.
Int. J. Semantic Comput., 2010

Towards modeling user behavior in interactions mediated through an automated bidirectional speech translation system.
Comput. Speech Lang., 2010


Robust representations for out-of-domain emotions using Emotion Profiles.
Proceedings of the 2010 IEEE Spoken Language Technology Workshop, 2010

An N-gram model for unstructured audio signals toward information retrieval.
Proceedings of the 2010 IEEE International Workshop on Multimedia Signal Processing, 2010

Ada and Grace: Toward Realistic and Engaging Virtual Museum Guides.
Proceedings of the Intelligent Virtual Agents, 10th International Conference, 2010

On data-driven histogram-based estimation for mutual information.
Proceedings of the IEEE International Symposium on Information Theory, 2010

A near-optimal (minimax) tree-structured partition for mutual information estimation.
Proceedings of the IEEE International Symposium on Information Theory, 2010

Acoustic feature analysis in speech emotion primitives estimation.
Proceedings of the INTERSPEECH 2010, 2010

Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional LSTM modeling.
Proceedings of the INTERSPEECH 2010, 2010

Automatic speech recognition system channel modeling.
Proceedings of the INTERSPEECH 2010, 2010

The INTERSPEECH 2010 paralinguistic challenge.
Proceedings of the INTERSPEECH 2010, 2010

A new multichannel multi modal dyadic interaction database.
Proceedings of the INTERSPEECH 2010, 2010

Investigating articulatory setting - pauses, ready position, and rest - using real-time MRI.
Proceedings of the INTERSPEECH 2010, 2010

Rapid semi-automatic segmentation of real-time magnetic resonance images for parametric vocal tract analysis.
Proceedings of the INTERSPEECH 2010, 2010

A cluster-profile representation of emotion using agglomerative hierarchical clustering.
Proceedings of the INTERSPEECH 2010, 2010

Vocal tract contour analysis of emotional speech by the functional data curve representation.
Proceedings of the INTERSPEECH 2010, 2010

Quantification of prosodic entrainment in affective spontaneous spoken interactions of married couples.
Proceedings of the INTERSPEECH 2010, 2010

Data-driven analysis of realtime vocal tract MRI using correlated image regions.
Proceedings of the INTERSPEECH 2010, 2010

Improved real-time MRI of oral-velar coordination using a golden-ratio spiral view order.
Proceedings of the INTERSPEECH 2010, 2010

A study of interplay between articulatory movement and prosodic characteristics in emotional speech production.
Proceedings of the INTERSPEECH 2010, 2010

A variable frame length and rate algorithm based on the spectral kurtosis measure for speaker verification.
Proceedings of the INTERSPEECH 2010, 2010

An improved cluster model selection method for agglomerative hierarchical speaker clustering using incremental Gaussian mixture models.
Proceedings of the INTERSPEECH 2010, 2010

Robust voice activity detection in stereo recording with crosstalk.
Proceedings of the INTERSPEECH 2010, 2010

Hierarchical classification for speech-to-speech translation.
Proceedings of the INTERSPEECH 2010, 2010

Statistical multi-stream modeling of real-time MRI articulatory speech data.
Proceedings of the INTERSPEECH 2010, 2010

A study of intra-speaker and inter-speaker affective variability using electroglottograph and inverse filtered glottal waveforms.
Proceedings of the INTERSPEECH 2010, 2010

Automatic classification of married couples' behavior using audio features.
Proceedings of the INTERSPEECH 2010, 2010

Data-dependent evaluator modeling and its application to emotional valence classification from speech.
Proceedings of the INTERSPEECH 2010, 2010

Robust ECG Biometrics by Fusing Temporal and Cepstral Information.
Proceedings of the 20th International Conference on Pattern Recognition, 2010

Speech emotion estimation in 3D space.
Proceedings of the 2010 IEEE International Conference on Multimedia and Expo, 2010

Language model adaptation using WWW documents obtained by utterance-based queries.
Proceedings of the IEEE International Conference on Acoustics, 2010

Decision level combination of multiple modalities for recognition and analysis of emotional expression.
Proceedings of the IEEE International Conference on Acoustics, 2010

Visual emotion recognition using compact facial representations and viseme information.
Proceedings of the IEEE International Conference on Acoustics, 2010

Predicting interruptions in dyadic spoken interactions.
Proceedings of the IEEE International Conference on Acoustics, 2010

An exploratory study of manifolds of emotional speech.
Proceedings of the IEEE International Conference on Acoustics, 2010

Using naïve text queries for robust audio information retrieval.
Proceedings of the IEEE International Conference on Acoustics, 2010

Acoustic stopwords for unstructured audio information retrieval.
Proceedings of the 18th European Signal Processing Conference, 2010

2009
Discriminative wavelet packet filter bank selection for pattern recognition.
IEEE Trans. Signal Process., 2009

Human Perception of Audio-Visual Synthetic Character Emotion Expression in the Presence of Ambiguous and Conflicting Information.
IEEE Trans. Multim., 2009

Region Segmentation in the Frequency Domain Applied to Upper Airway Real-Time Magnetic Resonance Images.
IEEE Trans. Medical Imaging, 2009

Automatic Detection of Disfluency Boundaries in Spontaneous Speech of Children Using Audio-Visual Information.
IEEE Trans. Speech Audio Process., 2009

An Iterative Relative Entropy Minimization-Based Data Selection Approach for n-Gram Model Adaptation.
IEEE Trans. Speech Audio Process., 2009

Prominence Detection Using Auditory Attention Cues and Task-Dependent High Level Information.
IEEE Trans. Speech Audio Process., 2009

Environmental Sound Recognition With Time-Frequency Audio Features.
IEEE Trans. Speech Audio Process., 2009

Analysis of Emotionally Salient Aspects of Fundamental Frequency for Emotion Detection.
IEEE Trans. Speech Audio Process., 2009

Unsupervised Adaptation of Categorical Prosody Models for Prosody Labeling and Speech Recognition.
IEEE Trans. Speech Audio Process., 2009

Pitch Contour Stylization Using an Optimal Piecewise Polynomial Approximation.
IEEE Signal Process. Lett., 2009

Assessment of emerging reading skills in young native speakers and language learners.
Speech Commun., 2009

Timing effects of syllable structure and stress on nasals: A real-time MRI examination.
J. Phonetics, 2009

Combining lexical, syntactic and prosodic cues for improved online dialog act tagging.
Comput. Speech Lang., 2009

Recognizing child's emotional state in problem-solving child-machine interactions.
Proceedings of the Second Workshop on Child, Computer and Interaction, 2009

A review of ASR technologies for children's speech.
Proceedings of the Second Workshop on Child, Computer and Interaction, 2009

Comparison of child-human and child-computer interactions based on manual annotations.
Proceedings of the Second Workshop on Child, Computer and Interaction, 2009

Acoustic topic model for audio information retrieval.
Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2009

Saliency-driven unstructured acoustic scene classification using latent perceptual indexing.
Proceedings of the 2009 IEEE International Workshop on Multimedia Signal Processing, 2009

A Low-Complexity Dynamic Face-Voice Feature Fusion Approach to Multimodal Person Recognition.
Proceedings of the 11th IEEE International Symposium on Multimedia, 2009

Histogram-based estimation for the divergence revisited.
Proceedings of the IEEE International Symposium on Information Theory, 2009

Context-driven automatic bilingual movie subtitle alignment.
Proceedings of the INTERSPEECH 2009, 2009

Automatically rating pronunciation through articulatory phonology.
Proceedings of the INTERSPEECH 2009, 2009

An articulatory analysis of phonological transfer using real-time MRI.
Proceedings of the INTERSPEECH 2009, 2009

Connecting rhythm and prominence in automatic ESL pronunciation scoring.
Proceedings of the INTERSPEECH 2009, 2009

Evaluating evaluators: a case study in understanding the benefits and pitfalls of multi-evaluator modeling.
Proceedings of the INTERSPEECH 2009, 2009

Modeling mutual influence of interlocutor emotion states in dyadic spoken interactions.
Proceedings of the INTERSPEECH 2009, 2009

A detailed study of word-position effects on emotion expression in speech.
Proceedings of the INTERSPEECH 2009, 2009

Continuous speech recognition using attention shift decoding with soft decision.
Proceedings of the INTERSPEECH 2009, 2009

Signature cluster model selection for incremental Gaussian mixture cluster modeling in agglomerative hierarchical speaker clustering.
Proceedings of the INTERSPEECH 2009, 2009

Improved speaker diarization of meeting speech with recurrent selection of representative speech segments and participant interaction pattern modeling.
Proceedings of the INTERSPEECH 2009, 2009

Estimation of articulatory gesture patterns from speech acoustics.
Proceedings of the INTERSPEECH 2009, 2009

Predicting children's reading ability using evaluator-informed features.
Proceedings of the INTERSPEECH 2009, 2009

A divide-and-conquer approach to Latent Perceptual Indexing of audio for large Web 2.0 applications.
Proceedings of the 2009 IEEE International Conference on Multimedia and Expo, 2009

Robust word boundary detection in spontaneous speech using acoustic and lexical cues.
Proceedings of the IEEE International Conference on Acoustics, 2009

Accelerated 3D MRI of vocal tract shaping using compressed sensing and parallel imaging.
Proceedings of the IEEE International Conference on Acoustics, 2009

A robust harmony structure modeling scheme for classical music opus identification.
Proceedings of the IEEE International Conference on Acoustics, 2009

An analysis of articulatory-acoustic data based on articulatory strokes.
Proceedings of the IEEE International Conference on Acoustics, 2009

A semi-supervised learning approach to online audio background detection.
Proceedings of the IEEE International Conference on Acoustics, 2009

Automatic pronunciation verification of english letter-names for early literacy assessment of preliterate children.
Proceedings of the IEEE International Conference on Acoustics, 2009

Optimal Allocation of Time-Resources for Multihypothesis Activity-Level Detection.
Proceedings of the Distributed Computing in Sensor Systems, 2009

Optimal time-resource allocation for activity-detection via multimodal sensing.
Proceedings of the 4th International ICST Conference on Body Area Networks, 2009

Lattice-based lexical cues for word fragment detection in conversational speech.
Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009

Interpreting ambiguous emotional expressions.
Proceedings of the Affective Computing and Intelligent Interaction, 2009

2008
Upper Bound Kullback-Leibler Divergence for Transient Hidden Markov Models.
IEEE Trans. Signal Process., 2008

On Energy-Based Acoustic Source Localization for Sensor Networks.
IEEE Trans. Signal Process., 2008

Challenging Uncertainty in Query by Humming Systems: A Fingerprinting Approach.
IEEE Trans. Speech Audio Process., 2008

Using Articulatory Representations to Detect Segmental Errors in Nonnative Pronunciation.
IEEE Trans. Speech Audio Process., 2008

Exploiting Acoustic and Syntactic Features for Automatic Prosody Labeling in a Maximum Entropy Framework.
IEEE Trans. Speech Audio Process., 2008

Strategies to Improve the Robustness of Agglomerative Hierarchical Clustering Under Data Source Variation for Speaker Diarization.
IEEE Trans. Speech Audio Process., 2008

Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence.
IEEE Trans. Speech Audio Process., 2008

Seeing speech: Capturing vocal tract shaping using real-time magnetic resonance imaging [Exploratory DSP].
IEEE Signal Process. Mag., 2008

IEMOCAP: interactive emotional dyadic motion capture database.
Lang. Resour. Evaluation, 2008

A generative model for scoring children<sup>2</sup>s reading comprehension.
Proceedings of the 1st Workshop on Child, Computer and Interaction, 2008

Linguistic analysis of spontaneous children speech.
Proceedings of the 1st Workshop on Child, Computer and Interaction, 2008

An empirical analysis of user uncertainty in problem-solving child-machine interactions.
Proceedings of the 1st Workshop on Child, Computer and Interaction, 2008

Incorporating discourse context in spoken language translation through dialog acts.
Proceedings of the 2008 IEEE Spoken Language Technology Workshop, 2008

Dynamic chroma feature vectors with applications to cover song identification.
Proceedings of the International Workshop on Multimedia Signal Processing, 2008

The SAIL speaker diarization system for analysis of spontaneous meetings.
Proceedings of the International Workshop on Multimedia Signal Processing, 2008

Multimodal Speaker Segmentation in Presence of Overlapped Speech Segments.
Proceedings of the Tenth IEEE International Symposium on Multimedia (ISM2008), 2008

Selection of Emotionally Salient Audio-Visual Features for Modeling Human Evaluations of Synthetic Character Emotion Displays.
Proceedings of the Tenth IEEE International Symposium on Multimedia (ISM2008), 2008

Audio-Visual Emotion Recognition Using Gaussian Mixture Models for Face and Voice.
Proceedings of the Tenth IEEE International Symposium on Multimedia (ISM2008), 2008

Tree grammars as models of prosodic structure.
Proceedings of the INTERSPEECH 2008, 2008

Better nonnative intonation scores through prosodic theory.
Proceedings of the INTERSPEECH 2008, 2008

Factored translation models for enriching spoken language translation with prosody.
Proceedings of the INTERSPEECH 2008, 2008

An analysis of multimodal cues of interruption in dyadic spoken interactions.
Proceedings of the INTERSPEECH 2008, 2008

Relation between geometry and kinematics of articulatory trajectory associated with emotional speech production.
Proceedings of the INTERSPEECH 2008, 2008

An interval type-2 fuzzy logic system to translate between emotion-related vocabularies.
Proceedings of the INTERSPEECH 2008, 2008

Combining task-dependent information with auditory attention cues for prominence detection in speech.
Proceedings of the INTERSPEECH 2008, 2008

Agglomerative hierarchical speaker clustering using incremental Gaussian mixture cluster modeling.
Proceedings of the INTERSPEECH 2008, 2008

Towards unsupervised training of the classifier-based speech translator.
Proceedings of the INTERSPEECH 2008, 2008

Scripted dialogs versus improvisation: lessons learned about emotional elicitation techniques from the IEMOCAP database.
Proceedings of the INTERSPEECH 2008, 2008

The expression and perception of emotions: comparing assessments of self versus others.
Proceedings of the INTERSPEECH 2008, 2008

An analysis of vocal tract shaping in English sibilant fricatives using real-time magnetic resonance imaging.
Proceedings of the INTERSPEECH 2008, 2008

Estimation of children's reading ability by fusion of automatic pronunciation verification and fluency detection.
Proceedings of the INTERSPEECH 2008, 2008

Pronunciation verification of English letter-sounds in preliterate children.
Proceedings of the INTERSPEECH 2008, 2008

Classification of sound clips by two schemes: Using onomatopoeia and semantic labels.
Proceedings of the 2008 IEEE International Conference on Multimedia and Expo, 2008

Joint-processing of audio-visual signals in human perception of conflicting synthetic character emotions.
Proceedings of the 2008 IEEE International Conference on Multimedia and Expo, 2008

Music fingerprint extraction for classical music cover song identification.
Proceedings of the 2008 IEEE International Conference on Multimedia and Expo, 2008

The Vera am Mittag German audio-visual emotional speech database.
Proceedings of the 2008 IEEE International Conference on Multimedia and Expo, 2008

Audio retrieval by latent perceptual indexing.
Proceedings of the IEEE International Conference on Acoustics, 2008

Modeling the intonation of discourse segments for improved online dialog ACT tagging.
Proceedings of the IEEE International Conference on Acoustics, 2008

Computation as estimation: Estimation-theoretic IC design improves robustness and reduces power consumption.
Proceedings of the IEEE International Conference on Acoustics, 2008

Human perception of synthetic character emotions in the presence of conflicting and congruent vocal and facial expressions.
Proceedings of the IEEE International Conference on Acoustics, 2008

A top-down auditory attention model for learning task dependent influences on prominence detection in speech.
Proceedings of the IEEE International Conference on Acoustics, 2008

Novel inter-cluster distance measure combining GLR and ICR for improved agglomerative hierarchical speaker clustering.
Proceedings of the IEEE International Conference on Acoustics, 2008

Investigating automatic assessment of reading comprehension in young children.
Proceedings of the IEEE International Conference on Acoustics, 2008

Environmental sound recognition using MP-based features.
Proceedings of the IEEE International Conference on Acoustics, 2008

Recognition for synthesis: Automatic parameter selection for resynthesis of emotional speech from neutral speech.
Proceedings of the IEEE International Conference on Acoustics, 2008

Fine-grained pitch accent and boundary tone labeling with parametric F0 features.
Proceedings of the IEEE International Conference on Acoustics, 2008

A novel algorithm for unsupervised prosodic language model adaptation.
Proceedings of the IEEE International Conference on Acoustics, 2008

Automatic classification of question turns in spontaneous speech using lexical and prosodic evidence.
Proceedings of the IEEE International Conference on Acoustics, 2008

Mitigation of Data Sparsity in Classifier-Based Translation.
Proceedings of the workshop on Speech Processing for Safety Critical Translation and Pervasive Applications@COLING 2008, 2008

Enriching Spoken Language Translation with Dialog Acts.
Proceedings of the ACL 2008, 2008

2007
Robust Speech Rate Estimation for Spontaneous Speech.
IEEE Trans. Speech Audio Process., 2007

An Acoustic Measure for Word Prominence in Spontaneous Speech.
IEEE Trans. Speech Audio Process., 2007

Interrelation Between Speech and Facial Gestures in Emotional Utterances: A Single Subject Study.
IEEE Trans. Speech Audio Process., 2007

Rigid Head Motion in Expressive Speech Animation: Analysis and Synthesis.
IEEE Trans. Speech Audio Process., 2007

Primitives-based evaluation and estimation of emotions in speech.
Speech Commun., 2007

Robust speaker identification based on selective use of feature vectors.
Pattern Recognit. Lett., 2007

Hassan: A Virtual Human for Tactical Questioning.
Proceedings of the 8th SIGdial Workshop on Discourse and Dialogue, 2007

Investigating Implicit Cues for User State Estimation in Human-Robot Interaction Using Physiological Measurements.
Proceedings of the IEEE RO-MAN 2007, 2007

Exploiting Acoustic and Syntactic Features for Prosody Labeling in a Maximum Entropy Framework.
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, 2007

Statistical Modeling and Retrieval of Polyphonic Music.
Proceedings of the IEEE 9th Workshop on Multimedia Signal Processing, 2007

Experiments in Automatic Genre Classification of Full-length Music Tracks using Audio Activity Rate.
Proceedings of the IEEE 9th Workshop on Multimedia Signal Processing, 2007

Analyzing the Multimodal Behaviors of Users of a Speech-to-Speech Translation Device by using Concept Matching Scores.
Proceedings of the IEEE 9th Workshop on Multimedia Signal Processing, 2007

Multimodal Meeting Monitoring: Improvements on Speaker Tracking and Segmentation through a Modified Mixture Particle Filter.
Proceedings of the IEEE 9th Workshop on Multimedia Signal Processing, 2007

A review of the acoustic and linguistic properties of children's speech.
Proceedings of the IEEE 9th Workshop on Multimedia Signal Processing, 2007

Real-time Emotion Detection System using Speech: Multi-modal Fusion of Different Timescale Features.
Proceedings of the IEEE 9th Workshop on Multimedia Signal Processing, 2007

Joint Analysis of the Emotional Fingerprint in the Face and Speech: A single subject study.
Proceedings of the IEEE 9th Workshop on Multimedia Signal Processing, 2007

A System for Technology Based Assessment of Language and Literacy in Young Children: the Role of Multiple Information Sources.
Proceedings of the IEEE 9th Workshop on Multimedia Signal Processing, 2007

Universal Consistency of Data-Driven Partitions for Divergence Estimation.
Proceedings of the IEEE International Symposium on Information Theory, 2007

A text-free approach to assessing nonnative intonation.
Proceedings of the INTERSPEECH 2007, 2007

A Bayesian network classifier for word-level reading assessment.
Proceedings of the INTERSPEECH 2007, 2007

Exploiting prosodic features for dialog act tagging in a discriminative modeling framework.
Proceedings of the INTERSPEECH 2007, 2007

A saliency-based auditory attention model with applications to unsupervised prominent syllable detection in speech.
Proceedings of the INTERSPEECH 2007, 2007

A robust stopping criterion for agglomerative hierarchical clustering in a speaker diarization system.
Proceedings of the INTERSPEECH 2007, 2007

Pitch period estimation using multipulse model and wavelet transform.
Proceedings of the INTERSPEECH 2007, 2007

Using neutral speech models for emotional speech analysis.
Proceedings of the INTERSPEECH 2007, 2007

Analysis of emotional speech prosody in terms of part of speech tags.
Proceedings of the INTERSPEECH 2007, 2007

Automatic detection and classification of disfluent reading miscues in young children's speech for the purpose of assessment.
Proceedings of the INTERSPEECH 2007, 2007

Prosody-enriched lattices for improved syllable recognition.
Proceedings of the INTERSPEECH 2007, 2007

Analysis of Audio Clustering using Word Descriptions.
Proceedings of the IEEE International Conference on Acoustics, 2007

Discriminating Two Types of Noise Sources using Cortical Representation and Dimension Reduction Technique.
Proceedings of the IEEE International Conference on Acoustics, 2007

Information Theoretic Analysis of Direct Articulatory Measurements for Phonetic Discrimination.
Proceedings of the IEEE International Conference on Acoustics, 2007

Optimal Wavelet Packets Decomposition Based on a Rate-Distortion Optimality Criterion.
Proceedings of the IEEE International Conference on Acoustics, 2007

Data Driven Approach for Language Model Adaptation using Stepwise Relative Entropy Minimization.
Proceedings of the IEEE International Conference on Acoustics, 2007

Support Vector Regression for Automatic Recognition of Spontaneous Emotions in Speech.
Proceedings of the IEEE International Conference on Acoustics, 2007

Real-Time Monitoring of Participants' Interaction in a Meeting using Audio-Visual Sensors.
Proceedings of the IEEE International Conference on Acoustics, 2007

A Statistical Approach for Modeling Prosody Features using POS Tags for Emotional Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2007

Improved Speech Recognition using Acoustic and Lexical Correlates of Pitch Accent in a N-Best Rescoring Framework.
Proceedings of the IEEE International Conference on Acoustics, 2007

Early auditory processing inspired features for robust automatic speech recognition.
Proceedings of the 15th European Signal Processing Conference, 2007

Robust speaker clustering strategies to data source variation for improved speaker diarization.
Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007

2006
Expressive Facial Animation Synthesis by Learning Speech Coarticulation and Expression Spaces.
IEEE Trans. Vis. Comput. Graph., 2006

Average divergence distance as a statistical discrimination measure for hidden Markov models.
IEEE Trans. Speech Audio Process., 2006

Efficient scalable encoding for distributed speech recognition.
Speech Commun., 2006

A split lexicon approach for improved recognition of spoken names.
Speech Commun., 2006

Acoustic-Syntactic Maximum Entropy Model for Automatic prosody Labeling.
Proceedings of the 2006 IEEE ACL Spoken Language Technology Workshop, 2006

Selecting relevant text subsets from web-data for building topic specific language models.
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, 2006

An attribute-based approach to audio description applied to segmenting vocal sections in popular music songs.
Proceedings of the IEEE 8th Workshop on Multimedia Signal Processing, 2006

User modeling in a speech translation driven mediated interaction setting.
Proceedings of the 1st ACM international workshop on Human-centered multimedia, 2006

Using model trees for evaluating dialog error conditions based on acoustic information.
Proceedings of the 1st ACM international workshop on Human-centered multimedia, 2006

Upper Bound Kullback-Leibler Divergence for Hidden Markov Models with Application as Discrimination Measure for Speech Recognition.
Proceedings of the Proceedings 2006 IEEE International Symposium on Information Theory, 2006

"yeah right": sarcasm recognition for spoken dialogue systems.
Proceedings of the INTERSPEECH 2006, 2006

Pronunciation verification of children²s speech for automatic literacy assessment.
Proceedings of the INTERSPEECH 2006, 2006

Radiobot-CFF: a spoken dialogue system for military training.
Proceedings of the INTERSPEECH 2006, 2006

A study of emotional speech articulation using a fast magnetic resonance imaging technique.
Proceedings of the INTERSPEECH 2006, 2006

Automatic detection of voice onset time contrasts for use in pronunciation assessment.
Proceedings of the INTERSPEECH 2006, 2006

Acoustic analysis and automatic recognition of spontaneous children²s speech.
Proceedings of the INTERSPEECH 2006, 2006

Cross-lingual dialog model for speech to speech translation.
Proceedings of the INTERSPEECH 2006, 2006

Combining acoustic, lexical, and syntactic evidence for automatic unsupervised prosody labeling.
Proceedings of the INTERSPEECH 2006, 2006

Where am I? Scene Recognition for Mobile Robots using Audio Features.
Proceedings of the 2006 IEEE International Conference on Multimedia and Expo, 2006

Text-Independent Voice Conversion Based on Unit Selection.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Speech Recognition Engineering Issues in Speech to Speech Translation System Design for Low Resource Languages and Domains.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Smooth Gmm Based Multi-Talker Spectral Conversion for Spectrally Degraded Speech.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Analyzing Children's Speech: An Acoustic Study of Consonants and Consonant-Vowel Transition.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Analysis of disfluent repetitions in spontaneous speech recognition.
Proceedings of the 14th European Signal Processing Conference, 2006

Combining categorical and primitives-based emotion recognition.
Proceedings of the 14th European Signal Processing Conference, 2006

Text data acquisition for domain-specific language models.
Proceedings of the EMNLP 2006, 2006

Pathological Voice Assessment.
Proceedings of the 28th International Conference of the IEEE Engineering in Medicine and Biology Society, 2006

Efficient Rotation Invariant Retrieval of Shapes with Applications in Medical Databases.
Proceedings of the 19th IEEE International Symposium on Computer-Based Medical Systems (CBMS 2006), 2006

Vector-based Representation and Clustering of Audio Using Onomatopoeia Words.
Proceedings of the Aurally Informed Performance: Integrating Machine Listening and Auditory Presentation in Robotic Systems, 2006

Content Analysis for Acoustic Environment Classification in Mobile Robots.
Proceedings of the Aurally Informed Performance: Integrating Machine Listening and Auditory Presentation in Robotic Systems, 2006

2005
Adaptive categorical understanding for spoken dialogue systems.
IEEE Trans. Speech Audio Process., 2005

Multichannel audio synthesis by subband-based spectral conversion and parameter adaptation.
IEEE Trans. Speech Audio Process., 2005

Toward detecting emotions in spoken dialogs.
IEEE Trans. Speech Audio Process., 2005

Unsupervised Speaker Indexing Using Generic Models.
IEEE Trans. Speech Audio Process., 2005

Creating data resources for designing usercentric frontends for query-by-humming systems.
Multim. Syst., 2005

Natural head motion synthesis driven by acoustic prosodic features.
Comput. Animat. Virtual Worlds, 2005

Dealing with Doctors: A Virtual Human for Non-team Interaction.
Proceedings of the 6th SIGdial Workshop on Discourse and Dialogue, 2005

Pronunciation variations of Spanish-accented English spoken by young children.
Proceedings of the INTERSPEECH 2005, 2005

Detecting Politeness and frustration state of a child in a conversational computer game.
Proceedings of the INTERSPEECH 2005, 2005

Piecewise linear stylization of pitch via wavelet analysis.
Proceedings of the INTERSPEECH 2005, 2005

Modeling and automating detection of errors in Arabic language learner speech.
Proceedings of the INTERSPEECH 2005, 2005

Building topic specific language models from webdata using competitive models.
Proceedings of the INTERSPEECH 2005, 2005

An articulatory study of emotional speech production.
Proceedings of the INTERSPEECH 2005, 2005

TBALL data collection: the making of a young children's speech corpus.
Proceedings of the INTERSPEECH 2005, 2005

Investigating the role of phoneme-level modifications in emotional speech resynthesis.
Proceedings of the INTERSPEECH 2005, 2005

An Unsupervised Quantitative Measure for Word Prominence in Spontaneous Speech.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

Automatic Syllable Stress Detection Using Prosodic Features for Pronunciation Evaluation of Language Learners.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

Speech Rate Estimation via Temporal Correlation and Selected Sub-Band Correlation.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

Smart room: participant and speaker localization and identification.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

An Automatic Prosody Recognizer using a Coupled Multi-Stream Acoustic Model and a Syntactic-Prosodic Language Model.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

Transonics: A Practical Speech-to-Speech Translator for English-Farsi Medical Dialogs.
Proceedings of the ACL 2005, 2005

2004
Content-based movie analysis and indexing based on audiovisual cues.
IEEE Trans. Circuits Syst. Video Technol., 2004

Introduction to the Special Issue on Spontaneous Speech Processing.
IEEE Trans. Speech Audio Process., 2004

Adaptive speaker identification with audiovisual cues for movie content analysis.
Pattern Recognit. Lett., 2004

Audio-based head motion synthesis for Avatar-based telepresence systems.
Proceedings of the 2004 ACM SIGMM Workshop on Effective Telepresence, 2004

A statistical approach to retrieval under user-dependent uncertainty in query-by-humming systems.
Proceedings of the 6th ACM SIGMM International Workshop on Multimedia Information Retrieval, 2004

Creation of a Doctor-Patient Dialogue Corpus Using Standardized Patients.
Proceedings of the Fourth International Conference on Language Resources and Evaluation, 2004

Tactical Language Training System: An Interim Report.
Proceedings of the Intelligent Tutoring Systems, 7th International Conference, 2004

Constructing emotional speech synthesizers with limited speech database.
Proceedings of the INTERSPEECH 2004, 2004

An acoustic study of emotions expressed in speech.
Proceedings of the INTERSPEECH 2004, 2004

Robust speech recognition over packet networks: an overview.
Proceedings of the INTERSPEECH 2004, 2004

A statistical discrimination measure for hidden Markov models based on divergence.
Proceedings of the INTERSPEECH 2004, 2004

Measuring convergence in language model estimation using relative entropy.
Proceedings of the INTERSPEECH 2004, 2004

Reference marking in children's computer-directed speech: an integrated analysis of discourse and gestures.
Proceedings of the INTERSPEECH 2004, 2004

Emotion recognition based on phoneme classes.
Proceedings of the INTERSPEECH 2004, 2004

Speaker model quantization for unsupervised speaker indexing.
Proceedings of the INTERSPEECH 2004, 2004

A distributed speech recognition system in multi-user environments.
Proceedings of the INTERSPEECH 2004, 2004

Context dependent statistical augmentation of persian transcripts.
Proceedings of the INTERSPEECH 2004, 2004

Analysis of emotion recognition using facial expressions, speech and multimodal information.
Proceedings of the 6th International Conference on Multimodal Interfaces, 2004

A multi-pass linear fold algorithm for sentence boundary detection using prosodic cues.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

Enhanced standard compliant distributed speech recognition (Aurora encoder) using rate allocation.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

Speaker identification using supra-segmental pitch pattern dynamics.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

The Transonics Spoken Dialogue Translator: An Aid for English-Persian Doctor-Patient Interviews.
Proceedings of the Dialogue Systems for Health Communication, 2004

2003
Robust recognition of children's speech.
IEEE Trans. Speech Audio Process., 2003

Virtual Microphones for Multichannel Audio Resynthesis.
EURASIP J. Adv. Signal Process., 2003

Handling real-time scheduling exceptions using decision support systems.
Proceedings of the IEEE International Conference on Systems, 2003

Creating data resources for designing user-centric frontends for query by humming systems.
Proceedings of the 5th ACM SIGMM International Workshop on Multimedia Information Retrieval, 2003

An empirical text transformation method for spontaneous speech synthesizers.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Towards optimal encoding for classification with applications to distributed speech recognition.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Language-adaptive persian speech recognition.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Emotion recognition using a data-driven fuzzy inference system.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

A method for on-line speaker indexing using generic reference models.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

A statistical multidimensional humming transcription using phone level hidden Markov models for query by humming systems.
Proceedings of the 2003 IEEE International Conference on Multimedia and Expo, 2003

An information-theoretic analysis of developmental changes in speech.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

Multidimensional humming transcription using a statistical approach for query by humming systems.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

Split-lexicon based hierarchical recognition of speech using syllable and word level acoustic units.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

Audiovisual-based adaptive speaker identification.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

2002
Creating conversational interfaces for children.
IEEE Trans. Speech Audio Process., 2002

Comparison of dictionary-based approaches to automatic repeating melody extraction.
Proceedings of the Storage and Retrieval for Media Databases 2002, 2002

Analysis of user behavior under error conditions in spoken dialogs.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Refined speech segmentation for concatenative speech synthesis.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Combining acoustic and language information for emotion recognition.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Speaker change detection using a new weighted distance measure.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Expressive speech synthesis using a concatenative synthesizer.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

An HMM-based approach to humming transcription.
Proceedings of the 2002 IEEE International Conference on Multimedia and Expo, 2002

Multiresolution spectral conversion for multichannel audio resynthesis.
Proceedings of the 2002 IEEE International Conference on Multimedia and Expo, 2002

Classifying emotions in human-machine spoken dialogs.
Proceedings of the 2002 IEEE International Conference on Multimedia and Expo, 2002

A statistical approach to humming recognition.
Proceedings of the IEEE International Conference on Acoustics, 2002

Identification of speakers in movie dialogs using audiovisual cues.
Proceedings of the IEEE International Conference on Acoustics, 2002

Effcient multichannel audio resynthesis by subband-based spectral conversion.
Proceedings of the 11th European Signal Processing Conference, 2002

2001
Amount of Information Presented in a Complex List: Effects on User Performance.
Proceedings of the First International Conference on Human Language Technology Research, 2001


Efficient scalable speech compression for scalable speech recognition.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

Politeness and frustration language in child-machine interactions.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

A Dictionary Approach To Repetitive Pattern Finding In Music.
Proceedings of the 2001 IEEE International Conference on Multimedia and Expo, 2001

On the implementation of ASR algorithms for hand-held wireless mobile devices.
Proceedings of the IEEE International Conference on Acoustics, 2001

Just (all) the facts, ma'am.
Proceedings of the CHI 2001 Extended Abstracts on Human Factors in Computing Systems, 2001

2000
Noise source models for fricative consonants.
IEEE Trans. Speech Audio Process., 2000

A spoken dialogue system for conference/workshop services.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Effects of dialog initiative and multi-modal presentation strategies on large directory information access.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

The AT&t-DARPA communicator mixed-initiative spoken dialog system.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Web-based monitoring, logging and reporting tools for multi-service multi-modal systems.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Unifying Conversational Multimedia Interfaces for Accessing Network Services Across Communication Devices.
Proceedings of the 2000 IEEE International Conference on Multimedia and Expo, 2000

1999
Categorical understanding using statistical ngram models.
Proceedings of the Sixth European Conference on Speech Communication and Technology, 1999

Multimodal systems for children: building a prototype.
Proceedings of the Sixth European Conference on Speech Communication and Technology, 1999

1998
Language model adaptation for spoken language systems.
Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998

VPQ: a spoken language interface to large scale directory information.
Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998

Spoken dialog systems for children.
Proceedings of the 1998 IEEE International Conference on Acoustics, 1998

Probing the relationship between qualitative and quantitative performance measures for voice-enabled telecommunication services.
Proceedings of the 1998 IEEE International Conference on Acoustics, 1998

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for Email.
Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, 1998

1997
Unsupervised HMM adaptation based on speech-silence discrimination.
Proceedings of the Fifth European Conference on Speech Communication and Technology, 1997

Novel filler acoustic models for connected digit recognition.
Proceedings of the Fifth European Conference on Speech Communication and Technology, 1997

Automatic speech recognition for children.
Proceedings of the Fifth European Conference on Speech Communication and Technology, 1997

New results in vowel production: MRI, EPG, and acoustic data.
Proceedings of the Fifth European Conference on Speech Communication and Technology, 1997

Database management and analysis for spoken dialog systems: methodology and tools.
Proceedings of the Fifth European Conference on Speech Communication and Technology, 1997

Analysis of children's speech: duration, pitch and formants.
Proceedings of the Fifth European Conference on Speech Communication and Technology, 1997

Evaluating spoken dialog systems for telecommunication services.
Proceedings of the Fifth European Conference on Speech Communication and Technology, 1997

Acoustic modelling of American English /r/.
Proceedings of the Fifth European Conference on Speech Communication and Technology, 1997

1996
Improved HMM phone and triphone models for realtime ASR telephony applications.
Proceedings of the 4th International Conference on Spoken Language Processing, 1996

Liquids in tamil.
Proceedings of the 4th International Conference on Spoken Language Processing, 1996

From MRI and acoustic data to articulatory synthesis: a case study of the lateral approximants in american English.
Proceedings of the 4th International Conference on Spoken Language Processing, 1996

Parametric hybrid source models for voiced and voiceless fricative consonants.
Proceedings of the 1996 IEEE International Conference on Acoustics, 1996

1994
An MRI study of fricative consonants.
Proceedings of the 3rd International Conference on Spoken Language Processing, 1994

Fast and Efficient Techniques for Motion Estimation Using Subband Analysis.
Proceedings of the Proceedings 1994 International Conference on Image Processing, 1994

1993
Strange attractors and chaotic dynamics in the production of voiced and voiceless fricatives.
Proceedings of the Third European Conference on Speech Communication and Technology, 1993


  Loading...