Mark Hasegawa-Johnson

Orcid: 0000-0002-5631-2893

According to our database1, Mark Hasegawa-Johnson authored at least 279 papers between 1996 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Preliminary Technical Validation of LittleBeats™: A Multimodal Sensing Platform to Capture Cardiac Physiology, Motion, and Vocalizations.
Sensors, February, 2024

Towards Unsupervised Speech Recognition Without Pronunciation Models.
CoRR, 2024

C-TPT: Calibrated Test-Time Prompt Tuning for Vision-Language Models via Text Feature Dispersion.
CoRR, 2024

Analysis of Self-Supervised Speech Models on Children's Speech and Infant Vocalizations.
CoRR, 2024

Finding Spoken Identifications: Using GPT-4 Annotation for an Efficient and Fast Dataset Creation Pipeline.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

2023
Automated morphological phenotyping using learned shape descriptors and functional maps: A novel approach to geometric morphometrics.
PLoS Comput. Biol., January, 2023

HiFi Tuner: High-Fidelity Subject-Driven Fine-Tuning for Diffusion Models.
CoRR, 2023

Unsupervised Speech Recognition with N-Skipgram and Positional Unigram Matching.
CoRR, 2023

Enhancing Child Vocalization Classification in Multi-Channel Child-Adult Conversations Through Wav2vec2 Children ASR Features.
CoRR, 2023

One-Shot Exemplification Modeling via Latent Sense Representations.
Proceedings of the 8th Workshop on Representation Learning for NLP, 2023

Wav2ToBI: a new approach to automatic ToBI transcription.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Mitigating the Exposure Bias in Sentence-Level Grapheme-to-Phoneme (G2P) Transduction.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Towards Robust Family-Infant Audio Analysis Based on Unsupervised Pretraining of Wav2vec 2.0 on Large-Scale Unlabeled Family Audio.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

End-to-End Zero-Shot Voice Conversion with Location-Variable Convolutions.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Dual-Path Cross-Modal Attention for Better Audio-Visual Speech Extraction.
Proceedings of the IEEE International Conference on Acoustics, 2023

Lightweight, Multi-Speaker, Multi-Lingual Indic Text-to-Speech.
Proceedings of the IEEE International Conference on Acoustics, 2023

Classification of Infant Sleep/Wake States: Cross-Attention among Large Scale Pretrained Transformer Networks using Audio, ECG, and IMU Data.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

INTapt: Information-Theoretic Adversarial Prompt Tuning for Enhanced Non-Native Speech Recognition.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

Listen, Decipher and Sign: Toward Unsupervised Speech-to-Sign Language Recognition.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

A Theory of Unsupervised Speech Recognition.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022
Autosegmental Neural Nets 2.0: An Extensive Study of Training Synchronous and Asynchronous Phones and Tones for Under-Resourced Tonal Languages.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

Seamless equal accuracy ratio for inclusive CTC speech recognition.
Speech Commun., 2022

Domain Generalization for Language-Independent Automatic Speech Recognition.
Frontiers Artif. Intell., 2022

Discovering phonetic inventories with crosslingual automatic speech recognition.
Comput. Speech Lang., 2022

Dual-path Attention is All You Need for Audio-Visual Speech Extraction.
CoRR, 2022

Improving Self-Supervised Speech Representations by Disentangling Speakers.
CoRR, 2022

Visualizations of Complex Sequences of Family-Infant Vocalizations Using Bag-of-Audio-Words Approach Based on Wav2vec 2.0 Features.
CoRR, 2022

SpeechSplit 2.0: Unsupervised speech disentanglement for voice conversion Without tuning autoencoder Bottlenecks.
CoRR, 2022

Syn2Vec: Synset Colexification Graphs for Lexical Semantic Similarity.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Cross-lingual articulatory feature information transfer for speech recognition using recurrent progressive neural networks.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Frame-Level Stutter Detection.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

WavPrompt: Towards Few-Shot Spoken Language Understanding with Frozen Language Models.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers.
Proceedings of the International Conference on Machine Learning, 2022

Forget-free Continual Learning with Winning Subnetworks.
Proceedings of the International Conference on Machine Learning, 2022

Detection of Covid-19 from Joint Time and Frequency Analysis of Speech, Breathing and Cough Audio.
Proceedings of the IEEE International Conference on Acoustics, 2022

SpeechSplit2.0: Unsupervised Speech Disentanglement for Voice Conversion without Tuning Autoencoder Bottlenecks.
Proceedings of the IEEE International Conference on Acoustics, 2022

SMSMix: Sense-Maintained Sentence Mixup for Word Sense Disambiguation.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

Estimation of Respiratory Rate from Breathing Audio.
Proceedings of the 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society, 2022

Equivariance Discovery by Learned Parameter-Sharing.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2022

Self-supervised Semantic-driven Phoneme Discovery for Zero-resource Speech Recognition.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

Fast and Efficient MMD-Based Fair PCA via Optimization over Stiefel Manifold.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
Synthesizing Spoken Descriptions of Images.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Counterfactually Fair Automatic Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Auxiliary Networks for Joint Speaker Adaptation and Speaker Change Detection.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Analysis of acoustic and voice quality features for the classification of infant and mother vocalizations.
Speech Commun., 2021

Fast and Efficient MMD-based Fair PCA via Optimization over Stiefel Manifold.
CoRR, 2021

Global Rhythm Style Transfer Without Text Transcriptions.
CoRR, 2021

Worldly Wise (WoW) - Cross-Lingual Knowledge Fusion for Fact-based Visual Spoken-Question Answering.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Classification of COVID-19 from Cough Using Autoregressive Predictive Coding Pretraining and Spectral Data Augmentation.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Zero-Shot Cross-Lingual Phonetic Recognition with External Language Embedding.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Global Prosody Style Transfer Without Text Transcriptions.
Proceedings of the 38th International Conference on Machine Learning, 2021

Interpretable Visual Reasoning via Induced Symbolic Space.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Multi-Decoder Dprnn: Source Separation for Variable Number of Speakers.
Proceedings of the IEEE International Conference on Acoustics, 2021

A Comparison Study on Infant-Parent Voice Diarization.
Proceedings of the IEEE International Conference on Acoustics, 2021

Align or attend? Toward More Efficient and Accurate Spoken Word Discovery Using Speech-to-Image Retrieval.
Proceedings of the IEEE International Conference on Acoustics, 2021

Show and Speak: Directly Synthesize Spoken Description of Images.
Proceedings of the IEEE International Conference on Acoustics, 2021

Continuous Cnn For Nonuniform Time Series.
Proceedings of the IEEE International Conference on Acoustics, 2021

Synthesis of New Words for Improved Dysarthric Speech Recognition on an Expanded Vocabulary.
Proceedings of the IEEE International Conference on Acoustics, 2021

How Phonotactics Affect Multilingual and Zero-Shot ASR Performance.
Proceedings of the IEEE International Conference on Acoustics, 2021

A Translation Framework for Visually Grounded Spoken Unit Discovery.
Proceedings of the 55th Asilomar Conference on Signals, Systems, and Computers, 2021

2020
Multimodal Word Discovery and Retrieval With Spoken Descriptions and Visual Concepts.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Speech Technology for Unwritten Languages.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Seeing is Knowing! Fact-based Visual Question Answering using Knowledge Graph Embeddings.
CoRR, 2020

Multi-Decoder DPRNN: High Accuracy Source Counting and Separation.
CoRR, 2020

Utterance-level Intent Recognition from Keywords.
CoRR, 2020

Automatic Estimation of Inteligibility Measure for Consonants in Speech.
CoRR, 2020

Grapheme-to-Phoneme Transduction for Cross-Language ASR.
Proceedings of the Statistical Language and Speech Processing, 2020

Identify Speakers in Cocktail Parties with End-to-End Attention.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

That Sounds Familiar: An Analysis of Phonetic Representations Transfer Across Languages.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

A DNN-HMM-DNN Hybrid Model for Discovering Word-Like Units from Spoken Captions and Image Regions.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Deep F-Measure Maximization for End-to-End Speech Understanding.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Autosegmental Neural Nets: Should Phones and Tones be Synchronous or Asynchronous?
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Evaluating Automatically Generated Phoneme Captions for Images.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Automatic Estimation of Intelligibility Measure for Consonants in Speech.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Unsupervised Speech Decomposition via Triple Information Bottleneck.
Proceedings of the 37th International Conference on Machine Learning, 2020

Training Spoken Language Understanding Systems with Non-Parallel Speech and Text.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

F0-Consistent Many-To-Many Non-Parallel Voice Conversion Via Conditional Autoencoder.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Context-Aware Automatic Text Simplification of Health Materials in Low-Resource Domains.
Proceedings of the 11th International Workshop on Health Text Mining and Information Analysis, 2020

2019
Fast transcription of speech in low-resource languages.
CoRR, 2019

The role of cue enhancement and frequency fine-tuning in hearing impaired phone recognition.
CoRR, 2019

Zero-Shot Voice Style Transfer with Only Autoencoder Loss.
CoRR, 2019

The Time-Course of Phoneme Category Adaptation in Deep Neural Networks.
Proceedings of the Statistical Language and Speech Processing, 2019

Position Paper: Brain Signal-Based Dialogue Systems.
Proceedings of the Increasing Naturalness and Flexibility in Spoken Dialogue Interaction, 2019

Multimodal Word Discovery and Retrieval with Phone Sequence and Image Concepts.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

The Neural Correlates Underlying Lexically-Guided Perceptual Learning.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Learning Speaker Aware Offsets for Speaker Adaptation of Neural Networks.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Study of the Performance of Automatic Speech Recognition Systems in Speakers with Parkinson's Disease.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss.
Proceedings of the 36th International Conference on Machine Learning, 2019

Pre-training of Speaker Embeddings for Low-latency Speaker Change Detection in Broadcast News.
Proceedings of the IEEE International Conference on Acoustics, 2019

Dimensional Analysis of Laughter in Female Conversational Speech.
Proceedings of the IEEE International Conference on Acoustics, 2019

When CTC Training Meets Acoustic Landmarks.
Proceedings of the IEEE International Conference on Acoustics, 2019

Monaural Singing Voice Separation Using Fusion-Net with Time-Frequency Masking.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

2018
Multitask Learning for Phone Recognition of Underresourced Languages Using Mismatched Transcription.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

Building an ASR System for Mboshi Using A Cross-Language Definition of Acoustic Units Approach.
Proceedings of the 6th Intl. Workshop on Spoken Language Technologies for Under-Resourced Languages, 2018

Infant Emotional Outbursts Detection in Infant-parent Spoken Interactions.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Visualizing Phoneme Category Adaptation in Deep Neural Networks.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Speaker Adaptive Audio-Visual Fusion for the Open-Vocabulary Section of AVICAR.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Improved ASR for Under-resourced Languages through Multi-task Learning with Acoustic Landmarks.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Improving DNNs Trained with Non-Native Transcriptions Using Knowledge Distillation and Target Interpolation.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Topic and Keyword Identification for Low-resourced Speech Using Cross-Language Transfer Learning.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Image Restoration with Deep Generative Models.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Joint Modeling of Accents and Acoustics for Multi-Accent Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Linguistic Unit Discovery from Multi-Modal Inputs in Unwritten Languages: Summary of the "Speaking Rosetta" JSALT 2017 Workshop.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Deep Learning Based Speech Beamforming.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Bayesian Models for Unit Discovery on a Very Low Resource Language.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Time-Frequency Networks for Audio Super-Resolution.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Recognizing Zero-Resourced Languages Based on Mismatched Machine Transcriptions.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Using Conversational Agents to Explain Medication Instructions to Older Adults.
Proceedings of the AMIA 2018, 2018

2017
ASR for Under-Resourced Languages From Probabilistic Transcription.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

A multidisciplinary approach to designing and evaluating Electronic Medical Record portal messages that support patient self-care.
J. Biomed. Informatics, 2017

Acoustic Landmarks Contain More Information About the Phone String than Other Frames.
CoRR, 2017

Streaming Recommender Systems.
Proceedings of the 26th International Conference on World Wide Web, 2017

Dilated Recurrent Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Glottal Model Based Speech Beamforming for ad-hoc Microphone Arrays.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Speech Enhancement Using Bayesian Wavenet.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Team ELISA System for DARPA LORELEI Speech Evaluation 2016.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Using Approximated Auditory Roughness as a Pre-Filtering Feature for Human Screaming and Affective Speech AED.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Multi-Task Learning Using Mismatched Transcription for Under-Resourced Speech Recognition.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Deep Auto-Encoder Based Multi-Task Learning Using Probabilistic Transcriptions.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Mismatched Crowdsourcing from Multiple Annotator Languages for Recognizing Zero-Resourced Languages: A Nullspace Clustering Approach.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Fast Generation for Convolutional Autoregressive Models.
Proceedings of the 5th International Conference on Learning Representations, 2017

Discovering dimensions of perceived vocal expression in semi-structured, unscripted oral history accounts.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Low-resource grapheme-to-phoneme conversion using recurrent neural networks.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Semantic Image Inpainting with Deep Generative Models.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Low-resource spoken keyword search strategies in georgian inspired by distinctive feature theory.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017



Mismatched crowdsourcing: Mining latent skills to acquire speech transcriptions.
Proceedings of the 51st Asilomar Conference on Signals, Systems, and Computers, 2017

2016
Speech Production in Speech Technologies: Introduction to the CSL Special Issue.
Comput. Speech Lang., 2016

Semantic Image Inpainting with Perceptual and Contextual Losses.
CoRR, 2016

Fast Wavenet Generation Algorithm.
CoRR, 2016

Landmark-based consonant voicing detection on multilingual corpora.
CoRR, 2016

Performance Improvements of Probabilistic Transcript-adapted ASR with Recurrent Neural Network and Language-specific Constraints.
CoRR, 2016

Clustering-based Phonetic Projection in Mismatched Crowdsourcing Channels for Low-resourced ASR.
Proceedings of the 6th Workshop on South and Southeast Asian Natural Language Processing, 2016

Use of particle filtering and MCMC for inference in Probabilistic Acoustic Tube model.
Proceedings of the IEEE Statistical Signal Processing Workshop, 2016

Performance Improvement of Probabilistic Transcriptions with Language-specific Constraints.
Proceedings of the SLTU-2016, 2016

Mismatched Crowdsourcing based Language Perception for Under-resourced Languages.
Proceedings of the SLTU-2016, 2016

A many-to-one phone mapping approach for cross-lingual speech recognition.
Proceedings of the 2016 IEEE RIVF International Conference on Computing & Communication Technologies, 2016

Positive-Unlabeled Learning in Streaming Networks.
Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016

Language coverage for mismatched crowdsourcing.
Proceedings of the 2016 Information Theory and Applications Workshop, 2016

Analysis of Mismatched Transcriptions Generated by Humans and Machines for Under-Resourced Languages.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Automatic Speech Recognition Using Probabilistic Transcriptions in Swahili, Amharic, and Dinka.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

An Investigation on Training Deep Neural Networks Using Probabilistic Transcriptions.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Stable and symmetric filter convolutional neural network.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Landmark of Mandarin nasal codas and its application in pronunciation error detection.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Adapting ASR for under-resourced languages using mismatched transcriptions.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Speech recognition of under-resourced languages using mismatched transcriptions.
Proceedings of the 2016 International Conference on Asian Language Processing, 2016

2015
Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation.
IEEE ACM Trans. Audio Speech Lang. Process., 2015

Incorporating AM-FM effect in voiced speech for probabilistic acoustic tube model.
Proceedings of the 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2015

Classtranscribe: a new tool with new educational opportunities for student crowdsourced college lecture transcription.
Proceedings of the ISCA International Workshop on Speech and Language Technology in Education, 2015

Acoustic correlates for perceived effort levels in expressive speech.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Improved hindi broadcast ASR by adapting the language model and pronunciation model using a priori syntactic and morphophonemic knowledge.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Transcribing continuous speech using mismatched crowdsourcing.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Cross-lingual transfer learning during supervised training in low resource scenarios.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Multichannel transient acoustic signal classification using task-driven dictionary with joint sparsity and beamforming.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Acquiring Speech Transcriptions Using Mismatched Crowdsourcing.
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015

2014
Mixed stereo audio classification using a stereo-input mixed-to-panned level feature.
IEEE ACM Trans. Audio Speech Lang. Process., 2014

Automatic detection of auditory salience with optimized linear filters derived from human annotation.
Pattern Recognit. Lett., 2014

Automatic Long Audio Alignment and Confidence Scoring for Conversational Arabic Speech.
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

Development of a TV Broadcasts Speech Recognition System for Qatari Arabic.
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

Singing-Voice Separation from Monaural Recordings using Deep Recurrent Neural Networks.
Proceedings of the 15th International Society for Music Information Retrieval Conference, 2014

Detecting articulatory compensation in acoustic data through linear regression modeling.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

An iterative approach to decision tree training for context dependent speech synthesis.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Foreground object detection in highly dynamic scenes using saliency.
Proceedings of the 2014 IEEE International Conference on Image Processing, 2014

Improvement of Probabilistic Acoustic Tube model for speech decomposition.
Proceedings of the IEEE International Conference on Acoustics, 2014

Deep learning for monaural speech separation.
Proceedings of the IEEE International Conference on Acoustics, 2014

Active Planning, Sensing, and Recognition Using a Resource-Constrained Discriminant POMDP.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014

A PAC-Bayesian Approach to Minimum Perplexity Language Modeling.
Proceedings of the COLING 2014, 2014

2013
Saliency-maximized audio visualization and efficient audio-visual browsing for faster-than-real-time human acoustic event detection.
ACM Trans. Appl. Percept., 2013

Acoustic model adaptation using in-domain background models for dysarthric speech recognition.
Comput. Speech Lang., 2013

Accurate speech segmentation by mimicking human auditory processing.
Proceedings of the IEEE International Conference on Acoustics, 2013

Random features for Kernel Deep Convex Network.
Proceedings of the IEEE International Conference on Acoustics, 2013

Sparse hidden Markov models for purer clusters.
Proceedings of the IEEE International Conference on Acoustics, 2013

2012
On Improving Dynamic State Space Approaches to Articulatory Inversion With MAP-Based Parameter Estimation.
IEEE Trans. Speech Audio Process., 2012

Detecting interaction links in a collaborating group using manually annotated data.
Soc. Networks, 2012

Partially Supervised Speaker Clustering.
IEEE Trans. Pattern Anal. Mach. Intell., 2012

On the Applicability of Speaker Diarization to Audio Indexing of Non-Speech and Mixed Non-Speech/Speech Video Soundtracks.
Int. J. Multim. Data Eng. Manag., 2012

Opportunistic sensing: Unattended acoustic sensor selection using crowdsourcing models.
Proceedings of the IEEE International Workshop on Machine Learning for Signal Processing, 2012

F0 and the Perception of Prominence.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Pooling Robust Shift-Invariant Sparse Representations of Acoustic Signals.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Improving faster-than-real-time human acoustic event detection by saliency-maximized audio visualization.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

How to put it into words - using random forests to extract symbol level descriptions from audio content for concept detection.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Singing-voice separation from monaural recordings using robust principal component analysis.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Detection of Acoustic-Phonetic Landmarks in Mismatched Conditions using a Biomimetic Model of Human Auditory Processing.
Proceedings of the COLING 2012, 2012

2011
Efficient Object Localization with Variation-Normalized Gaussianized Vectors.
Proceedings of the Intelligent Video Event Analysis and Understanding, 2011

Estimation of Articulatory Trajectories Based on Gaussian Mixture Model (GMM) With Audio-Visual Information Fusion and Dynamic Kalman Smoothing.
IEEE Trans. Speech Audio Process., 2011

Intelligibility predictors and neural representation of speech.
Speech Commun., 2011

Open-loop multi-channel inversion of room impulse response
CoRR, 2011

Unlabeled data and other marginals.
Proceedings of the 2011 Symposium on Machine Learning in Speech and Language Processing, 2011

Optimal Models of Prosodic Prominence Using the Bayesian Information Criterion.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Improving acoustic event detection using generalizable visual features and multi-modality modeling.
Proceedings of the IEEE International Conference on Acoustics, 2011

Multi-sensory features for personnel detection at border crossings.
Proceedings of the 14th International Conference on Information Fusion, 2011

2010
A Novel Vector Representation of Stochastic Signals Based on Adapted Ergodic HMMs.
IEEE Signal Process. Lett., 2010

Real-world acoustic event detection.
Pattern Recognit. Lett., 2010

Novel Gaussianized vector representation for improved natural scene categorization.
Pattern Recognit. Lett., 2010

State-Transition Interpolation and MAP Adaptation for HMM-based Dysarthric Speech Recognition.
Proceedings of the Workshop on Speech and Language Processing for Assistive Technologies, 2010

A minimum converted trajectory error (MCTE) approach to high quality speech-to-lips conversion.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Landmark-based automated pronunciation error detection.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

A procedure for estimating gestural scores from natural speech.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Kinematic analysis of tongue movement control in spastic dysarthria.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Robust automatic speech recognition with decoder oriented ideal binary mask estimation.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Semi-supervised training of Gaussian mixture models by conditional entropy minimization.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

FSM-based pronunciation modeling using articulatory phonological code.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Non-frontal view facial expression recognition based on ergodic hidden Markov model supervectors.
Proceedings of the 2010 IEEE International Conference on Multimedia and Expo, 2010

Toward robust learning of the Gaussian mixture state emission densities for hidden Markov models.
Proceedings of the IEEE International Conference on Acoustics, 2010

Joint estimation of DOA and speech based on EM beamforming.
Proceedings of the IEEE International Conference on Acoustics, 2010

2009
Sensitive Talking Heads [Applications Corner].
IEEE Signal Process. Mag., 2009

Articulatory phonological code for word classification.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Automated pronunciation scoring using confidence scoring and landmark-based SVM.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Universal access: speech recognition for talkers with spastic dysarthria.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Formant trajectories for acoustic-to-articulatory inversion.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Prosodic effects on vowel production: evidence from formant structure.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Emotion recognition from speech VIA boosted Gaussian mixture models.
Proceedings of the 2009 IEEE International Conference on Multimedia and Expo, 2009

Acoustic fall detection using Gaussian mixture models and GMM supervectors.
Proceedings of the IEEE International Conference on Acoustics, 2009

Kernel metric learning for phonetic classification.
Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009

2008
Humanoid Audio-Visual Avatar With Emotive Text-to-Speech Synthesis.
IEEE Trans. Multim., 2008

Brain anatomy differences in childhood stuttering.
NeuroImage, 2008

EAVA: A 3D Emotive Audio-Visual Avatar.
Proceedings of the 9th IEEE Workshop on Applications of Computer Vision (WACV 2008), 2008

SIFT-Bag kernel for video event analysis.
Proceedings of the 16th International Conference on Multimedia 2008, 2008

The entropy of the articulatory phonological code: recognizing gestures from tract variables.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Two-stage prosody prediction for emotional text-to-speech synthesis.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Human speech perception and feature extraction.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Dysarthric speech database for universal access research.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Maximum mutual information estimation with unlabeled data for phonetic classification.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Face age estimation using patch-based hidden Markov model supervectors.
Proceedings of the 19th International Conference on Pattern Recognition (ICPR 2008), 2008

A novel Gaussianized vector representation for natural scene categorization.
Proceedings of the 19th International Conference on Pattern Recognition (ICPR 2008), 2008

Real-time conversion from a single 2D face image to a 3D text-driven emotive audio-visual avatar.
Proceedings of the 2008 IEEE International Conference on Multimedia and Expo, 2008

Feature analysis and selection for acoustic event detection.
Proceedings of the IEEE International Conference on Acoustics, 2008

Optimal speech estimator considering room response as well as additive noise: Different approaches in low and high frequency range.
Proceedings of the IEEE International Conference on Acoustics, 2008

Regression from patch-kernel.
Proceedings of the 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2008), 2008

2007
Prosodic effects on acoustic cues to stop voicing and place of articulation: Evidence from Radio News speech.
J. Phonetics, 2007

A Multi-Stream Approach to Audiovisual Automatic Speech Recognition.
Proceedings of the IEEE 9th Workshop on Multimedia Signal Processing, 2007

Frequency domain correspondence for speaker normalization.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Robust Analysis and Weighting on MFCC Components for Speech Recognition and Speaker Identification.
Proceedings of the 2007 IEEE International Conference on Multimedia and Expo, 2007

Exploring Discriminative Learning for Text-Independent Speaker Recognition.
Proceedings of the 2007 IEEE International Conference on Multimedia and Expo, 2007

Lipreading by Locality Discriminant Graph.
Proceedings of the International Conference on Image Processing, 2007

Articulatory Feature-Based Methods for Acoustic and Audio-Visual Speech Recognition: Summary from the 2006 JHU Summer workshop.
Proceedings of the IEEE International Conference on Acoustics, 2007

HMM-Based Acoustic Event Detection with AdaBoost Feature Selection.
Proceedings of the Multimodal Technologies for Perception of Humans, 2007

Multichannel and Multimodality Person Identification.
Proceedings of the Multimodal Technologies for Perception of Humans, 2007

2006
Prosody dependent speech recognition on radio news corpus of American English.
IEEE Trans. Speech Audio Process., 2006

Cognitive state classification in a spoken tutorial dialogue system.
Speech Commun., 2006

Extraction of pragmatic and semantic salience from spontaneous spoken English.
Speech Commun., 2006

Novel entropy based moving average refiners for HMM landmarks.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Novel time domain multi-class SVMs for landmark detection.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Generalized Optimal Multi-Microphone Speech Enhancement Using Sequential Minimum Variance Distortionless Response(MVDR) Beamforming and Postfiltering.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Hmm-Based and Svm-Based Recognition of the Speech of Talkers With Spastic Dysarthria.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

2005
Simultaneous recognition of words and prosody in the Boston University Radio Speech Corpus.
Speech Commun., 2005

Distinctive feature based SVM discriminant features for improvements to phone recognition on telephone band speech.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Landmark-Based Speech Recognition: Report of the 2004 Johns Hopkins Summer Workshop.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

Prosodic parallelism as a cue to repetition and error correction disfluency.
Proceedings of the ISCA Tutorial and Research Workshop (ITRW) on Disfluency in Spontaneous Speech, 2005

2004
Model enforcement: a unified feature transformation framework for classification and recognition.
IEEE Trans. Signal Process., 2004

Automatic recognition of pitch movements using multilayer perceptron and time-Delay Recursive neural network.
IEEE Signal Process. Lett., 2004

Semantic analysis for a speech user interface in an intelligent tutoring system.
Proceedings of the 9th International Conference on Intelligent User Interfaces, 2004

Stop consonant classification by dynamic formant trajectory.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Intertranscriber reliability of prosodic labeling on telephone conversation using toBI.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

AVICAR: audio-visual speech corpus in a car environment.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Children's emotion recognition in an intelligent tutoring scenario.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Automatic detection of contrast for speech understanding.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

A factorial HMM aproach to robust isolated digit recognition in background music.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Source separation using particle filters.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Modeling pronunciation variation using artificial neural networks for English spontaneous speech.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Modeling and recognition of phonetic and prosodic factors for improvements to acoustic speech recognition models.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Formant tracking by mixture state particle filter.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

A factorial HMM approach to simultaneous recognition of isolated digits spoken by multiple talkers on one audio channel.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

An automatic prosody labeling system using ANN-based syntactic-prosodic model and GMM-based acoustic-prosodic model.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

2003
Approximately independent factors of speech using nonlinear symplectic transformation.
IEEE Trans. Speech Audio Process., 2003

Non-linear maximum likelihood feature transformation for speech recognition.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Maximum conditional mutual information projection for speech recognition.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Prosody dependent speech recognition with explicit duration modelling at intonational phrase boundaries.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Acoustic segmentation using switching state Kalman filter.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

2002
An evaluation of using mutual information for selection of acoustic-features representation of phonemes for speech recognition.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Maximum mutual information based acoustic-features representation of phonological features for speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2002

Auditory-modeling inspired methods of feature extraction for robust automatic speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2002

2001
PLP coefficients can be quantized at 400 bps.
Proceedings of the IEEE International Conference on Acoustics, 2001

2000
Signal approximation in Hilbert space and its application on articulatory speech synthesis.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Time-frequency distribution of partial phonetic information measured using mutual information.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Multivariate-state hidden Markov models for simultaneous transcription of phones and formants.
Proceedings of the IEEE International Conference on Acoustics, 2000

1996
Formant and burst spectral measurements with quantitative error models for speech sound classification.
PhD thesis, 1996


  Loading...