Jim Glass

Orcid: 0000-0002-0148-0224

Affiliations:
  • Massachusetts Institute of Technology (MIT), CSAIL, Cambridge, MA, USA


According to our database1, Jim Glass authored at least 409 papers between 1985 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Awards

IEEE Fellow

IEEE Fellow 2014, "For contributions to probabilistic segment-based speech recognition and spoken dialogue interfaces".

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Curiosity-driven Red-teaming for Large Language Models.
CoRR, 2024

Revisiting Self-supervised Learning of Speech Representation from a Mutual Information Perspective.
CoRR, 2024

Joint Inference of Retrieval and Generation for Passage Re-ranking.
Proceedings of the Findings of the Association for Computational Linguistics: EACL 2024, 2024

2023
R-Spin: Efficient Speaker and Noise-invariant Representation Learning with Acoustic Pieces.
CoRR, 2023

Self-Specialization: Uncovering Latent Expertise within Large Language Models.
CoRR, 2023

Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning.
CoRR, 2023

DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models.
CoRR, 2023

Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong General Audio Event Taggers.
CoRR, 2023

Improved Cross-Lingual Transfer Learning For Automatic Speech Translation.
CoRR, 2023

SAIL: Search-Augmented Instruction Learning.
CoRR, 2023

Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages.
CoRR, 2023

Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clustering.
CoRR, 2023

Listen, Think, and Understand.
CoRR, 2023

Interpretable Unified Language Checking.
CoRR, 2023

What, when, and where? - Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions.
CoRR, 2023

PCFG-Based Natural Language Interface Improves Generalization for Controlled Text Generation.
Proceedings of the The 12th Joint Conference on Lexical and Computational Semantics, 2023

Revealing the Blind Spot of Sentence Encoder Evaluation by HEROS.
Proceedings of the 8th Workshop on Representation Learning for NLP, 2023

DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Flexible Boundary Design for a Chattanooga Microgrid Powered by Landfill Solar Photovoltaic and Battery Storage.
Proceedings of the IEEE International Smart Cities Conference, 2023

Contrastive Audio-Visual Masked Autoencoder.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval.
Proceedings of the IEEE International Conference on Acoustics, 2023

On Unsupervised Uncertainty-Driven Speech Pseudo-Label Filtering and Model Calibration.
Proceedings of the IEEE International Conference on Acoustics, 2023

Search Augmented Instruction Learning.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Logic Against Bias: Textual Entailment Mitigates Stereotypical Sentence Reasoning.
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023

Audio-Visual Neural Syntax Acquisition.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Joint Audio and Speech Understanding.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

On the Blind Spots of Model-Based Evaluation Metrics for Text Generation.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Entailment as Robust Self-Learner.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Expand, Rerank, and Retrieve: Query Reranking for Open-Domain Question Answering.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022
UAVM: Towards Unifying Audio and Visual Models.
IEEE Signal Process. Lett., 2022

Autoregressive Predictive Coding: A Comprehensive Study.
IEEE J. Sel. Top. Signal Process., 2022

SAMU-XLSR: Semantically-Aligned Multimodal Utterance-Level Cross-Lingual Speech Representation.
IEEE J. Sel. Top. Signal Process., 2022

UAVM: A Unified Model for Audio-Visual Learning.
CoRR, 2022

Developing a Series of AI Challenges for the United States Department of the Air Force.
CoRR, 2022

CMKD: CNN/Transformer-Based Cross-Model Knowledge Distillation for Audio Classification.
CoRR, 2022

A Framework for Coordinated Self-Assembly of Networked Microgrids Using Consensus Algorithms.
IEEE Access, 2022

Cooperative Self-training of Machine Reading Comprehension.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

DiffCSE: Difference-based Contrastive Learning for Sentence Embeddings.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Speak: A Toolkit Using Amazon Mechanical Turk to Collect and Validate Speech Audio Recordings.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

Simple and Effective Unsupervised Speech Synthesis.
Proceedings of the Interspeech 2022, 2022

On the Interplay between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2022

Magic Dust for Cross-Lingual Adaptation of Monolingual Wav2vec-2.0.
Proceedings of the IEEE International Conference on Acoustics, 2022

Repetition Assessment for Speech and Language Disorders: A Study of the Logopenic Variant of Primary Progressive Aphasia.
Proceedings of the IEEE International Conference on Acoustics, 2022

Vocalsound: A Dataset for Improving Human Vocal Sounds Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022

Transformer-Based Multi-Aspect Multi-Granularity Non-Native English Speaker Pronunciation Assessment.
Proceedings of the IEEE International Conference on Acoustics, 2022

Detecting Dementia from Long Neuropsychological Interviews.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

Everything at Once - Multi-modal Fusion Transformer for Video Retrieval.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Cross-Modal Discrete Representation Learning.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

Controlling the Focus of Pretrained Language Generation Models.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

SSAST: Self-Supervised Audio Spectrogram Transformer.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
PSLA: Improving Audio Tagging With Pretraining, Sampling, Labeling, and Aggregation.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Routing with Self-Attention for Multimodal Capsule Networks.
CoRR, 2021

An Empirical Study on Few-shot Knowledge Probing for Pretrained Language Models.
CoRR, 2021

Mitigating Biases in Toxic Language Detection through Invariant Rationalization.
CoRR, 2021

Cooperative Learning of Zero-Shot Machine Reading Comprehension.
CoRR, 2021

PSLA: Improving Audio Event Classification with Pretraining, Sampling, Labeling, and Aggregation.
CoRR, 2021

A Smart and Flexible Microgrid With a Low-Cost Scalable Open-Source Controller.
IEEE Access, 2021

Interpretable Propaganda Detection in News Articles.
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), 2021

PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

AVLnet: Learning Audio-Visual Language Representations from Instructional Videos.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Cascaded Multilingual Audio-Visual Learning from Videos.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Joint Retrieval-Extraction Training for Evidence-Aware Dialog Response Selection.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Non-Autoregressive Predictive Coding for Learning Speech Representations from Local Dependencies.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

CLAC: A Speech Corpus of Healthy English Speakers.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

AST: Audio Spectrogram Transformer.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Semi-Supervised Spoken Language Understanding via Self-Supervised Speech and Language Model Pretraining.
Proceedings of the IEEE International Conference on Acoustics, 2021

Similarity Analysis of Self-Supervised Speech Representations.
Proceedings of the IEEE International Conference on Acoustics, 2021

Exposure Bias versus Self-Recovery: Are Distortions Really Incremental for Autoregressive Text Generation?
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Analyzing the Forgetting Problem in Pretrain-Finetuning of Open-domain Dialogue Response Models.
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021

Spoken Moments: Learning Joint Audio-Visual Representations From Video Descriptions.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Text-Free Image-to-Speech Synthesis Using Learned Segmental Units.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

2020
Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input.
Int. J. Comput. Vis., 2020

Text-Free Image-to-Speech Synthesis Using Learned Segmental Units.
CoRR, 2020

Constructing a Knowledge Graph from Unstructured Documents without External Alignment.
CoRR, 2020

AVLnet: Learning Audio-Visual Language Representations from Instructional Videos.
CoRR, 2020

CSTNet: Contrastive Speech Translation Network for Self-Supervised Speech Representation Learning.
CoRR, 2020

On the Linguistic Representational Power of Neural Machine Translation Models.
Comput. Linguistics, 2020

Multimodal Association for Speaker Verification.
Proceedings of the Interspeech 2020, 2020

Pair Expansion for Learning Multilingual Semantic Embeddings Using Disjoint Visually-Grounded Speech Audio Datasets.
Proceedings of the Interspeech 2020, 2020

Prototypical Q Networks for Automatic Conversational Diagnosis and Few-Shot New Disease Adaption.
Proceedings of the Interspeech 2020, 2020

A Convolutional Deep Markov Model for Unsupervised Speech Representation Learning.
Proceedings of the Interspeech 2020, 2020

Unsupervised Methods for Evaluating Speech Representations.
Proceedings of the Interspeech 2020, 2020

Vector-Quantized Autoregressive Predictive Coding.
Proceedings of the Interspeech 2020, 2020

What Does an End-to-End Dialect Identification Model Learn About Non-Dialectal Information?
Proceedings of the Interspeech 2020, 2020

A Systematic Characterization of Sampling Algorithms for Open-ended Language Generation.
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, 2020

Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech.
Proceedings of the 8th International Conference on Learning Representations, 2020

ADI17: A Fine-Grained Arabic Dialect Identification Dataset.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Trilingual Semantic Embeddings of Visually Grounded Speech with Self-Attention Mechanisms.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Audio-Visual Calibration with Polynomial Regression for 2-D Projection Using SVD-PHAT.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Learning a Subword Inventory Jointly with End-to-End Automatic Speech Recognition.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Generative Pre-Training for Speech with Autoregressive Predictive Coding.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

We Can Detect Your Bias: Predicting the Political Ideology of News Articles.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Similarity Analysis of Contextual Word Representation Models.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Negative Training for Neural Dialogue Response Generation.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Improved Speech Representations with Multi-Target Autoregressive Predictive Coding.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

What Was Written vs. Who Read It: News Media Profiling Using Text Analysis and Social Media Context.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Knowledge Grounded Conversational Symptom Detection with Graph Memory Networks.
Proceedings of the 3rd Clinical Natural Language Processing Workshop, 2020

2019
Time-Contrastive Learning Based Deep Bottleneck Features for Text-Dependent Speaker Verification.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

Deep Learning for Database Mapping and Asking Clarification Questions in Dialogue Systems.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

Analysis Methods in Neural Language Processing: A Survey.
Trans. Assoc. Comput. Linguistics, 2019

Automatic Fact-Checking Using Context and Discourse Information.
ACM J. Data Inf. Qual., 2019

Language processing and learning models for community question answering in Arabic.
Inf. Process. Manag., 2019

Mix-review: Alleviate Forgetting in the Pretrain-Finetune Framework for Neural Language Generation Models.
CoRR, 2019

DARTS: Dialectal Arabic Transcription System.
CoRR, 2019

Towards Transfer Learning for End-to-End Speech Synthesis from Deep Pre-Trained Language Models.
CoRR, 2019

Quantifying Exposure Bias for Neural Language Generation.
CoRR, 2019

Adversarial Domain Adaptation for Stance Detection.
CoRR, 2019

Team QCRI-MIT at SemEval-2019 Task 4: Propaganda Analysis Meets Hyperpartisan News Detection.
Proceedings of the 13th International Workshop on Semantic Evaluation, 2019

FAKTA: An Automatic End-to-End Fact Checking System.
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

Multi-Task Ordinal Regression for Jointly Predicting the Trustworthiness and the Leading Political Ideology of News Media.
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

Fast and Robust 3-D Sound Source Localization with DSVD-PHAT.
Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2019

VoiceID Loss: Speech Enhancement for Speaker Verification.
Proceedings of the Interspeech 2019, 2019

MCE 2018: The 1st Multi-Target Speaker Detection and Identification Challenge Evaluation.
Proceedings of the Interspeech 2019, 2019

Integrating Video Retrieval and Moment Detection in a Unified Corpus for Video Question Answering.
Proceedings of the Interspeech 2019, 2019

A Comparison of Deep Learning Methods for Language Understanding.
Proceedings of the Interspeech 2019, 2019

Transfer Learning from Audio-Visual Grounding to Speech Recognition.
Proceedings of the Interspeech 2019, 2019

Multiple Sound Source Localization with SVD-PHAT.
Proceedings of the Interspeech 2019, 2019

A Deep Residual Network for Large-Scale Acoustic Scene Analysis.
Proceedings of the Interspeech 2019, 2019

An Unsupervised Autoregressive Model for Speech Representation Learning.
Proceedings of the Interspeech 2019, 2019

Analyzing Phonetic and Graphemic Representations in End-to-End Automatic Speech Recognition.
Proceedings of the Interspeech 2019, 2019

Towards Bilingual Lexicon Discovery From Visually Grounded Speech Audio.
Proceedings of the Interspeech 2019, 2019

Detecting Egregious Responses in Neural Sequence-to-sequence Models.
Proceedings of the 7th International Conference on Learning Representations, 2019

Identifying and Controlling Important Neurons in Neural Machine Translation.
Proceedings of the 7th International Conference on Learning Representations, 2019

Noise-tolerant Audio-visual Online Person Verification Using an Attention-based Neural Network Fusion.
Proceedings of the IEEE International Conference on Acoustics, 2019

Domain Attentive Fusion for End-to-end Dialect Identification with Unknown Target Domain.
Proceedings of the IEEE International Conference on Acoustics, 2019

Dialogue State Tracking with Convolutional Semantic Taggers.
Proceedings of the IEEE International Conference on Acoustics, 2019

A Factorial Deep Markov Model for Unsupervised Disentangled Representation Learning from Speech.
Proceedings of the IEEE International Conference on Acoustics, 2019

Disentangling Correlated Speaker and Noise for Speech Synthesis via Data Augmentation and Adversarial Factorization.
Proceedings of the IEEE International Conference on Acoustics, 2019

Towards Visually Grounded Sub-word Speech Unit Discovery.
Proceedings of the IEEE International Conference on Acoustics, 2019

SVD-PHAT: A Fast Sound Source Localization Method.
Proceedings of the IEEE International Conference on Acoustics, 2019

Subword Regularization and Beam Search Decoding for End-to-end Automatic Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019

Towards Unsupervised Speech-to-text Translation.
Proceedings of the IEEE International Conference on Acoustics, 2019

Tanbih: Get To Know What You Are Reading.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

Contrastive Language Adaptation for Cross-Lingual Stance Detection.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

Sound Event Localization and Detection Using CRNN on Pairs of Microphones.
Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events 2019 (DCASE 2019), 2019

Learning Words by Drawing Images.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Grounding Spoken Words in Unlabeled Video.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

Explicit Alignment of Text and Speech Encodings for Attention-Based End-to-End Speech Recognition.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

The MGB-5 Challenge: Recognition and Dialect Identification of Dialectal Arabic Speech.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Improving Neural Language Models by Segmenting, Attending, and Predicting the Future.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

NeuroX: A Toolkit for Analyzing Individual Neurons in Neural Networks.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

What Is One Grain of Sand in the Desert? Analyzing Individual Neurons in Deep NLP Models.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
A Low-Power Speech Recognizer and Voice Activity Detector Using Deep Neural Networks.
IEEE J. Solid State Circuits, 2018

A Study of the Complexity and Accuracy of Direction of Arrival Estimation Methods Based on GCC-PHAT for a Pair of Close Microphones.
CoRR, 2018

On The Inductive Bias of Words in Acoustics-to-Word Models.
CoRR, 2018

MCE 2018: The 1st Multi-target Speaker Detection and Identification Challenge Evaluation (MCE) Plan, Dataset and Baseline System.
CoRR, 2018

Disentangling by Partitioning: A Representation Learning Framework for Multimodal Sensory Data.
CoRR, 2018

Convolutional Neural Networks and Language Embeddings for End-to-End Dialect Recognition.
CoRR, 2018

Language Identification and Morphosyntactic Tagging: The Second VarDial Evaluation Campaign.
Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects, 2018

On Training Recurrent Networks with Truncated Backpropagation Through time in Speech Recognition.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Frame-Level Speaker Embeddings for Text-Independent Speaker Recognition and Analysis of End-to-End Model.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Unsupervised Representation Learning of Speech for Dialect Identification.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Convolutional Neural Networks for Dialogue State Tracking without Pre-Trained Word Vectors or Semantic Dictionaries.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Combining End-to-End and Adversarial Training for Low-Resource Speech Recognition.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Convolutional Neural Network and Language Embeddings for End-to-End Dialect Recognition.
Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018

Unsupervised Cross-Modal Alignment of Speech and Text Embedding Spaces.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

On the Evaluation of Semantic Phenomena in Neural Machine Translation Using Natural Language Inference.
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018

Automatic Stance Detection Using End-to-End Memory Networks.
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018

Role-specific Language Models for Processing Recorded Neuropsychological Exams.
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018

Supervised and Unsupervised Transfer Learning for Question Answering.
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018

Integrating Stance Detection and Fact Checking in a Unified Corpus.
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018

A Study of Enhancement, Augmentation and Autoencoder Methods for Domain Adaptation in Distant Speech Recognition.
Proceedings of the Interspeech 2018, 2018

Unsupervised Adaptation with Interpretable Disentangled Representations for Distant Conversational Speech Recognition.
Proceedings of the Interspeech 2018, 2018

Scalable Factorized Hierarchical Variational Autoencoder Training.
Proceedings of the Interspeech 2018, 2018

Detecting Depression with Audio/Text Sequence Modeling of Interviews.
Proceedings of the Interspeech 2018, 2018

Speech2Vec: A Sequence-to-Sequence Framework for Learning Word Embeddings from Speech.
Proceedings of the Interspeech 2018, 2018

A Noise-Robust Self-Adaptive Multitarget Speaker Detection System.
Proceedings of the 24th International Conference on Pattern Recognition, 2018

Exploiting Convolutional Neural Networks for Phonotactic Based Dialect Identification.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Convolutional Neural Networks and Multitask Strategies for Semantic Mapping of Natural Language Input to a Structured Database.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Energy-Efficient Speaker Identification with Low-Precision Networks.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Extracting Domain Invariant Features by Unsupervised Learning for Robust Automatic Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Vision as an Interlingua: Learning Multilingual Semantic Embeddings of Untranscribed Speech.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Learning Word Representations with Cross-Sentence Dependencyfor End-to-End Co-reference Resolution.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

Predicting Factuality of Reporting and Bias of News Media Sources.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

Fact Checking in Community Forums.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017
Spoken Language Understanding for a Nutrition Dialogue System.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

Learning Word Embeddings from Speech.
CoRR, 2017

Bidirectional Backpropagation: Towards Biologically Plausible Error Signal Transmission in Neural Networks.
CoRR, 2017

Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

14.4 A scalable speech recognizer with deep-neural-network acoustic models and voice-activated power gating.
Proceedings of the 2017 IEEE International Solid-State Circuits Conference, 2017

Character-Based Embedding Models and Reranking Strategies for Understanding Natural Language Meal Descriptions.
Proceedings of the Interspeech 2017, 2017

QMDIS: QCRI-MIT Advanced Dialect Identification System.
Proceedings of the Interspeech 2017, 2017

Learning Latent Representations for Speech Generation and Transformation.
Proceedings of the Interspeech 2017, 2017

An Environmental Feature Representation for Robust Speech Recognition and for Environment Identification.
Proceedings of the Interspeech 2017, 2017

Evaluating Layers of Representation in Neural Machine Translation on Part-of-Speech and Semantic Tagging Tasks.
Proceedings of the Eighth International Joint Conference on Natural Language Processing, 2017

Semantic mapping of natural language input to database entries via convolutional neural networks.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

MIT-QCRI Arabic dialect identification system for the 2017 multi-genre broadcast challenge.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Automatic speech recognition of Arabic multi-genre broadcast media.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Learning modality-invariant representations for speech and images.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Unsupervised domain adaptation for robust speech recognition via variational autoencoder-based data augmentation.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Spoken language biomarkers for detecting cognitive impairment.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Learning Word-Like Units from Joint Audio-Visual Analysis.
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017

What do Neural Machine Translation Models Learn about Morphology?
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017

2016
On the Use of Acoustic Unit Discovery for Language Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

Recurrent Neural Network Encoder with Attention for Community Question Answering.
CoRR, 2016

Large-Scale Machine Translation between Arabic and Hebrew: Available Corpora and Initial Results.
CoRR, 2016

A Character-level Convolutional Neural Network for Distinguishing Similar Languages and Dialects.
Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects, 2016

Look, listen, and decode: Multimodal speech recognition with images.
Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

A prioritized grid long short-term memory RNN for speech recognition.
Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

Development of the MIT ASR system for the 2016 Arabic Multi-genre Broadcast Challenge.
Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

The MGB-2 challenge: Arabic multi-dialect broadcast media recognition.
Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

SemEval-2016 Task 3: Community Question Answering.
Proceedings of the 10th International Workshop on Semantic Evaluation, 2016

SLS at SemEval-2016 Task 3: Neural-based Approaches for Ranking in Community Question Answering.
Proceedings of the 10th International Workshop on Semantic Evaluation, 2016

Learning Semantic Relatedness in Community Question Answering Using Neural Models.
Proceedings of the 1st Workshop on Representation Learning for NLP, 2016

Unsupervised Learning of Spoken Language with Visual Context.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Memory-Efficient Modeling and Search Techniques for Hardware ASR Decoders.
Proceedings of the Interspeech 2016, 2016

Exploiting Depth and Highway Connections in Convolutional Recurrent Deep Neural Networks for Speech Recognition.
Proceedings of the Interspeech 2016, 2016

Automatic Dialect Detection in Arabic Broadcast Speech.
Proceedings of the Interspeech 2016, 2016

Highway long short-term memory RNNS for distant speech recognition.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Prediction-adaptation-correction recurrent neural networks for low-resource language speech recognition.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Personalized mispronunciation detection and diagnosis based on unsupervised error pattern discovery.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Distributional semantics for understanding spoken meal descriptions.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Multilingual data selection for training stacked bottleneck features.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Neural Attention for Learning to Rank Questions in Community Question Answering.
Proceedings of the COLING 2016, 2016

2015
Spoken Content Retrieval - Beyond Cascading Speech Recognition with Text Retrieval.
IEEE ACM Trans. Audio Speech Lang. Process., 2015

Unsupervised Lexicon Discovery from Acoustic Input.
Trans. Assoc. Comput. Linguistics, 2015

A 6 mW, 5, 000-Word Real-Time Speech Recognizer Using WFST Models.
IEEE J. Solid State Circuits, 2015

A Situationally Aware Voice-commandable Robotic Forklift Working Alongside People in Unstructured Outdoor Environments.
J. Field Robotics, 2015

SemEval-2015 Task 3: Answer Selection in Community Question Answering.
Proceedings of the 9th International Workshop on Semantic Evaluation, 2015

VectorSLU: A Continuous Word Vector Approach to Answer Selection in Community Question Answering Systems.
Proceedings of the 9th International Workshop on Semantic Evaluation, 2015

A Vector Space Approach for Aspect Based Sentiment Analysis.
Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing, 2015

Mispronunciation detection without nonnative training data.
Proceedings of the INTERSPEECH 2015, 2015

Speaker adaptation using the i-vector technique for bottleneck features.
Proceedings of the INTERSPEECH 2015, 2015

On using heterogeneous data for vehicle-based speech recognition: A DNN-based approach.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Arabic Diacritization with Recurrent Neural Networks.
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015

Wait-Learning: Leveraging Wait Time for Second Language Education.
Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, 2015

Deep multimodal semantic embeddings for speech and images.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014
Non-Negative Factor Analysis of Gaussian Mixture Model Weight Adaptation for Language and Dialect Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2014

Data collection and language understanding of food descriptions.
Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

A complete KALDI recipe for building Arabic speech recognition systems.
Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

27.2 A 6mW 5K-Word real-time speech recognizer using WFST models.
Proceedings of the 2014 IEEE International Conference on Solid-State Circuits Conference, 2014

Limited labels for unlimited data: active learning for speaker recognition.
Proceedings of the INTERSPEECH 2014, 2014

Graph-based re-ranking using acoustic feature similarity between search results for spoken term detection on low-resource languages.
Proceedings of the INTERSPEECH 2014, 2014

Context-dependent pronunciation error pattern discovery with limited annotations.
Proceedings of the INTERSPEECH 2014, 2014

Speech recognition without a lexicon - bridging the gap between graphemic and phonetic systems.
Proceedings of the INTERSPEECH 2014, 2014

Lexical modeling for Arabic ASR: a systematic approach.
Proceedings of the INTERSPEECH 2014, 2014

Language ID-based training of multilingual stacked bottleneck features.
Proceedings of the INTERSPEECH 2014, 2014

Recent advances in ASR applied to an Arabic transcription system for Al-Jazeera.
Proceedings of the INTERSPEECH 2014, 2014

Extracting deep neural network bottleneck features using low-rank matrix factorization.
Proceedings of the IEEE International Conference on Acoustics, 2014

Speech feature denoising and dereverberation via deep autoencoders for noisy reverberant speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2014

A Study of using Syntactic and Semantic Structures for Concept Segmentation and Labeling.
Proceedings of the COLING 2014, 2014

One-shot learning of generative speech concepts.
Proceedings of the 36th Annual Meeting of the Cognitive Science Society, 2014

Wait-learning: leveraging conversational dead time for second language education.
Proceedings of the CHI Conference on Human Factors in Computing Systems, 2014

2013
Unsupervised Methods for Speaker Diarization: An Integrated and Iterative Approach.
IEEE Trans. Speech Audio Process., 2013

Learning Lexicons From Speech Using a Pronunciation Mixture Model.
IEEE Trans. Speech Audio Process., 2013

Pronunciation assessment via a comparison-based system.
Proceedings of the ISCA International Workshop on Speech and Language Technology in Education, 2013

Probabilistic Dialogue Modeling for Speech-Enabled Assistive Technology.
Proceedings of the Fourth Workshop on Speech and Language Processing for Assistive Technologies, 2013

Bayesian distance metric learning on i-vector for speaker verification.
Proceedings of the INTERSPEECH 2013, 2013

Asgard: A portable architecture for multilingual dialogue systems.
Proceedings of the IEEE International Conference on Acoustics, 2013

Mispronunciation detection via dynamic time warping on deep belief network-based posteriorgrams.
Proceedings of the IEEE International Conference on Acoustics, 2013

Zero resource spoken audio corpus analysis.
Proceedings of the IEEE International Conference on Acoustics, 2013

Joint Learning of Phonetic Units and Word Pronunciations for ASR.
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 2013

Query understanding enhanced by hierarchical parsing structures.
Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

2012
A comparison-based approach to mispronunciation detection.
Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012

Towards unsupervised speech processing.
Proceedings of the 11th International Conference on Information Science, 2012

On the Use of Spectral and Iterative Methods for Speaker Diarization.
Proceedings of the INTERSPEECH 2012, 2012

Automating Crowd-supervised Learning for Spoken Language Systems.
Proceedings of the INTERSPEECH 2012, 2012

A Conversational Movie Search System Based on Conditional Random Fields.
Proceedings of the INTERSPEECH 2012, 2012

Sentence Detection Using Multiple Annotations.
Proceedings of the INTERSPEECH 2012, 2012

Resource configurable spoken query detection using Deep Boltzmann Machines.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Fast spoken query detection using lower-bound Dynamic Time Warping on Graphical Processing Units.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Handling uncertain observations in unsupervised topic-mixture language model adaptation.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Evaluation of multi-level context-dependent acoustic model for large vocabulary speaker adaptation tasks.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

A Nonparametric Bayesian Approach to Acoustic Model Discovery.
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, July 8-14, 2012, Jeju Island, Korea, 2012

2011
A Piecewise Aggregate Approximation Lower-Bound Estimate for Posteriorgram-Based Dynamic Time Warping.
Proceedings of the INTERSPEECH 2011, 2011

Exploiting Intra-Conversation Variability for Speaker Diarization.
Proceedings of the INTERSPEECH 2011, 2011

Growing a Spoken Language Interface on Amazon Mechanical Turk.
Proceedings of the INTERSPEECH 2011, 2011

An Efferent-Inspired Auditory Model Front-End for Speech Recognition.
Proceedings of the INTERSPEECH 2011, 2011

A Transcription Task for Crowdsourcing with Automatic Quality Control.
Proceedings of the INTERSPEECH 2011, 2011

Robust Voice Activity Detector for Real World Applications Using Harmonicity and Modulation Frequency.
Proceedings of the INTERSPEECH 2011, 2011

Pronunciation Learning from Continuous Speech.
Proceedings of the INTERSPEECH 2011, 2011

An inner-product lower-bound estimate for dynamic time warping.
Proceedings of the IEEE International Conference on Acoustics, 2011

A channel-blind system for speaker verification.
Proceedings of the IEEE International Conference on Acoustics, 2011

Multi-level context-dependent acoustic modeling for automatic speech recognition.
Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

2010
Introduction to the Issue on Speech Processing for Natural Interaction With Intelligent Environments.
IEEE J. Sel. Top. Signal Process., 2010

Combining missing-feature theory, speech enhancement, and speaker-dependent/-independent modeling for speech separation.
Comput. Speech Lang., 2010

A collective data generation method for speech language models.
Proceedings of the 2010 IEEE Spoken Language Technology Workshop, 2010

Spoken command of large mobile robots in outdoor environments.
Proceedings of the 2010 IEEE Spoken Language Technology Workshop, 2010

Unsupervised Speaker Adaptation based on the Cosine Similarity for Text-Independent Speaker Verification.
Proceedings of the Odyssey 2010: The Speaker and Language Recognition Workshop, Brno, Czech Republic, June 28, 2010

Cosine Similarity Scoring without Score Normalization Techniques.
Proceedings of the Odyssey 2010: The Speaker and Language Recognition Workshop, Brno, Czech Republic, June 28, 2010

Collecting Voices from the Cloud.
Proceedings of the International Conference on Language Resources and Evaluation, 2010

Learning new word pronunciations from spoken examples.
Proceedings of the INTERSPEECH 2010, 2010

A voice-commandable robotic forklift working alongside humans in minimally-prepared outdoor environments.
Proceedings of the IEEE International Conference on Robotics and Automation, 2010

Towards multi-speaker unsupervised speech pattern discovery.
Proceedings of the IEEE International Conference on Acoustics, 2010

Multimodal interaction with an autonomous forklift.
Proceedings of the 5th ACM/IEEE International Conference on Human Robot Interaction, 2010

2009
Updated MINDS report on speech recognition and understanding, Part 2 [DSP Education].
IEEE Signal Process. Mag., 2009

Developments and directions in speech recognition and understanding, Part 1 [DSP Education].
IEEE Signal Process. Mag., 2009

Multistream Articulatory Feature-Based Models for Visual Speech Recognition.
IEEE Trans. Pattern Anal. Mach. Intell., 2009

A back-off discriminative acoustic model for automatic speech recognition.
Proceedings of the INTERSPEECH 2009, 2009

Speech rhythm guided syllable nuclei detection.
Proceedings of the IEEE International Conference on Acoustics, 2009

On the phonetic information in ultrasonic microphone signals.
Proceedings of the IEEE International Conference on Acoustics, 2009

Language model parameter estimation using user transcriptions.
Proceedings of the IEEE International Conference on Acoustics, 2009

Discriminative training of hierarchical acoustic models for large vocabulary continuous speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2009

Syntactic Phrase Reordering for English-to-Arabic Statistical Machine Translation.
Proceedings of the EACL 2009, 12th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference, Athens, Greece, March 30, 2009

City browser: developing a conversational automotive HMI.
Proceedings of the 27th International Conference on Human Factors in Computing Systems, 2009

Unsupervised spoken keyword spotting via segmental DTW on Gaussian posteriorgrams.
Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009

2008
Unsupervised Pattern Discovery in Speech.
IEEE Trans. Speech Audio Process., 2008

Iterative language model estimation: efficient data structure & algorithms.
Proceedings of the INTERSPEECH 2008, 2008

A turbo-style algorithm for lexical baseforms estimation.
Proceedings of the IEEE International Conference on Acoustics, 2008

N-gram Weighting: Reducing Training Data Mismatch in Cross-Domain Language Model Estimation.
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, 2008

Segmentation for English-to-Arabic Statistical Machine Translation.
Proceedings of the ACL 2008, 2008

2007
Robust Speaker Recognition in Noisy Conditions.
IEEE Trans. Speech Audio Process., 2007

An Implementation of Rational Wavelets and Filter Design for Phonetic Classification.
IEEE Trans. Speech Audio Process., 2007

Multimodal speech recognition with ultrasonic sensors.
Proceedings of the INTERSPEECH 2007, 2007

Recent progress in the MIT spoken lecture processing project.
Proceedings of the INTERSPEECH 2007, 2007

New word acquisition using subword modeling.
Proceedings of the INTERSPEECH 2007, 2007

Noise Robust Phonetic Classificationwith Linear Regularized Least Squares and Second-Order Features.
Proceedings of the IEEE International Conference on Acoustics, 2007

Open-Vocabulary Spoken Utterance Retrieval using Confusion Networks.
Proceedings of the IEEE International Conference on Acoustics, 2007

Speech recognition with localized time-frequency pattern detectors.
Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007

Automatic lexical pronunciations generation and update.
Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007

Hierarchical large-margin Gaussian mixture models for phonetic classification.
Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007

Making Sense of Sound: Unsupervised Topic Segmentation over Acoustic Input.
Proceedings of the ACL 2007, 2007

2006
A Novel DTW-Based Distance Measure for speaker Segmentation.
Proceedings of the 2006 IEEE ACL Spoken Language Technology Workshop, 2006

A Comparative Study of Methods for Handheld Speaker Verification in Realistic Noisy Conditions.
Proceedings of the Odyssey 2006, 2006

Spoken Correction for Chinese Text Entry.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006

Unsupervised Word Acquisition from Speech using Pattern Discovery.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Speaker Verification Over Handheld Devices with Realistic Noisy Speech Data.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Flexible Multi-Stream Framework for Speech Recognition using Multi-Tape Finite-State Transducers.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Style & Topic Language Model Adaptation Using HMM-LDA.
Proceedings of the EMNLP 2006, 2006

2005
The MIT Spoken Lecture Processing Project.
Proceedings of the HLT/EMNLP 2005, 2005

Robust detection of sonorant landmarks.
Proceedings of the INTERSPEECH 2005, 2005

Morphing spectral envelopes using audio flow.
Proceedings of the INTERSPEECH 2005, 2005

Visual Speech Recognition with Loosely Synchronized Feature Streams.
Proceedings of the 10th IEEE International Conference on Computer Vision (ICCV 2005), 2005

Production domain modeling of pronunciation for visual speech recognition.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

Automatic Processing of Audio Lectures for Information Retrieval: Vocabulary Selection and Language Modeling.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

A Wavelet and Filter Bank Framework For Phonetic Classification.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

2004
Feature-based Pronunciation Modeling for Speech Recognition.
Proceedings of HLT-NAACL 2004: Short Papers, Boston, Massachusetts, USA, May 2-7, 2004, 2004

Feature-based pronunciation modeling with trainable asynchrony probabilities.
Proceedings of the INTERSPEECH 2004, 2004

Articulatory features for robust visual speech recognition.
Proceedings of the 6th International Conference on Multimodal Interfaces, 2004

A segment-based audio-visual speech recognizer: data collection, development, and initial experiments.
Proceedings of the 6th International Conference on Multimodal Interfaces, 2004

A Framework for Developing Conversational User Interfaces.
Proceedings of the Computer-Aided Design of User Interfaces IV, 2004

2003
A probabilistic framework for segment-based speech recognition.
Comput. Speech Lang., 2003

Hidden feature models for speech recognition using dynamic Bayesian networks.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

2002
Information-theoretic criteria for unit selection synthesis.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

A multi-class approach for modelling out-of-vocabulary words.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

2001
Mokusei: a telephone-based Japanese conversational system in the weather domain.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

Segment-based recognition on the phonebook task: initial results and observations on duration modeling.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

Speechbuilder: facilitating spoken dialogue system development.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

Learning units for domain-independent out-of- vocabulary word modelling.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

2000
JUPlTER: a telephone-based conversational interface for weather information.
IEEE Trans. Speech Audio Process., 2000

Guest editorial introduction to the special issue on language modeling and dialogue systems.
IEEE Trans. Speech Audio Process., 2000

Conversational interfaces: advances and challenges.
Proc. IEEE, 2000

A flexible, scalable finite-state transducer architecture for corpus-based concatenative speech synthesis.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Data collection and performance evaluation of spoken dialogue systems: the MIT experience.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Modeling out-of-vocabulary words for robust speech recognition.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Lexical modeling of non-native speech for automatic speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2000

Heterogeneous lexical units for automatic speech recognition: preliminary investigations.
Proceedings of the IEEE International Conference on Acoustics, 2000

1999
Real-time telephone-based speech recognition in the Jupiter domain.
Proceedings of the 1999 IEEE International Conference on Acoustics, 1999

1998
Evaluation methodology for a telephone-based conversational system.
Proceedings of the First International Conference on Language Resources and Evaluation, 1998

Natural-sounding speech synthesis using variable-length units.
Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998

Confidence scoring for speech understanding systems.
Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998

Real-time probabilistic segmentation for segment-based speech recognition.
Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998

Heterogeneous measurements and multiple classifiers for speech recognition.
Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998

Telephone-based conversational speech recognition in the JUPITER domain.
Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998

1997
From interface to content: translingual access and delivery of on-line information.
Proceedings of the Fifth European Conference on Speech Communication and Technology, 1997

YINHE: a Mandarin Chinese version of the GALAXY system.
Proceedings of the Fifth European Conference on Speech Communication and Technology, 1997

MUSE: a scripting language for the development of interactive speech analysis and recognition tools.
Proceedings of the Fifth European Conference on Speech Communication and Technology, 1997

A comparison of novel techniques for instantaneous speaker adaptation.
Proceedings of the Fifth European Conference on Speech Communication and Technology, 1997

Heterogeneous acoustic measurements for phonetic classification 1.
Proceedings of the Fifth European Conference on Speech Communication and Technology, 1997

Segmentation and modeling in segment-based recognition.
Proceedings of the Fifth European Conference on Speech Communication and Technology, 1997

1996
Multilingual human-computer interactions: from information access to language learning.
Proceedings of the 4th International Conference on Spoken Language Processing, 1996

WHEELS: a conversational system in the automobile classifieds domain.
Proceedings of the 4th International Conference on Spoken Language Processing, 1996

Telephone data collection using the world wide web.
Proceedings of the 4th International Conference on Spoken Language Processing, 1996

A probabilistic framework for feature-based speech recognition.
Proceedings of the 4th International Conference on Spoken Language Processing, 1996

1995
Multilingual spoken-language understanding in the MIT Voyager system.
Speech Commun., 1995

1994
PEGASUS: A spoken dialogue interface for on-line air travel planning.
Speech Communication, 1994

PEGASUS: A Spoken Language Interface for On-Line Air Travel Planning I.
Proceedings of the Human Language Technology, 1994

Empirical acquisition of language models for speech recognition.
Proceedings of the 3rd International Conference on Spoken Language Processing, 1994

Statistical trajectory models for phonetic recognition.
Proceedings of the 3rd International Conference on Spoken Language Processing, 1994

GALAXY: a human-language interface to on-line travel information.
Proceedings of the 3rd International Conference on Spoken Language Processing, 1994

Multilingual language generation across multiple domains.
Proceedings of the 3rd International Conference on Spoken Language Processing, 1994

Porting the bilingual voyager system to Italian.
Proceedings of the 3rd International Conference on Spoken Language Processing, 1994

1993
Empirical acquisition of word and phrase classes in the atis domain.
Proceedings of the Third European Conference on Speech Communication and Technology, 1993

A* word network search for continuous speech recognition.
Proceedings of the Third European Conference on Speech Communication and Technology, 1993

Modelling spectral dynamics for vowel classification.
Proceedings of the Third European Conference on Speech Communication and Technology, 1993

A bilingual Voyager system.
Proceedings of the Third European Conference on Speech Communication and Technology, 1993

A comparative study of signal representations and classification techniques for speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 1993

1992
T]he MIT ATIS System: February 1992 Progress Report.
Proceedings of the Speech and Natural Language: Proceedings of a Workshop Held at Harriman, 1992

Collection and Analyses of WSJ-CSR Data at MIT.
Proceedings of the Speech and Natural Language: Proceedings of a Workshop Held at Harriman, 1992

Collection and analyses of WSJ-CSR corpus at MIT.
Proceedings of the Second International Conference on Spoken Language Processing, 1992

Vowel classification based on analysis-by-synthesis.
Proceedings of the Second International Conference on Spoken Language Processing, 1992

1991
Spoken language systems for human/machine interfaces.
Proceedings of the Computer-Assisted Information Retrieval (Recherche d'Information et ses Applications) - RIAO 1991, 3rd International Conference, Universitad Autonoma de Barcelona, Spain, April 2, 1991

Development and Preliminary Evaluation of the MIT ATIS System.
Proceedings of the Speech and Natural Language, 1991

Modelling Context Dependency in Acoustic-Phonetic and Lexical Representations.
Proceedings of the Speech and Natural Language, 1991

The MIT ATIS system; preliminary development, spontaneous speech data collection, and performance evaluation.
Proceedings of the Second European Conference on Speech Communication and Technology, 1991

Automatic learning of lexical representations for sub-word unit based speech recognition systems.
Proceedings of the Second European Conference on Speech Communication and Technology, 1991

Integration of speech recognition and natural language processing in the MIT VOYAGER system.
Proceedings of the 1991 International Conference on Acoustics, 1991

1990
Speech database development at MIT: Timit and beyond.
Speech Commun., 1990

From Speech Recognition to Spoken Language Understanding.
Proceedings of the Advances in Neural Information Processing Systems 3, 1990

Phonetic Classification and Recognition Using the Multi-Layer Perceptron.
Proceedings of the Advances in Neural Information Processing Systems 3, 1990

Recent Progress on the SUMMIT System.
Proceedings of the Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, 1990

Preliminary ATIS Development at MIT.
Proceedings of the Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, 1990

Recent Progress on the VOYAGER System.
Proceedings of the Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, 1990

Recent progress on the MIT VOYAGER spoken language system.
Proceedings of the First International Conference on Spoken Language Processing, 1990

Detection and classification of phonemes using context-independent error back-propagation.
Proceedings of the First International Conference on Spoken Language Processing, 1990

The SUMMIT speech recognition system: phonological modelling and lexical access.
Proceedings of the 1990 International Conference on Acoustics, 1990

The VOYAGER speech understanding system: preliminary development and evaluation.
Proceedings of the 1990 International Conference on Acoustics, 1990

1989
The MIT Summit Speech Recognition System: a Progress Report.
Proceedings of the Speech and Natural Language: Proceedings of a Workshop Held at Philadelphia, 1989

Preliminary Evaluation of the Voyager Spoken Language System.
Proceedings of the Speech and Natural Language: Proceedings of a Workshop Held at Cape Cod, 1989

The Voyager Speech Understanding System: A Progress Report.
Proceedings of the Speech and Natural Language: Proceedings of a Workshop Held at Cape Cod, 1989

The Collection and Preliminary Analysis of a Spontaneous Speech Database.
Proceedings of the Speech and Natural Language: Proceedings of a Workshop Held at Cape Cod, 1989

Acoustic segmentation and phonetic classification in the SUMMIT system.
Proceedings of the IEEE International Conference on Acoustics, 1989

1988
Finding acoustic regularities in speech: applications to phonetic recognition.
PhD thesis, 1988

Multi-level acoustic segmentation of continuous speech.
Proceedings of the IEEE International Conference on Acoustics, 1988

1986
Detection and recognition of nasal consonants in American English.
Proceedings of the IEEE International Conference on Acoustics, 1986

1985
Detection of nasalized vowels in American English.
Proceedings of the IEEE International Conference on Acoustics, 1985


  Loading...