Gerasimos Potamianos

CoRR, March, 2026

2025

Monocular 3D Hand Pose Estimation with Implicit Camera Alignment.

[BibT_eX]

[DOI]

Christos P. Antonopoulos

CoRR, June, 2025

Unsupervised Transcript-assisted Video Summarization and Highlight Detection.

[BibT_eX]

[DOI]

Spyros Barbakos

Charalampos Antoniadis

Gianluca Setti

CoRR, May, 2025

Greek sign language recognition for an education platform.

[BibT_eX]

[DOI]

Univers. Access Inf. Soc., March, 2025

A Multi-Stream Framework Utilizing 3D Human Reconstruction for Cued Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Controllable Single-Shot Animation Blending with Temporal Conditioning.

[BibT_eX]

[DOI]

Eleni Tselepi

Proceedings of the IEEE/CVF International Conference on Computer Vision, ICCV 2025, 2025

Seeing in 2D, Thinking in 3D: 3D Hand Mesh-Guided Feature Learning for Continuous Fingerspelling.

[BibT_eX]

[DOI]

Panayiotis Paraskevas Filntisis

George Retsinas

Proceedings of the IEEE/CVF International Conference on Computer Vision, ICCV 2025, 2025

Resource-Efficient and Noise-Robust Modality Fusion for Audio-Visual Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Conformer-Based Multi-Modal Learning for Cued Speech Recognition from Videos.

[BibT_eX]

[DOI]

Proceedings of the 33rd European Signal Processing Conference, 2025

2024

A large corpus for the recognition of Greek Sign Language gestures.

[BibT_eX]

[DOI]

Galini Sapountzaki

Kyriaki Vasilaki

Comput. Vis. Image Underst., 2024

Multimodal Continuous Fingerspelling Recognition via Visual Alignment Learning.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

2023

Multimodal Locally Enhanced Transformer for Continuous Sign Language Recognition.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

SL-REDU GSL: A Large Greek Sign Language Recognition Corpus.

[BibT_eX]

[DOI]

Galini Sapountzaki

Kyriaki Vasilaki

Proceedings of the IEEE International Conference on Acoustics, 2023

Sign Language Recognition via Deformable 3D Convolutions and Modulated Graph Convolutional Networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

2022

ChildBot: Multi-robot perception and interaction with children.

[BibT_eX]

[DOI]

Robotics Auton. Syst., 2022

Spatio-Temporal Graph Convolutional Networks for Continuous Sign Language Recognition.

[BibT_eX]

[DOI]

Maria Parelli

Georgios Pavlakos

Proceedings of the IEEE International Conference on Acoustics, 2022

Accurate and Resource-Efficient Lipreading with Efficientnetv2 and Transformers.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Joint Object Affordance Reasoning and Segmentation in RGB-D Videos.

[BibT_eX]

[DOI]

IEEE Access, 2021

A robotic edutainment framework for designing child-robot interaction scenarios.

[BibT_eX]

[DOI]

Proceedings of the PETRA '21: The 14th PErvasive Technologies Related to Assistive Environments Conference, Virtual Event, Greece, 29 June, 2021

Multimodal Fusion and Sequence Learning for Cued Speech Recognition from Videos.

[BibT_eX]

[DOI]

Proceedings of the Universal Access in Human-Computer Interaction. Access to Media, Learning and Assistive Environments, 2021

The SL-ReDu Environment for Self-monitoring and Objective Learner Assessment in Greek Sign Language.

[BibT_eX]

[DOI]

Proceedings of the Universal Access in Human-Computer Interaction. Access to Media, Learning and Assistive Environments, 2021

Resource-efficient TDNN Architectures for Audio-visual Speech Recognition.

[BibT_eX]

[DOI]

Samuel Thomas

Edmilson da Silva Morais

Proceedings of the 29th European Signal Processing Conference, 2021

Overlapped Sound Event Classification via Multi-Channel Sound Separation Network.

[BibT_eX]

[DOI]

Proceedings of the 29th European Signal Processing Conference, 2021

An Audiovisual Child Emotion Recognition System for Child-Robot Interaction Applications.

[BibT_eX]

[DOI]

Proceedings of the 29th European Signal Processing Conference, 2021

2020

Deep sensorimotor learning for RGB-D object recognition.

[BibT_eX]

[DOI]

Georgios Th. Papadopoulos

Comput. Vis. Image Underst., 2020

SL-ReDu: greek sign language recognition for educational applications. Project description and early results.

[BibT_eX]

[DOI]

Galini Sapountzaki

Proceedings of the PETRA '20: The 13th PErvasive Technologies Related to Assistive Environments Conference, Corfu, Greece, June 30, 2020

Multimodal Sign Language Recognition via Temporal Deformable Convolutional Sequence Learning.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Resource-Adaptive Deep Learning for Visual Speech Recognition.

[BibT_eX]

[DOI]

Samuel Thomas

Edmilson da Silva Morais

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

A Deep Learning Approach to Object Affordance Segmentation.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Audio-Assisted Image Inpainting for Talking Faces.

[BibT_eX]

[DOI]

Samuel Thomas

Edmilson da Silva Morais

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

A Fully Convolutional Sequence Learning Approach for Cued Speech Recognition from Videos.

[BibT_eX]

[DOI]

Proceedings of the 28th European Signal Processing Conference, 2020

Exploiting 3D Hand Pose Estimation in Deep Learning-Based Sign Language Recognition from RGB Videos.

[BibT_eX]

[DOI]

Maria Parelli

Georgios Pavlakos

Proceedings of the Computer Vision - ECCV 2020 Workshops, 2020

Emotion Understanding in Videos Through Body, Context, and Visual-Semantic Embedding Loss.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020 Workshops, 2020

2019

Fusing Body Posture With Facial Expressions for Joint Recognition of Affect in Child-Robot Interaction.

[BibT_eX]

[DOI]

IEEE Robotics Autom. Lett., 2019

Room-localized speech activity detection in multi-microphone smart homes.

[BibT_eX]

[DOI]

EURASIP J. Audio Speech Music. Process., 2019

End-to-End Convolutional Sequence Learning for ASL Fingerspelling Recognition.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

MobiLipNet: Resource-Efficient Deep Learning Based Lipreading.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Fingerspelled Alphabet Sign Recognition in Upper-Body Videos.

[BibT_eX]

[DOI]

Sotirios Panagiotis Chytas

Proceedings of the 27th European Signal Processing Conference, 2019

Hierarchical Detection of Sound Events and their Localization Using Convolutional Neural Networks with Adaptive Thresholds.

[BibT_eX]

[DOI]

Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events 2019 (DCASE 2019), 2019

2018

Deep View2View Mapping for View-Invariant Lipreading.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Object Assembly Guidance in Child-Robot Interaction using RGB-D based 3D Tracking.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2018

Multi3: Multi-Sensory Perception System for Multi-Modal Child Interaction with Multiple Robots.

[BibT_eX]

[DOI]

Petros Koutras

Proceedings of the 2018 IEEE International Conference on Robotics and Automation, 2018

Attention-Enhanced Sensorimotor Object Recognition.

[BibT_eX]

[DOI]

Georgios Th. Papadopoulos

Proceedings of the 2018 IEEE International Conference on Image Processing, 2018

Multi- View Fusion for Action Recognition in Child-Robot Interaction.

[BibT_eX]

[DOI]

Petros Koutras

Proceedings of the 2018 IEEE International Conference on Image Processing, 2018

Far-Field Audio-Visual Scene Perception of Multi-Party Human-Robot Interaction for Children and Adults.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

A Hybrid Approach to Hand Detection and Type Classification in Upper-Body Videos.

[BibT_eX]

[DOI]

Proceedings of the 7th European Workshop on Visual Information Processing, 2018

Multi-Channel Non-Negative Matrix Factorization for Overlapped Acoustic Event Detection.

[BibT_eX]

[DOI]

Proceedings of the 26th European Signal Processing Conference, 2018

2017

Room-localized spoken command recognition in multi-room, multi-microphone environments.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2017

On the Joint Use of NMF and Classification for Overlapping Acoustic Event Detection.

[BibT_eX]

[DOI]

Proceedings of the IWCIM 2017, 2017

Deep Affordance-Grounded Sensorimotor Object Recognition.

[BibT_eX]

[DOI]

Georgios Th. Papadopoulos

Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Exploring ROI size in deep learning based lipreading.

[BibT_eX]

[DOI]

Youssef Mroueh

Steven J. Rennie

Proceedings of the 14th International Conference on Auditory-Visual Speech Processing, 2017

Audio and visual modality combination in speech processing applications.

[BibT_eX]

[DOI]

Alexandros Koumbaroulis

Argyrios Vartholomaios

Proceedings of the Handbook of Multimodal-Multisensor Interfaces: Foundations, User Modeling, and Common Modality Combinations, 2017

2016

Audio-visual speech activity detection in a two-speaker scenario incorporating depth information from a profile or frontal view.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

Improved Dictionary Selection and Detection Schemes in Sparse-CNMF-Based Overlapping Acoustic Event Detection.

[BibT_eX]

[DOI]

Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events, 2016

2015

Detecting audio-visual synchrony using deep neural networks.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Multichannel speech enhancement using MEMS microphones.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Multi-room speech activity detection using a distributed microphone network in domestic environments.

[BibT_eX]

[DOI]

Alessio Brutti

Marco Matassoni

Alberto Abad

Miguel Matos

Proceedings of the 23rd European Signal Processing Conference, 2015

Scattering vs. discrete cosine transform features in visual speech processing.

[BibT_eX]

[DOI]

Proceedings of the Auditory-Visual Speech Processing, 2015

2014

ATHENA: a Greek multi-sensory database for home automation control uthor: isidoros rodomagoulakis (NTUA, Greece).

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Robust far-field spoken command recognition for home automation combining adaptation and multichannel processing.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

The Athena-RC system for speech activity detection and speaker localization in the DIRHA smart home.

[BibT_eX]

[DOI]

Proceedings of the 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays, 2014

Database and baseline system for detecting degraded traffic signs in urban environments.

[BibT_eX]

[DOI]

Georgios Floros

Konstantinos Kyritsis

Proceedings of the 5th European Workshop on Visual Information Processing, 2014

Experiments in acoustic source localization using sparse arrays in adverse indoors environments.

[BibT_eX]

[DOI]

Proceedings of the 22nd European Signal Processing Conference, 2014

Multi-microphone fusion for detection of speech and acoustic events in smart spaces.

[BibT_eX]

[DOI]

Proceedings of the 22nd European Signal Processing Conference, 2014

2013

Experiments on far-field multichannel speech processing in smart homes.

[BibT_eX]

[DOI]

Z.-I. Skordilis

Proceedings of the 18th International Conference on Digital Signal Processing, 2013

Robust Multi-Modal Speech Recognition in Two Languages Utilizing Video and Distance Information from the Kinect.

[BibT_eX]

[DOI]

Proceedings of the Human-Computer Interaction. Interaction Modalities and Techniques, 2013

Advances in Large Vocabulary Continuous Speech Recognition in Greek: Modeling and nonlinear features.

[BibT_eX]

[DOI]

Proceedings of the 21st European Signal Processing Conference, 2013

2012

Audio-visual speech recognition using depth information from the Kinect in noisy video conditions.

[BibT_eX]

[DOI]

Proceedings of the 5th International Conference on PErvasive Technologies Related to Assistive Environments, 2012

A hierarchical approach with feature selection for emotion recognition from speech.

[BibT_eX]

[DOI]

Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2012

Audio-visual speech recognition incorporating facial depth information captured by the Kinect.

[BibT_eX]

[DOI]

Proceedings of the 20th European Signal Processing Conference, 2012

2011

Special Section on Interactive Multimedia.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2011

Audio visual speech recognition in noisy visual environments.

[BibT_eX]

[DOI]

Alexandros Papangelis

Proceedings of the PETRA 2011, 2011

Bilingual corpus for AVASR using multiple sensors and depth information.

[BibT_eX]

[DOI]

Dimitrios I. Kosmopoulos

Christopher McMurrough

Proceedings of the Auditory-Visual Speech Processing, 2011

2010

Joint estimation of DOA and speech based on EM beamforming.

[BibT_eX]

[DOI]

Lae-Hoon Kim

Mark Hasegawa-Johnson

Aristodemos Pnevmatikakis

Vit Libal

Proceedings of the IEEE International Conference on Acoustics, 2010

Computers in the Human Interaction Loop.

[BibT_eX]

[DOI]

Proceedings of the Handbook of Ambient Intelligence and Smart Environments, 2010

2009

Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the Computers in the Human Interaction Loop, 2009

Person Tracking.

[BibT_eX]

[DOI]

Keni Bernardin

Rainer Stiefelhagen

Proceedings of the Computers in the Human Interaction Loop, 2009

Multimodal Classification of Activities of Daily Living Inside Smart Homes.

[BibT_eX]

[DOI]

Proceedings of the Distributed Computing, 2009

Robust audio-visual speech synchrony detection by generalized bimodal linear prediction.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Acoustic fall detection using Gaussian mixture models and GMM supervectors.

[BibT_eX]

[DOI]

Xiaodan Zhuang

Jing Huang

Mark Hasegawa-Johnson

Proceedings of the IEEE International Conference on Acoustics, 2009

Long-time span acoustic activity analysis from far-field sensors in smart homes.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2009

Audio-visual speech synchronization detection using a bimodal linear prediction model.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2009

Audio-visual automatic speech recognition and related bimodal speech technologies: A review of the state-of-the-art and open problems.

[BibT_eX]

[DOI]

Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009

2008

A multi-modal spoken dialog system for interactive TV.

[BibT_eX]

[DOI]

Proceedings of the 10th International Conference on Multimodal Interfaces, 2008

Patch-based analysis of visual speech from multiple views.

[BibT_eX]

[DOI]

Aristodemos Pnevmatikakis

Sridha Sridharan

Proceedings of the International Conference on Auditory-Visual Speech Processing 2008, 2008

2007

Joint face and head tracking inside multi-camera smart rooms.

[BibT_eX]

[DOI]

Signal Image Video Process., 2007

The CHIL audiovisual corpus for lecture and meeting analysis inside smart rooms.

[BibT_eX]

[DOI]

Lang. Resour. Evaluation, 2007

An Embedded System for In-Vehicle Visual Speech Activity Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE 9th Workshop on Multimedia Signal Processing, 2007

A unified approach to multi-pose audio-visual ASR.

[BibT_eX]

[DOI]

Sridha Sridharan

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Detection, diarization, and transcription of far-field lecture speech.

[BibT_eX]

[DOI]

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Dynamic Stream Weight Modeling for Audio-Visual Speech Recognition.

[BibT_eX]

[DOI]

Etienne Marcheret

Vit Libal

Proceedings of the IEEE International Conference on Acoustics, 2007

Kernel-Based 3D Tracking.

[BibT_eX]

[DOI]

Proceedings of the 2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2007), 2007

The IBM RT07 Evaluation Systems for Speaker Diarization on Lecture Meetings.

[BibT_eX]

[DOI]

Proceedings of the Multimodal Technologies for Perception of Humans, 2007

The IBM Rich Transcription 2007 Speech-to-Text Systems for Lecture Meetings.

[BibT_eX]

[DOI]

Proceedings of the Multimodal Technologies for Perception of Humans, 2007

An extended pose-invariant lipreading system.

[BibT_eX]

[DOI]

Sridha Sridharan

Proceedings of the Auditory-Visual Speech Processing 2007, 2007

2006

Lipreading Using Profile Versus Frontal Views.

[BibT_eX]

[DOI]

Proceedings of the IEEE 8th Workshop on Multimedia Signal Processing, 2006

The IBM RT06s Evaluation System for Speech Activity Detection in CHIL Seminars.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning for Multimodal Interaction, 2006

The IBM Rich Transcription Spring 2006 Speech-to-Text System for Lecture Meetings.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning for Multimodal Interaction, 2006

Audio-Visual ASR from Multiple Views inside Smart Rooms.

[BibT_eX]

[DOI]

Proceedings of the 2006 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, 2006

Person Tracking in Smart Rooms using Dynamic Programming and Adaptive Subspace Learning.

[BibT_eX]

[DOI]

Proceedings of the 2006 IEEE International Conference on Multimedia and Expo, 2006

Robust Multi-View Multi-Camera Face Detection inside Smart Rooms Using Spatio-Temporal Dynamic Programming.

[BibT_eX]

[DOI]

Proceedings of the Seventh IEEE International Conference on Automatic Face and Gesture Recognition (FGR 2006), 2006

A Joint System for Single-Person 2D-Face and 3D-Head Tracking in CHIL Seminars.

[BibT_eX]

[DOI]

ZhenQiu Zhang

Proceedings of the Multimodal Technologies for Perception of Humans, 2006

2005

Automatic Speech Recognition and Speech Activity Detection in the CHIL Smart Room.

[BibT_eX]

[DOI]

Stephen M. Chu

Etienne Marcheret

Proceedings of the Machine Learning for Multimodal Interaction, 2005

Speech activity detection fusing acoustic phonetic and energy features.

[BibT_eX]

[DOI]

Etienne Marcheret

Karthik Visweswariah

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Automatic Speech Activity Detection, Source Localization, and Speech Recognition on the Chil Seminar Corpus.

[BibT_eX]

[DOI]

Proceedings of the 2005 IEEE International Conference on Multimedia and Expo, 2005

Improved face finding in visually challenging environments.

[BibT_eX]

[DOI]

Jintao Jiang

Giridharan Iyengar

Proceedings of the 2005 IEEE International Conference on Multimedia and Expo, 2005

A Joint System for Person Tracking and Face Detection.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision in Human-Computer Interaction, 2005

Exploiting lower face symmetry in appearance-based automatic speechreading.

[BibT_eX]

[DOI]

Patricia Scanlon

Proceedings of the Auditory-Visual Speech Processing 2005, 2005

2004

Audio-visual speech recognition using an infrared headset.

[BibT_eX]

[DOI]

Speech Commun., 2004

Mutual information based visual feature selection for lipreading.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Efficient likelihood computation in multi-stream HMM based audio-visual speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Multistage information fusion for audio-visual speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2004 IEEE International Conference on Multimedia and Expo, 2004

Towards practical deployment of audio-visual speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

Improved face and feature finding for audio-visual speech recognition in visually challenging environments.

[BibT_eX]

[DOI]

Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

2003

Recent advances in the automatic recognition of audiovisual speech.

[BibT_eX]

[DOI]

Proc. IEEE, 2003

Audio-visual speech recognition in challenging environments.

[BibT_eX]

[DOI]

Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

A real-time prototype for small-vocabulary audio-visual ASR.

[BibT_eX]

[DOI]

Proceedings of the 2003 IEEE International Conference on Multimedia and Expo, 2003

Information fusion and decision cascading for audio-visual speaker recognition based on time-varying stream reliability prediction.

[BibT_eX]

[DOI]

Proceedings of the 2003 IEEE International Conference on Multimedia and Expo, 2003

Frame-dependent multi-stream reliability indicators for audio-visual speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

Audio-visual speaker recognition using time-varying stream reliability prediction.

[BibT_eX]

[DOI]

Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

Joint audio-visual speech processing for recognition and enhancement.

[BibT_eX]

[DOI]

Sabine Deligne

Proceedings of the AVSP 2003, 2003

Improving audio-visual speech recognition with an infrared headset.

[BibT_eX]

[DOI]

Jing Huang

Proceedings of the AVSP 2003, 2003

2002

Editorial.

[BibT_eX]

[DOI]

Juergen Luettin

Eric Vatikiotis-Bateson

EURASIP J. Adv. Signal Process., 2002

Audio-visual speech enhancement with AVCDCN (audio-visual codebook dependent cepstral normalization).

[BibT_eX]

[DOI]

Sabine Deligne

Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Maximum entropy and MCE based HMM stream weight estimation for audio-visual ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2002

Noisy audio feature enhancement using audio-visual speech data.

[BibT_eX]

[DOI]

Roland Goecke

Proceedings of the IEEE International Conference on Acoustics, 2002

2001

A Cascade Visual Front End for Speaker Independent Automatic Speechreading.

[BibT_eX]

[DOI]

Int. J. Speech Technol., 2001

Large-vocabulary audio-visual speech recognition: a summary of the Johns Hopkins Summer 2000 Workshop.

[BibT_eX]

[DOI]

Proceedings of the Fourth IEEE Workshop on Multimedia Signal Processing, 2001

Robust detection of visual ROI for automatic speechreading.

[BibT_eX]

[DOI]

Proceedings of the Fourth IEEE Workshop on Multimedia Signal Processing, 2001

Large-vocabulary audio-visual speech recognition by machines and humans.

[BibT_eX]

[DOI]

Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

A Comparison Of Model And Transform-Based Visual Features For Audio-Visual LVCSR.

[BibT_eX]

[DOI]

Proceedings of the 2001 IEEE International Conference on Multimedia and Expo, 2001

Improved ROI and within frame discriminant features for lipreading.

[BibT_eX]

[DOI]

Proceedings of the 2001 International Conference on Image Processing, 2001

Hierarchical discriminant features for audio-visual LVCSR.

[BibT_eX]

[DOI]

Juergen Luettin

Proceedings of the IEEE International Conference on Acoustics, 2001

Asynchronous stream modeling for large vocabulary audio-visual speech recognition.

[BibT_eX]

[DOI]

Juergen Luettin

Proceedings of the IEEE International Conference on Acoustics, 2001

Weighting schemes for audio-visual fusion in speech recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2001

Automatic speechreading of impaired speech.

[BibT_eX]

[DOI]

Proceedings of the Auditory-Visual Speech Processing, 2001

2000

Stream confidence estimation for audio-visual speech recognition.

[BibT_eX]

[DOI]

Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Perceptual interfaces for information interaction: joint processing of audio and visual information for human-computer interaction.

[BibT_eX]

[DOI]

Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

A Cascade Image Transform for Speaker Independent Automatic Speech Reading.

[BibT_eX]

[DOI]

Proceedings of the 2000 IEEE International Conference on Multimedia and Expo, 2000

Audio-Visual Unit Selection for the Synthesis of Photo-Realistic Talking-Heads.

[BibT_eX]

[DOI]

Eric Cosatto

Proceedings of the 2000 IEEE International Conference on Multimedia and Expo, 2000

1999

Speaker adaptation for audio-visual speech recognition.

[BibT_eX]

[DOI]

Alexandros Potamianos

Proceedings of the Sixth European Conference on Speech Communication and Technology, 1999

1998

A study of n-gram and decision tree letter language modeling methods.

[BibT_eX]

[DOI]

Frederick Jelinek

Speech Commun., 1998

Linear discriminant analysis for speechreading.

[BibT_eX]

[DOI]

Proceedings of the Second IEEE Workshop on Multimedia Signal Processing, 1998

An Image Transform Approach for HMM based Automatic Lipreading.

[BibT_eX]

Eric Cosatto

Proceedings of the 1998 IEEE International Conference on Image Processing, 1998

Discriminative training of HMM stream exponents for audio-visual speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 1998 IEEE International Conference on Acoustics, 1998

1997

Stochastic approximation algorithms for partition function estimation of Gibbs random fields.

[BibT_eX]

[DOI]

John K. Goutsias

IEEE Trans. Inf. Theory, 1997

Speaker independent audio-visual database for bimodal ASR.

[BibT_eX]

[DOI]

Proceedings of the ESCA Workshop on Audio-Visual Speech Processing, 1997

1993

Partition function estimation of Gibbs random field images using Monte Carlo simulations.

[BibT_eX]

[DOI]

John K. Goutsias

IEEE Trans. Inf. Theory, 1993

An analysis of Monte Carlo methods for likelihood estimation of Gibbsian images.

[BibT_eX]

[DOI]

John Goutsias

Proceedings of the IEEE International Conference on Acoustics, 1993

1991

A novel method for computing the partition function of Markov random field images using Monte Carlo simulations.

[BibT_eX]

[DOI]