Gerasimos Potamianos

Orcid: 0000-0002-9833-7124

According to our database1, Gerasimos Potamianos authored at least 138 papers between 1991 and 2023.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2023
SL-REDU GSL: A Large Greek Sign Language Recognition Corpus.
Proceedings of the IEEE International Conference on Acoustics, 2023

Sign Language Recognition via Deformable 3D Convolutions and Modulated Graph Convolutional Networks.
Proceedings of the IEEE International Conference on Acoustics, 2023

2022
ChildBot: Multi-robot perception and interaction with children.
Robotics Auton. Syst., 2022

Spatio-Temporal Graph Convolutional Networks for Continuous Sign Language Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022

Accurate and Resource-Efficient Lipreading with Efficientnetv2 and Transformers.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Joint Object Affordance Reasoning and Segmentation in RGB-D Videos.
IEEE Access, 2021

A robotic edutainment framework for designing child-robot interaction scenarios.
Proceedings of the PETRA '21: The 14th PErvasive Technologies Related to Assistive Environments Conference, Virtual Event, Greece, 29 June, 2021

Multimodal Fusion and Sequence Learning for Cued Speech Recognition from Videos.
Proceedings of the Universal Access in Human-Computer Interaction. Access to Media, Learning and Assistive Environments, 2021

The SL-ReDu Environment for Self-monitoring and Objective Learner Assessment in Greek Sign Language.
Proceedings of the Universal Access in Human-Computer Interaction. Access to Media, Learning and Assistive Environments, 2021

Resource-efficient TDNN Architectures for Audio-visual Speech Recognition.
Proceedings of the 29th European Signal Processing Conference, 2021

Overlapped Sound Event Classification via Multi-Channel Sound Separation Network.
Proceedings of the 29th European Signal Processing Conference, 2021

An Audiovisual Child Emotion Recognition System for Child-Robot Interaction Applications.
Proceedings of the 29th European Signal Processing Conference, 2021

2020
Deep sensorimotor learning for RGB-D object recognition.
Comput. Vis. Image Underst., 2020

SL-ReDu: greek sign language recognition for educational applications. Project description and early results.
Proceedings of the PETRA '20: The 13th PErvasive Technologies Related to Assistive Environments Conference, Corfu, Greece, June 30, 2020

Multimodal Sign Language Recognition via Temporal Deformable Convolutional Sequence Learning.
Proceedings of the Interspeech 2020, 2020

Resource-Adaptive Deep Learning for Visual Speech Recognition.
Proceedings of the Interspeech 2020, 2020

A Deep Learning Approach to Object Affordance Segmentation.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Audio-Assisted Image Inpainting for Talking Faces.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

A Fully Convolutional Sequence Learning Approach for Cued Speech Recognition from Videos.
Proceedings of the 28th European Signal Processing Conference, 2020

Exploiting 3D Hand Pose Estimation in Deep Learning-Based Sign Language Recognition from RGB Videos.
Proceedings of the Computer Vision - ECCV 2020 Workshops, 2020

Emotion Understanding in Videos Through Body, Context, and Visual-Semantic Embedding Loss.
Proceedings of the Computer Vision - ECCV 2020 Workshops, 2020

2019
Fusing Body Posture With Facial Expressions for Joint Recognition of Affect in Child-Robot Interaction.
IEEE Robotics Autom. Lett., 2019

Room-localized speech activity detection in multi-microphone smart homes.
EURASIP J. Audio Speech Music. Process., 2019

End-to-End Convolutional Sequence Learning for ASL Fingerspelling Recognition.
Proceedings of the Interspeech 2019, 2019

MobiLipNet: Resource-Efficient Deep Learning Based Lipreading.
Proceedings of the Interspeech 2019, 2019

Fingerspelled Alphabet Sign Recognition in Upper-Body Videos.
Proceedings of the 27th European Signal Processing Conference, 2019

Hierarchical Detection of Sound Events and their Localization Using Convolutional Neural Networks with Adaptive Thresholds.
Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events 2019 (DCASE 2019), 2019

2018
Deep View2View Mapping for View-Invariant Lipreading.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Object Assembly Guidance in Child-Robot Interaction using RGB-D based 3D Tracking.
Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2018

Multi3: Multi-Sensory Perception System for Multi-Modal Child Interaction with Multiple Robots.
Proceedings of the 2018 IEEE International Conference on Robotics and Automation, 2018

Attention-Enhanced Sensorimotor Object Recognition.
Proceedings of the 2018 IEEE International Conference on Image Processing, 2018

Multi- View Fusion for Action Recognition in Child-Robot Interaction.
Proceedings of the 2018 IEEE International Conference on Image Processing, 2018

Far-Field Audio-Visual Scene Perception of Multi-Party Human-Robot Interaction for Children and Adults.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

A Hybrid Approach to Hand Detection and Type Classification in Upper-Body Videos.
Proceedings of the 7th European Workshop on Visual Information Processing, 2018

Multi-Channel Non-Negative Matrix Factorization for Overlapped Acoustic Event Detection.
Proceedings of the 26th European Signal Processing Conference, 2018

2017
Room-localized spoken command recognition in multi-room, multi-microphone environments.
Comput. Speech Lang., 2017

On the Joint Use of NMF and Classification for Overlapping Acoustic Event Detection.
Proceedings of the IWCIM 2017, 2017

Deep Affordance-Grounded Sensorimotor Object Recognition.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Exploring ROI size in deep learning based lipreading.
Proceedings of the Auditory-Visual Speech Processing, 2017

Audio and visual modality combination in speech processing applications.
Proceedings of the Handbook of Multimodal-Multisensor Interfaces: Foundations, User Modeling, and Common Modality Combinations, 2017

2016
Audio-visual speech activity detection in a two-speaker scenario incorporating depth information from a profile or frontal view.
Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

Improved Dictionary Selection and Detection Schemes in Sparse-CNMF-Based Overlapping Acoustic Event Detection.
Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events, 2016

2015
Detecting audio-visual synchrony using deep neural networks.
Proceedings of the INTERSPEECH 2015, 2015

Multichannel speech enhancement using MEMS microphones.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Multi-room speech activity detection using a distributed microphone network in domestic environments.
Proceedings of the 23rd European Signal Processing Conference, 2015

Scattering vs. discrete cosine transform features in visual speech processing.
Proceedings of the Auditory-Visual Speech Processing, 2015

2014
ATHENA: a Greek multi-sensory database for home automation control uthor: isidoros rodomagoulakis (NTUA, Greece).
Proceedings of the INTERSPEECH 2014, 2014

Robust far-field spoken command recognition for home automation combining adaptation and multichannel processing.
Proceedings of the IEEE International Conference on Acoustics, 2014

The Athena-RC system for speech activity detection and speaker localization in the DIRHA smart home.
Proceedings of the 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays, 2014

Database and baseline system for detecting degraded traffic signs in urban environments.
Proceedings of the 5th European Workshop on Visual Information Processing, 2014

Experiments in acoustic source localization using sparse arrays in adverse indoors environments.
Proceedings of the 22nd European Signal Processing Conference, 2014

Multi-microphone fusion for detection of speech and acoustic events in smart spaces.
Proceedings of the 22nd European Signal Processing Conference, 2014

2013
Experiments on far-field multichannel speech processing in smart homes.
Proceedings of the 18th International Conference on Digital Signal Processing, 2013

Robust Multi-Modal Speech Recognition in Two Languages Utilizing Video and Distance Information from the Kinect.
Proceedings of the Human-Computer Interaction. Interaction Modalities and Techniques, 2013

Advances in Large Vocabulary Continuous Speech Recognition in Greek: Modeling and nonlinear features.
Proceedings of the 21st European Signal Processing Conference, 2013

2012
Audio-visual speech recognition using depth information from the Kinect in noisy video conditions.
Proceedings of the 5th International Conference on PErvasive Technologies Related to Assistive Environments, 2012

A hierarchical approach with feature selection for emotion recognition from speech.
Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2012

Audio-visual speech recognition incorporating facial depth information captured by the Kinect.
Proceedings of the 20th European Signal Processing Conference, 2012

2011
Special Section on Interactive Multimedia.
IEEE Trans. Multim., 2011

Audio visual speech recognition in noisy visual environments.
Proceedings of the PETRA 2011, 2011

Bilingual corpus for AVASR using multiple sensors and depth information.
Proceedings of the Auditory-Visual Speech Processing, 2011

2010
Joint estimation of DOA and speech based on EM beamforming.
Proceedings of the IEEE International Conference on Acoustics, 2010


2009
Automatic Speech Recognition.
Proceedings of the Computers in the Human Interaction Loop, 2009

Person Tracking.
Proceedings of the Computers in the Human Interaction Loop, 2009

Multimodal Classification of Activities of Daily Living Inside Smart Homes.
Proceedings of the Distributed Computing, 2009

Robust audio-visual speech synchrony detection by generalized bimodal linear prediction.
Proceedings of the INTERSPEECH 2009, 2009

Acoustic fall detection using Gaussian mixture models and GMM supervectors.
Proceedings of the IEEE International Conference on Acoustics, 2009

Long-time span acoustic activity analysis from far-field sensors in smart homes.
Proceedings of the IEEE International Conference on Acoustics, 2009

Audio-visual speech synchronization detection using a bimodal linear prediction model.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2009

Audio-visual automatic speech recognition and related bimodal speech technologies: A review of the state-of-the-art and open problems.
Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009

2008
A multi-modal spoken dialog system for interactive TV.
Proceedings of the 10th International Conference on Multimodal Interfaces, 2008

Patch-based analysis of visual speech from multiple views.
Proceedings of the International Conference on Auditory-Visual Speech Processing 2008, 2008

2007
Joint face and head tracking inside multi-camera smart rooms.
Signal Image Video Process., 2007

The CHIL audiovisual corpus for lecture and meeting analysis inside smart rooms.
Lang. Resour. Evaluation, 2007

An Embedded System for In-Vehicle Visual Speech Activity Detection.
Proceedings of the IEEE 9th Workshop on Multimedia Signal Processing, 2007

A unified approach to multi-pose audio-visual ASR.
Proceedings of the INTERSPEECH 2007, 2007

Detection, diarization, and transcription of far-field lecture speech.
Proceedings of the INTERSPEECH 2007, 2007

Dynamic Stream Weight Modeling for Audio-Visual Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2007

Kernel-Based 3D Tracking.
Proceedings of the 2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2007), 2007

The IBM RT07 Evaluation Systems for Speaker Diarization on Lecture Meetings.
Proceedings of the Multimodal Technologies for Perception of Humans, 2007

The IBM Rich Transcription 2007 Speech-to-Text Systems for Lecture Meetings.
Proceedings of the Multimodal Technologies for Perception of Humans, 2007

An extended pose-invariant lipreading system.
Proceedings of the Auditory-Visual Speech Processing 2007, 2007

2006
Lipreading Using Profile Versus Frontal Views.
Proceedings of the IEEE 8th Workshop on Multimedia Signal Processing, 2006

The IBM RT06s Evaluation System for Speech Activity Detection in CHIL Seminars.
Proceedings of the Machine Learning for Multimodal Interaction, 2006

The IBM Rich Transcription Spring 2006 Speech-to-Text System for Lecture Meetings.
Proceedings of the Machine Learning for Multimodal Interaction, 2006

Audio-Visual ASR from Multiple Views inside Smart Rooms.
Proceedings of the 2006 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, 2006

Person Tracking in Smart Rooms using Dynamic Programming and Adaptive Subspace Learning.
Proceedings of the 2006 IEEE International Conference on Multimedia and Expo, 2006

Robust Multi-View Multi-Camera Face Detection inside Smart Rooms Using Spatio-Temporal Dynamic Programming.
Proceedings of the Seventh IEEE International Conference on Automatic Face and Gesture Recognition (FGR 2006), 2006

A Joint System for Single-Person 2D-Face and 3D-Head Tracking in CHIL Seminars.
Proceedings of the Multimodal Technologies for Perception of Humans, 2006

2005
Automatic Speech Recognition and Speech Activity Detection in the CHIL Smart Room.
Proceedings of the Machine Learning for Multimodal Interaction, 2005

Speech activity detection fusing acoustic phonetic and energy features.
Proceedings of the INTERSPEECH 2005, 2005

Automatic Speech Activity Detection, Source Localization, and Speech Recognition on the Chil Seminar Corpus.
Proceedings of the 2005 IEEE International Conference on Multimedia and Expo, 2005

Improved face finding in visually challenging environments.
Proceedings of the 2005 IEEE International Conference on Multimedia and Expo, 2005

A Joint System for Person Tracking and Face Detection.
Proceedings of the Computer Vision in Human-Computer Interaction, 2005

Exploiting lower face symmetry in appearance-based automatic speechreading.
Proceedings of the Auditory-Visual Speech Processing 2005, 2005

2004
Audio-visual speech recognition using an infrared headset.
Speech Commun., 2004

Mutual information based visual feature selection for lipreading.
Proceedings of the INTERSPEECH 2004, 2004

Efficient likelihood computation in multi-stream HMM based audio-visual speech recognition.
Proceedings of the INTERSPEECH 2004, 2004

Multistage information fusion for audio-visual speech recognition.
Proceedings of the 2004 IEEE International Conference on Multimedia and Expo, 2004

Towards practical deployment of audio-visual speech recognition.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

Improved face and feature finding for audio-visual speech recognition in visually challenging environments.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

2003
Recent advances in the automatic recognition of audiovisual speech.
Proc. IEEE, 2003

Audio-visual speech recognition in challenging environments.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

A real-time prototype for small-vocabulary audio-visual ASR.
Proceedings of the 2003 IEEE International Conference on Multimedia and Expo, 2003

Information fusion and decision cascading for audio-visual speaker recognition based on time-varying stream reliability prediction.
Proceedings of the 2003 IEEE International Conference on Multimedia and Expo, 2003

Frame-dependent multi-stream reliability indicators for audio-visual speech recognition.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

Audio-visual speaker recognition using time-varying stream reliability prediction.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

Joint audio-visual speech processing for recognition and enhancement.
Proceedings of the AVSP 2003, 2003

Improving audio-visual speech recognition with an infrared headset.
Proceedings of the AVSP 2003, 2003

2002
Editorial.
EURASIP J. Adv. Signal Process., 2002

Audio-visual speech enhancement with AVCDCN (audio-visual codebook dependent cepstral normalization).
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Maximum entropy and MCE based HMM stream weight estimation for audio-visual ASR.
Proceedings of the IEEE International Conference on Acoustics, 2002

Noisy audio feature enhancement using audio-visual speech data.
Proceedings of the IEEE International Conference on Acoustics, 2002

2001
A Cascade Visual Front End for Speaker Independent Automatic Speechreading.
Int. J. Speech Technol., 2001

Large-vocabulary audio-visual speech recognition: a summary of the Johns Hopkins Summer 2000 Workshop.
Proceedings of the Fourth IEEE Workshop on Multimedia Signal Processing, 2001

Robust detection of visual ROI for automatic speechreading.
Proceedings of the Fourth IEEE Workshop on Multimedia Signal Processing, 2001

Large-vocabulary audio-visual speech recognition by machines and humans.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

A Comparison Of Model And Transform-Based Visual Features For Audio-Visual LVCSR.
Proceedings of the 2001 IEEE International Conference on Multimedia and Expo, 2001

Improved ROI and within frame discriminant features for lipreading.
Proceedings of the 2001 International Conference on Image Processing, 2001

Hierarchical discriminant features for audio-visual LVCSR.
Proceedings of the IEEE International Conference on Acoustics, 2001

Asynchronous stream modeling for large vocabulary audio-visual speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2001

Weighting schemes for audio-visual fusion in speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2001

Automatic speechreading of impaired speech.
Proceedings of the Auditory-Visual Speech Processing, 2001

2000
Stream confidence estimation for audio-visual speech recognition.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Perceptual interfaces for information interaction: joint processing of audio and visual information for human-computer interaction.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

A Cascade Image Transform for Speaker Independent Automatic Speech Reading.
Proceedings of the 2000 IEEE International Conference on Multimedia and Expo, 2000

Audio-Visual Unit Selection for the Synthesis of Photo-Realistic Talking-Heads.
Proceedings of the 2000 IEEE International Conference on Multimedia and Expo, 2000

1999
Speaker adaptation for audio-visual speech recognition.
Proceedings of the Sixth European Conference on Speech Communication and Technology, 1999

1998
A study of n-gram and decision tree letter language modeling methods.
Speech Commun., 1998

Linear discriminant analysis for speechreading.
Proceedings of the Second IEEE Workshop on Multimedia Signal Processing, 1998

An Image Transform Approach for HMM based Automatic Lipreading.
Proceedings of the 1998 IEEE International Conference on Image Processing, 1998

Discriminative training of HMM stream exponents for audio-visual speech recognition.
Proceedings of the 1998 IEEE International Conference on Acoustics, 1998

1997
Stochastic approximation algorithms for partition function estimation of Gibbs random fields.
IEEE Trans. Inf. Theory, 1997

Speaker independent audio-visual database for bimodal ASR.
Proceedings of the ESCA Workshop on Audio-Visual Speech Processing, 1997

1993
Partition function estimation of Gibbs random field images using Monte Carlo simulations.
IEEE Trans. Inf. Theory, 1993

An analysis of Monte Carlo methods for likelihood estimation of Gibbsian images.
Proceedings of the IEEE International Conference on Acoustics, 1993

1991
A novel method for computing the partition function of Markov random field images using Monte Carlo simulations.
Proceedings of the 1991 International Conference on Acoustics, 1991


  Loading...