Gérard Bailly

Orcid: 0000-0002-6053-0818

Affiliations:
  • CNRS, Grenoble, France


According to our database1, Gérard Bailly authored at least 168 papers between 1986 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Probing the Inductive Biases of a Gaze Model for Multi-party Interaction.
Proceedings of the Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction, 2024

2023
Data-Driven Generation of Eyes and Head Movements of a Social Robot in Multiparty Conversation.
Proceedings of the Social Robotics - 15th International Conference, 2023

On the Benefit of Independent Control of Head and Eye Movements of a Social Robot for Multiparty Human-Robot Interaction.
Proceedings of the Human-Computer Interaction, 2023

2022
Automatic assessment of oral readings of young pupils.
Speech Commun., 2022

Comparing NLP Solutions for the Disambiguation of French Heterophonic Homographs for End-to-End TTS Systems.
Proceedings of the Speech and Computer - 24th International Conference, 2022

Automatic Verbal Depiction of a Brick Assembly for a Robot Instructing Humans.
Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue, 2022

Speaking Rate Control of end-to-end TTS Models by Direct Manipulation of the Encoder's Output Embeddings.
Proceedings of the Interspeech 2022, 2022

2021
Impact of Social Presence of Humanoid Robots: Does Competence Matter?
Proceedings of the Social Robotics - 13th International Conference, 2021

Evaluating the Extrapolation Capabilities of Neural Vocoders to Extreme Pitch Values.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Characterizing and Assessing the Oral Reading Fluency of Young Readers.
Proceedings of the Fifth International Conference, 2021


2020
Predicting Multidimensional Subjective Ratings of Children' Readings from the Speech Signals for the Automatic Assessment of Fluency.
Proceedings of The 12th Language Resources and Evaluation Conference, 2020

2019
Reading Prosody Development: Automatic Assessment for a Longitudinal Study.
Proceedings of the 8th ISCA International Workshop on Speech and Language Technology in Education, 2019

Transfer and Extraction of the Style of Handwritten Letters using Deep Learning.
Proceedings of the 11th International Conference on Agents and Artificial Intelligence, 2019

2018
Introduction to the special issue on auditory-visual expressive speech and gesture in humans and machines.
Speech Commun., 2018

Audio-visual synchronization in reading while listening to texts: Effects on visual behavior and verbal learning.
Comput. Speech Lang., 2018

Style Transfer and Extraction for the Handwritten Letters Using Deep Learning.
CoRR, 2018

A Variational Prosody Model for the decomposition and synthesis of speech prosody.
CoRR, 2018

Handwriting Styles: Benchmarks and Evaluation Metrics.
Proceedings of the Fifth International Conference on Social Networks Analysis, 2018

A Weighted Superposition of Functional Contours Model for Modelling Contextual Prominence of Elementary Prosodic Contours.
Proceedings of the Interspeech 2018, 2018

Comparing Cascaded LSTM Architectures for Generating Head Motion from Speech in Task-Oriented Dialogs.
Proceedings of the Human-Computer Interaction. Interaction Technologies, 2018

2017
Which prosodic features contribute to the recognition of dramatic attitudes?
Speech Commun., 2017

Learning off-line vs. on-line models of interactive multimodal behaviors with recurrent neural networks.
Pattern Recognit. Lett., 2017

Critical review of the book "Gaze in Human-Robot Communication".
J. Multimodal User Interfaces, 2017

A Generative Audio-Visual Prosodic Model for Virtual Actors.
IEEE Computer Graphics and Applications, 2017

Evaluation of reading performance of primary school children: Objective measurements vs. subjective ratings.
Proceedings of the WOCCI 2017: 6th International Workshop on Child Computer Interaction, 2017

Improving fluency of young readers: introducing a Karaoke to learn how to breathe during a Reading-while-Listening task.
Proceedings of the 7th ISCA International Workshop on Speech and Language Technology in Education, 2017

Acquiring Human-Robot Interaction skills with Transfer Learning Techniques.
Proceedings of the Companion of the 2017 ACM/IEEE International Conference on Human-Robot Interaction, 2017

2016
Graphical models for social behavior modeling in face-to face interaction.
Pattern Recognit. Lett., 2016

Statistical conversion of silent articulation into audible speech using full-covariance HMM.
Comput. Speech Lang., 2016

Adaptive Latency for Part-of-Speech Tagging in Incremental Text-to-Speech Synthesis.
Proceedings of the Interspeech 2016, 2016

Introduction to Poster Presentation of Part II.
Proceedings of the Interspeech 2016, 2016

Characterization of Audiovisual Dramatic Attitudes.
Proceedings of the Interspeech 2016, 2016

Quantitative Analysis of Backchannels Uttered by an Interviewer During Neuropsychological Tests.
Proceedings of the Interspeech 2016, 2016

Conducting neuropsychological tests with a humanoid robot: Design and evaluation.
Proceedings of the 7th IEEE International Conference on Cognitive Infocommunications, 2016

2015
Speaker-Adaptive Acoustic-Articulatory Inversion Using Cascaded Gaussian Mixture Regression.
IEEE ACM Trans. Audio Speech Lang. Process., 2015

Learning multimodal behavioral models for face-to-face social interaction.
J. Multimodal User Interfaces, 2015

Design and Validation of a Talking Face for the iCub.
Int. J. Humanoid Robotics, 2015

Using Karaoke to enhance reading while listening: impact on word memorization and eye movements.
Proceedings of the ISCA International Workshop on Speech and Language Technology in Education, 2015

HMM training strategy for incremental speech synthesis.
Proceedings of the INTERSPEECH 2015, 2015

Impact of iris size and eyelids coupling on the estimation of the gaze direction of a robotic talking head by human viewers.
Proceedings of the 15th IEEE-RAS International Conference on Humanoid Robots, 2015

Beaming the Gaze of a Humanoid Robot.
Proceedings of the Tenth Annual ACM/IEEE International Conference on Human-Robot Interaction, 2015

Audiovisual generation of social attitudes from neutral stimuli.
Proceedings of the Auditory-Visual Speech Processing, 2015

2014
Beyond basic emotions: expressive virtual actors with social attitudes.
Proceedings of the Seventh International Conference on Motion in Games, Playa Vista, CA, USA, November 06, 2014

Assessing objective characterizations of phonetic convergence.
Proceedings of the INTERSPEECH 2014, 2014

An articulated talking face for the iCub.
Proceedings of the 14th IEEE-RAS International Conference on Humanoid Robots, 2014

Modeling perception-action loops: comparing sequential models with frame-based classifiers.
Proceedings of the second international conference on Human-agent interaction, 2014

2013
Vizart3d - real-time system of visual articulatory feedback.
Proceedings of the ISCA International Workshop on Speech and Language Technology in Education, 2013

Speaker adaptation of an acoustic-articulatory inversion model using cascaded Gaussian mixture regressions.
Proceedings of the INTERSPEECH 2013, 2013

Adaptation of respiratory patterns in collaborative reading.
Proceedings of the INTERSPEECH 2013, 2013

Social Behavior Modeling Based on Incremental Discrete Hidden Markov Models.
Proceedings of the Human Behavior Understanding - 4th International Workshop, 2013

Audio-visual speaker conversion using prosody features.
Proceedings of the Auditory-Visual Speech Processing, 2013

2012
I Reach Faster When I See You Look: Gaze Effects in Human-Human and Human-Robot Face-to-Face Cooperation.
Frontiers Neurorobotics, 2012

Vizart3D : Retour Articulatoire Visuel pour l'Aide à la Prononciation (Vizart3D: Visual Articulatory Feedack for Computer-Assisted Pronunciation Training) [in French].
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, 2012

Cross-speaker Acoustic-to-Articulatory Inversion using Phone-based Trajectory HMM for Pronunciation Training.
Proceedings of the INTERSPEECH 2012, 2012

Continuous Articulatory-to-Acoustic Mapping using Phone-based Trajectory HMM for a Silent Speech Interface.
Proceedings of the INTERSPEECH 2012, 2012

Pauses and respiratory markers of the structure of book reading.
Proceedings of the INTERSPEECH 2012, 2012

2011
A pilot study on augmented speech communication based on Electro-Magnetic Articulography.
Pattern Recognit. Lett., 2011

Toward a Multi-Speaker Visual Articulatory Feedback System.
Proceedings of the INTERSPEECH 2011, 2011

Synchronous Reading: Learning French Orthography by Audiovisual Training.
Proceedings of the INTERSPEECH 2011, 2011

2010
Improvement to a NAM-captured whisper-to-speech system.
Speech Commun., 2010

Speech and face-to-face communication - An introduction.
Speech Commun., 2010

Gaze, conversational agents and face-to-face communication.
Speech Commun., 2010

Can you 'read' tongue movements? Evaluation of the contribution of tongue display to speech understanding.
Speech Commun., 2010

On the importance of eye gaze in a face-to-face collaborative task.
Proceedings of the 3rd international workshop on Affective interaction in natural environments, 2010

Facilitative effects of communicative gaze and speech in human-robot cooperation.
Proceedings of the 3rd international workshop on Affective interaction in natural environments, 2010

Can tongue be recovered from face? the answer of data-driven statistical models.
Proceedings of the INTERSPEECH 2010, 2010

Speech dominoes and phonetic convergence.
Proceedings of the INTERSPEECH 2010, 2010

Exploiting multimodal data fusion in robust speech recognition.
Proceedings of the 2010 IEEE International Conference on Multimedia and Expo, 2010

Speech, Gaze and Head Motion in a Face-to-Face Collaborative Task.
Proceedings of the Electronic Speech Signal Processing, 2010

Study of the Phenomenon of Phonetic Convergence Thanks to Speech Dominoes.
Proceedings of the Analysis of Verbal and Nonverbal Communication and Enactment. The Processing Issues, 2010

Acoustic-to-articulatory inversion in speech based on statistical models.
Proceedings of the Auditory-Visual Speech Processing, 2010

2009
Exploiting visual information for NAM recognition.
IEICE Electron. Express, 2009

Animating Virtual Speakers or Singers from Audio: Lip-Synching Facial Animation.
EURASIP J. Audio Speech Music. Process., 2009

Lip-Synching Using Speaker-Specific Articulation, Shape and Appearance Models.
EURASIP J. Audio Speech Music. Process., 2009

Acoustic-to-articulatory inversion using speech recognition and trajectory formation based on phoneme hidden Markov models.
Proceedings of the INTERSPEECH 2009, 2009

Multimodal HMM-based NAM-to-speech conversion.
Proceedings of the INTERSPEECH 2009, 2009

2008
Improvement to a NAM captured whisper-to-speech system.
Proceedings of the INTERSPEECH 2008, 2008

LIPS2008: visual speech synthesis challenge.
Proceedings of the INTERSPEECH 2008, 2008

From 3-d speaker cloning to text-to-audiovisual-speech.
Proceedings of the INTERSPEECH 2008, 2008

A trainable trajectory formation model TD-HMM parameterized for the LIPS 2008 challenge.
Proceedings of the INTERSPEECH 2008, 2008

Can you "read tongue movements"?
Proceedings of the INTERSPEECH 2008, 2008

German text-to-audiovisual-speech by 3-d speaker cloning.
Proceedings of the International Conference on Auditory-Visual Speech Processing 2008, 2008

Retargeting cued speech hand gestures for different talking heads and speakers.
Proceedings of the International Conference on Auditory-Visual Speech Processing 2008, 2008

Speaking with smile or disgust: data and models.
Proceedings of the International Conference on Auditory-Visual Speech Processing 2008, 2008

An Audiovisual Talking Head for Augmented Speech Generation: Models and Animations Based on a Real Speaker's Articulatory Data.
Proceedings of the Articulated Motion and Deformable Objects, 5th International Conference, 2008

2007
Image and Video for Hearing Impaired People.
EURASIP J. Image Video Process., 2007

Learning optimal audiovisual phasing for an HMM-based control model for facial animation.
Proceedings of the Sixth ISCA Workshop on Speech Synthesis, 2007

Analyzing Gaze During Face-to-Face Interaction.
Proceedings of the Intelligent Virtual Agents, 7th International Conference, 2007

Scrutinizing Natural Scenes: Controlling the Gaze of an Embodied Conversational Agent.
Proceedings of the Intelligent Virtual Agents, 7th International Conference, 2007

Gaze Patterns during Face-to-Face Interaction.
Proceedings of the 2007 IEEE/WIC/ACM International Conference on Web Intelligence and International Conference on Intelligent Agent Technology, 2007

Analyzing and modeling gaze during face-to-face interaction.
Proceedings of the Auditory-Visual Speech Processing 2007, 2007

Intelligibility of natural and 3d-cloned German speech.
Proceedings of the Auditory-Visual Speech Processing 2007, 2007

Towards eye gaze aware analysis and synthesis of audiovisual speech.
Proceedings of the Auditory-Visual Speech Processing 2007, 2007

2006
3D Semi-Landmarks Based Statistical Face Reconstruction.
J. Comput. Inf. Technol., 2006


Embodied Conversational Agents: Computing and Rendering Realistic Gaze Patterns.
Proceedings of the Advances in Multimedia Information Processing, 2006

Does a Virtual Talking Face Generate Proper Multimodal Cues to Draw User's Attention to Points of Interest?
Proceedings of the Fifth International Conference on Language Resources and Evaluation, 2006

A joint intelligibility evaluation of French text-to-speech synthesis systems: the EvaSy SUS/ACR campaign.
Proceedings of the Fifth International Conference on Language Resources and Evaluation, 2006

A joint prosody evaluation of French text-to-speech synthesis systems.
Proceedings of the Fifth International Conference on Language Resources and Evaluation, 2006

TDA: a new trainable trajectory formation system for facial animation.
Proceedings of the INTERSPEECH 2006, 2006

Evaluating a virtual speech cuer.
Proceedings of the INTERSPEECH 2006, 2006

Generating German intonation with a trainable prosodic model.
Proceedings of the INTERSPEECH 2006, 2006

A new trainable trajectory formation system for facial animation.
Proceedings of the ISCA Tutorial and Research Workshop on Experimental Linguistics, 2006

Evaluation of a virtual speech cuer.
Proceedings of the ISCA Tutorial and Research Workshop on Experimental Linguistics, 2006

Audiovisual speech enhancement experiments for mouth segmentation evaluation.
Proceedings of the 14th European Signal Processing Conference, 2006

Statistical 3D Cranio-Facial Models.
Proceedings of the Sixth International Conference on Computer and Information Technology (CIT 2006), 2006

2005
SFC: A trainable prosodic model.
Speech Commun., 2005

Evaluating the pronunciation of proper names by four French grapheme-to-phoneme converters.
Proceedings of the INTERSPEECH 2005, 2005

Statistical active model for mouth components segmentation.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

Missing Data Estimation Using Polynomial Kernels.
Proceedings of the Pattern Recognition and Data Mining, 2005

Non-linear active model for mouth inner and outer contours detection.
Proceedings of the 13th European Signal Processing Conference, 2005

Basic components of a face-to-face interaction with a conversational agent: mutual attention and deixis.
Proceedings of the 2005 joint conference on Smart objects and ambient intelligence, 2005

Capturing data and realistic 3d models for cued speech analysis and audiovisual synthesis.
Proceedings of the Auditory-Visual Speech Processing 2005, 2005

2004
Tracking talking faces with shape and appearance models.
Speech Commun., 2004

Audiovisual text-to-cued speech synthesis.
Proceedings of the Fifth ISCA ITRW on Speech Synthesis, 2004

Evaluation of a Speech Cuer: From Motion Capture to a Concatenative Text-to-cued Speech System.
Proceedings of the Fourth International Conference on Language Resources and Evaluation, 2004

A superposed prosodic model for Chinese text-to-speech synthesis.
Proceedings of the 2004 International Symposium on Chinese Spoken Language Processing, 2004

Audiovisual perceptual evaluation of resynthesised speech movements.
Proceedings of the INTERSPEECH 2004, 2004

A trainable prosodic model: learning the contours implementing communicative functions within a superpositional model of intonation.
Proceedings of the INTERSPEECH 2004, 2004

3D Meshes Registration: Application to Statistical Skull Model.
Proceedings of the Image Analysis and Recognition: International Conference, 2004

Audiovisual text-to-cued speech synthesis.
Proceedings of the 2004 12th European Signal Processing Conference, 2004

2003
Audiovisual Speech Synthesis.
Int. J. Speech Technol., 2003

Close Shadowing Natural Versus Synthetic Speech.
Int. J. Speech Technol., 2003

ISCA special session: hot topics in speech synthesis.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Shape and appearance models of talking faces for model-based tracking.
Proceedings of the 2003 IEEE International Workshop on Analysis and Modeling of Faces and Gestures (AMFG 2003), 2003

2002
Three-dimensional linear articulatory modeling of tongue, lips and face, based on MRI and video images.
J. Phonetics, 2002

Seeing tongue movements from outside.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Audiovisual speech synthesis. from ground truth to models.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

2001
Generating prosodic attitudes in French: Data, model and evaluation.
Speech Commun., 2001

Close shadowing natural vs. synthetic speech.
Proceedings of the 4th ITRW on Speech Synthesis, Perthshire, Scotland, UK, August 29, 2001

Visual synthesis.
Proceedings of the 4th ITRW on Speech Synthesis, Perthshire, Scotland, UK, August 29, 2001

Creating and controlling video-realistic talking heads.
Proceedings of the Auditory-Visual Speech Processing, 2001

2000
The Cost258 Signal Generation Test Array.
Proceedings of the Second International Conference on Language Resources and Evaluation, 2000

MOTHER: a new generation of talking heads providing a flexible articulatory control for video-realistic speech animation.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Generating prosody by superposing multi-parametric overlapping contours.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

1999
Training an application-dependent prosodic model corpus, model and evaluation.
Proceedings of the Sixth European Conference on Speech Communication and Technology, 1999

Accurate estimation of sinusoidal parameters in an harmonic+noise model for speech synthesis.
Proceedings of the Sixth European Conference on Speech Communication and Technology, 1999

1998
Objective evaluation of grapheme to phoneme conversion for text-to-speech synthesis in French.
Comput. Speech Lang., 1998

Evaluating the adeqnacy of synthetic prosody in signaling syntactic boundaries: methodology and first results.
Proceedings of the First International Conference on Language Resources and Evaluation, 1998

Evaluation of grapheme-to phoneme conversion for text-to-speech synthesis in French.
Proceedings of the First International Conference on Language Resources and Evaluation, 1998

Cooperation and competition of burst and formant transitions for the perception and identification of French stops.
Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998

Synergy between jaw and lips/tongue movements : consequences in articulatory modelling.
Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998

A three-dimensional linear articulatory model based on MRI data.
Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998

1997
Learning to speak. Sensori-motor control of speech movements.
Speech Commun., 1997

Relative contributions of noise burst and vocalic transitions to the perceptual identification of stop consonants.
Proceedings of the Fifth European Conference on Speech Communication and Technology, 1997

Synthesising attitudes with global rhythmic and intonation contours.
Proceedings of the Fifth European Conference on Speech Communication and Technology, 1997

Synthesis of fricative consonants by audiovisual-to-articulatory inversion.
Proceedings of the Fifth European Conference on Speech Communication and Technology, 1997

Introduction to Part III.
Proceedings of the Computing Prosody, 1997

1996
Generating intonation by superposing gestures.
Proceedings of the 4th International Conference on Spoken Language Processing, 1996

Building sensori-motor prototypes from audiovisual exemplars.
Proceedings of the 4th International Conference on Spoken Language Processing, 1996

1995
Synthesis and evaluation of intonation with a superposition model.
Proceedings of the Fourth European Conference on Speech Communication and Technology, 1995

Articulatori-acoustic vowel prototypes for speech production.
Proceedings of the Fourth European Conference on Speech Communication and Technology, 1995

Generation of intonation: a global approach.
Proceedings of the Fourth European Conference on Speech Communication and Technology, 1995

1994
Characterisation of rhythmic patterns for text-to-speech synthesis.
Speech Communication, 1994

Generation of pauses within the z-score model.
Proceedings of the Second ESCA/IEEE Workshop on Speech Synthesis, 1994

Building prototypes for articulatory speech synthesis.
Proceedings of the Second ESCA/IEEE Workshop on Speech Synthesis, 1994

1993
Resonances as possible representation of speech in the auditory-to-articulatory transform.
Proceedings of the Third European Conference on Speech Communication and Technology, 1993

COMPOST: a client-server model for applications using text-to-speech systems.
Proceedings of the Third European Conference on Speech Communication and Technology, 1993

1991
Synthesis-by-rule using compost: modelling resonance trajectories.
Proceedings of the Second European Conference on Speech Communication and Technology, 1991

1990
Automatic labeling of large prosodic databases : tools, methodology and links with a text-to-speech system.
Proceedings of the ESCA Workshop on Speech Synthesis, 1990

Generation of articulatory trajectories using sequential networks.
Proceedings of the ESCA Workshop on Speech Synthesis, 1990

Automatic segmentation and alignment of continuous speech based on temporal decomposition model.
Proceedings of the First International Conference on Spoken Language Processing, 1990

1989
Integration of rhythmic and syntactic constraints in a model of generation of French prosody.
Speech Commun., 1989

Compost: a rule-compiler for speech synthesis.
Proceedings of the First European Conference on Speech Communication and Technology, 1989

A new algorithm for temporal decomposition of speech-application to a numerical model of coarticulation.
Proceedings of the IEEE International Conference on Acoustics, 1989

1988
Stochastic model of diphone-like segments based on trajectory concepts.
Proceedings of the IEEE International Conference on Acoustics, 1988

1986
Multiparametric generation of French prosody from unrestricted text.
Proceedings of the IEEE International Conference on Acoustics, 1986


  Loading...