Jonas Beskow

Proceedings of the Social Robotics - 16th International Conference, 2024

Incorporating Spatial Awareness in Data-Driven Gesture Generation for Virtual Agents.

[BibT_eX]

[DOI]

Anna Deichler

Proceedings of the 24th ACM International Conference on Intelligent Virtual Agents, 2024

Beyond graphemes and phonemes: continuous phonological features in neural text-to-speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Should you use a probabilistic duration model in TTS? Probably! Especially for spontaneous speech.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Gesture Evaluation in Virtual Reality.

[BibT_eX]

[DOI]

Axel Wiebe Werner

Anna Deichler

Proceedings of the Companion Proceedings of the 26th International Conference on Multimodal Interaction, 2024

Matcha-TTS: A Fast TTS Architecture with Conditional Flow Matching.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Unified Speech and Gesture Synthesis Using Flow Matching.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Fake it to make it: Using synthetic data to remedy the data shortage in joint multi-modal speech-and-gesture synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

Learning to generate pointing gestures in situated embodied conversational agents.

[BibT_eX]

[DOI]

Frontiers Robotics AI, October, 2023

Listen, Denoise, Action! Audio-Driven Motion Synthesis with Diffusion Models.

[BibT_eX]

[DOI]

ACM Trans. Graph., August, 2023

Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis.

[BibT_eX]

[DOI]

Proceedings of the 12th ISCA Speech Synthesis Workshop, 2023

Hi robot, it's not what you say, it's how you say it.

[BibT_eX]

[DOI]

Proceedings of the 32nd IEEE International Conference on Robot and Human Interactive Communication, 2023

Generation of speech and facial animation with controllable articulatory effort for amusing conversational characters.

[BibT_eX]

[DOI]

Éva Székely

Proceedings of the 23rd ACM International Conference on Intelligent Virtual Agents, 2023

OverFlow: Putting flows on top of neural transducers for better TTS.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Diffusion-Based Co-Speech Gesture Generation Using Joint Text and Audio Representation.

[BibT_eX]

[DOI]

Proceedings of the 25th International Conference on Multimodal Interaction, 2023

Casual chatter or speaking up? Adjusting articulatory effort in generation of speech and animation for conversational characters.

[BibT_eX]

[DOI]

Proceedings of the 17th IEEE International Conference on Automatic Face and Gesture Recognition, 2023

2022

Neural HMMS Are All You Need (For High-Quality Attention-Free TTS).

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Transflower: probabilistic autoregressive dance generation with multimodal attention.

[BibT_eX]

[DOI]

Guillermo Valle Pérez

ACM Trans. Graph., 2021

Multimodal Capture of Patient Behaviour for Improved Detection of Early Dementia: Clinical Feasibility and Preliminary Results.

[BibT_eX]

[DOI]

Frontiers Comput. Sci., 2021

Mechanical Chameleons: Evaluating the effects of a social robot's non-verbal behavior on social influence.

[BibT_eX]

[DOI]

CoRR, 2021

Personality in the mix - investigating the contribution of fillers and speaking style to the perception of spontaneous speech synthesis.

[BibT_eX]

[DOI]

Éva Székely

Proceedings of the 11th ISCA Speech Synthesis Workshop, 2021

Expressive Robot Performance Based on Facial Motion Capture.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Integrated Speech and Gesture Synthesis.

[BibT_eX]

[DOI]

Proceedings of the ICMI '21: International Conference on Multimodal Interaction, 2021

2020

MoGlow: probabilistic and controllable motion synthesis using normalising flows.

[BibT_eX]

[DOI]

Gustav Eje Henter

ACM Trans. Graph., 2020

Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially Aware Language Acquisition.

[BibT_eX]

[DOI]

IEEE Trans. Cogn. Dev. Syst., 2020

Style-Controllable Speech-Driven Gesture Synthesis Using Normalising Flows.

[BibT_eX]

[DOI]

Comput. Graph. Forum, 2020

Let's Face It: Probabilistic Multi-modal Interlocutor-aware Generation of Facial Gestures in Dyadic Settings.

[BibT_eX]

[DOI]

Proceedings of the IVA '20: ACM International Conference on Intelligent Virtual Agents, 2020

Can we trust online crowdworkers?: Comparing online and offline participants in a preference test of virtual agents.

[BibT_eX]

[DOI]

Proceedings of the IVA '20: ACM International Conference on Intelligent Virtual Agents, 2020

Generating coherent spontaneous speech and gesture from text.

[BibT_eX]

[DOI]

Proceedings of the IVA '20: ACM International Conference on Intelligent Virtual Agents, 2020

Breathing and Speech Planning in Spontaneous Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Embodiment and gender interact in alignment to TTS voices.

[BibT_eX]

[DOI]

Proceedings of the 42th Annual Meeting of the Cognitive Science Society, 2020

2019

Modeling of Human Visual Attention in Multiparty Open-World Dialogues.

[BibT_eX]

[DOI]

Hedvig Kjellström

ACM Trans. Hum. Robot Interact., 2019

The effect of a physical robot on vocabulary learning.

[BibT_eX]

[DOI]

CoRR, 2019

Speech Synthesis Evaluation - State-of-the-Art Assessment and Suggestion for a Novel Research Program.

[BibT_eX]

[DOI]

Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

How to train your fillers: uh and um in spontaneous speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

PROMIS: a statistical-parametric speech synthesis system with prominence control via a prominence network.

[BibT_eX]

[DOI]

Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

Spontaneous Conversational Speech Synthesis from Found Data.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Off the Cuff: Exploring Extemporaneous Speech Delivery with TTS.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Equipping social robots with culturally-sensitive facial expressions of emotion using data-driven methods.

[BibT_eX]

[DOI]

Proceedings of the 14th IEEE International Conference on Automatic Face & Gesture Recognition, 2019

Multimodal conversational interaction with robots.

[BibT_eX]

[DOI]

Proceedings of the Handbook of Multimodal-Multisensor Interfaces: Language Processing, Software, Commercialization, and Emerging Directions, 2019

2018

A Multimodal Corpus for Mutual Gaze and Joint Attention in Multiparty Situated Interaction.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

Crowdsourced Multimodal Corpora Collection Tool.

[BibT_eX]

[DOI]

Patrik Jonell

Catharine Oertel

Aravind Elanjimattathil Vijayan

Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

Emotion-Awareness for Intelligent Vehicle Assistants: A Research Agenda.

[BibT_eX]

[DOI]

Proceedings of the 1st IEEE/ACM International Workshop on Software Engineering for AI in Autonomous Systems, 2018

Using Constrained Optimization for Real-Time Synchronization of Verbal and Nonverbal Robot Behavior.

[BibT_eX]

[DOI]

Iolanda Leite

Proceedings of the 2018 IEEE International Conference on Robotics and Automation, 2018

Reverse Engineering Psychologically Valid Facial Expressions of Emotion into Social Robots.

[BibT_eX]

[DOI]

Proceedings of the 13th IEEE International Conference on Automatic Face & Gesture Recognition, 2018

2017

Mimebot - Investigating the Expressibility of Non-Verbal Communication Across Agent Embodiments.

[BibT_eX]

[DOI]

ACM Trans. Appl. Percept., 2017

Self-Supervised Vision-Based Detection of the Active Speaker as a Prerequisite for Socially-Aware Language Acquisition.

[BibT_eX]

[DOI]

CoRR, 2017

Machine Learning and Social Robotics for Detecting Early Signs of Dementia.

[BibT_eX]

[DOI]

CoRR, 2017

Real-time labeling of non-rigid motion capture marker sets.

[BibT_eX]

[DOI]

Carol O'Sullivan

Comput. Graph., 2017

Look but Don't Stare: Mutual Gaze Interaction in Social Robots.

[BibT_eX]

[DOI]

Yanxia Zhang

Muhammad Sikandar Lal Khan

Hedvig Kjellström

Proceedings of the Social Robotics - 9th International Conference, 2017

Moveable Facial Features in a Social Mediator.

[BibT_eX]

[DOI]

Proceedings of the Intelligent Virtual Agents - 17th International Conference, 2017

Crowd-Powered Design of Virtual Attentive Listeners.

[BibT_eX]

[DOI]

Patrik Jonell

Catharine Oertel

Proceedings of the Intelligent Virtual Agents - 17th International Conference, 2017

Crowd-Sourced Design of Artificial Attentive Listeners.

[BibT_eX]

[DOI]

Catharine Oertel

Patrik Jonell

Joseph Mendelson

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Controlling Prominence Realisation in Parametric DNN-Based Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

2016

A hybrid harmonics-and-bursts modelling approach to speech synthesis.

[BibT_eX]

[DOI]

Harald Berthelsen

Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016

WikiSpeech - enabling open source text-to-speech for Wikipedia.

[BibT_eX]

[DOI]

Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016

Robust online motion capture labeling of finger markers.

[BibT_eX]

[DOI]

Carol O'Sullivan

Proceedings of the 9th International Conference on Motion in Games, 2016

A Multi-party Multi-modal Dataset for Focus of Visual Attention in Human-human and Human-robot Interaction.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016

Look who's talking: visual identification of the active speaker in multi-party human-robot interaction.

[BibT_eX]

[DOI]

Akihiro Sugimoto

Proceedings of the 2nd Workshop on Advancements in Social Signal Processing for Multimodal Interaction, 2016

Automatic annotation of gestural units in spontaneous face-to-face interaction.

[BibT_eX]

[DOI]

Proceedings of the Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction, 2016

2015

Towards Fully Automated Motion Capture of Signs - Development and Evaluation of a Key Word Signing Avatar.

[BibT_eX]

[DOI]

ACM Trans. Access. Comput., 2015

Talking Heads, Signing Avatars and Social Robots.

[BibT_eX]

[DOI]

Proceedings of the 6th Workshop on Speech and Language Processing for Assistive Technologies, 2015

A Collaborative Human-Robot Game as a Test-bed for Modelling Multi-party, Situated Interaction.

[BibT_eX]

[DOI]

Proceedings of the Intelligent Virtual Agents - 15th International Conference, 2015

Exploring Turn-taking Cues in Multi-party Human-robot Discussions about Objects.

[BibT_eX]

[DOI]

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, Seattle, WA, USA, November 09, 2015

2014

Animated Lombard speech: Motion capture, facial animation and visual intelligibility of speech produced in adverse conditions.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2014

Spontaneous spoken dialogues with the furhat human-like robot head.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction, 2014

Human-robot collaborative tutoring using multiparty multimodal spoken dialogue.

[BibT_eX]

[DOI]

Ahmed Hussen Abdelaziz

Maria Koutsombogera

José David Águas Lopes

Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction, 2014

2013

The furhat Back-Projected humanoid Head-Lip Reading, gaze and Multi-Party Interaction.

[BibT_eX]

[DOI]

Int. J. Humanoid Robotics, 2013

Face-to-Face with a Robot: What do we actually Talk about?

[BibT_eX]

[DOI]

Int. J. Humanoid Robotics, 2013

Non-linear Pitch Modification in Voice Conversion Using Artificial Neural Networks.

[BibT_eX]

[DOI]

Bajibabu Bollepalli

Proceedings of the Advances in Nonlinear Speech Processing - 6th International Conference, 2013

The furhat social companion talking head.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Tutoring Robots - Multiparty Multimodal Social Dialogue with an Embodied Tutor.

[BibT_eX]

[DOI]

Bajibabu Bollepalli

Ahmed Hussen Abdelaziz

Maria Koutsombogera

José David Águas Lopes

Proceedings of the Innovative and Creative Developments in Multimodal Interaction Systems, 2013

Aspects of co-occurring syllables and head nods in spontaneous dialogue.

[BibT_eX]

[DOI]

Proceedings of the Auditory-Visual Speech Processing, 2013

Co-present or Not?

[BibT_eX]

[DOI]

Proceedings of the Eye Gaze in Intelligent User Interfaces, 2013

2012

Taming Mona Lisa: Communicating gaze faithfully in 2D and 3D facial projections.

[BibT_eX]

[DOI]

ACM Trans. Interact. Intell. Syst., 2012

Visual Recognition of Isolated Swedish Sign Language Signs

[BibT_eX]

[DOI]

Saad Akram

Hedvig Kjellström

CoRR, 2012

Children and adults in dialogue with the robot head Furhat - corpus collection and initial analysis.

[BibT_eX]

[DOI]

Proceedings of the Third Workshop on Child, Computer and Interaction, 2012

3rd party observer gaze as a continuous measure of dialogue flow.

[BibT_eX]

[DOI]

Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2012

Lip-Reading: Furhat Audio Visual Intelligibility of a Back Projected Animated Face.

[BibT_eX]

[DOI]

Proceedings of the Intelligent Virtual Agents - 12th International Conference, 2012

Multimodal multiparty social interaction with the furhat head.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Multimodal Interaction, 2012

2011

The Mona Lisa Gaze Effect as an Objective Metric for Perceived Cospatiality.

[BibT_eX]

[DOI]

Proceedings of the Intelligent Virtual Agents - 11th International Conference, 2011

Furhat: A Back-Projected Human-Like Robot Head for Multiparty Human-Machine Interaction.

[BibT_eX]

[DOI]

Proceedings of the Cognitive Behavioural Systems, 2011

A robotic head using projected animated faces.

[BibT_eX]

[DOI]

Proceedings of the Auditory-Visual Speech Processing, 2011

Kinetic data for large-scale analysis and modeling of face-to-face conversation.

[BibT_eX]

[DOI]

Proceedings of the Auditory-Visual Speech Processing, 2011

2010

Spontal: A Swedish Spontaneous Dialogue Corpus of Audio, Video and Motion Capture.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Language Resources and Evaluation, 2010

Prominence detection in Swedish using syllable correlates.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Perception of nonverbal gestures of prominence in visual speech animation.

[BibT_eX]

[DOI]

Proceedings of the ACM / SSPNET 2nd International Symposium on Facial Analysis and Animation, 2010

Perception of gaze direction in 2D and 3D facial projections.

[BibT_eX]

[DOI]

Proceedings of the ACM / SSPNET 2nd International Symposium on Facial Analysis and Animation, 2010

Audio-Visual Prosody: Perception, Detection, and Synthesis of Prominence.

[BibT_eX]

[DOI]

Proceedings of the Toward Autonomous, Adaptive, and Context-Aware Multimodal Interfaces. Theoretical and Practical Issues, 2010

Animated Faces for Robotic Heads: Gaze and Beyond.

[BibT_eX]

[DOI]

Proceedings of the Analysis of Verbal and Nonverbal Communication and Enactment. The Processing Issues, 2010

2009

Multimodal Interaction Control.

[BibT_eX]

[DOI]

Proceedings of the Computers in the Human Interaction Loop, 2009

Auditory visual prominence.

[BibT_eX]

[DOI]

J. Multimodal User Interfaces, 2009

SynFace - Speech-Driven Facial Animation for Virtual Speech-Reading Support.

[BibT_eX]

[DOI]

EURASIP J. Audio Speech Music. Process., 2009

Virtual speech reading support for hard of hearing in a domestic multi-media setting.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

The MonAMI reminder: a spoken dialogue system for face-to-face interaction.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Face-to-Face Interaction and the KTH Cooking Show.

[BibT_eX]

[DOI]

Proceedings of the Development of Multimodal Interfaces: Active Listening and Synchrony, 2009

Effects of visual prominence cues on speech intelligibility.

[BibT_eX]

[DOI]

Proceedings of the Auditory-Visual Speech Processing, 2009

Synface - verbal and non-verbal face animation from audio.

[BibT_eX]

[DOI]

Proceedings of the Auditory-Visual Speech Processing, 2009

2008

Innovative Interfaces in MonAMI: The Reminder.

[BibT_eX]

[DOI]

Proceedings of the Perception in Multimodal Dialogue Systems, 2008

Hearing at home - communication support in home environments for hearing impaired persons.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Recognizing and modelling regional varieties of Swedish.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Innovative interfaces in MonAMI: the reminder.

[BibT_eX]

[DOI]

Proceedings of the 10th International Conference on Multimodal Interfaces, 2008

2007

Pushy versus meek - using avatars to influence turn-taking behaviour.

[BibT_eX]

[DOI]

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Analysis and Synthesis of Multimodal Verbal and Non-verbal Interaction for Animated Interface Agents.

[BibT_eX]

[DOI]

Proceedings of the Verbal and Nonverbal Communication Behaviours, 2007

2006

Visual correlates to prominence in several expressive modes.

[BibT_eX]

[DOI]

Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

User Evaluation of the SYNFACE Talking Head Telephone.

[BibT_eX]

[DOI]

Proceedings of the Computers Helping People with Special Needs, 2006

2005

Data-driven synthesis of expressive visual speech using an MPEG-4 talking head.

[BibT_eX]

[DOI]

Mikael Nordenberg

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

2004

Trainable Articulatory Control Models for Visual Speech Synthesis.

[BibT_eX]

[DOI]

Int. J. Speech Technol., 2004

Design strategies for a virtual language tutor.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Spoken Language Processing, 2004

SYNFACE - A Talking Head Telephone for the Hearing-Impaired.

[BibT_eX]

[DOI]

Proceedings of the Computers Helping People with Special Needs, 2004

Expressive Animated Agents for Affective Dialogue Systems.

[BibT_eX]

[DOI]

Proceedings of the Affective Dialogue Systems, Tutorial and Research Workshop, 2004

Preliminary Cross-Cultural Evaluation of Expressiveness in Synthetic Faces.

[BibT_eX]

[DOI]

Proceedings of the Affective Dialogue Systems, Tutorial and Research Workshop, 2004

2003

Resynthesis of 3d tongue movements from facial data.

[BibT_eX]

[DOI]

Olov Engwall

Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

2002

Specification and realisation of multimodal output in dialogue systems.

[BibT_eX]

[DOI]

Magnus Nordstrand

Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

2001

Timing and interaction of visual cues for prominence in audiovisual speech perception.

[BibT_eX]

[DOI]

Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

2000

Wavesurfer - an open source speech tool.

[BibT_eX]

[DOI]

Kåre Sjölander

Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Adapt - a multimodal conversational dialogue system in an apartment domain.

[BibT_eX]

[DOI]

Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

1999

Picture my voice: Audio to visual speech synthesis using artificial neural networks.

[BibT_eX]

[DOI]

Proceedings of the Auditory-Visual Speech Processing, 1999

Developing a 3D-agent for the august dialogue system.

[BibT_eX]

[DOI]

Magnus Lundeberg