Xavier Alameda-Pineda

Orcid: 0000-0002-5354-1084

According to our database1, Xavier Alameda-Pineda authored at least 118 papers between 2008 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Unsupervised performance analysis of 3D face alignment with a statistically robust confidence test.
Neurocomputing, January, 2024

2023
Variational meta reinforcement learning for social robotics.
Appl. Intell., November, 2023

TransCenter: Transformers With Dense Representations for Multiple-Object Tracking.
IEEE Trans. Pattern Anal. Mach. Intell., June, 2023

Expression-Preserving Face Frontalization Improves Visually Assisted Speech Processing.
Int. J. Comput. Vis., May, 2023

Learning and controlling the source-filter representation of speech with a variational autoencoder.
Speech Commun., March, 2023

Continual Attentive Fusion for Incremental Learning in Semantic Segmentation.
IEEE Trans. Multim., 2023

SocialInteractionGAN: Multi-Person Interaction Sequence Generation.
IEEE Trans. Affect. Comput., 2023

Uncertainty-Aware Contrastive Distillation for Incremental Semantic Segmentation.
IEEE Trans. Pattern Anal. Mach. Intell., 2023

VQ-HPS: Human Pose and Shape Estimation in a Vector-Quantized Latent Space.
CoRR, 2023

Mixture of Dynamical Variational Autoencoders for Multi-Source Trajectory Modeling and Separation.
CoRR, 2023

Univariate Radial Basis Function Layers: Brain-inspired Deep Neural Layers for Low-Dimensional Inputs.
CoRR, 2023

A Comprehensive Multi-scale Approach for Speech and Dynamics Synchrony in Talking Head Generation.
CoRR, 2023

Unsupervised speech enhancement with deep dynamical generative speech and noise models.
CoRR, 2023

A Multimodal Dynamical Variational Autoencoder for Audiovisual Speech Representation Learning.
CoRR, 2023

Back to MLP: A Simple Baseline for Human Motion Prediction.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023

Motion-DVAE: Unsupervised learning for fast human motion denoising.
Proceedings of the 16th ACM SIGGRAPH Conference on Motion, Interaction and Games, 2023

On the Effectiveness of LayerNorm Tuning for Continual Learning in Vision Transformers.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Speech Modeling with a Hierarchical Transformer Dynamical VAE.
Proceedings of the IEEE International Conference on Acoustics, 2023

Semi-supervised learning made simple with self-supervised clustering.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Variational Inference and Learning of Piecewise Linear Dynamical Systems.
IEEE Trans. Neural Networks Learn. Syst., 2022

Unsupervised Speech Enhancement Using Dynamical Variational Autoencoders.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

Probabilistic Graph Attention Network With Conditional Kernels for Pixel-Wise Prediction.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

Weighted variance variational autoencoder for speech enhancement.
CoRR, 2022

Autoregressive GAN for Semantic Unconditional Head Motion Generation.
CoRR, 2022

Robust Audio-Visual Instance Discrimination via Active Contrastive Set Mining.
CoRR, 2022

HiT-DVAE: Human Motion Generation via Hierarchical Transformer Dynamical VAE.
CoRR, 2022

Unsupervised Multiple-Object Tracking with a Dynamical Variational Autoencoder.
CoRR, 2022

M4MM '22: 1st International Workshop on Methodologies for Multimedia.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Active Contrastive Set Mining for Robust Audio-Visual Instance Discrimination.
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

The Impact of Removing Head Movements on Audio-Visual Speech Enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2022

A Proposal-based Paradigm for Self-supervised Sound Source Localization in Videos.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Multi-Person Extreme Motion Prediction.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Self-Supervised Models are Continual Learners.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
Mixture of Inference Networks for VAE-Based Audio-Visual Speech Enhancement.
IEEE Trans. Signal Process., 2021

Variational Bayesian Inference for Audio-Visual Tracking of Multiple Speakers.
IEEE Trans. Pattern Anal. Mach. Intell., 2021

Dynamical Variational Autoencoders: A Comprehensive Review.
Found. Trends Mach. Learn., 2021

Successor Feature Neural Episodic Control.
CoRR, 2021

Xi-Learning: Successor Feature Transfer Learning for General Reward Functions.
CoRR, 2021

Unsupervised Speech Enhancement using Dynamical Variational Auto-Encoders.
CoRR, 2021

Multi-Person Extreme Motion Prediction with Cross-Interaction Attention.
CoRR, 2021

TransCenter: Transformers with Dense Queries for Multiple-Object Tracking.
CoRR, 2021

Variational Structured Attention Networks for Deep Visual Representation Learning.
CoRR, 2021

PI-Net: Pose Interacting Network for Multi-Person Monocular 3D Pose Estimation.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2021

Deep Variational Generative Models for Audio-Visual Speech Separation.
Proceedings of the 2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP), 2021

A Benchmark of Dynamical Variational Autoencoders Applied to Speech Spectrogram Modeling.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Switching Variational Auto-Encoders for Noise-Agnostic Audio-Visual Speech Enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2021

2020
Learning How to Smile: Expression Video Generation With Conditional Adversarial Recurrent Nets.
IEEE Trans. Multim., 2020

Audio-Visual Speech Enhancement Using Conditional Variational Auto-Encoders.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

A Comprehensive Analysis of Deep Regression.
IEEE Trans. Pattern Anal. Mach. Intell., 2020

Special Issue on Generating Realistic Visual Data of Human Behavior.
Int. J. Comput. Vis., 2020

GMM-UNIT: Unsupervised Multi-Domain and Multi-Modal Image-to-Image Translation via Attribute Gaussian Mixture Modeling.
CoRR, 2020

Describe What to Change: A Text-guided Unsupervised Image-to-image Translation Approach.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

FATE/MM 20: 2nd International Workshop on Fairness, Accountability, Transparency and Ethics in MultiMedia.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

CANU-ReID: A Conditional Adversarial Network for Unsupervised person Re-IDentification.
Proceedings of the 25th International Conference on Pattern Recognition, 2020

ODANet: Online Deep Appearance Network for Identity-Consistent Multi-person Tracking.
Proceedings of the Pattern Recognition. ICPR International Workshops and Challenges, 2020

Robust Unsupervised Audio-Visual Speech Enhancement Using a Mixture of Variational Autoencoders.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

A Recurrent Variational Autoencoder for Speech Enhancement.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

How to Train Your Deep Multi-Object Tracker.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Towards Probabilistic Generative Models for Socially Intelligent Robots.
, 2020

2019
Increasing Image Memorability with Neural Style Transfer.
ACM Trans. Multim. Comput. Commun. Appl., 2019

Special Section on Multimodal Understanding of Social, Affective, and Subjective Attributes.
ACM Trans. Multim. Comput. Commun. Appl., 2019

Tracking Multiple Audio Sources With the von Mises Distribution and Variational EM.
IEEE Signal Process. Lett., 2019

Online Localization and Tracking of Multiple Moving Speakers in Reverberant Environments.
IEEE J. Sel. Top. Signal Process., 2019

Audio-visual Speech Enhancement Using Conditional Variational Auto-Encoder.
CoRR, 2019

DeepMOT: A Differentiable Framework for Training Multiple Object Trackers.
CoRR, 2019

Camera Adversarial Transfer for Unsupervised Person Re-Identification.
CoRR, 2019

FAT/MM'19: 1st International Workshop on Fairness, Accountability, and Transparency in MultiMedia.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Audio-Visual Variational Fusion for Multi-Person Tracking with Robots.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

The Predicting Media Memorability Task at MediaEval 2019.
Proceedings of the Working Notes Proceedings of the MediaEval 2019 Workshop, 2019

2018
Cross-Paced Representation Learning With Partial Curricula for Sketch-Based Image Retrieval.
IEEE Trans. Image Process., 2018

A cascaded multiple-speaker localization and tracking system.
CoRR, 2018

Every Smile is Unique: Landmark-Guided Diverse Smile Generation.
CoRR, 2018

EE-USAD: ACM MM 2018Workshop on UnderstandingSubjective Attributes of Data focus on Evoked Emotions.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

Accounting for Room Acoustics in Audio-Visual Multi-Speaker Tracking.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

DeepGUM: Learning Deep Robust Regression with a Gaussian-Uniform Mixture Model.
Proceedings of the Computer Vision - ECCV 2018, 2018

Every Smile Is Unique: Landmark-Guided Diverse Smile Generation.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Multimodal analysis of free-standing conversational groups.
Proceedings of the Frontiers of Multimedia Research, 2018

2017
Extending the Cascaded Gaussian Mixture Regression Framework for Cross-Speaker Acoustic-Articulatory Mapping.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

Automatic animation of an articulatory tongue model from ultrasound images of the vocal tract.
Speech Commun., 2017

Exploiting the intermittency of speech for joint separation and diarization.
Proceedings of the 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2017

Learning Deep Structured Multi-Scale Features using Attention-Gated CRFs for Contour Prediction.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

MUSA2: First ACM Workshop on Multimodal Understanding of Social, Affective and Subjective Attributes.
Proceedings of the 2017 ACM on Multimedia Conference, 2017

How to Make an Image More Memorable?: A Deep Style Transfer Approach.
Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, 2017

Tracking a varying number of people with a visually-controlled robotic head.
Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2017

Exploiting the Complementarity of Audio and Visual Data in Multi-speaker Tracking.
Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops, 2017

An EM algorithm for joint source separation and diarisation of multichannel convolutive speech mixtures.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Adaptation of a Gaussian Mixture Regressor to a New Input Distribution: Extending the C-GMR Framework.
Proceedings of the Latent Variable Analysis and Signal Separation, 2017

Viraliency: Pooling Local Virality.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

SALSA: A Multimodal Dataset for the Automated Analysis of Free-Standing Social Interactions.
Proceedings of the Group and Crowd Behavior for Computer Vision, 1st Edition, 2017

2016
A Variational EM Algorithm for the Separation of Time-Varying Convolutive Audio Mixtures.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

EM Algorithms for Weighted-Data Clustering with Application to Audio-Visual Scene Analysis.
IEEE Trans. Pattern Anal. Mach. Intell., 2016

SALSA: A Novel Dataset for Multimodal Group Behavior Analysis.
IEEE Trans. Pattern Anal. Mach. Intell., 2016

An on-line variational Bayesian model for multi-person tracking from cluttered scenes.
Comput. Vis. Image Underst., 2016

Academic Coupled Dictionary Learning for Sketch-based Image Retrieval.
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

Emerging Topics in Learning from Noisy and Missing Data.
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

Multi-Paced Dictionary Learning for cross-domain retrieval and recognition.
Proceedings of the 23rd International Conference on Pattern Recognition, 2016

An inverse-gamma source variance prior with factorized parameterization for audio source separation.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Tracking Multiple Persons Based on a Variational Bayesian Model.
Proceedings of the Computer Vision - ECCV 2016 Workshops, 2016

Self-Adaptive Matrix Completion for Heart Rate Estimation from Face Videos under Realistic Conditions.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Recognizing Emotions from Abstract Paintings Using Non-Linear Matrix Completion.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Projective Unsupervised Flexible Embedding with Optimal Graph.
Proceedings of the British Machine Vision Conference 2016, 2016

2015
Speaker-Adaptive Acoustic-Articulatory Inversion Using Cascaded Gaussian Mixture Regression.
IEEE ACM Trans. Audio Speech Lang. Process., 2015

Vision-guided robot hearing.
Int. J. Robotics Res., 2015

A variational EM algorithm for the separation of moving sound sources.
Proceedings of the 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2015

Analyzing Free-standing Conversational Groups: A Multimodal Approach.
Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, MM '15, Brisbane, Australia, October 26, 2015

2014
A Geometric Approach to Sound Source Localization from Time-Delay Estimates.
IEEE ACM Trans. Audio Speech Lang. Process., 2014

Audio-visual speaker localization via weighted clustering.
Proceedings of the IEEE International Workshop on Machine Learning for Signal Processing, 2014

Sound representation and classification benchmark for domestic robots.
Proceedings of the 2014 IEEE International Conference on Robotics and Automation, 2014

2013
Egocentric Audio-Visual Scene Analysis. A Machine Learning and Signal Processing Approach. (Analyse Égocentrique de Scènes Audio-Visuelles. Une approche par Apprentissage Automatique et Traitement du Signal).
PhD thesis, 2013

RAVEL: an annotated corpus for training robots with audiovisual abilities.
J. Multimodal User Interfaces, 2013

The geometry of sound-source localization using non-coplanar microphone arrays.
Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2013

Benchmarking methods for audio-visual recognition using tiny training sets.
Proceedings of the IEEE International Conference on Acoustics, 2013

2012
Audio-visual robot command recognition: D-META'12 grand challenge.
Proceedings of the International Conference on Multimodal Interaction, 2012

Online multimodal speaker detection for humanoid robots.
Proceedings of the 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012), Osaka, Japan, November 29, 2012

Sound-event recognition with a companion humanoid.
Proceedings of the 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012), Osaka, Japan, November 29, 2012

Geometrically-constrained robust time delay estimation using non-coplanar microphone arrays.
Proceedings of the 20th European Signal Processing Conference, 2012

2011
Finding audio-visual events in informal social gatherings.
Proceedings of the 13th International Conference on Multimodal Interfaces, 2011

2008
Image compression with Generalized Lifting and partial knowledge of the signal pdf.
Proceedings of the International Conference on Image Processing, 2008


  Loading...