Philip J. B. Jackson

CoRR, January, 2026

Reverberation-Based Features for Sound Event Localization and Detection With Distance Estimation.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2026

2025

Integrating Spatial and Semantic Embeddings for Stereo Sound Event Localization in Videos.

[BibT_eX]

[DOI]

CoRR, September, 2025

Spatial and Semantic Embedding Integration for Stereo Sound Event Localization and Detection in Regular Videos.

[BibT_eX]

[DOI]

CoRR, July, 2025

PAL: Probing Audio Encoders via LLMs - A Study of Information Transfer from Audio Encoders to LLMs.

[BibT_eX]

[DOI]

CoRR, June, 2025

Deconstruct Complexity (DeComplex): A Novel Perspective on Tackling Dense Action Detection.

[BibT_eX]

[DOI]

CoRR, January, 2025

SSLAM: Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024

Leveraging Visual Supervision for Array-Based Active Speaker Detection and Localization.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Leveraging Reverberation and Visual Depth Cues for Sound Event Localization and Detection with Distance Estimation.

[BibT_eX]

[DOI]

CoRR, 2024

Audio-Visual Talker Localization in Video for Spatial Sound Reproduction.

[BibT_eX]

[DOI]

CoRR, 2024

ForecasterFlexOBM: A Multi-View Audio-Visual Dataset for Flexible Object-Based Media Production.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

Fusion of Audio and Visual Embeddings for Sound Event Localization and Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Max-AST: Combining Convolution, Local and Global Self-Attentions for Audio Event Classification.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

CoLeaF: A Contrastive-Collaborative Learning Framework for Weakly Supervised Audio-Visual Video Parsing.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

DTF-AT: Decoupled Time-Frequency Audio Transformer for Event Classification.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Conditional trust: Citizens' council on data-driven media personalisation and public expectations of transparency and accountability.

[BibT_eX]

[DOI]

Big Data Soc., July, 2023

Audio-Visual Speaker Tracking: Progress, Challenges, and Future Directions.

[BibT_eX]

[DOI]

CoRR, 2023

Audio Inputs for Active Speaker Detection and Localization Via Microphone Array.

[BibT_eX]

[DOI]

Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2023

Producing Personalised Object-Based Audio-Visual Experiences: an Ethnographic Study.

[BibT_eX]

[DOI]

Craig Cieciura

Maxine Glancy

Proceedings of the 2023 ACM International Conference on Interactive Media Experiences, 2023

PAT: Position-Aware Transformer for Dense Multi-Label Action Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

2022

Immersive audio-visual scene reproduction using semantic scene reconstruction from 360 cameras.

[BibT_eX]

[DOI]

Virtual Real., 2022

Tragic Talkers: A Shakespearean Sound- and Light-Field Dataset for Audio-Visual Machine Learning Research.

[BibT_eX]

[DOI]

Marco Volino

Proceedings of the European Conference on Visual Media Production, 2022

2021

Acoustic Room Modelling Using 360 Stereo Cameras.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2021

Naturalistic audio-visual volumetric sequences dataset of sounding actions for six degree-of-freedom interaction.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops, 2021

Visually Supervised Speaker Detection and Localization via Microphone Array.

[BibT_eX]

[DOI]

Adrian Hilton

Proceedings of the 23rd International Workshop on Multimedia Signal Processing, 2021

2020

Immersive Virtual Reality Audio Rendering Adapted to the Listener and the Room.

[BibT_eX]

[DOI]

Proceedings of the Adversarial and Uncertain Reasoning for Adaptive Cyber Defense, 2020

Audio-Visual Spatial Aligment Requirements of Central and Peripheral Object Events.

[BibT_eX]

[DOI]

CoRR, 2020

Audio-Visual Spatial Alignment Requirements of Central and Peripheral Object Events.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops, 2020

2019

Modeling the Comb Filter Effect and Interaural Coherence for Binaural Source Separation.

[BibT_eX]

[DOI]

Luca Remaggi

IEEE ACM Trans. Audio Speech Lang. Process., 2019

A Speech Synthesis Approach for High Quality Speech Separation and Generation.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2019

Immersive Spatial Audio Reproduction for VR/AR Using Room Acoustic Modelling from 360° Images.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Virtual Reality and 3D User Interfaces, 2019

Single-Channel Signal Separation and Deconvolution with Generative Adversarial Networks.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Generalisation in Environmental Sound Classification: The 'Making Sense of Sounds' Data Set and Challenge.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Robust Full-sphere Binaural Sound Source Localization Using Interaural and Spectral Cues.

[BibT_eX]

[DOI]

Benjamin R. Hammond

Proceedings of the IEEE International Conference on Acoustics, 2019

Six types of audio that DEFY reality!: A taxonomy of audio augmented reality with examples.

[BibT_eX]

[DOI]

Michael Krzyzaniak

David M. Frohlich

Proceedings of the 14th International Audio Mostly Conference: A Journey in Sound, 2019

2018

Multiple Speaker Tracking in Spatial Audio via PHD Filtering and Depth-Audio Fusion.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2018

An Audio-Visual System for Object-Based Audio: From Recording to Listening.

[BibT_eX]

[DOI]

Marcos F. Simón Gálvez

IEEE Trans. Multim., 2018

An Audio-Visual Method for Room Boundary Estimation and Material Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2018 Workshop on Audio-Visual Scene Understanding for Immersive Multimedia, 2018

Acoustic Reflector Localization and Classification.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Iterative Deep Neural Networks for Speaker-Independent Binaural Blind Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Synthesis of Images by Two-Stage Generative Adversarial Networks.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Robust Full-Sphere Binaural Sound Source Localization.

[BibT_eX]

[DOI]

Benjamin R. Hammond

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Perceptual Evaluation of Blind Source Separation in Object-Based Audio Production.

[BibT_eX]

[DOI]

Proceedings of the Latent Variable Analysis and Signal Separation, 2018

Supporting Audiography: Design of a System for Sentimental Sound Recording, Classification and Playback.

[BibT_eX]

[DOI]

Proceedings of the HCI International 2018, 2018

Robust median-plane binaural sound source localization.

[BibT_eX]

[DOI]

Benjamin R. Hammond

Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events, 2018

A Performance Evaluation of Several Deep Neural Networks for Reverberant Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the 52nd Asilomar Conference on Signals, Systems, and Computers, 2018

2017

Unsupervised Feature Learning Based on Deep Models for Environmental Audio Tagging.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2017

Acoustic Reflector Localization: Novel Image Source Reversion and Direct Localization Methods.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2017

Object-Based Audio Rendering.

[BibT_eX]

[DOI]

Marcos F. Simón Gálvez

Teofilo de Campos

Hansung Kim

Hanne Stenzel

CoRR, 2017

Speech reaction time measurements for the evaluation of audio-visual spatial coherence.

[BibT_eX]

[DOI]

Hanne Stenzel

Jon Francombe

Proceedings of the Ninth International Conference on Quality of Multimedia Experience, 2017

Fast tagging of natural sounds using marginal co-regularization.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

A perceptually-weighted deep neural network for monaural speech enhancement in various background noise conditions.

[BibT_eX]

[DOI]

Proceedings of the 25th European Signal Processing Conference, 2017

Media Device Orchestration for Immersive Spatial Audio Reproduction.

[BibT_eX]

[DOI]

Proceedings of the 12th International Audio Mostly Conference on Augmented and Participatory Sound and Music Experiences, 2017

3D Room Geometry Reconstruction Using Audio-Visual Sensors.

[BibT_eX]

[DOI]

Proceedings of the 2017 International Conference on 3D Vision, 2017

2016

Fully Deep Neural Networks Incorporating Unsupervised Feature Learning for Audio Tagging.

[BibT_eX]

[DOI]

CoRR, 2016

Predicting Binaural Speech Intelligibility from Signals Estimated by a Blind Source Separation Algorithm.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Fully DNN-Based Multi-Label Regression for Audio Tagging.

[BibT_eX]

[DOI]

Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events, 2016

2015

Person Tracking Using Audio and Depth Cues.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Computer Vision Workshop, 2015

A 3D model for room boundary estimation.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

IVA algorithms using a multivariate Student's t source prior for speech source separation in real room environments.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

A source separation evaluation method in object-based spatial audio.

[BibT_eX]

[DOI]

Proceedings of the 23rd European Signal Processing Conference, 2015

2014

Joint Mixing Vector and Binaural Model Based Stereo Source Separation.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2014

2013

Source Separation of Convolutive and Noisy Mixtures Using Audio-Visual Dictionary Learning and Probabilistic Time-Frequency Masking.

[BibT_eX]

[DOI]

IEEE Trans. Signal Process., 2013

Spatial and coherence cues based time-frequency masking for binaural reverberant speech separation.

[BibT_eX]

[DOI]

Atiyeh Alinaghi

Proceedings of the IEEE International Conference on Acoustics, 2013

2012

Use of bimodal coherence to resolve the permutation problem in convolutive BSS.

[BibT_eX]

[DOI]

Signal Process., 2012

Reverberant speech separation based on audio-visual dictionary learning and binaural cues.

[BibT_eX]

[DOI]

Proceedings of the IEEE Statistical Signal Processing Workshop, 2012

2011

Source localization and separation using Random Sample Consensus with phase cues.

[BibT_eX]

[DOI]

Lukasz Litwic

Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2011

Integrating binaural cues and blind source separation method for separating reverberant speech mixtures.

[BibT_eX]

[DOI]

Atiyeh Alinaghi

Proceedings of the IEEE International Conference on Acoustics, 2011

Robust feature selection for scaling ambiguity reduction in audio-visual convolutive BSS.

[BibT_eX]

[DOI]

Proceedings of the 19th European Signal Processing Conference, 2011

2010

Bimodal coherence based scale ambiguity cancellation for target speech extraction and enhancement.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Use of Bimodal Coherence to Resolve Spectral Indeterminacy in Convolutive BSS.

[BibT_eX]

[DOI]

Proceedings of the Latent Variable Analysis and Signal Separation, 2010

2009

Statistical identification of articulation constraints in the production of speech.

[BibT_eX]

[DOI]

Veena D. Singampalli

Speech Commun., 2009

Model-based synthesis of visual speech movements from 3D video.

[BibT_eX]

[DOI]

James D. Edge

Adrian Hilton

Proceedings of the International Conference on Computer Graphics and Interactive Techniques, 2009

Speaker-dependent audio-visual emotion recognition.

[BibT_eX]

[DOI]

Sanaul Haq

Proceedings of the Auditory-Visual Speech Processing, 2009

2008

Frication and Voicing Classification.

[BibT_eX]

[DOI]

Luis M. T. Jesus

Proceedings of the Computational Processing of the Portuguese Language, 2008

Parallel model combination and word recognition in soccer audio.

[BibT_eX]

[DOI]

Jack H. Longton

Proceedings of the 2008 IEEE International Conference on Multimedia and Expo, 2008

Audio-visual feature selection and reduction for emotion classification.

[BibT_eX]

[DOI]

Sanaul Haq

James D. Edge

Proceedings of the International Conference on Auditory-Visual Speech Processing 2008, 2008

Parameterisation of 3d speech lip movements.

[BibT_eX]

[DOI]

James D. Edge

Adrian Hilton

Proceedings of the International Conference on Auditory-Visual Speech Processing 2008, 2008

2007

Visual analysis of lip coarticulation in VCV utterances.

[BibT_eX]

[DOI]

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Statistical identification of critical, dependent and redundant articulators.

[BibT_eX]

[DOI]

Veena D. Singampalli

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Time-Frequency-Modulation Representation of Stochastic Signals.

[BibT_eX]

[DOI]

Proceedings of the 15th International Conference on Digital Signal Processing, 2007

2006

Enhancement of harmonic content of speech based on a dynamic programming pitch tracking algorithm.

[BibT_eX]

[DOI]

Mark R. Every

Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

2005

A multiple-level linear/linear segmental HMM with a formant-based intermediate layer.

[BibT_eX]

[DOI]

Martin J. Russell

Comput. Speech Lang., 2005

Amplitude modulation of frication noise by voicing saturates.

[BibT_eX]

[DOI]

Jonathan Pincas

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

2004

Speech-Driven Face Synthesis from 3D Video.

[BibT_eX]

[DOI]

Proceedings of the 2nd International Symposium on 3D Data Processing, 2004

2003

The effect of an intermediate articulatory layer on the performance of a segmental HMM.

[BibT_eX]

[DOI]

Martin J. Russell

Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Covariation and weighting of harmonically decomposed streams for ASR.

[BibT_eX]

[DOI]

Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

2002

Models of speech dynamics in a segmental-HMM recognizer using intermediate linear representations.

[BibT_eX]

[DOI]

Martin J. Russell

Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

2001

Pitch-scaled estimation of simultaneous voiced and turbulence-noise components in speech.

[BibT_eX]

[DOI]

Christine H. Shadle

IEEE Trans. Speech Audio Process., 2001

2000

Performance of the pitch-scaled harmonic filter and applications in speech analysis.

[BibT_eX]

[DOI]