Ruohan Gao

CoRR, June, 2025

ControlTac: Force- and Position-Controlled Tactile Data Augmentation with a Single Reference Image.

[BibT_eX]

[DOI]

Dongyu Luo

Kelin Yu

Amir-Hossein Shahidzadeh

Cornelia Fermüller

Yiannis Aloimonos

CoRR, May, 2025

Differentiable Room Acoustic Rendering with Multi-View Vision Priors.

[BibT_eX]

[DOI]

Derong Jin

CoRR, April, 2025

Aurelia: Test-time Reasoning Distillation in Audio-Visual LLMs.

[BibT_eX]

[DOI]

CoRR, March, 2025

AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs.

[BibT_eX]

[DOI]

CoRR, January, 2025

Predicting the Distribution of Ailanthus altissima Using Deep Learning-Based Analysis of Satellite Imagery.

[BibT_eX]

[DOI]

Symmetry, 2025

Towards Perception-Informed Latent HRTF Representations.

[BibT_eX]

[DOI]

Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2025

Scene-wide Acoustic Parameter Estimation.

[BibT_eX]

[DOI]

Ricardo Falcón Pérez

Sebastià Vicenc Amengual Garí

Gregor Mueckl

Sebastià Vicenc Amengual Garí

Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2025

Hearing Anywhere in Any Environment.

[BibT_eX]

[DOI]

Xiulong Liu

Anurag Kumar

Paul Calamia

Calvin Murdock

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Learning to Highlight Audio by Watching Movies.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Multisensory Machine Intelligence.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024

Leveraging Cognitive Conflict and Organizational Unlearning for Digital Mastery in SMEs: Insights From Upper Echelons.

[BibT_eX]

[DOI]

IEEE Trans. Engineering Management, 2024

DiffSound: Differentiable Modal Sound Rendering and Inverse Rendering for Diverse Inference Tasks.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGGRAPH 2024 Conference Papers, 2024

Spherical World-Locking for Audio-Visual Localization in Egocentric Videos.

[BibT_eX]

[DOI]

Heeseung Yun

Proceedings of the Computer Vision - ECCV 2024, 2024

MEERKAT: Audio-Visual Large Language Model for Grounding in Space and Time.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

VMFTransformer: An Angle-Preserving and Auto-Scaling Machine for Multi-Horizon Probabilistic Forecasting.

[BibT_eX]

[DOI]

Proceedings of the ECAI 2024 - 27th European Conference on Artificial Intelligence, 19-24 October 2024, Santiago de Compostela, Spain, 2024

Hearing Anything Anywhere.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective.

[BibT_eX]

[DOI]

Wenqi Jia

Miao Liu

Hao Jiang

James M. Rehg

Vamsi Krishna Ithapu

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

Visually-Guided Audio Spatialization in Video with Geometry-Aware Multi-task Learning.

[BibT_eX]

[DOI]

Rishabh Garg

Int. J. Comput. Vis., October, 2023

Differentiable Physics Simulation of Dynamics-Augmented Neural Objects.

[BibT_eX]

[DOI]

IEEE Robotics Autom. Lett., May, 2023

Learning Object-Centric Neural Scattering Functions for Free-viewpoint Relighting and Scene Composition.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2023

The ObjectFolder Benchmark: Multisensory Learning with Neural and Real Objects.

[BibT_eX]

[DOI]

CoRR, 2023

An Extensible Multimodal Multi-task Object Dataset with Materials.

[BibT_eX]

[DOI]

CoRR, 2023

SoundCam: A Dataset for Finding Humans Using Room Acoustics.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Sonicverse: A Multisensory Simulation Platform for Embodied Household Agents that See and Hear.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Automation, 2023

An Extensible Multi-modal Multi-task Object Dataset with Materials.

[BibT_eX]

[DOI]

Trevor Scott Standley

Proceedings of the Eleventh International Conference on Learning Representations, 2023

The Object Folder Benchmark : Multisensory Learning with Neural and Real Objects.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

REALIMPACT: A Dataset of Impact Sound Fields for Real Objects.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

NOIR: Neural Signal Operated Intelligent Robots for Everyday Activities.

[BibT_eX]

[DOI]

Proceedings of the Conference on Robot Learning, 2023

2022

Determining critical lung cancer subtypes from gigapixel multi-scale whole slide H&E stains images.

[BibT_eX]

[DOI]

Santhosh Kumar Ramakrishnan

Proceedings of the 5th International Conference on Data Science and Information Technology, 2022

ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Visual Acoustic Matching.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

See, Hear, and Feel: Smart Sensory Fusion for Robotic Manipulation.

[BibT_eX]

[DOI]

Proceedings of the Conference on Robot Learning, 2022

2021

Learning to Set Waypoints for Audio-Visual Navigation.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

VisualVoice: Audio-Visual Speech Separation With Cross-Modal Consistency.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

ObjectFolder: A Dataset of Objects with Implicit Visual, Auditory, and Tactile Representations.

[BibT_eX]

[DOI]

Proceedings of the Conference on Robot Learning, 8-11 November 2021, London, UK., 2021

DiffImpact: Differentiable Rendering and Identification of Impact Sounds.

[BibT_eX]

[DOI]

Proceedings of the Conference on Robot Learning, 8-11 November 2021, London, UK., 2021

Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video.

[BibT_eX]

[DOI]

Rishabh Garg

Santhosh Kumar Ramakrishnan

Proceedings of the 32nd British Machine Vision Conference 2021, 2021

2020

Audio-Visual Waypoints for Navigation.

[BibT_eX]

[DOI]

CoRR, 2020

VisualEchoes: Spatial Image Representation Learning Through Echolocation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Listen to Look: Action Recognition by Previewing Audio.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019

Co-Separating Sounds of Visual Objects.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

2.5D Visual Sound.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018

ShapeCodes: Self-supervised Feature Learning by Lifting Views to Viewgrids.

[BibT_eX]

[DOI]

Dinesh Jayaraman

Proceedings of the Computer Vision - ECCV 2018, 2018

Learning to Separate Object Sounds by Watching Unlabeled Video.

[BibT_eX]

[DOI]

Rogério Schmidt Feris

Proceedings of the Computer Vision - ECCV 2018, 2018

Im2Flow: Motion Hallucination From Static Images for Action Recognition.

[BibT_eX]

[DOI]

Bo Xiong

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017

Unsupervised learning through one-shot image-based shape reconstruction.

[BibT_eX]

[DOI]

Dinesh Jayaraman

CoRR, 2017

On-demand Learning for Deep Image Restoration.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2017

2016

From One-Trick Ponies to All-Rounders: On-Demand Learning for Image Restoration.

[BibT_eX]

[DOI]

CoRR, 2016

Accelerating graph mining algorithms via uniform random edge sampling.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Communications, 2016

Object-Centric Representation Learning from Unlabeled Videos.

[BibT_eX]

[DOI]

Dinesh Jayaraman

Proceedings of the Computer Vision - ACCV 2016, 2016

2015

Graph Property Preservation under Community-Based Sampling.

[BibT_eX]

[DOI]