Paul Pu Liang

Proceedings of the Forty-second International Conference on Machine Learning, 2025

CLIMB: Data Foundations for Large Scale Multimodal Clinical Foundation Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

TeaserGen: Generating Teasers for Long Documentaries.

[BibT_eX]

[DOI]

Taylor Berg-Kirkpatrick

Hao-Wen Dong

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Progressive Compositionality in Text-to-Image Generative Models.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

OS-ATLAS: Foundation Action Model for Generalist GUI Agents.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

OpenFace 3.0: A Lightweight Multitask System for Comprehensive Facial Behavior Analysis.

[BibT_eX]

[DOI]

Jiewen Hu

Leena Mathur

Proceedings of the 19th IEEE International Conference on Automatic Face and Gesture Recognition, 2025

Deriving Strategic Market Insights with Large Language Models: A Benchmark for Forward Counterfactual Generation.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

Social Genome: Grounded Social Reasoning Abilities of Multimodal Models.

[BibT_eX]

[DOI]

Leena Mathur

Marian Qian

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

Fits like a Flex-Glove: Automatic Design of Personalized FPCB-Based Tactile Sensing Gloves.

[BibT_eX]

[DOI]

Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, 2025

Multimodal AI for Human Sensing and Interaction.

[BibT_eX]

[DOI]

Karan Ahuja

Yiyue Luo

Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, 2025

Interactive Sketchpad: A Multimodal Tutoring System for Collaborative, Visual Problem-Solving.

[BibT_eX]

[DOI]

Jimin Lee

Steven-Shine Chen

Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, 2025

TAMP: Token-Adaptive Layerwise Pruning in Multimodal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

VLM2-Bench: A Closer Look at How Well VLMs Implicitly Link Explicit Matching Visual Cues.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

Foundations & Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions.

[BibT_eX]

[DOI]

ACM Comput. Surv., October, 2024

Foundations of Multisensory Artificial Intelligence

[BibT_eX]

[DOI]

PhD thesis, 2024

WiReSens Toolkit: An Open-source Platform towards Accessible Wireless Tactile Sensing.

[BibT_eX]

[DOI]

CoRR, 2024

Multimodal Fusion Balancing Through Game-Theoretic Regularization.

[BibT_eX]

[DOI]

Konstantinos Kontras

Thomas Strypsteen

Christos Chatzichristos

Matthew B. Blaschko

Maarten De Vos

CoRR, 2024

Benchmarking Vision, Language, & Action Models on Robotic Learning Tasks.

[BibT_eX]

[DOI]

CoRR, 2024

Quantitative Insights into Language Model Usage and Trust in Academia: An Empirical Study.

[BibT_eX]

[DOI]

CoRR, 2024

MultiMed: Massively Multimodal and Multitask Medical Understanding.

[BibT_eX]

[DOI]

Shentong Mo

CoRR, 2024

IoT-LM: Large Multisensory Language Models for the Internet of Things.

[BibT_eX]

[DOI]

Shentong Mo

CoRR, 2024

Foundations of Multisensory Artificial Intelligence.

[BibT_eX]

[DOI]

CoRR, 2024

Semantically Corrected Amharic Automatic Speech Recognition.

[BibT_eX]

[DOI]

Samuael Adnew

CoRR, 2024

HEMM: Holistic Evaluation of Multimodal Foundation Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Multimodal Learning Without Labeled Multimodal Data: Guarantees and Applications.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

MMoE: Enhancing Multimodal Models with Mixtures of Multimodal Interaction Experts.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Advancing Social Intelligence in AI Agents: Technical Challenges and Open Questions.

[BibT_eX]

[DOI]

Leena Mathur

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

FLHetBench: Benchmarking Device and State Heterogeneity in Federated Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Modeling Dense Multimodal Interactions Between Biological Pathways and Histology for Survival Prediction.

[BibT_eX]

[DOI]

Guillaume Jaume

Anurag Vaidya

Richard J. Chen

Drew F. K. Williamson

Faisal Mahmood

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Think Twice: Perspective-Taking Improves Large Language Models' Theory-of-Mind Capabilities.

[BibT_eX]

[DOI]

Alex Wilf

Sihyun Shawn Lee

Giambattista Parascandolo

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models.

[BibT_eX]

[DOI]

Bartlomiej Bojanowski

Christopher D. Manning

Daniel Moseguí González

Eunice Engefu Manyasi

Evgenii Zheltonozhskii

Fanyue Xia

Fatemeh Siar

Fernando Martínez-Plumed

Giorgio Mariani

Gloria Wang

Gonzalo Jaimovitch-López

Jaime Fernández Fisac

Jascha Sohl-Dickstein

José Hernández-Orallo

Karthik Gopalakrishnan

Lidia Contreras Ochando

María José Ramírez-Quintana

Michael I. Ivanitskiy

Neta Gur-Ari Krakover

Nitish Shirish Keskar

Pablo Antonio Moreno Casares

Pegah Alipoormolabashi

Shyamolima (Shammie) Debnath

Sneha Priscilla Makini

Yadollah Yaghoobzadeh

Trans. Mach. Learn. Res., 2023

High-Modality Multimodal Transformer: Quantifying Modality & Interaction Heterogeneity for High-Modality Representation Learning.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2023

MultiZoo and MultiBench: A Standardized Toolkit for Multimodal Deep Learning.

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2023

MMOE: Mixture of Multimodal Interaction Experts.

[BibT_eX]

[DOI]

Haofei Yu

CoRR, 2023

MultiIoT: Towards Large-scale Multisensory Learning for the Internet of Things.

[BibT_eX]

[DOI]

Shentong Mo

CoRR, 2023

Comparative Knowledge Distillation.

[BibT_eX]

[DOI]

CoRR, 2023

MultiZoo & MultiBench: A Standardized Toolkit for Multimodal Deep Learning.

[BibT_eX]

[DOI]

CoRR, 2023

Quantifying & Modeling Feature Interactions: An Information Decomposition Framework.

[BibT_eX]

[DOI]

CoRR, 2023

Read and Reap the Rewards: Learning to Play Atari with the Help of Instruction Manuals.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Localized Symbolic Knowledge Distillation for Visual Commonsense Models.

[BibT_eX]

[DOI]

Jae Sung Park

Jack Hessel

Khyathi Raghavi Chandu

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Factorized Contrastive Learning: Going Beyond Multi-view Redundancy.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Quantifying & Modeling Multimodal Interactions: An Information Decomposition Framework.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Tutorial on Multimodal Machine Learning: Principles, Challenges, and Open Questions.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Multimodal Interaction, 2023

Multimodal Fusion Interactions: A Study of Human and Automatic Quantification.

[BibT_eX]

[DOI]

Yun Cheng

Proceedings of the 25th International Conference on Multimodal Interaction, 2023

HIINT: Historical, Intra- and Inter- personal Dynamics Modeling with Cross-person Memory Transformer.

[BibT_eX]

[DOI]

Proceedings of the 25th International Conference on Multimodal Interaction, 2023

MultiViz: Towards Visualizing and Understanding Multimodal Models.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Towards Vision-Language Mechanistic Interpretability: A Causal Tracing Tool for BLIP.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Lecture Presentations Multimodal Dataset: Towards Understanding Multimodality in Educational Videos.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Face-to-Face Contrastive Learning for Social Intelligence Question-Answering.

[BibT_eX]

[DOI]

Proceedings of the 17th IEEE International Conference on Automatic Face and Gesture Recognition, 2023

Difference-Masking: Choosing What to Mask in Continued Pretraining.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

MultiViz: Towards User-Centric Visualizations and Interpretations of Multimodal Models.

[BibT_eX]

[DOI]

Proceedings of the Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems, 2023

Language Models Get a Gender Makeover: Mitigating Gender Bias with Few-Shot Data Interventions.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2023

Cross-modal Attention Congruence Regularization for Vision-Language Relation Alignment.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Nano: Nested Human-in-the-Loop Reward Learning for Few-shot Language Model Control.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

Demystify the Gravity Well in the Optimization Landscape (Student Abstract).

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

Foundations and Recent Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions.

[BibT_eX]

[DOI]

CoRR, 2022

Multimodal Lecture Presentations Dataset: Understanding Multimodality in Educational Slides.

[BibT_eX]

[DOI]

CoRR, 2022

Face-to-Face Contrastive Learning for Social Intelligence Question-Answering.

[BibT_eX]

[DOI]

CoRR, 2022

MultiViz: An Analysis Benchmark for Visualizing and Understanding Multimodal Models.

[BibT_eX]

[DOI]

Pawan Sasanka Ammanamanchi

CoRR, 2022

GEMv2: Multilingual NLG Benchmarking in a Single Line of Code.

[BibT_eX]

[DOI]

Alexandros Papangelis

Aman Madaan

Angelina McMillan-Major

Khyathi Raghavi Chandu

Laura Perez-Beltrachini

Leonardo F. R. Ribeiro

CoRR, 2022

Brainish: Formalizing A Multimodal Language for Intelligence and Consciousness.

[BibT_eX]

[DOI]

CoRR, 2022

HighMMT: Towards Modality and Task Generalization for High-Modality Representation Learning.

[BibT_eX]

[DOI]

CoRR, 2022

Uncertainty Quantification with Pre-trained Language Models: A Large-Scale Empirical Analysis.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

PACS: A Dataset for Physical Audiovisual CommonSense Reasoning.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Rethinking Architecture Design for Tackling Data Heterogeneity in Federated Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

DIME: Fine-grained Interpretations of Multimodal Models via Disentangled Local Explanations.

[BibT_eX]

[DOI]

Proceedings of the AIES '22: AAAI/ACM Conference on AI, Ethics, and Society, Oxford, United Kingdom, May 19, 2022

2021

Ask & Explore: Grounded Question Answering for Curiosity-Driven Exploration.

[BibT_eX]

[DOI]

Jivat Neet Kaur

Yiding Jiang

CoRR, 2021

Understanding the Tradeoffs in Client-Side Privacy for Speech Recognition.

[BibT_eX]

[DOI]

Peter Wu

CoRR, 2021

MultiBench: Multiscale Benchmarks for Multimodal Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

StylePTB: A Compositional Benchmark for Fine-grained Controllable Text Style Transfer.

[BibT_eX]

[DOI]

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Cross-Modal Generalization: Learning in Low Resource Modalities via Meta-Alignment.

[BibT_eX]

[DOI]

Peter Wu

Liu Ziyin

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Towards Understanding and Mitigating Social Biases in Language Models.

[BibT_eX]

[DOI]

Chiyu Wu

Proceedings of the 38th International Conference on Machine Learning, 2021

Anchor & Transform: Learning Sparse Embeddings for Large Vocabularies.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

Understanding the Tradeoffs in Client-side Privacy for Downstream Speech Tasks.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

Learning Language and Multimodal Privacy-Preserving Markers of Mood from Mobile Data.

[BibT_eX]

[DOI]

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

2020

Deep Neural Network for Robust Modulation Classification Under Uncertain Noise Conditions.

[BibT_eX]

[DOI]

IEEE Trans. Veh. Technol., 2020

Foundations of Multimodal Co-learning.

[BibT_eX]

[DOI]

Inf. Fusion, 2020

Multimodal Privacy-preserving Mood Prediction from Mobile Data: A Preliminary Study.

[BibT_eX]

[DOI]

CoRR, 2020

An Investigation of how Label Smoothing Affects Generalization.

[BibT_eX]

[DOI]

CoRR, 2020

Anchor & Transform: Learning Sparse Representations of Discrete Objects.

[BibT_eX]

[DOI]

CoRR, 2020

Learning Not to Learn in the Presence of Noisy Labels.

[BibT_eX]

[DOI]

Masahito Ueda

CoRR, 2020

Think Locally, Act Globally: Federated Learning with Local and Global Representations.

[BibT_eX]

[DOI]

CoRR, 2020

CMU-MOSEAS: A Multimodal Language Dataset for Spanish, Portuguese, German and French.

[BibT_eX]

[DOI]

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Diverse and Admissible Trajectory Forecasting Through Multimodal Context Understanding.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

On Emergent Communication in Competitive Multi-Agent Teams.

[BibT_eX]

[DOI]

Jeffrey Chen

Satwik Kottur

Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems, 2020

Towards Debiasing Sentence Representations.

[BibT_eX]

[DOI]

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

2019

Factorized Multimodal Transformer for Multimodal Sequential Learning.

[BibT_eX]

[DOI]

CoRR, 2019

Variational Auto-Decoder.

[BibT_eX]

[DOI]

Yao Chong Lim

CoRR, 2019

Deep Gamblers: Learning to Abstain with Portfolio Theory.

[BibT_eX]

[DOI]

Masahito Ueda

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Strong and Simple Baselines for Multimodal Utterance Embeddings.

[BibT_eX]

[DOI]

Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

Learning Factorized Multimodal Representations.

[BibT_eX]

[DOI]

Yao-Hung Hubert Tsai

Proceedings of the 7th International Conference on Learning Representations, 2019

Social-IQ: A Question Answering Benchmark for Artificial Social Intelligence.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Multimodal Transformer for Unaligned Multimodal Language Sequences.

[BibT_eX]

[DOI]

Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

Learning Representations from Imperfect Time Series Data via Tensor Rank Regularization.

[BibT_eX]

[DOI]

Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

Words Can Shift: Dynamically Adjusting Word Representations Using Nonverbal Behaviors.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

Found in Translation: Learning Robust Joint Representations by Cyclic Translations between Modalities.

[BibT_eX]

[DOI]

Hai Pham

Thomas Manzini

Barnabás Póczos

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018

Label-Assisted Transmission for Short Packet Communications: A Machine Learning Approach.

[BibT_eX]

[DOI]

IEEE Trans. Veh. Technol., 2018

Seq2Seq2Sentiment: Multimodal Sequence to Sequence Models for Sentiment Analysis.

[BibT_eX]

[DOI]

CoRR, 2018

Multimodal Local-Global Ranking Fusion for Emotion Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2018 on International Conference on Multimodal Interaction, 2018

A Machine Learning Approach to MIMO Communications.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Communications, 2018

Robust Modulation Classification under Uncertain Noise Condition Using Recurrent Neural Network.

[BibT_eX]

[DOI]

Proceedings of the IEEE Global Communications Conference, 2018

Multimodal Language Analysis with Recurrent Multistage Fusion.

[BibT_eX]

[DOI]

Ziyin Liu

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

An Empirical Evaluation of Sketched SVD and its Application to Leverage Score Ordering.

[BibT_eX]

[DOI]

Hui Han Chin

Varun Bharadhwaj Lakshminarasimhan

Proceedings of The 10th Asian Conference on Machine Learning, 2018

Efficient Low-rank Multimodal Fusion With Modality-Specific Factors.

[BibT_eX]

[DOI]

Zhun Liu

Ying Shen

Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018

Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph.

[BibT_eX]

[DOI]

Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018

Multi-attention Recurrent Network for Human Communication Comprehension.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

Memory Fusion Network for Multi-view Sequential Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017

Multimodal sentiment analysis with word-level fusion and reinforcement learning.

[BibT_eX]

[DOI]