We stand with Ukraine

We stand with Ukraine

Po-Yao Huang

Affiliations:

Facebook AI
Carnegie Mellon University, School of Computer Science, Pittsburgh, PA, USA

According to our database¹, Po-Yao Huang authored at least 62 papers between 2014 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

Online presence:

On csauthors.net:

Bibliography

2025

Pushing the Frontier of Audiovisual Perception with Large-Scale Multimodal Correspondence Learning.

[DOI]

,

,

,

,

,

,

,

,

,

Christoph Feichtenhofer

,

,

CoRR, December, 2025

Perception Encoder: The best visual embeddings are not at the output of the network.

[DOI]

,

,

,

,

,

,

,

,

Jathushan Rajasegaran

,

Hanoona Rasheed

,

,

,

,

,

,

,

,

Christoph Feichtenhofer

CoRR, April, 2025

PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding.

[DOI]

CoRR, April, 2025

Text Quality-Based Pruning for Efficient Training of Language Models.

[DOI]

,

,

Newsha Ardalani

,

Kushal Tirumala

,

,

,

,

,

Armen Aghajanyan

,

,

Luke Zettlemoyer

J. Data-centric Mach. Learn. Res., 2025

2024

DINOv2: Learning Robust Visual Features without Supervision.

[DOI]

Trans. Mach. Learn. Res., 2024

Text Quality-Based Pruning for Efficient Training of Language Models.

[DOI]

,

,

Newsha Ardalani

,

Kushal Tirumala

,

,

,

,

,

Armen Aghajanyan

,

,

Luke Zettlemoyer

CoRR, 2024

VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild.

[DOI]

,

,

,

Abdelrahman Mohamed

,

CoRR, 2024

Adversarially Masked Video Consistency for Unsupervised Domain Adaptation.

[DOI]

,

,

,

CoRR, 2024

Demystifying CLIP Data.

[DOI]

,

,

Xiaoqing Ellen Tan

,

,

,

,

,

,

Luke Zettlemoyer

,

Christoph Feichtenhofer

Proceedings of the Twelfth International Conference on Learning Representations, 2024

AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Abdelrahman Mohamed

,

,

Proceedings of the IEEE International Conference on Acoustics, 2024

Altogether: Image Captioning via Re-aligning Alt-text.

[DOI]

,

,

Xiaoqing Ellen Tan

,

,

,

,

,

,

Luke Zettlemoyer

,

,

,

,

Christoph Feichtenhofer

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Self-Supervised Audio-Visual Soundscape Stylization.

[DOI]

,

,

,

,

Gopala Anumanchipalli

Proceedings of the Computer Vision - ECCV 2024, 2024

MoDE: CLIP Data Experts via Clustering.

[DOI]

,

,

,

,

Luke Zettlemoyer

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild.

[DOI]

,

,

,

Abdelrahman Mohamed

,

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

Video Pivoting Unsupervised Multi-Modal Machine Translation.

[DOI]

,

,

,

,

,

IEEE Trans. Pattern Anal. Mach. Intell., March, 2023

AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Shinji Watanabe

,

Abdelrahman Mohamed

,

,

CoRR, 2023

SeamlessM4T-Massively Multilingual & Multimodal Machine Translation.

[DOI]

Seamless Communication

,

,

,

Mariano Coria Meglioli

,

,

,

Paul-Ambroise Duquenne

,

,

,

Kevin Heffernan

,

,

Christopher Klaiber

,

,

,

,

Alice Rakotoarison

,

Kaushik Ram Sadagopan

,

Guillaume Wenzek

,

,

,

,

,

,

Gabriel Mejia Gonzalez

,

,

Prangthip Hansanti

,

,

,

,

Hirofumi Inaguma

,

,

,

,

,

,

,

,

Ruslan Mavlyutov

,

Benjamin N. Peloquin

,

Mohamed Ramadan

,

Abinesh Ramakrishnan

,

,

,

,

,

,

,

,

,

,

,

Marta R. Costa-jussà

,

,

,

,

Francisco Guzmán

,

,

,

Alexandre Mourachko

,

,

,

Christophe Ropers

,

Safiyyah Saleem

,

,

Paden Tomasello

,

,

,

CoRR, 2023

MAViL: Masked Audio-Video Learners.

[DOI]

,

,

,

Chaitanya Ryali

,

,

,

,

,

,

Christoph Feichtenhofer

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles.

[DOI]

Chaitanya Ryali

,

,

,

,

,

,

Vaibhav Aggarwal

,

Arkabandhu Chowdhury

,

,

,

,

,

Christoph Feichtenhofer

Proceedings of the International Conference on Machine Learning, 2023

Diffusion Models as Masked Autoencoders.

[DOI]

,

Karttikeya Mangalam

,

,

,

,

,

,

,

,

Christoph Feichtenhofer

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

CiT: Curation in Training for Effective Vision-Language Data.

[DOI]

,

,

,

,

,

,

Luke Zettlemoyer

,

Christoph Feichtenhofer

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action Recognition.

[DOI]

,

,

,

Celso M. de Melo

,

Alexander G. Hauptmann

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Flap: Fast Language-Audio Pre-Training.

[DOI]

,

,

,

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Generating Hashtags for Short-form Videos with Guided Signals.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022

Towards Multilingual Vision-Language Models

[DOI]

PhD thesis, 2022

A Survey of Deep Active Learning.

[DOI]

,

,

,

,

,

,

,

ACM Comput. Surv., 2022

A Comprehensive Survey of Neural Architecture Search: Challenges and Solutions.

[DOI]

,

,

,

,

,

,

ACM Comput. Surv., 2022

CM3: A Causal Masked Multimodal Model of the Internet.

[DOI]

Armen Aghajanyan

,

,

,

Vladimir Karpukhin

,

,

,

,

,

,

,

Luke Zettlemoyer

CoRR, 2022

Masked Autoencoders that Listen.

[DOI]

,

,

,

,

,

Wojciech Galuba

,

,

Christoph Feichtenhofer

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

AudioTagging Done Right: 2nd comparison of deep learning methods for environmental sound classification.

[DOI]

,

,

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

On Adversarial Robustness Of Large-Scale Audio Visual Learning.

[DOI]

,

,

,

Bernie Po-Yao Huang

,

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Self-Supervised Deep Correlation Tracking.

[DOI]

,

,

,

,

IEEE Trans. Image Process., 2021

Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models.

[DOI]

,

Mandela Patrick

,

,

,

,

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Support-set bottlenecks for video-text representation learning.

[DOI]

Mandela Patrick

,

,

Yuki Markus Asano

,

,

Alexander G. Hauptmann

,

João F. Henriques

,

Proceedings of the 9th International Conference on Learning Representations, 2021

Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning.

[DOI]

Mandela Patrick

,

,

,

,

,

,

João F. Henriques

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Audio-Visual Event Recognition Through the Lens of Adversary.

[DOI]

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2021

VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding.

[DOI]

,

,

,

,

Armen Aghajanyan

,

,

Luke Zettlemoyer

,

Christoph Feichtenhofer

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding.

[DOI]

,

,

,

,

Masoumeh Aminzadeh

,

Christoph Feichtenhofer

,

,

Luke Zettlemoyer

Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021

2020

A Survey of Deep Active Learning.

[DOI]

,

,

,

,

,

,

CoRR, 2020

Argus: Efficient Activity Detection System for Extended Video Analysis.

[DOI]

,

,

,

,

,

,

,

,

,

,

Alexander G. Hauptmann

Proceedings of the IEEE Winter Applications of Computer Vision Workshops, 2020

Forward and Backward Multimodal NMT for Improved Monolingual and Multilingual Cross-Modal Retrieval.

[DOI]

,

,

Alexander G. Hauptmann

,

Proceedings of the 2020 on International Conference on Multimedia Retrieval, 2020

Unsupervised Multimodal Neural Machine Translation with Pseudo Visual Pivoting.

[DOI]

,

,

,

Alexander G. Hauptmann

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

2019

RWR-GAE: Random Walk Regularization for Graph Auto Encoders.

[DOI]

,

,

Robert E. Frederking

CoRR, 2019

MMVG-INF-Etrol@TRECVID 2019: Activities in Extended Video.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Alexander G. Hauptmann

Proceedings of the 2019 TREC Video Retrieval Evaluation, 2019

CMU-Informedia at TREC 2019 Incident Streams Track.

[DOI]

,

,

,

Alexander G. Hauptmann

Proceedings of the Twenty-Eighth Text REtrieval Conference, 2019

OPERA: Operations-oriented Probabilistic Extraction, Reasoning, and Analysis.

[DOI]

,

Jaime G. Carbonell

,

,

Anatole Gershman

,

,

,

Teruko Mitamura

,

,

,

Aditi Chaudhary

,

,

,

,

Salvador Medina

,

,

,

,

,

Proceedings of the 2019 Text Analysis Conference, 2019

Annotation Efficient Cross-Modal Retrieval with Adversarial Attentive Alignment.

[DOI]

,

,

,

,

Alexander G. Hauptmann

Proceedings of the 27th ACM International Conference on Multimedia, 2019

Improving What Cross-Modal Retrieval Models Learn through Object-Oriented Inter- and Intra-Modal Attention Networks.

[DOI]

,

,

,

Alexander G. Hauptmann

Proceedings of the 2019 on International Conference on Multimedia Retrieval, 2019

Multi-Head Attention with Diversity for Learning Grounded Multilingual Multimodal Representations.

[DOI]

,

,

Alexander G. Hauptmann

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

2018

Informedia @ TRECVID 2018: Ad-hoc Video Search, Video to Text Description, Activities in Extended video.

[DOI]

,

,

,

Alexander G. Hauptmann

,

,

,

,

,

,

,

,

,

,

,

,

,

Ruslan Salakhutdinov

,

,

Proceedings of the 2018 TREC Video Retrieval Evaluation, 2018

OPERA: Operations-oriented Probabilistic Extraction, Reasoning, and Analysis.

[DOI]

,

Taylor Berg-Kirkpatrick

,

Jaime G. Carbonell

,

,

Anatole Gershman

,

Alexander G. Hauptmann

,

,

Teruko Mitamura

,

Aditi Chaudhary

,

,

Bernie Po-Yao Huang

,

Hector Zhengzhong Liu

,

,

Shruti Palaskar

,

Dheeraj Rajagopal

,

,

Proceedings of the 2018 Text Analysis Conference, 2018

Panoramic depth reconstruction within a single shot by optimizing global sphere radii.

[DOI]

,

Hong Shiang Lin

,

Sun-Yu Gordon Chi

,

,

Proceedings of the SIGGRAPH Asia 2018 Posters, Tokyo, Japan, December 04-07, 2018, 2018

Multimodal Filtering of Social Media for Temporal Monitoring and Event Analysis.

[DOI]

,

,

Jean-Baptiste Lamare

,

Alexander G. Hauptmann

Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval, 2018

RCAA: Relational Context-Aware Agents for Person Search.

[DOI]

,

,

,

,

,

Alexander G. Hauptmann

Proceedings of the Computer Vision - ECCV 2018, 2018

2017

Video Representation Learning and Latent Concept Mining for Large-scale Multi-label Video Classification.

[DOI]

,

,

,

,

Alexander G. Hauptmann

CoRR, 2017

Synchronization for multi-perspective videos in the wild.

[DOI]

,

,

,

Alexander G. Hauptmann

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

An Event Reconstruction Tool for Conflict Monitoring Using Social Media.

[DOI]

,

,

,

,

,

,

Alexander G. Hauptmann

Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017

2016

Attention-based Multimodal Neural Machine Translation.

[DOI]

,

,

,

,

Proceedings of the First Conference on Machine Translation, 2016

Informedia @ TRECVID 2016.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

Alexander G. Hauptmann

Proceedings of the 2016 TREC Video Retrieval Evaluation, 2016

2015

Cognitive vertical handover in heterogeneous networks.

[DOI]

,

Shin-Ming Cheng

,

Proceedings of the 11th International Conference on Heterogeneous Networking for Quality, 2015

Entity Hierarchy Embedding.

[DOI]

,

,

,

,

Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, 2015

2014

Cognitive access in multichannel wireless networks using two-dimension Markov chain.

[DOI]

,

Shin-Ming Cheng

,

Proceedings of the International Wireless Communications and Mobile Computing Conference, 2014

Loading...