Po-Yao Huang

Affiliations:
  • Facebook AI
  • Carnegie Mellon University, School of Computer Science, Pittsburgh, PA, USA


According to our database1, Po-Yao Huang authored at least 51 papers between 2014 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild.
CoRR, 2024

Adversarially Masked Video Consistency for Unsupervised Domain Adaptation.
CoRR, 2024

2023
Video Pivoting Unsupervised Multi-Modal Machine Translation.
IEEE Trans. Pattern Anal. Mach. Intell., March, 2023

Demystifying CLIP Data.
CoRR, 2023

AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models.
CoRR, 2023

SeamlessM4T-Massively Multilingual & Multimodal Machine Translation.
CoRR, 2023

DINOv2: Learning Robust Visual Features without Supervision.
CoRR, 2023

MAViL: Masked Audio-Video Learners.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles.
Proceedings of the International Conference on Machine Learning, 2023

Diffusion Models as Masked Autoencoders.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

CiT: Curation in Training for Effective Vision-Language Data.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Flap: Fast Language-Audio Pre-Training.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Generating Hashtags for Short-form Videos with Guided Signals.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022
A Survey of Deep Active Learning.
ACM Comput. Surv., 2022

A Comprehensive Survey of Neural Architecture Search: Challenges and Solutions.
ACM Comput. Surv., 2022

CM3: A Causal Masked Multimodal Model of the Internet.
CoRR, 2022

Masked Autoencoders that Listen.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

AudioTagging Done Right: 2nd comparison of deep learning methods for environmental sound classification.
Proceedings of the Interspeech 2022, 2022

On Adversarial Robustness Of Large-Scale Audio Visual Learning.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Self-Supervised Deep Correlation Tracking.
IEEE Trans. Image Process., 2021

Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Support-set bottlenecks for video-text representation learning.
Proceedings of the 9th International Conference on Learning Representations, 2021

Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Audio-Visual Event Recognition Through the Lens of Adversary.
Proceedings of the IEEE International Conference on Acoustics, 2021

VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding.
Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021

2020
A Survey of Deep Active Learning.
CoRR, 2020

Argus: Efficient Activity Detection System for Extended Video Analysis.
Proceedings of the IEEE Winter Applications of Computer Vision Workshops, 2020

Forward and Backward Multimodal NMT for Improved Monolingual and Multilingual Cross-Modal Retrieval.
Proceedings of the 2020 on International Conference on Multimedia Retrieval, 2020

Unsupervised Multimodal Neural Machine Translation with Pseudo Visual Pivoting.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

2019
RWR-GAE: Random Walk Regularization for Graph Auto Encoders.
CoRR, 2019

MMVG-INF-Etrol@TRECVID 2019: Activities in Extended Video.
Proceedings of the 2019 TREC Video Retrieval Evaluation, 2019

CMU-Informedia at TREC 2019 Incident Streams Track.
Proceedings of the Twenty-Eighth Text REtrieval Conference, 2019


Annotation Efficient Cross-Modal Retrieval with Adversarial Attentive Alignment.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Improving What Cross-Modal Retrieval Models Learn through Object-Oriented Inter- and Intra-Modal Attention Networks.
Proceedings of the 2019 on International Conference on Multimedia Retrieval, 2019

Multi-Head Attention with Diversity for Learning Grounded Multilingual Multimodal Representations.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

2018
Informedia @ TRECVID 2018: Ad-hoc Video Search, Video to Text Description, Activities in Extended video.
Proceedings of the 2018 TREC Video Retrieval Evaluation, 2018


Panoramic depth reconstruction within a single shot by optimizing global sphere radii.
Proceedings of the SIGGRAPH Asia 2018 Posters, Tokyo, Japan, December 04-07, 2018, 2018

Multimodal Filtering of Social Media for Temporal Monitoring and Event Analysis.
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval, 2018

RCAA: Relational Context-Aware Agents for Person Search.
Proceedings of the Computer Vision - ECCV 2018, 2018

2017
Video Representation Learning and Latent Concept Mining for Large-scale Multi-label Video Classification.
CoRR, 2017

Synchronization for multi-perspective videos in the wild.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

An Event Reconstruction Tool for Conflict Monitoring Using Social Media.
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017

2016
Attention-based Multimodal Neural Machine Translation.
Proceedings of the First Conference on Machine Translation, 2016

Informedia @ TRECVID 2016.
Proceedings of the 2016 TREC Video Retrieval Evaluation, 2016

2015
Cognitive vertical handover in heterogeneous networks.
Proceedings of the 11th International Conference on Heterogeneous Networking for Quality, 2015

Entity Hierarchy Embedding.
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, 2015

2014
Cognitive access in multichannel wireless networks using two-dimension Markov chain.
Proceedings of the International Wireless Communications and Mobile Computing Conference, 2014


  Loading...