Xudong Lin

Affiliations:
  • Columbia University, New York, NY, USA
  • Tsinghua University, Department of Automation, Beijing, China (former)


According to our database1, Xudong Lin authored at least 35 papers between 2018 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
SCHEMA: State CHangEs MAtter for Procedure Planning in Instructional Videos.
CoRR, 2024

Beyond Grounding: Extracting Fine-Grained Event Hierarchies across Modalities.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Video Summarization: Towards Entity-Aware Captions.
CoRR, 2023

TempCLR: Temporal Alignment Representation with Contrastive Learning.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Learning to Decompose Visual Features with Latent Textual Prompts.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

All in One: Exploring Unified Video-Language Pre-Training.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

In Defense of Structural Symbolic Representation for Video Event-Relation Prediction.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Supervised Masked Knowledge Distillation for Few-Shot Transformers.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel Video-Language Retrieval.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Non-Sequential Graph Script Induction via Multimedia Grounding.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Video Event Extraction via Tracking Visual States of Arguments.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

Video-Text Pre-training with Learned Regions for Retrieval.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Multimodal Event Graphs: Towards Event Centric Understanding of Multimodal World.
CoRR, 2022

Revitalize Region Feature for Democratizing Video-Language Pre-training.
CoRR, 2022

All in One: Exploring Unified Video-Language Pre-training.
CoRR, 2022

Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Weakly-Supervised Temporal Article Grounding.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Object-aware Video-language Pre-training for Retrieval.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

CLIP-Event: Connecting Text and Images with Event Structures.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Learning To Recognize Procedural Activities with Distant Supervision.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media Knowledge Extraction and Grounding.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
Video-Text Pre-training with Learned Regions.
CoRR, 2021

RESIN: A Dockerized Schema-Guided Cross-document Cross-lingual Cross-media Information Extraction and Event Tracking System.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations, 2021

Joint Multimedia Event Extraction from Video and Article.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, 2021

Co-Grounding Networks With Semantic Attention for Referring Expression Comprehension in Videos.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Vx2Text: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020
Context-Gated Convolution.
Proceedings of the Computer Vision - ECCV 2020, 2020

2019
Flow-Distilled IP Two-Stream Networks for Compressed Video Action Recognition.
CoRR, 2019

LPAT: Learning to Predict Adaptive Threshold for Weakly-supervised Temporal Action Localization.
CoRR, 2019

Unsupervised Rank-Preserving Hashing for Large-Scale Image Retrieval.
Proceedings of the 2019 on International Conference on Multimedia Retrieval, 2019

DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
Deep Variational Metric Learning.
Proceedings of the Computer Vision - ECCV 2018, 2018

Deep Adversarial Metric Learning.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

GraphBit: Bitwise Interaction Mining via Deep Reinforcement Learning.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018


  Loading...