We stand with Ukraine

We stand with Ukraine

Xudong Lin

Affiliations:

Columbia University, New York, NY, USA
Tsinghua University, Department of Automation, Beijing, China (former)

According to our database¹, Xudong Lin authored at least 35 papers between 2018 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

Online presence:

on xudonglinthu.github.io
on scholar.google.com

On csauthors.net:

Bibliography

2024

SCHEMA: State CHangEs MAtter for Procedure Planning in Instructional Videos.

[BibT_eX]

[DOI]

,

,

,

,

CoRR, 2024

Beyond Grounding: Extracting Fine-Grained Event Hierarchies across Modalities.

[BibT_eX]

[DOI]

Hammad A. Ayyubi

,

Christopher Thomas

,

,

,

,

,

,

,

,

,

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Video Summarization: Towards Entity-Aware Captions.

[BibT_eX]

[DOI]

Hammad A. Ayyubi

,

,

,

,

,

,

,

,

,

CoRR, 2023

TempCLR: Temporal Alignment Representation with Contrastive Learning.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Learning to Decompose Visual Features with Latent Textual Prompts.

[BibT_eX]

[DOI]

,

,

,

,

Alexander G. Schwing

,

Proceedings of the Eleventh International Conference on Learning Representations, 2023

All in One: Exploring Unified Video-Language Pre-Training.

[BibT_eX]

[DOI]

,

,

,

,

Kevin Qinghong Lin

,

Satoshi Tsutsui

,

,

,

,

,

,

Mike Zheng Shou

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

In Defense of Structural Symbolic Representation for Video Event-Relation Prediction.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Supervised Masked Knowledge Distillation for Few-Shot Transformers.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel Video-Language Retrieval.

[BibT_eX]

[DOI]

,

,

,

,

Mike Zheng Shou

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Non-Sequential Graph Script Induction via Multimedia Grounding.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Video Event Extraction via Tracking Visual States of Arguments.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

Video-Text Pre-training with Learned Regions for Retrieval.

[BibT_eX]

[DOI]

,

Mike Zheng Shou

,

,

,

,

,

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

Multimodal Event Graphs: Towards Event Centric Understanding of Multimodal World.

[BibT_eX]

[DOI]

Hammad A. Ayyubi

,

Christopher Thomas

,

,

,

,

,

,

,

,

CoRR, 2022

Revitalize Region Feature for Democratizing Video-Language Pre-training.

[BibT_eX]

[DOI]

,

,

Alex Jinpeng Wang

,

,

,

,

,

,

,

Mike Zheng Shou

CoRR, 2022

All in One: Exploring Unified Video-Language Pre-training.

[BibT_eX]

[DOI]

Alex Jinpeng Wang

,

,

,

,

,

,

,

,

,

Mike Zheng Shou

CoRR, 2022

Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners.

[BibT_eX]

[DOI]

Zhenhailong Wang

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Weakly-Supervised Temporal Article Grounding.

[BibT_eX]

[DOI]

,

,

,

,

,

Christopher Thomas

,

Hammad A. Ayyubi

,

,

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Object-aware Video-language Pre-training for Retrieval.

[BibT_eX]

[DOI]

Alex Jinpeng Wang

,

,

,

,

,

,

,

Mike Zheng Shou

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

CLIP-Event: Connecting Text and Images with Event Structures.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Learning To Recognize Procedural Activities with Distant Supervision.

[BibT_eX]

[DOI]

,

,

Gedas Bertasius

,

Marcus Rohrbach

,

,

Lorenzo Torresani

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media Knowledge Extraction and Grounding.

[BibT_eX]

[DOI]

Revanth Gangi Reddy

,

,

,

,

,

,

,

,

,

,

Alexander G. Schwing

,

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021

Video-Text Pre-training with Learned Regions.

[BibT_eX]

[DOI]

,

Mike Zheng Shou

,

,

Alex Jinpeng Wang

,

,

,

CoRR, 2021

RESIN: A Dockerized Schema-Guided Cross-document Cross-lingual Cross-media Information Extraction and Event Tracking System.

[BibT_eX]

[DOI]

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations, 2021

Joint Multimedia Event Extraction from Video and Article.

[BibT_eX]

[DOI]

,

,

Christopher Thomas

,

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, 2021

Co-Grounding Networks With Semantic Attention for Referring Expression Comprehension in Videos.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Vx2Text: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs.

[BibT_eX]

[DOI]

,

Gedas Bertasius

,

,

,

,

Lorenzo Torresani

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020

Context-Gated Convolution.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the Computer Vision - ECCV 2020, 2020

2019

Flow-Distilled IP Two-Stream Networks for Compressed Video Action Recognition.

[BibT_eX]

[DOI]

,

,

,

CoRR, 2019

LPAT: Learning to Predict Adaptive Threshold for Weakly-supervised Temporal Action Localization.

[BibT_eX]

[DOI]

,

,

CoRR, 2019

Unsupervised Rank-Preserving Hashing for Large-Scale Image Retrieval.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the 2019 on International Conference on Multimedia Retrieval, 2019

DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition.

[BibT_eX]

[DOI]

,

,

Yannis Kalantidis

,

Laura Sevilla-Lara

,

Marcus Rohrbach

,

,

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018

Deep Variational Metric Learning.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the Computer Vision - ECCV 2018, 2018

Deep Adversarial Metric Learning.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

GraphBit: Bitwise Interaction Mining via Deep Reinforcement Learning.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Loading...