Chenliang Xu

CoRR, January, 2026

MMCOMPOSITION: Revisiting the Compositionality of Pre- trained Vision-Language Models.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2026

Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

Tuning the Face: Modulating Facial Expressions for Realistic Self-Avatars in Virtual Reality.

[BibT_eX]

[DOI]

Proceedings of the 2026 Designing Interactive Systems Conference, 2026

2025

Video-R4: Reinforcing Text-Rich Video Reasoning with Visual Rumination.

[BibT_eX]

[DOI]

CoRR, November, 2025

When to Think and When to Look: Uncertainty-Guided Lookback.

[BibT_eX]

[DOI]

CoRR, November, 2025

PromptReverb: Multimodal Room Impulse Response Generation Through Latent Rectified Flow Matching.

[BibT_eX]

[DOI]

CoRR, October, 2025

Diagnosing Visual Reasoning: Challenges, Insights, and a Path Forward.

[BibT_eX]

[DOI]

CoRR, October, 2025

Directional Reasoning Injection for Fine-Tuning MLLMs.

[BibT_eX]

[DOI]

CoRR, October, 2025

Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models.

[BibT_eX]

[DOI]

CoRR, October, 2025

AdvEvo-MARL: Shaping Internalized Safety through Adversarial Co-Evolution in Multi-Agent Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, October, 2025

StreamME: Simplify 3D Gaussian Avatar within Live Stream.

[BibT_eX]

[DOI]

CoRR, July, 2025

OPENXRD: A Comprehensive Benchmark and Enhancement Framework for LLM/MLLM XRD Question Answering.

[BibT_eX]

[DOI]

CoRR, July, 2025

What to Do Next? Memorizing skills from Egocentric Instructional Video.

[BibT_eX]

[DOI]

Jing Bi

CoRR, July, 2025

ACTLLM: Action Consistency Tuned Large Language Model.

[BibT_eX]

[DOI]

CoRR, June, 2025

Can Sound Replace Vision in LLaVA With Token Substitution?

[BibT_eX]

[DOI]

CoRR, June, 2025

I2G: Generating Instructional Illustrations via Text-Conditioned Diffusion.

[BibT_eX]

[DOI]

CoRR, May, 2025

Intentional Gesture: Deliver Your Intentions with Gestures for Speech.

[BibT_eX]

[DOI]

CoRR, May, 2025

The Sword of Damocles in ViTs: Computational Redundancy Amplifies Adversarial Transferability.

[BibT_eX]

[DOI]

CoRR, April, 2025

Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1).

[BibT_eX]

[DOI]

CoRR, April, 2025

FreSca: Unveiling the Scaling Space in Diffusion Models.

[BibT_eX]

[DOI]

CoRR, April, 2025

Forward Learning with Differential Privacy.

[BibT_eX]

[DOI]

CoRR, April, 2025

VERIFY: A Benchmark of Visual Explanation and Reasoning for Investigating Multimodal Reasoning Fidelity.

[BibT_eX]

[DOI]

CoRR, March, 2025

From 16-Bit to 1-Bit: Visual KV Cache Quantization for Memory-Efficient Multimodal Large Language Models.

[BibT_eX]

[DOI]

CoRR, February, 2025

GestureLSM: Latent Shortcut based Co-Speech Gesture Generation with Spatial-Temporal Modeling.

[BibT_eX]

[DOI]

CoRR, January, 2025

StreamME: Simplify 3D Gaussian Avatar within Live Stream.

[BibT_eX]

[DOI]

Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference, 2025

MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Harnessing the Computation Redundancy in ViTs to Boost Adversarial Transferability.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

ZeroSep: Separate Anything in Audio with Zero Training.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

BinauralFlow: A Causal and Streamable Approach for High-Quality Binaural Speech Synthesis with Flow Matching Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Rethinking Audio-Visual Adversarial Vulnerability from Temporal and Modality Perspectives.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Generative AI for Cel-Animation: A Survey.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, ICCV 2025, 2025

GestureLSM: Latent Shortcut Based Co-Speech Gesture Generation with Spatial-Temporal Modeling.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

$\pi$-AVAS: Can Physics-Integrated Audio-Visual Modeling Boost Neural Acoustic Synthesis?

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Targeted Forgetting of Image Subgroups in CLIP Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Learning to Highlight Audio by Watching Movies.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

CaRDiff: Video Salient Object Ranking Chain of Thought Reasoning for Saliency Prediction with Diffusion.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

GaussianStyle: Gaussian Head Avatar via StyleGAN.

[BibT_eX]

[DOI]

Proceedings of the International Conference on 3D Vision, 2025

2024

Cross Modality Bias in Visual Question Answering: A Causal View With Possible Worlds VQA.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2024

Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach.

[BibT_eX]

[DOI]

CoRR, 2024

Scaling Concept With Text-Guided Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2024

Will the Inclusion of Generated Data Amplify Bias Across Generations in Future Image Classification Models?

[BibT_eX]

[DOI]

CoRR, 2024

MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

Quadratic Is Not What You Need For Multimodal Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

Do More Details Always Introduce More Hallucinations in LVLM-based Image Captioning?

[BibT_eX]

[DOI]

CoRR, 2024

AVicuna: Audio-Visual LLM with Interleaver and Context-Boundary Alignment for Temporal Referential Dialogue.

[BibT_eX]

[DOI]

CoRR, 2024

Approximated Likelihood Ratio: A Forward-Only and Parallel Framework for Boosting Neural Network Training.

[BibT_eX]

[DOI]

CoRR, 2024

Efficiently Leveraging Linguistic Priors for Scene Text Spotting.

[BibT_eX]

[DOI]

Nguyen Nguyen

CoRR, 2024

OSCaR: Object State Captioning and State Change Representation.

[BibT_eX]

[DOI]

CoRR, 2024

Tri2-plane: Volumetric Avatar Reconstruction with Feature Pyramid.

[BibT_eX]

[DOI]

CoRR, 2024

Bag of Tricks to Boost Adversarial Transferability.

[BibT_eX]

[DOI]

CoRR, 2024

TextToon: Real-Time Text Toonify Head Avatar from Single Video.

[BibT_eX]

[DOI]

Proceedings of the SIGGRAPH Asia 2024 Conference Papers, 2024

OSCaR: Object State Captioning and State Change Representation.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

EAGLE: Egocentric AGgregated Language-video Engine.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

One Forward is Enough for Neural Network Training via Likelihood Ratio Method.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Learning Audio Concepts from Counterfactual Natural Language.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Adaptive Super Resolution for One-Shot Talking-Head Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Can CLIP Count Stars? An Empirical Study on Quantity Bias in CLIP.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Tri2-plane: Thinking Head Avatar via Feature Pyramid.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Modeling and Driving Human Body Soundfields Through Acoustic Primitives.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Random Smooth-based Certified Defense against Text Adversarial Attack.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EACL 2024, 2024

Learning to Transform Dynamically for Better Adversarial Transferability.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Discover and Mitigate Multiple Biased Subgroups in Image Classifiers.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Language-Guided Joint Audio-Visual Editing via One-Shot Adaptation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ACCV 2024, 2024

High-Quality Visually-Guided Sound Separation from Diverse Categories.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ACCV 2024, 2024

2023

Rapid runtime learning by curating small datasets of high-quality items obtained from memory.

[BibT_eX]

[DOI]

PLoS Comput. Biol., October, 2023

Video Understanding with Large Language Models: A Survey.

[BibT_eX]

[DOI]

CoRR, 2023

Scalable CP Decomposition for Tensor Learning using GPU Tensor Cores.

[BibT_eX]

[DOI]

CoRR, 2023

Separating Invisible Sounds Toward Universal Audiovisual Scene-Aware Sound Separation.

[BibT_eX]

[DOI]

CoRR, 2023

MISAR: A Multimodal Instructional System with Augmented Reality.

[BibT_eX]

[DOI]

CoRR, 2023

Emotional Listener Portrait: Neural Listener Head Generation with Emotion.

[BibT_eX]

[DOI]

CoRR, 2023

Neural Acoustic Context Field: Rendering Realistic Room Impulse Response With Neural Fields.

[BibT_eX]

[DOI]

CoRR, 2023

DAVIS: High-Quality Audio-Visual Separation with Generative Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2023

Unveiling Cross Modality Bias in Visual Question Answering: A Causal View with Possible Worlds VQA.

[BibT_eX]

[DOI]

CoRR, 2023

Training Neural Networks without Backpropagation: A Deeper Dive into the Likelihood Ratio Method.

[BibT_eX]

[DOI]

CoRR, 2023

Improving Adversarial Transferability with Scheduled Step Size and Dual Example.

[BibT_eX]

[DOI]

CoRR, 2023

PEANUT: A Human-AI Collaborative Tool for Annotating Audio-Visual Data.

[BibT_eX]

[DOI]

Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, 2023

AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Emotional Listener Portrait: Realistic Listener Motion Simulation in Conversation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

A Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others.

[BibT_eX]

[DOI]

Cristian Canton-Ferrer

Mark Ibrahim

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Egocentric Audio-Visual Object Localization.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Face Forgery Detection via Symmetric Transformer.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Cross-modal Contrastive Distillation for Instructional Activity Anticipation.

[BibT_eX]

[DOI]

Proceedings of the 26th International Conference on Pattern Recognition, 2022

Discover and Mitigate Unknown Biases with Debiasing Alternate Networks.

[BibT_eX]

[DOI]

Zhiheng Li

Anthony Hoogs

Proceedings of the Computer Vision - ECCV 2022, 2022

Learning to Answer Questions in Dynamic Audio-Visual Scenarios.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

StyleT2I: Toward Compositional and High-Fidelity Text-to-Image Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Transformer-empowered Multi-scale Contextual Matching and Aggregation for Multi-contrast MRI Super-resolution.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

SpaceEdit: Learning a Unified Editing Space for Open-Domain Image Color Editing.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Pose Flow Learning From Person Images for Pose Guided Synthesis.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2021

Structured and Consistent Multi-Layer Multi-Kernel Subtask Correction Filter Tracker.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., 2021

Anomaly Crossing: A New Method for Video Anomaly Detection as Cross-domain Few-shot Learning.

[BibT_eX]

[DOI]

CoRR, 2021

SpaceEdit: Learning a Unified Editing Space for Open-Domain Image Editing.

[BibT_eX]

[DOI]

CoRR, 2021

Zooming SlowMo: An Efficient One-Stage Framework for Space-Time Video Super-Resolution.

[BibT_eX]

[DOI]

CoRR, 2021

Animated 3D human avatars from a single image with GAN-based texture inference.

[BibT_eX]

[DOI]

Comput. Graph., 2021

How to Make a BLT Sandwich? Learning VQA towards Understanding Web Instructional Videos.

[BibT_eX]

[DOI]

Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2021

Improve CAM with Auto-adapted Segmentation and Co-supervised Augmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2021

Learning to Generate Scene Graph from Natural Language Supervision.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Discover the Unknown Biased Attribute of an Image Classifier.

[BibT_eX]

[DOI]

Zhiheng Li

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Explaining Local, Global, And Higher-Order Interactions In Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Procedure Planning in Instructional Videos via Contextual Modeling and Model-based Policy Learning.

[BibT_eX]

[DOI]

Jing Bi

Jiebo Luo

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

A Simple Baseline for Weakly-Supervised Scene Graph Generation.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Can Audio-Visual Integration Strengthen Robustness Under Multimodal Attacks?

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Cyclic Co-Learning of Sounding Object Visual Grounding and Sound Separation.

[BibT_eX]

[DOI]

Di Hu

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

High-Fidelity Face Tracking for AR/VR via Deep Lighting Adaptation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Learning by Planning: Language-Guided Global Image Editing.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Space-Time Memory Network for Sounding Object Localization in Videos.

[BibT_eX]

[DOI]

Sizhe Li

Proceedings of the 32nd British Machine Vision Conference 2021, 2021

2020

Noise-Resilient Training Method for Face Landmark Generation From Speech.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2020

A Weakly Supervised Multi-task Ranking Framework for Actor-Action Semantic Segmentation.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., 2020

Cubic Spline Smoothing Compensation for Irregularly Sampled Sequences.

[BibT_eX]

[DOI]

CoRR, 2020

Actor-Action Video Classification CSC 249/449 Spring 2020 Challenge Report.

[BibT_eX]

[DOI]

CoRR, 2020

Graph Neural Network Based Coarse-Grained Mapping Prediction.

[BibT_eX]

[DOI]

Zhiheng Li

Geemi P. Wellawatte

Maghesree Chakraborty

Heta A. Gandhi

Andrew D. White

CoRR, 2020

What comprises a good talking-head video generation?: A Survey and Benchmark.

[BibT_eX]

[DOI]

CoRR, 2020

Assembling Semantically-Disentangled Representations for Predictive-Generative Models via Adaptation from Synthetic Domain.

[BibT_eX]

[DOI]

Burkay Donderici

Caleb New

CoRR, 2020

TailorGAN: Making User-Defined Fashion Designs.

[BibT_eX]

[DOI]

Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2020

End-To-End Generation of Talking Faces from Noisy Speech.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing.

[BibT_eX]

[DOI]

Dingzeyu Li

Proceedings of the Computer Vision - ECCV 2020, 2020

Talking-Head Generation with Rhythmic Head Motion.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

TDAN: Temporally-Deformable Alignment Network for Video Super-Resolution.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Deep Grouping Model for Unified Perceptual Parsing.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Learning a Weakly-Supervised Video Actor-Action Segmentation Model With a Wise Selection.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

A Benchmark and Baseline for Language-Driven Image Editing.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ACCV 2020 - 15th Asian Conference on Computer Vision, Kyoto, Japan, November 30, 2020

Learning from Interventions Using Hierarchical Policies for Safe Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019

Online Audio-Visual Source Association for Chamber Music Performances.

[BibT_eX]

[DOI]

Trans. Int. Soc. Music. Inf. Retr., 2019

Deep Audio Prior.

[BibT_eX]

[DOI]

Dingzeyu Li

CoRR, 2019

Weakly Supervised Object Localization with Inter-Intra Regulated CAMs.

[BibT_eX]

[DOI]

CoRR, 2019

Unsupervised Pose Flow Learning for Pose Guided Synthesis.

[BibT_eX]

[DOI]

CoRR, 2019

Hierarchical Cross-Modal Talking Face Generationwith Dynamic Pixel-Wise Loss.

[BibT_eX]

[DOI]

CoRR, 2019

3D Human Avatar Digitization from a Single Image.

[BibT_eX]

[DOI]

Proceedings of the 17th International Conference on Virtual-Reality Continuum and its Applications in Industry, 2019

GAN-EM: GAN Based EM Learning Framework.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Single Image 3D Vehicle Pose Estimation for Augmented Reality.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE Global Conference on Signal and Information Processing, 2019

Audio-Visual Interpretable and Controllable Video Captioning.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

Audio-Visual Event Localization in the Wild.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

NTIRE 2019 Challenge on Video Super-Resolution: Methods and Results.

[BibT_eX]

[DOI]

Rudrabha Mukhopadhyay

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

Sound to Visual: Hierarchical Cross-Modal Talking Face Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

Hierarchical Cross-Modal Talking Face Generation With Dynamic Pixel-Wise Loss.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Not All Frames Are Equal: Weakly-Supervised Video Grounding With Contextual Similarity and Visual Clustering Losses.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Dynamic Graph Modules for Modeling Object-Object Interactions in Activity Recognition.

[BibT_eX]

[DOI]

Proceedings of the 30th British Machine Vision Conference 2019, 2019

2018

Dynamic Graph Modules for Modeling Higher-Order Interactions in Activity Recognition.

[BibT_eX]

[DOI]

CoRR, 2018

An Attempt towards Interpretable Audio-Visual Video Captioning.

[BibT_eX]

[DOI]

CoRR, 2018

How to Make a BLT Sandwich? Learning to Reason towards Understanding Web Instructional Videos.

[BibT_eX]

[DOI]

CoRR, 2018

Navigation by Imitation in a Pedestrian-Rich Environment.

[BibT_eX]

[DOI]

CoRR, 2018

Improving Text-Based Person Search by Spatial Matching and Adaptive Threshold.

[BibT_eX]

[DOI]

Tianlang Chen

Jiebo Luo

Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision, 2018

MRI tumor segmentation with densely connected 3D CNN.

[BibT_eX]

[DOI]

Proceedings of the Medical Imaging 2018: Image Processing, 2018

Generating Talking Face Landmarks from Speech.

[BibT_eX]

[DOI]

Proceedings of the Latent Variable Analysis and Signal Separation, 2018

Audio-Visual Event Localization in Unconstrained Videos.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018, 2018

Lip Movements Generation at a Glance.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018, 2018

Weakly-Supervised Action Segmentation With Iterative Soft Boundary Assignment.

[BibT_eX]

[DOI]

Li Ding

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Towards Automatic Learning of Procedures From Web Instructional Videos.

[BibT_eX]

[DOI]

Luowei Zhou

Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017

Dancelets Mining for Video Recommendation Based on Dance Styles.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2017

ProcNets: Learning to Segment Procedures in Untrimmed and Unconstrained Videos.

[BibT_eX]

[DOI]

Luowei Zhou

CoRR, 2017

Action Understanding with Multiple Classes of Actors.

[BibT_eX]

[DOI]

Caiming Xiong

CoRR, 2017

TricorNet: A Hybrid Temporal Convolutional and Recurrent Network for Video Action Segmentation.

[BibT_eX]

[DOI]

Li Ding

CoRR, 2017

Watch What You Just Said: Image Captioning with Text-Conditional Attention.

[BibT_eX]

[DOI]

Proceedings of the on Thematic Workshops of ACM Multimedia 2017, Mountain View, CA, USA, October 23, 2017

Deep Cross-Modal Audio-Visual Generation.

[BibT_eX]

[DOI]

Proceedings of the on Thematic Workshops of ACM Multimedia 2017, Mountain View, CA, USA, October 23, 2017

Weakly Supervised Actor-Action Segmentation via Robust Multi-task Ranking.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

2016

Scale-Adaptive Video Understanding.

[BibT_eX]

[DOI]

PhD thesis, 2016

LIBSVX: A Supervoxel Library and Benchmark for Early Video Processing.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., 2016

Image Caption Generation with Text-Conditional Semantic Attention.

[BibT_eX]

[DOI]

CoRR, 2016

Actor-Action Semantic Segmentation with Grouping Process Models.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

2015

Can humans fly? Action understanding with multiple classes of actors.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

2013

A Study of Actor and Action Semantic retention in Video Supervoxel Segmentation.

[BibT_eX]

[DOI]

Int. J. Semantic Comput., 2013

TRECVID 2013 GENIE: Multimedia Event Detection and Recounting.

[BibT_eX]

[DOI]

Proceedings of the 2013 TREC Video Retrieval Evaluation, 2013

Are Actor and Action Semantics Retained in Video Supervoxel Segmentation?

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE Seventh International Conference on Semantic Computing, 2013

Flattening Supervoxel Hierarchies by the Uniform Entropy Slice.

[BibT_eX]

[DOI]

Spencer Whitt

Proceedings of the IEEE International Conference on Computer Vision, 2013

A Thousand Frames in Just a Few Words: Lingual Description of Videos through Latent Topics and Sparse Object Stitching.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013

2012

TRECVID 2012 GENIE: Multimedia Event Detection and Recounting.

[BibT_eX]

[DOI]

Proceedings of the 2012 TREC Video Retrieval Evaluation, 2012

Streaming Hierarchical Video Segmentation.

[BibT_eX]

[DOI]

Caiming Xiong

Proceedings of the Computer Vision - ECCV 2012, 2012

Evaluation of super-voxel methods for early video processing.

[BibT_eX]

[DOI]