Tsu-Jui Fu

According to our database¹, Tsu-Jui Fu authored at least 56 papers between 2018 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

Taming Outlier Tokens in Diffusion Transformers.

[BibT_eX]

[DOI]

CoRR, May, 2026

2025

ThreadWeaver: Adaptive Threading for Efficient Parallel Reasoning in Language Models.

[BibT_eX]

[DOI]

CoRR, December, 2025

CAR-Flow: Condition-Aware Reparameterization Aligns Source and Target for Better Flow Matching.

[BibT_eX]

[DOI]

CoRR, September, 2025

GIE-Bench: Towards Grounded Evaluation for Text-Guided Image Editing.

[BibT_eX]

[DOI]

CoRR, May, 2025

UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing.

[BibT_eX]

[DOI]

CoRR, March, 2025

DiT-Air: Revisiting the Efficiency of Diffusion Model Architecture Design in Text to Image Generation.

[BibT_eX]

[DOI]

CoRR, March, 2025

CAR-Flow: Condition-Aware Reparameterization Aligns Source and Target for Better Flow Matching.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

STIV: Scalable Text and Image Conditioned Video Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

TC-Bench: Benchmarking Temporal Compositionality in Conditional Video Generation.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024

Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2024

STIV: Scalable Text and Image Conditioned Video Generation.

[BibT_eX]

[DOI]

CoRR, 2024

TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation.

[BibT_eX]

[DOI]

CoRR, 2024

From Text to Pixel: Advancing Long-Context Understanding in MLLMs.

[BibT_eX]

[DOI]

CoRR, 2024

Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Guiding Instruction-based Image Editing via Multimodal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Text-guided 3D Human Generation from 2D Collections.

[BibT_eX]

[DOI]

CoRR, 2023

Discriminative Diffusion Models as Few-shot Vision and Language Learners.

[BibT_eX]

[DOI]

CoRR, 2023

PHOTOSWAP: Personalized Subject Swapping in Images.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

LayoutGPT: Compositional Visual Planning and Generation with Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Collaborative Generative AI: Integrating GPT-k for Efficient Editing in Text-to-Image Generation.

[BibT_eX]

[DOI]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

EDIS: Entity-Driven Image Search over Multimodal Web Content.

[BibT_eX]

[DOI]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Text-guided 3D Human Generation from 2D Collections.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

CPL: Counterfactual Prompt Learning for Vision and Language Models.

[BibT_eX]

[DOI]

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

ULN: Towards Underspecified Vision-and-Language Navigation.

[BibT_eX]

[DOI]

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Language-Driven Artistic Style Transfer.

[BibT_eX]

[DOI]

Tsu-Jui Fu

Xin Eric Wang

William Yang Wang

Proceedings of the Computer Vision - ECCV 2022, 2022

M<sup>3</sup>L: Language-based Video Editing via Multi-Modal Multi-Level Transformers.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

DOC2PPT: Automatic Presentation Slides Generation from Scientific Documents.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021

VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling.

[BibT_eX]

[DOI]

CoRR, 2021

Language-Driven Image Style Transfer.

[BibT_eX]

[DOI]

Tsu-Jui Fu

Xin Eric Wang

William Yang Wang

CoRR, 2021

Language-based Video Editing via Multi-Modal Multi-Level Transformer.

[BibT_eX]

[DOI]

CoRR, 2021

Semi-Supervised Policy Initialization for Playing Games with Language Hints.

[BibT_eX]

[DOI]

Tsu-Jui Fu

William Yang Wang

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Multimodal Text Style Transfer for Outdoor Vision-and-Language Navigation.

[BibT_eX]

[DOI]

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021

L2C: Describing Visual Differences Needs Semantic Understanding of Individuals.

[BibT_eX]

[DOI]

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021

H-FND: Hierarchical False-Negative Denoising for Distant Supervision Relation Extraction.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021

2020

SSCR: Iterative Language-Based Image Editing via Self-Supervised Counterfactual Reasoning.

[BibT_eX]

[DOI]

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Counterfactual Vision-and-Language Navigation via Adversarial Path Sampler.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Why Attention? Analyze BiLSTM Deficiency and Its Remedies in the Case of NER.

[BibT_eX]

[DOI]

Peng-Hsuan Li

Tsu-Jui Fu

Wei-Yun Ma

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019

Counterfactual Vision-and-Language Navigation via Adversarial Path Sampling.

[BibT_eX]

[DOI]

CoRR, 2019

Remedying BiLSTM-CNN Deficiency in Modeling Cross-Context for NER.

[BibT_eX]

[DOI]

Peng-Hsuan Li

Tsu-Jui Fu

Wei-Yun Ma

CoRR, 2019

Attentive and Adversarial Learning for Video Summarization.

[BibT_eX]

[DOI]

Tsu-Jui Fu

Shao-Heng Tai

Hwann-Tzong Chen

Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2019

Learning from Observation-Only Demonstration for Task-Oriented Language Grounding via Self-Examination.

[BibT_eX]

[DOI]

Proceedings of the Visually Grounded Interaction and Language (ViGIL), 2019

A Distributed Scheme for Accelerating Semantic Video Segmentation on An Embedded Cluster.

[BibT_eX]

[DOI]

Proceedings of the 37th IEEE International Conference on Computer Design, 2019

Adversarial Active Exploration for Inverse Dynamics Model Learning.

[BibT_eX]

[DOI]

Proceedings of the 3rd Annual Conference on Robot Learning, 2019

GraphRel: Modeling Text as Relational Graphs for Joint Entity and Relation Extraction.

[BibT_eX]

[DOI]

Tsu-Jui Fu

Peng-Hsuan Li

Wei-Yun Ma

Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

2018

Adversarial Exploration Strategy for Self-Supervised Imitation Learning.

[BibT_eX]

[DOI]

CoRR, 2018

Diversity-Driven Exploration Strategy for Deep Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Speed Reading: Learning to Read ForBackward via Shuttle.

[BibT_eX]

[DOI]

Tsu-Jui Fu

Wei-Yun Ma

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

Visual Relationship Prediction via Label Clustering and Incorporation of Depth Information.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018 Workshops, 2018

Dynamic Video Segmentation Network.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Region-Semantics Preserving Image Synthesis.

[BibT_eX]

[DOI]

Kang-Jun Liu

Tsu-Jui Fu

Shan-Hung Wu

Proceedings of the Computer Vision - ACCV 2018, 2018

Tsu-Jui Fu

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...