Jiuxiang Gu

According to our database1, Jiuxiang Gu authored at least 106 papers between 2015 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Multimodal LLMs as Customized Reward Models for Text-to-Image Generation.
CoRR, July, 2025

MENTOR: Efficient Multimodal-Conditioned Tuning for Autoregressive Vision Generation Models.
CoRR, July, 2025

MS4UI: A Dataset for Multi-modal Summarization of User Interface Instructional Videos.
CoRR, June, 2025

Refer to Anything with Vision-Language Prompts.
CoRR, June, 2025

R-KV: Redundancy-aware KV Cache Compression for Training-Free Reasoning Models Acceleration.
CoRR, May, 2025

FastCar: Cache Attentive Replay for Fast Auto-Regressive Video Generation on the Edge.
CoRR, May, 2025

DraftAttention: Fast Video Diffusion via Low-Resolution Attention Guidance.
CoRR, May, 2025

Towards Visual Text Grounding of Multimodal Large Language Model.
CoRR, April, 2025

Robust Latent Matters: Boosting Image Generation with Sampling Error Synthesis.
CoRR, March, 2025

Efficient Reasoning with Hidden Thinking.
CoRR, January, 2025

Personalization of Large Language Models: A Survey.
Trans. Mach. Learn. Res., 2025

ARTIST: Improving the Generation of Text-Rich Images with Disentangled Diffusion Models and Large Language Models.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

Differential Privacy Mechanisms in Neural Tangent Kernel Regression.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

Self-Debiasing Large Language Models: Zero-Shot Recognition and Reduction of Stereotypes.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

SV-RAG: LoRA-Contextualizing Adaptation of MLLMs for Long Document Understanding.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

ImageFolder: Autoregressive Image Generation with Folded Tokens.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

QuartDepth: Post-Training Quantization for Real-Time Depth Estimation on the Edge.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

MegaSynth: Scaling Up 3D Scene Reconstruction with Synthesized Data.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025


METAL: A Multi-Agent Framework for Chart Generation with Test-Time Scaling.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Numerical Pruning for Efficient Autoregressive Models.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
Numerical Pruning for Efficient Autoregressive Models.
CoRR, 2024

SUGAR: Subject-Driven Video Customization in a Zero-Shot Manner.
CoRR, 2024

Personalized Multimodal Large Language Models: A Survey.
CoRR, 2024

XQ-GAN: An Open-source Image Tokenization Framework for Autoregressive Generation.
CoRR, 2024

LoRA-Contextualizing Adaptation of Large Multimodal Models for Long Document Understanding.
CoRR, 2024

A Survey of Small Language Models.
CoRR, 2024

VipAct: Visual-Perception Enhancement via Specialized VLM Agent Collaboration and Tool-use.
CoRR, 2024

A Multi-LLM Debiasing Framework.
CoRR, 2024

MMR: Evaluating Reading Ability of Large Multimodal Models.
CoRR, 2024

Fast John Ellipsoid Computation with Differential Privacy Optimization.
CoRR, 2024

CoMMIT: Coordinated Instruction Tuning for Multimodal Large Language Models.
CoRR, 2024

LLaVA-Read: Enhancing Reading Ability of Multimodal Language Models.
CoRR, 2024

Differential Privacy of Cross-Attention with Provable Guarantee.
CoRR, 2024

Toward Infinite-Long Prefix in Transformer.
CoRR, 2024

ARTIST: Improving the Generation of Text-rich Images by Disentanglement.
CoRR, 2024

Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation.
CoRR, 2024

DocSynthv2: A Practical Autoregressive Modeling for Document Generation.
CoRR, 2024

Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective.
CoRR, 2024

Tensor Attention Training: Provably Efficient Learning of Higher-order Transformers.
CoRR, 2024

Conv-Basis: A New Paradigm for Efficient Attention Inference and Gradient Computation in Transformers.
CoRR, 2024

Exploring the Frontiers of Softmax: Provable Optimization, Applications in Diffusion Model, and Beyond.
CoRR, 2024

Fourier Circuits in Neural Networks: Unlocking the Potential of Large Language Models in Mathematical Reasoning and Modular Arithmetic.
CoRR, 2024

Self-Cleaning: Improving a Named Entity Recognizer Trained on Noisy Data with a Few Clean Instances.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

Category-Aware Active Domain Adaptation.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

LRM: Large Reconstruction Model for Single Image to 3D.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

ADOPD: A Large-Scale Document Page Decomposition Dataset.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

SOHES: Self-supervised Open-world Hierarchical Entity Segmentation.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

TextLap: Customizing Language Models for Text-to-Layout Planning.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Advancing Vision-Language Models with Adapter Ensemble Strategies.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Customization Assistant for Text-to-image Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

TRINS: Towards Multimodal Language Models that Can Read.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

DocScript: Document-level Script Event Prediction.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023
Open World Entity Segmentation.
IEEE Trans. Pattern Anal. Mach. Intell., July, 2023

Improving a Named Entity Recognizer Trained on Noisy Data with a Few Clean Instances.
CoRR, 2023

Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning.
CoRR, 2023

LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding.
CoRR, 2023

AIMS: All-Inclusive Multi-Level Segmentation.
CoRR, 2023

LayerDoc: Layer-wise Extraction of Spatial Hierarchical Structure in Visually-Rich Documents.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023

AIMS: All-Inclusive Multi-Level Segmentation for Anything.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

High Quality Entity Segmentation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Learning the Visualness of Text Using Large Vision-Language Models.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

A Critical Analysis of Document Out-of-Distribution Detection.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

DocEdit: Language-Guided Document Editing.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Fine-Grained Entity Segmentation.
CoRR, 2022

Unified Pretraining Framework for Document Understanding.
CoRR, 2022

FedKC: Federated Knowledge Composition for Multilingual Natural Language Understanding.
Proceedings of the WWW '22: The ACM Web Conference 2022, Virtual Event, Lyon, France, April 25, 2022

Delving into Out-of-Distribution Detection with Vision-Language Representations.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

DocTime: A Document-level Temporal Dependency Graph Parser.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

DocLayoutTTS: Dataset and Baselines for Layout-informed Document-level Neural Speech Synthesis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

MGDoc: Pre-training with Multi-granular Hierarchy for Document Image Understanding.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Meta Spatio-Temporal Debiasing for Video Scene Graph Generation.
Proceedings of the Computer Vision - ECCV 2022, 2022

Improving the Reliability for Confidence Estimation.
Proceedings of the Computer Vision - ECCV 2022, 2022

CA-SSL: Class-Agnostic Semi-Supervised Learning for Detection and Segmentation.
Proceedings of the Computer Vision - ECCV 2022, 2022

Towards Language-Free Training for Text-to-Image Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

EI-CLIP: Entity-aware Interventional Contrastive Learning for E-commerce Cross-modal Retrieval.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Open-Vocabulary Instance Segmentation via Robust Cross-Modal Pseudo-Labeling.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

User-Entity Differential Privacy in Learning Natural Language Models.
Proceedings of the IEEE International Conference on Big Data, 2022

Learning Adaptive Axis Attentions in Fine-tuning: Beyond Fixed Sparse Attention Patterns.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

TiGAN: Text-Based Interactive Image Generation and Manipulation.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

UNISON: Unpaired Cross-Lingual Image Captioning.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
CaSP: Class-agnostic Semi-Supervised Pretraining for Detection and Segmentation.
CoRR, 2021

LAFITE: Towards Language-Free Training for Text-to-Image Generation.
CoRR, 2021

UniDoc: Unified Pretraining Framework for Document Understanding.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Towards Interpreting and Mitigating Shortcut Learning Behavior of NLU models.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Exploiting Semantic Embedding and Visual Feature for Facial Action Unit Detection.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Multi-Scale Aligned Distillation for Low-Resolution Detection.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

SelfDoc: Self-Supervised Document Representation Learning.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020
Video captioning with boundary-aware hierarchical language decoding and joint video prediction.
Neurocomputing, 2020

Unsupervised Cross-lingual Image Captioning.
CoRR, 2020

Self-Supervised Relationship Probing.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Finding It at Another Side: A Viewpoint-Adapted Matching Encoder for Change Captioning.
Proceedings of the Computer Vision - ECCV 2020, 2020

2019
Resilient Load Restoration in Microgrids Considering Mobile Energy Storage Fleets: A Deep Reinforcement Learning Approach.
CoRR, 2019

Watch It Twice: Video Captioning with a Refocused Video Encoder.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Unpaired Image Captioning via Scene Graph Alignments.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Scene Graph Generation With External Knowledge and Image Reconstruction.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
Recent advances in convolutional neural networks.
Pattern Recognit., 2018

NTU ROSE Lab at TRECVID 2018: Ad-hoc Video Search and Video to Text.
Proceedings of the 2018 TREC Video Retrieval Evaluation, 2018

Unpaired Image Captioning by Language Pivoting.
Proceedings of the Computer Vision - ECCV 2018, 2018

Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval With Generative Models.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Stack-Captioning: Coarse-to-Fine Learning for Image Captioning.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017
An Empirical Study of Language CNN for Image Captioning.
Proceedings of the IEEE International Conference on Computer Vision, 2017

2016
Recurrent Highway Networks with Language CNN for Image Captioning.
CoRR, 2016

2015
Recent Advances in Convolutional Neural Networks.
CoRR, 2015


  Loading...