Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

SV-RAG: LoRA-Contextualizing Adaptation of MLLMs for Long Document Understanding.

[BibT_eX]

[DOI]

Jian Chen

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

ImageFolder: Autoregressive Image Generation with Folded Tokens.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

QuartDepth: Post-Training Quantization for Real-Time Depth Estimation on the Edge.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

MegaSynth: Scaling Up 3D Scene Reconstruction with Synthesized Data.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

From Selection to Generation: A Survey of LLM-based Active Learning.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

METAL: A Multi-Agent Framework for Chart Generation with Test-Time Scaling.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Numerical Pruning for Efficient Autoregressive Models.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024

Numerical Pruning for Efficient Autoregressive Models.

[BibT_eX]

[DOI]

CoRR, 2024

SUGAR: Subject-Driven Video Customization in a Zero-Shot Manner.

[BibT_eX]

[DOI]

CoRR, 2024

Personalized Multimodal Large Language Models: A Survey.

[BibT_eX]

[DOI]

CoRR, 2024

XQ-GAN: An Open-source Image Tokenization Framework for Autoregressive Generation.

[BibT_eX]

[DOI]

CoRR, 2024

LoRA-Contextualizing Adaptation of Large Multimodal Models for Long Document Understanding.

[BibT_eX]

[DOI]

CoRR, 2024

A Survey of Small Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

VipAct: Visual-Perception Enhancement via Specialized VLM Agent Collaboration and Tool-use.

[BibT_eX]

[DOI]

CoRR, 2024

A Multi-LLM Debiasing Framework.

[BibT_eX]

[DOI]

CoRR, 2024

MMR: Evaluating Reading Ability of Large Multimodal Models.

[BibT_eX]

[DOI]

CoRR, 2024

Fast John Ellipsoid Computation with Differential Privacy Optimization.

[BibT_eX]

[DOI]

CoRR, 2024

CoMMIT: Coordinated Instruction Tuning for Multimodal Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

LLaVA-Read: Enhancing Reading Ability of Multimodal Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

Differential Privacy of Cross-Attention with Provable Guarantee.

[BibT_eX]

[DOI]

CoRR, 2024

Toward Infinite-Long Prefix in Transformer.

[BibT_eX]

[DOI]

CoRR, 2024

ARTIST: Improving the Generation of Text-rich Images by Disentanglement.

[BibT_eX]

[DOI]

CoRR, 2024

Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation.

[BibT_eX]

[DOI]

CoRR, 2024

DocSynthv2: A Practical Autoregressive Modeling for Document Generation.

[BibT_eX]

[DOI]

CoRR, 2024

Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective.

[BibT_eX]

[DOI]

CoRR, 2024

Tensor Attention Training: Provably Efficient Learning of Higher-order Transformers.

[BibT_eX]

[DOI]

CoRR, 2024

Conv-Basis: A New Paradigm for Efficient Attention Inference and Gradient Computation in Transformers.

[BibT_eX]

[DOI]

CoRR, 2024

Exploring the Frontiers of Softmax: Provable Optimization, Applications in Diffusion Model, and Beyond.

[BibT_eX]

[DOI]

CoRR, 2024

Fourier Circuits in Neural Networks: Unlocking the Potential of Large Language Models in Mathematical Reasoning and Modular Arithmetic.

[BibT_eX]

[DOI]

CoRR, 2024

Self-Cleaning: Improving a Named Entity Recognizer Trained on Noisy Data with a Few Clean Instances.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

Category-Aware Active Domain Adaptation.

[BibT_eX]

[DOI]

Wenxiao Xiao

Jiuxiang Gu

Hongfu Liu

Proceedings of the Forty-first International Conference on Machine Learning, 2024

LRM: Large Reconstruction Model for Single Image to 3D.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

ADOPD: A Large-Scale Document Page Decomposition Dataset.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

SOHES: Self-supervised Open-world Hierarchical Entity Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

TextLap: Customizing Language Models for Text-to-Layout Planning.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Advancing Vision-Language Models with Adapter Ensemble Strategies.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Customization Assistant for Text-to-image Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

TRINS: Towards Multimodal Language Models that Can Read.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

DocScript: Document-level Script Event Prediction.

[BibT_eX]

[DOI]

Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023

Open World Entity Segmentation.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., July, 2023

Improving a Named Entity Recognizer Trained on Noisy Data with a Few Clean Instances.

[BibT_eX]

[DOI]

CoRR, 2023

Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning.

[BibT_eX]

[DOI]

CoRR, 2023

LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding.

[BibT_eX]

[DOI]

CoRR, 2023

AIMS: All-Inclusive Multi-Level Segmentation.

[BibT_eX]

[DOI]

CoRR, 2023

LayerDoc: Layer-wise Extraction of Spatial Hierarchical Structure in Visually-Rich Documents.

[BibT_eX]

[DOI]

Anandhavelu Natarajan

Quan Hung Tran

Verena Kaynig-Fittkau

Ani Nenkova

Dinesh Manocha

Vlad I. Morariu

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023

AIMS: All-Inclusive Multi-Level Segmentation for Anything.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

High Quality Entity Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Learning the Visualness of Text Using Large Vision-Language Models.

[BibT_eX]

[DOI]

Gaurav Verma

Ryan A. Rossi

Christopher Tensmeyer

Jiuxiang Gu

Ani Nenkova

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

A Critical Analysis of Document Out-of-Distribution Detection.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

DocEdit: Language-Guided Document Editing.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

Fine-Grained Entity Segmentation.

[BibT_eX]

[DOI]

CoRR, 2022

Unified Pretraining Framework for Document Understanding.

[BibT_eX]

[DOI]

CoRR, 2022

FedKC: Federated Knowledge Composition for Multilingual Natural Language Understanding.

[BibT_eX]

[DOI]

Proceedings of the WWW '22: The ACM Web Conference 2022, Virtual Event, Lyon, France, April 25, 2022

Delving into Out-of-Distribution Detection with Vision-Language Representations.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

DocTime: A Document-level Temporal Dependency Graph Parser.

[BibT_eX]

[DOI]

Puneet Mathur

Vlad I. Morariu

Verena Kaynig-Fittkau

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

DocLayoutTTS: Dataset and Baselines for Layout-informed Document-level Neural Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

MGDoc: Pre-training with Multi-granular Hierarchy for Document Image Understanding.

[BibT_eX]

[DOI]

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Meta Spatio-Temporal Debiasing for Video Scene Graph Generation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Improving the Reliability for Confidence Estimation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

CA-SSL: Class-Agnostic Semi-Supervised Learning for Detection and Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Towards Language-Free Training for Text-to-Image Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

EI-CLIP: Entity-aware Interventional Contrastive Learning for E-commerce Cross-modal Retrieval.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Open-Vocabulary Instance Segmentation via Robust Cross-Modal Pseudo-Labeling.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

User-Entity Differential Privacy in Learning Natural Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Big Data, 2022

Learning Adaptive Axis Attentions in Fine-tuning: Beyond Fixed Sparse Attention Patterns.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

TiGAN: Text-Based Interactive Image Generation and Manipulation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

UNISON: Unpaired Cross-Lingual Image Captioning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021

CaSP: Class-agnostic Semi-Supervised Pretraining for Detection and Segmentation.

[BibT_eX]

[DOI]

CoRR, 2021

LAFITE: Towards Language-Free Training for Text-to-Image Generation.

[BibT_eX]

[DOI]

CoRR, 2021

UniDoc: Unified Pretraining Framework for Document Understanding.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Towards Interpreting and Mitigating Shortcut Learning Behavior of NLU models.

[BibT_eX]

[DOI]

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Exploiting Semantic Embedding and Visual Feature for Facial Action Unit Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Multi-Scale Aligned Distillation for Low-Resolution Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

SelfDoc: Self-Supervised Document Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020

Video captioning with boundary-aware hierarchical language decoding and joint video prediction.

[BibT_eX]

[DOI]

Neurocomputing, 2020

Unsupervised Cross-lingual Image Captioning.

[BibT_eX]

[DOI]

CoRR, 2020

Self-Supervised Relationship Probing.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Finding It at Another Side: A Viewpoint-Adapted Matching Encoder for Change Captioning.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

2019

Resilient Load Restoration in Microgrids Considering Mobile Energy Storage Fleets: A Deep Reinforcement Learning Approach.

[BibT_eX]

[DOI]

CoRR, 2019

Watch It Twice: Video Captioning with a Refocused Video Encoder.

[BibT_eX]

[DOI]

Proceedings of the 27th ACM International Conference on Multimedia, 2019

Unpaired Image Captioning via Scene Graph Alignments.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Scene Graph Generation With External Knowledge and Image Reconstruction.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018

Recent advances in convolutional neural networks.

[BibT_eX]

[DOI]

Pattern Recognit., 2018

NTU ROSE Lab at TRECVID 2018: Ad-hoc Video Search and Video to Text.

[BibT_eX]

[DOI]

Proceedings of the 2018 TREC Video Retrieval Evaluation, 2018

Unpaired Image Captioning by Language Pivoting.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018, 2018

Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval With Generative Models.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Stack-Captioning: Coarse-to-Fine Learning for Image Captioning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017

An Empirical Study of Language CNN for Image Captioning.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2017

2016

Recurrent Highway Networks with Language CNN for Image Captioning.

[BibT_eX]

[DOI]

Jiuxiang Gu

Gang Wang

Tsuhan Chen

CoRR, 2016

2015

Recent Advances in Convolutional Neural Networks.

[BibT_eX]

[DOI]

CoRR, 2015

Jiuxiang Gu

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...