Dongxu Li

Orcid: 0000-0001-8543-4761

Affiliations:
  • DATA61-CSIRO, Australia
  • Australian National University (ANU), College of Engineering and Computer Science, Australia
  • Salesforce AI Research, Palo Alto, CA, USA


According to our database1, Dongxu Li authored at least 39 papers between 2018 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
GTA1: GUI Test-time Scaling Agent.
CoRR, July, 2025

VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

EZSR: Event-based Zero-Shot Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Aria-UI: Visual Grounding for GUI Instructions.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

ProBench: Judging Multimodal Foundation Models on Open-ended Multi-domain Expert Tasks.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024
Aria: An Open Multimodal Native Mixture-of-Experts Model.
CoRR, 2024

LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

X-InstructBLIP: A Framework for Aligning Image, 3D, Audio, Video to LLMs and its Emergent Cross-Modal Reasoning.
Proceedings of the Computer Vision - ECCV 2024, 2024

2023
EDFace-Celeb-1 M: Benchmarking Face Hallucination With a Million-Scale Dataset.
IEEE Trans. Pattern Anal. Mach. Intell., March, 2023

Linearized Relative Positional Encoding.
Trans. Mach. Learn. Res., 2023

Enhanced Spatio-Temporal Interaction Learning for Video Deraining: Faster and Better.
IEEE Trans. Pattern Anal. Mach. Intell., 2023

X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning.
CoRR, 2023

BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models.
Proceedings of the International Conference on Machine Learning, 2023

Toeplitz Neural Network for Sequence Modeling.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

From Images to Textual Prompts: Zero-shot Visual Question Answering with Frozen Large Language Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

LAVIS: A One-stop Library for Language-Vision Intelligence.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), 2023

2022
Four-player GroupGAN for weak expression recognition via latent expression magnification.
Knowl. Based Syst., 2022

From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models.
CoRR, 2022

LAVIS: A Library for Language-Vision Intelligence.
CoRR, 2022

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation.
Proceedings of the International Conference on Machine Learning, 2022

cosFormer: Rethinking Softmax In Attention.
Proceedings of the Tenth International Conference on Learning Representations, 2022

The Devil in Linear Transformer.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Align and Prompt: Video-and-Language Pre-training with Entity Prompts.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Automatic Gloss Dictionary for Sign Language Learners.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 2022

Transcribing Natural Languages for the Deaf via Neural Editing Programs.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
Dual Attention-in-Attention Model for Joint Rain Streak and Raindrop Removal.
IEEE Trans. Image Process., 2021

EDFace-Celeb-1M: Benchmarking Face Hallucination with a Million-scale Dataset.
CoRR, 2021

Enhanced Spatio-Temporal Interaction Learning for Video Deraining: A Faster and Better Framework.
CoRR, 2021

Dual Attention-in-Attention Model for Joint Rain Streak and Raindrop Removal.
CoRR, 2021

Benchmarking Ultra-High-Definition Image Super-resolution.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

ARVo: Learning All-Range Volumetric Correspondence for Video Deblurring.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020
Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2020

TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for Sign Language Translation.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Reachability Analysis of Nonlinear Systems Using Hybridization and Dynamics Scaling.
Proceedings of the Formal Modeling and Analysis of Timed Systems, 2020

Transferring Cross-Domain Knowledge for Video Sign Language Recognition.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019
Falsification of hybrid systems using symbolic reachability and trajectory splicing.
Proceedings of the 22nd ACM International Conference on Hybrid Systems: Computation and Control, 2019

2018
Effect-Abstraction Based Relaxation for Linear Numeric Planning.
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018


  Loading...