Dongxu Li
Orcid: 0000-0001-8543-4761Affiliations:
- DATA61-CSIRO, Australia
- Australian National University (ANU), College of Engineering and Computer Science, Australia
- Salesforce AI Research, Palo Alto, CA, USA
According to our database1,
Dongxu Li authored at least 39 papers
between 2018 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2025
VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
Proceedings of the Findings of the Association for Computational Linguistics, 2025
ProBench: Judging Multimodal Foundation Models on Open-ended Multi-domain Expert Tasks.
Proceedings of the Findings of the Association for Computational Linguistics, 2025
2024
LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
X-InstructBLIP: A Framework for Aligning Image, 3D, Audio, Video to LLMs and its Emergent Cross-Modal Reasoning.
Proceedings of the Computer Vision - ECCV 2024, 2024
2023
IEEE Trans. Pattern Anal. Mach. Intell., March, 2023
Enhanced Spatio-Temporal Interaction Learning for Video Deraining: Faster and Better.
IEEE Trans. Pattern Anal. Mach. Intell., 2023
X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning.
CoRR, 2023
BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models.
Proceedings of the International Conference on Machine Learning, 2023
Proceedings of the Eleventh International Conference on Learning Representations, 2023
From Images to Textual Prompts: Zero-shot Visual Question Answering with Frozen Large Language Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), 2023
2022
Four-player GroupGAN for weak expression recognition via latent expression magnification.
Knowl. Based Syst., 2022
CoRR, 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation.
Proceedings of the International Conference on Machine Learning, 2022
Proceedings of the Tenth International Conference on Learning Representations, 2022
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 2022
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022
2021
IEEE Trans. Image Process., 2021
CoRR, 2021
Enhanced Spatio-Temporal Interaction Learning for Video Deraining: A Faster and Better Framework.
CoRR, 2021
CoRR, 2021
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021
2020
Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2020
TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for Sign Language Translation.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020
Proceedings of the Formal Modeling and Analysis of Timed Systems, 2020
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020
2019
Proceedings of the 22nd ACM International Conference on Hybrid Systems: Computation and Control, 2019
2018
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018