Boyi Li

Orcid: 0000-0002-6752-3223

Affiliations:
  • NVIDIA, Autonomous Vehicle Research Group, Santa Clara, CA, USA
  • University of California, Berkeley, CA, USA
  • Cornell University, Ithaca, NY, USA (PhD)


According to our database1, Boyi Li authored at least 47 papers between 2017 and 2026.

Collaborative distances:

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
Attend Before Attention: Efficient and Scalable Video Understanding via Autoregressive Gazing.
CoRR, March, 2026

Accelerating Structured Chain-of-Thought in Autonomous Vehicles.
CoRR, February, 2026

2025
Counterfactual VLA: Self-Reflective Vision-Language-Action Model with Adaptive Reasoning.
CoRR, December, 2025

Towards Efficient and Effective Multi-Camera Encoding for End-to-End Driving.
CoRR, December, 2025

FoundationMotion: Auto-Labeling and Reasoning about Spatial Movement in Videos.
CoRR, December, 2025

The Case for Negative Data: From Crash Reports to Counterfactuals for Reasonable Driving.
CoRR, September, 2025

MultiGen: Using Multimodal Generation in Simulation to Learn Multimodal Policies in Real.
CoRR, July, 2025

Atlas: Multi-Scale Attention Improves Long Context Image Modeling.
CoRR, March, 2025

Wolf: Dense Video Captioning with a World Summarization Framework.
Trans. Mach. Learn. Res., 2025

Interactive Task Planning with Language Models.
Trans. Mach. Learn. Res., 2025

DreamDrive: Generative 4D Scene Modeling from Street View Images.
Proceedings of the IEEE International Conference on Robotics and Automation, 2025

LoRA3D: Low-Rank Self-Calibration of 3D Geometric Foundation models.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Language-Image Models with 3D Understanding.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

STORM: Spatio-TempOral Reconstruction Model For Large-Scale Outdoor Scenes.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Describe Anything: Detailed Localized Image and Video Captioning.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Bias in Gender Bias Benchmarks: How Spurious Features Distort Evaluation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Extrapolated Urban View Synthesis Benchmark.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Scaling Vision Pre-Training to 4K Resolution.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

LOTUS: A Leaderboard for Detailed Image Captioning from Quality to Societal Bias and User Preferences.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track), 2025

2024
LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models.
Trans. Mach. Learn. Res., 2024

Wolf: Captioning Everything with a World Summarization Framework.
CoRR, 2024

Synthesizing Moving People with 3D Control.
CoRR, 2024

DiffuBox: Refining 3D Object Detection with Point Diffusion.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Re-evaluating the Need for Visual Signals in Unsupervised Grammar Induction.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via Self-Supervision.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

LLM-grounded Video Diffusion Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Self-Correcting LLM-Controlled Diffusion Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Driving Everywhere with Large Language Model Policy Adaptation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Tokenize the World into Object-level Knowledge to Address Long-tail Events in Autonomous Driving.
Proceedings of the Conference on Robot Learning, 6-9 November 2024, Munich, Germany., 2024

Promptable Closed-loop Traffic Simulation.
Proceedings of the Conference on Robot Learning, 6-9 November 2024, Munich, Germany., 2024

2023
From Wrong To Right: A Recursive Approach Towards Vision-Language Explanation.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

2022
Does unsupervised grammar induction need pixels?
CoRR, 2022

Language-driven Semantic Segmentation.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Fixed Neural Network Steganography: Train the images, not the network.
Proceedings of the Tenth International Conference on Learning Representations, 2022

SITTA: Single Image Texture Translation for Data Augmentation.
Proceedings of the Computer Vision - ECCV 2022 Workshops, 2022

Neural Image Recolorization for Creative Domains.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022

2021
Single Image Texture Translation for Data Augmentation.
CoRR, 2021

On Feature Normalization and Data Augmentation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2019
Benchmarking Single-Image Dehazing and Beyond.
IEEE Trans. Image Process., 2019

Integrated Triaging for Fast Reading Comprehension.
CoRR, 2019

FastFusionNet: New State-of-the-Art for DAWNBench SQuAD.
CoRR, 2019

Positional Normalization.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

2018
End-to-End United Video Dehazing and Detection.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017
RESIDE: A Benchmark for Single Image Dehazing.
CoRR, 2017

An All-in-One Network for Dehazing and Beyond.
CoRR, 2017

AOD-Net: All-in-One Dehazing Network.
Proceedings of the IEEE International Conference on Computer Vision, 2017


  Loading...