Boyi Li

Orcid: 0000-0002-6752-3223

Affiliations:

NVIDIA, Autonomous Vehicle Research Group, Santa Clara, CA, USA
University of California, Berkeley, CA, USA
Cornell University, Ithaca, NY, USA (PhD)

According to our database¹, Boyi Li authored at least 47 papers between 2017 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

Attend Before Attention: Efficient and Scalable Video Understanding via Autoregressive Gazing.

[BibT_eX]

[DOI]

CoRR, March, 2026

Accelerating Structured Chain-of-Thought in Autonomous Vehicles.

[BibT_eX]

[DOI]

CoRR, February, 2026

2025

Counterfactual VLA: Self-Reflective Vision-Language-Action Model with Adaptive Reasoning.

[BibT_eX]

[DOI]

CoRR, December, 2025

Towards Efficient and Effective Multi-Camera Encoding for End-to-End Driving.

[BibT_eX]

[DOI]

CoRR, December, 2025

FoundationMotion: Auto-Labeling and Reasoning about Spatial Movement in Videos.

[BibT_eX]

[DOI]

CoRR, December, 2025

The Case for Negative Data: From Crash Reports to Counterfactuals for Reasonable Driving.

[BibT_eX]

[DOI]

CoRR, September, 2025

MultiGen: Using Multimodal Generation in Simulation to Learn Multimodal Policies in Real.

[BibT_eX]

[DOI]

Gopala Anumanchipalli

CoRR, July, 2025

Atlas: Multi-Scale Attention Improves Long Context Image Modeling.

[BibT_eX]

[DOI]

Kumar Krishna Agrawal

CoRR, March, 2025

Wolf: Dense Video Captioning with a World Summarization Framework.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2025

Interactive Task Planning with Language Models.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2025

DreamDrive: Generative 4D Scene Modeling from Street View Images.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Automation, 2025

LoRA3D: Low-Rank Self-Calibration of 3D Geometric Foundation models.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding.

[BibT_eX]

[DOI]

Vitor Campagnolo Guizilini

Yue Wang

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Language-Image Models with 3D Understanding.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

STORM: Spatio-TempOral Reconstruction Model For Large-Scale Outdoor Scenes.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Describe Anything: Detailed Localized Image and Video Captioning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Bias in Gender Bias Benchmarks: How Spurious Features Distort Evaluation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Extrapolated Urban View Synthesis Benchmark.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Scaling Vision Pre-Training to 4K Resolution.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

LOTUS: A Leaderboard for Detailed Image Captioning from Quality to Societal Bias and User Preferences.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track), 2025

2024

LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2024

Wolf: Captioning Everything with a World Summarization Framework.

[BibT_eX]

[DOI]

CoRR, 2024

Synthesizing Moving People with 3D Control.

[BibT_eX]

[DOI]

Boyi Li

Jathushan Rajasegaran

Yossi Gandelsman

Alexei A. Efros

Jitendra Malik

CoRR, 2024

DiffuBox: Refining 3D Object Detection with Point Diffusion.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Re-evaluating the Need for Visual Signals in Unsupervised Grammar Induction.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via Self-Supervision.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

LLM-grounded Video Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Self-Correcting LLM-Controlled Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Driving Everywhere with Large Language Model Policy Adaptation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Tokenize the World into Object-level Knowledge to Address Long-tail Events in Autonomous Driving.

[BibT_eX]

[DOI]

Proceedings of the Conference on Robot Learning, 6-9 November 2024, Munich, Germany., 2024

Promptable Closed-loop Traffic Simulation.

[BibT_eX]

[DOI]

Proceedings of the Conference on Robot Learning, 6-9 November 2024, Munich, Germany., 2024

2023

From Wrong To Right: A Recursive Approach Towards Vision-Language Explanation.

[BibT_eX]

[DOI]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

2022

Does unsupervised grammar induction need pixels?

[BibT_eX]

[DOI]

CoRR, 2022

Language-driven Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

Fixed Neural Network Steganography: Train the images, not the network.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

SITTA: Single Image Texture Translation for Data Augmentation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022 Workshops, 2022

Neural Image Recolorization for Creative Domains.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022

2021

Single Image Texture Translation for Data Augmentation.

[BibT_eX]

[DOI]

CoRR, 2021

On Feature Normalization and Data Augmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2019

Benchmarking Single-Image Dehazing and Beyond.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2019

Integrated Triaging for Fast Reading Comprehension.

[BibT_eX]

[DOI]

CoRR, 2019

FastFusionNet: New State-of-the-Art for DAWNBench SQuAD.

[BibT_eX]

[DOI]

CoRR, 2019

Positional Normalization.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

2018

End-to-End United Video Dehazing and Detection.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017

RESIDE: A Benchmark for Single Image Dehazing.

[BibT_eX]

[DOI]

CoRR, 2017

An All-in-One Network for Dehazing and Beyond.

[BibT_eX]

[DOI]

CoRR, 2017

AOD-Net: All-in-One Dehazing Network.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2017

Boyi Li

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...