We stand with Ukraine

We stand with Ukraine

Hao Tan

Orcid: 0000-0002-9755-9040

Affiliations:

Adobe Research
University of North Carolina, Chapel Hill, NC, USA (former)

According to our database¹, Hao Tan authored at least 70 papers between 2017 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

Online presence:

on orcid.org
on openreview.net

On csauthors.net:

Bibliography

2026

Softmax-GS: Generalized Gaussians Learning When to Blend or Bound.

[DOI]

,

,

,

,

CoRR, April, 2026

OmniRoam: World Wandering via Long-Horizon Panoramic Video Generation.

[DOI]

,

,

,

,

,

Kalyan Sunkavalli

,

Yannick Hold-Geoffroy

,

,

,

,

,

CoRR, March, 2026

Anticipatory Planning for Multimodal AI Agents.

[DOI]

,

,

,

,

,

Franck Dernoncourt

,

,

,

CoRR, March, 2026

tttLRM: Test-Time Training for Long Context and Autoregressive 3D Reconstruction.

[DOI]

,

,

,

,

,

Kalyan Sunkavalli

,

,

,

CoRR, February, 2026

2025

E-RayZer: Self-supervised 3D Reconstruction as Spatial Visual Pre-training.

[DOI]

,

,

,

,

,

Kalyan Sunkavalli

,

Shubham Tulsiani

,

CoRR, December, 2025

Long-LRM++: Preserving Fine Details in Feed-Forward Wide-Coverage Reconstruction.

[DOI]

,

,

,

,

CoRR, December, 2025

Rethinking Training Dynamics in Scale-wise Autoregressive Generation.

[DOI]

,

,

,

,

CoRR, December, 2025

SplatPainter: Interactive Authoring of 3D Gaussians from 2D Edits via Test-Time Training.

[DOI]

,

,

,

,

Leonidas J. Guibas

,

Gordon Wetzstein

,

CoRR, December, 2025

RELIC: Interactive Video World Model with Long-Horizon Memory.

[DOI]

,

,

,

,

,

,

Yannick Hold-Geoffroy

,

,

,

,

Kalyan Sunkavalli

,

,

,

CoRR, December, 2025

GUI-AIMA: Aligning Intrinsic Multimodal Attention with a Context Anchor for GUI Grounding.

[DOI]

,

,

,

,

,

,

CoRR, November, 2025

pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation.

[DOI]

,

,

,

Leonidas J. Guibas

,

Gordon Wetzstein

,

CoRR, October, 2025

Aligning Visual Foundation Encoders to Tokenizers for Diffusion Models.

[DOI]

,

,

,

,

,

,

,

,

CoRR, September, 2025

RigAnything: Template-Free Autoregressive Rigging for Diverse 3D Assets.

[DOI]

,

,

,

,

,

,

,

ACM Trans. Graph., August, 2025

MS4UI: A Dataset for Multi-modal Summarization of User Interface Instructional Videos.

[DOI]

,

,

,

Franck Dernoncourt

,

,

,

,

CoRR, June, 2025

Test-Time Training Done Right.

[DOI]

,

,

,

,

,

,

Kalyan Sunkavalli

,

William T. Freeman

,

CoRR, May, 2025

RayZer: A Self-supervised Large View Synthesis Model.

[DOI]

,

,

,

,

,

,

,

,

Kalyan Sunkavalli

,

,

Georgios Pavlakos

CoRR, May, 2025

Pre-trained Vision-Language Models Learn Discoverable Visual Concepts.

[DOI]

,

,

,

,

Trans. Mach. Learn. Res., 2025

Neural BRDF Importance Sampling by Reparameterization.

[DOI]

,

,

,

,

,

,

,

Ravi Ramamoorthi

Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference, 2025

4D-LRM: Large Space-Time Reconstruction Model From and To Any View at Any Time.

[DOI]

,

,

,

,

,

,

,

,

,

Kalyan Sunkavalli

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Gaussian Mixture Flow Matching Models.

[DOI]

,

,

,

,

,

Leonidas J. Guibas

,

Gordon Wetzstein

,

Proceedings of the Forty-second International Conference on Machine Learning, 2025

RelitLRM: Generative Relightable Radiance for Large Reconstruction Models.

[DOI]

,

,

,

,

,

,

,

,

,

William T. Freeman

,

,

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

VEGGIE: Instructional Editing and Reasoning Video Concepts with Grounded Generation.

[DOI]

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Rayzer: a Self-Supervised Large View Synthesis Model.

[DOI]

,

,

,

,

,

,

,

,

Kalyan Sunkavalli

,

,

Georgios Pavlakos

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Long-LRM: Long-Sequence Large Reconstruction Model for Wide-Coverage Gaussian Splats.

[DOI]

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Progressive Autoregressive Video Diffusion Models.

[DOI]

,

,

,

,

,

,

Arie E. Kaufman

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025

RandAR: Decoder-only Autoregressive Visual Generation in Random Orders.

[DOI]

,

,

,

,

,

,

William T. Freeman

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Buffer Anytime: Zero-Shot Video Depth and Normal from Image Priors.

[DOI]

,

,

,

,

,

,

,

,

Gordon Wetzstein

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

MegaSynth: Scaling Up 3D Scene Reconstruction with Synthesized Data.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

Georgios Pavlakos

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Turbo3D: Ultra-fast Text-to-3D Generation.

[DOI]

,

,

,

,

,

,

,

Shubham Tulsiani

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Generating 3D-Consistent Videos from Unposed Internet Photos.

[DOI]

,

,

,

,

,

,

Bharath Hariharan

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

Numerical Pruning for Efficient Autoregressive Models.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, 2024

MeshLRM: Large Reconstruction Model for High-Quality Mesh.

[DOI]

,

,

,

,

,

Valentin Deschaintre

,

Kalyan Sunkavalli

,

,

CoRR, 2024

Single-View 3D Human Digitalization with Large Reconstruction Models.

[DOI]

,

,

,

,

,

Serena Yeung-Levy

,

CoRR, 2024

LRM-Zero: Training Large Reconstruction Models with Synthesized Data.

[DOI]

,

,

,

,

,

,

,

Arie E. Kaufman

,

,

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Identifying Speakers in Dialogue Transcripts: A Text-based Approach Using Pretrained Language Models.

[DOI]

,

Franck Dernoncourt

,

,

Hanieh Deilamsalehy

,

,

,

,

,

Thien Huu Nguyen

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

DMV3D: Denoising Multi-view Diffusion Using 3D Large Reconstruction Model.

[DOI]

,

,

,

,

,

,

,

Kalyan Sunkavalli

,

Gordon Wetzstein

,

,

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Instant3D: Fast Text-to-3D with Sparse-view Generation and Large Reconstruction Model.

[DOI]

,

,

,

,

,

,

,

Kalyan Sunkavalli

,

Greg Shakhnarovich

,

Proceedings of the Twelfth International Conference on Learning Representations, 2024

LRM: Large Reconstruction Model for Single Image to 3D.

[DOI]

,

,

,

,

,

,

,

Kalyan Sunkavalli

,

,

Proceedings of the Twelfth International Conference on Learning Representations, 2024

SOHES: Self-supervised Open-world Hierarchical Entity Segmentation.

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the Twelfth International Conference on Learning Representations, 2024

PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction.

[DOI]

,

,

,

,

,

Kalyan Sunkavalli

,

,

,

Proceedings of the Twelfth International Conference on Learning Representations, 2024

GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting.

[DOI]

,

,

,

,

,

Kalyan Sunkavalli

,

Proceedings of the Computer Vision - ECCV 2024, 2024

Carve3D: Improving Multi-view Reconstruction Consistency for Diffusion Models with RL Finetuning.

[DOI]

,

,

,

,

,

,

,

,

Arie E. Kaufman

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Building Vision-Language Models on Solid Foundations with Masked Distillation.

[DOI]

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

Boosting Punctuation Restoration with Data Generation and Reinforcement Learning.

[DOI]

,

,

,

,

,

,

Hanieh Deilamsalehy

,

Franck Dernoncourt

,

Thien Huu Nguyen

CoRR, 2023

Boosting Punctuation Restoration with Data Generation and Reinforcement Learning.

[DOI]

,

,

,

,

,

,

Hanieh Deilamsalehy

,

Franck Dernoncourt

,

Thien Huu Nguyen

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Scaling Data Generation in Vision-and-Language Navigation.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Learning Navigational Visual Representations with Semantic Map Supervision.

[DOI]

,

,

,

Franck Dernoncourt

,

,

,

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

2022

CLEAR: Improving Vision-Language Navigation with Cross-Lingual, Environment-Agnostic Representations.

[DOI]

,

,

Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2022, 2022

How Much Can CLIP Benefit Vision-and-Language Tasks?

[DOI]

,

Liunian Harold Li

,

,

,

,

,

,

Proceedings of the Tenth International Conference on Learning Representations, 2022

Envedit: Environment Editing for Vision-and-Language Navigation.

[DOI]

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Scientific Chart Summarization: Datasets and Improved Text Modeling.

[DOI]

,

,

,

Proceedings of the Workshop on Scientific Document Understanding co-located with 36th AAAI Conference on Artificial Inteligence, 2022

2021

VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning.

[DOI]

,

,

,

CoRR, 2021

VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer.

[DOI]

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Improving Cross-Modal Alignment in Vision Language Navigation via Syntactic Information.

[DOI]

,

,

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Unifying Vision-and-Language Tasks via Text Generation.

[DOI]

,

,

,

Proceedings of the 38th International Conference on Machine Learning, 2021

2020

Diagnosing the Environment Bias in Vision-and-Language Navigation.

[DOI]

,

,

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

Enabling Robots to Understand Incomplete Natural Language Instructions Using Commonsense Reasoning.

[DOI]

,

,

,

,

Proceedings of the 2020 IEEE International Conference on Robotics and Automation, 2020

The Curse of Performance Instability in Analysis Datasets: Consequences, Source, and Suggestions.

[DOI]

,

,

,

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

MAF: Multimodal Alignment Framework for Weakly-Supervised Phrase Grounding.

[DOI]

,

,

,

Michael W. Mahoney

,

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision.

[DOI]

,

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

ArraMon: A Joint Navigation-Assembly Instruction Interpretation Task in Dynamic Environments.

[DOI]

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

Modality-Balanced Models for Visual Dialogue.

[DOI]

,

,

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019

Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout.

[DOI]

,

,

Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

LXMERT: Learning Cross-Modality Encoder Representations from Transformers.

[DOI]

,

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

Expressing Visual Relationships via Language.

[DOI]

,

Franck Dernoncourt

,

,

,

Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

2018

Object Ordering with Bidirectional Matchings for Visual Reasoning.

[DOI]

,

Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018

Source-Target Inference Models for Spatial Instruction Understanding.

[DOI]

,

Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017

A Joint Speaker-Listener-Reinforcer Model for Referring Expressions.

[DOI]

,

,

,

Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Loading...