Yansong Tang

Orcid: 0000-0002-1534-4549

According to our database¹, Yansong Tang authored at least 127 papers between 2017 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2025

FLAG3D++: A Benchmark for 3D Fitness Activity Comprehension With Language Instruction.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., November, 2025

Efficient Text-Guided 3D-Aware Generation With Score Distillation on 3D Distribution.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., October, 2025

Human-in-the-loop Online Rejection Sampling for Robotic Manipulation.

[BibT_eX]

[DOI]

CoRR, October, 2025

RESample: A Robust Data Augmentation Framework via Exploratory Sampling for Robotic Manipulation.

[BibT_eX]

[DOI]

CoRR, October, 2025

iMoWM: Taming Interactive Multi-Modal World Model for Robotic Manipulation.

[BibT_eX]

[DOI]

CoRR, October, 2025

VLA-Reasoner: Empowering Vision-Language-Action Models with Reasoning via Online Monte Carlo Tree Search.

[BibT_eX]

[DOI]

CoRR, September, 2025

DSPv2: Improved Dense Policy for Effective and Generalizable Whole-body Mobile Manipulation.

[BibT_eX]

[DOI]

CoRR, September, 2025

ScoreHOI: Physically Plausible Reconstruction of Human-Object Interaction via Score-Guided Diffusion.

[BibT_eX]

[DOI]

CoRR, September, 2025

Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference.

[BibT_eX]

[DOI]

CoRR, September, 2025

GWM: Towards Scalable Gaussian World Models for Robotic Manipulation.

[BibT_eX]

[DOI]

CoRR, August, 2025

Pseudo Depth Meets Gaussian: A Feed-forward RGB SLAM Baseline.

[BibT_eX]

[DOI]

CoRR, August, 2025

Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning.

[BibT_eX]

[DOI]

CoRR, August, 2025

Language-Aware Vision Transformer for Referring Segmentation.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., July, 2025

Flash-VStream: Efficient Real-Time Understanding for Long Video Streams.

[BibT_eX]

[DOI]

CoRR, June, 2025

ManiGaussian++: General Robotic Bimanual Manipulation with Hierarchical Gaussian World Model.

[BibT_eX]

[DOI]

CoRR, June, 2025

Stepping Out of Similar Semantic Space for Open-Vocabulary Segmentation.

[BibT_eX]

[DOI]

CoRR, June, 2025

DreamLight: Towards Harmonious and Consistent Image Relighting.

[BibT_eX]

[DOI]

CoRR, June, 2025

Learning High-Quality Dynamic Memory for Video Object Segmentation.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., May, 2025

VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, May, 2025

UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, May, 2025

FlexiAct: Towards Flexible Action Control in Heterogeneous Scenarios.

[BibT_eX]

[DOI]

CoRR, May, 2025

KV-Edit: Training-Free Image Editing for Precise Background Preservation.

[BibT_eX]

[DOI]

CoRR, February, 2025

GaussianToken: An Effective Image Tokenizer with 2D Gaussian Splatting.

[BibT_eX]

[DOI]

CoRR, January, 2025

DEHand: Deformable Encoding for Photo-Realistic Free-View and Free-Pose Hand Rendering.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2025

OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2025

InstaRevive: One-Step Image Enhancement via Dynamic Score Matching.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

ThinkBot: Embodied Instruction Following with Thought Chain Reasoning.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

FADE: Frequency-Aware Diffusion Model Factorization for Video Editing.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

VoCo-LLaMA: Towards Vision Compression with Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

ATP-LLaVA: Adaptive Token Pruning for Large Vision Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

SAM2-LOVE: Segment Anything Model 2 in Language-aided Audio-Visual Scenes.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Ponder & Press: Advancing Visual GUI Agent towards General Computer Control.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

IteRPrimE: Zero-shot Referring Image Segmentation with Iterative Grad-CAM Refinement and Primary Word Emphasis.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024

StableSwap: Stable Face Swapping in a Shared and Controllable Latent Space.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2024

DOVE: Doodled vessel enhancement for photoacoustic angiography super resolution.

[BibT_eX]

[DOI]

Medical Image Anal., 2024

A Multitask Fourier Transformer Network for Seismic Source Characterization Estimation From a Single-Station Waveform.

[BibT_eX]

[DOI]

IEEE Geosci. Remote. Sens. Lett., 2024

AnyBimanual: Transferring Unimanual Policy for General Bimanual Manipulation.

[BibT_eX]

[DOI]

CoRR, 2024

Momentum-GS: Momentum Gaussian Self-Distillation for High-Quality Large Scene Reconstruction.

[BibT_eX]

[DOI]

CoRR, 2024

UVCG: Leveraging Temporal Consistency for Universal Video Protection.

[BibT_eX]

[DOI]

CoRR, 2024

NovelGS: Consistent Novel-view Denoising via Large Gaussian Reconstruction Model.

[BibT_eX]

[DOI]

CoRR, 2024

Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation.

[BibT_eX]

[DOI]

CoRR, 2024

Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model.

[BibT_eX]

[DOI]

CoRR, 2024

Arena Learning: Build Data Flywheel for LLMs Post-training via Simulated Chatbot Arena.

[BibT_eX]

[DOI]

CoRR, 2024

Hierarchical Memory for Long Video QA.

[BibT_eX]

[DOI]

CoRR, 2024

LIPE: Learning Personalized Identity Prior for Non-rigid Image Editing.

[BibT_eX]

[DOI]

CoRR, 2024

GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation.

[BibT_eX]

[DOI]

CoRR, 2024

VoCo-LLaMA: Towards Vision Compression with Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

Localizing Events in Videos with Multimodal Queries.

[BibT_eX]

[DOI]

CoRR, 2024

Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams.

[BibT_eX]

[DOI]

CoRR, 2024

ManiCM: Real-time 3D Diffusion Policy via Consistency Model for Robotic Manipulation.

[BibT_eX]

[DOI]

CoRR, 2024

OpenCarbonEval: A Unified Carbon Emission Estimation Framework in Large-Scale AI Models.

[BibT_eX]

[DOI]

CoRR, 2024

GaussianCube: Structuring Gaussian Splatting using Optimal Transport for 3D Generative Modeling.

[BibT_eX]

[DOI]

CoRR, 2024

1st Place Solution for 5th LSVOS Challenge: Referring Video Object Segmentation.

[BibT_eX]

[DOI]

CoRR, 2024

Localization-Aware Multi-Scale Representation Learning for Repetitive Action Counting.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Visual Communications and Image Processing, 2024

Fully Aligned Network for Referring Image Segmentation.

[BibT_eX]

[DOI]

Yong Liu

Ruihao Xu

Yansong Tang

Proceedings of the IEEE International Conference on Visual Communications and Image Processing, 2024

GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

GaussianCube: A Structured and Explicit Radiance Representation for 3D Generative Modeling.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Q-VLM: Post-training Quantization for Large Vision-Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

WizardArena: Post-training Large Language Models via Simulated Offline Chatbot Arena.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Language-Free Compositional Action Generation via Decoupling Refinement.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Post-training Quantization with Progressive Calibration and Activation Relaxing for Text-to-Image Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Plan, Posture and Go: Towards Open-Vocabulary Text-to-Motion Generation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Learning Dual-Level Deformable Implicit Representation for Real-World Scale Arbitrary Super-Resolution.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

MotionLCM: Real-Time Controllable Motion Generation via Latent Consistency Model.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

FlowIE: Efficient Image Enhancement via Rectified Flow.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Narrative Action Evaluation with Prompt-Guided Multimodal Interaction.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

PTM-VQA: Efficient Video Quality Assessment Leveraging Diverse PreTrained Models from the Wild.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Once for Both: Single Stage of Importance and Sparsity Search for Vision Transformer Compression.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Towards Accurate Post-Training Quantization for Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Universal Segmentation at Arbitrary Granularity with Language Instruction.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Open-Vocabulary Segmentation with Semantic-Assisted Calibration.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Segment and Caption Anything.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

CoSTA: End-to-End Comprehensive Space-Time Entanglement for Spatio-Temporal Video Grounding.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Learning Multi-Scale Video-Text Correspondence for Weakly Supervised Temporal Article Gronding.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Plan, Posture and Go: Towards Open-World Text-to-Motion Generation.

[BibT_eX]

[DOI]

CoRR, 2023

OccNeRF: Self-Supervised Multi-Camera Occupancy Prediction with Neural Radiance Fields.

[BibT_eX]

[DOI]

CoRR, 2023

ThinkBot: Embodied Instruction Following with Thought Chain Reasoning.

[BibT_eX]

[DOI]

CoRR, 2023

Fine-tuning vision foundation model for crack segmentation in civil infrastructures.

[BibT_eX]

[DOI]

CoRR, 2023

Lightweight Diffusion Models with Distillation-Based Block Neural Architecture Search.

[BibT_eX]

[DOI]

CoRR, 2023

Language-free Compositional Action Generation via Decoupling Refinement.

[BibT_eX]

[DOI]

CoRR, 2023

Efficient Text-Guided 3D-Aware Portrait Generation with Score Distillation Sampling on Distribution.

[BibT_eX]

[DOI]

CoRR, 2023

Towards Accurate Data-free Quantization for Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2023

Self-similarity-based super-resolution of photoacoustic angiography from hand-drawn doodles.

[BibT_eX]

[DOI]

CoRR, 2023

Efficient Meshy Neural Fields for Animatable Human Avatars.

[BibT_eX]

[DOI]

CoRR, 2023

SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

MCUFormer: Deploying Vision Tranformers on Microcontrollers with Limited Memory.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Ada-DQA: Adaptive Diverse Quality-aware Feature Acquisition for Video Quality Assessment.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

LUNA: Language as Continuing Anchors for Referring Expression Comprehension.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

HOI-aware Adaptive Network for Weakly-supervised Action Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

GAIN: On the Generalization of Instructional Action Understanding.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Context-Aware Inpainter-Refiner for Skeleton-Based Human Motion Completion.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Image Processing, 2023

FineDance: A Fine-grained Choreography Dataset for 3D Full Body Dance Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Skip-Plan: Procedure Planning in Instructional Videos via Condensed Action Space Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Global Knowledge Calibration for Fast Open-Vocabulary Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

LOGO: A Long-Form Video Dataset for Group Action Quality Assessment.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

FLAG3D: A 3D Fitness Activity Dataset with Language Instruction.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based Action Recognition.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., 2022

VideoABC: A Real-World Video Dataset for Abductive Visual Reasoning.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2022

HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

OrdinalCLIP: Learning Rank Prompts for Language-Guided Ordinal Regression.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

ScalableViT: Rethinking the Context-Oriented Generalization of Vision Transformer.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision, 2022

Global Spectral Filter Memory Network for Video Object Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Semantic-Aware Auto-Encoders for Self-supervised Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

BNV-Fusion: Dense 3D Reconstruction using Bi-level Neural Volume Fusion.

[BibT_eX]

[DOI]

Kejie Li

Yansong Tang

Victor Adrian Prisacariu

Philip H. S. Torr

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

LAVT: Language-Aware Vision Transformer for Referring Image Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

YouMVOS: An Actor-centric Multi-shot Video Object Segmentation Dataset.

[BibT_eX]

[DOI]

Anirudh Srinivasan Chakravarthy

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Comprehensive Instructional Video Analysis: The COIN Dataset and Performance Evaluation.

[BibT_eX]

[DOI]

Yansong Tang

Jiwen Lu

Jie Zhou

IEEE Trans. Pattern Anal. Mach. Intell., 2021

Unsupervised Embedding Learning from Uncertainty Momentum Modeling.

[BibT_eX]

[DOI]

CoRR, 2021

Breaking Shortcut: Exploring Fully Convolutional Cycle-Consistency for Video Correspondence Learning.

[BibT_eX]

[DOI]

CoRR, 2021

Hierarchical Interaction Network for Video Object Segmentation from Referring Expressions.

[BibT_eX]

[DOI]

Proceedings of the 32nd British Machine Vision Conference 2021, 2021

2020

Graph Interaction Networks for Relation Transfer in Human Activity Videos.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., 2020

Uncertainty-Aware Score Distribution Learning for Action Quality Assessment.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019

Learning Semantics-Preserving Attention and Contextual Interaction for Group Activity Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2019

Multi-Stream Deep Neural Networks for RGB-D Egocentric Action Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., 2019

COIN: A Large-Scale Dataset for Comprehensive Instructional Video Analysis.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018

Mining Semantics-Preserving Attention for Group Activity Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

Deep Progressive Reinforcement Learning for Skeleton-Based Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017

Action recognition in RGB-D egocentric videos.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Image Processing, 2017

Yansong Tang

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...