Yansong Tang

Orcid: 0000-0002-1534-4549

According to our database1, Yansong Tang authored at least 116 papers between 2017 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Pseudo Depth Meets Gaussian: A Feed-forward RGB SLAM Baseline.
CoRR, August, 2025

Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning.
CoRR, August, 2025

Language-Aware Vision Transformer for Referring Segmentation.
IEEE Trans. Pattern Anal. Mach. Intell., July, 2025

Flash-VStream: Efficient Real-Time Understanding for Long Video Streams.
CoRR, June, 2025

ManiGaussian++: General Robotic Bimanual Manipulation with Hierarchical Gaussian World Model.
CoRR, June, 2025

Stepping Out of Similar Semantic Space for Open-Vocabulary Segmentation.
CoRR, June, 2025

DreamLight: Towards Harmonious and Consistent Image Relighting.
CoRR, June, 2025

Learning High-Quality Dynamic Memory for Video Object Segmentation.
IEEE Trans. Pattern Anal. Mach. Intell., May, 2025

VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning.
CoRR, May, 2025

UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning.
CoRR, May, 2025

FlexiAct: Towards Flexible Action Control in Heterogeneous Scenarios.
CoRR, May, 2025

KV-Edit: Training-Free Image Editing for Precise Background Preservation.
CoRR, February, 2025

GaussianToken: An Effective Image Tokenizer with 2D Gaussian Splatting.
CoRR, January, 2025

OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments.
IEEE Trans. Image Process., 2025

InstaRevive: One-Step Image Enhancement via Dynamic Score Matching.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

ThinkBot: Embodied Instruction Following with Thought Chain Reasoning.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

FADE: Frequency-Aware Diffusion Model Factorization for Video Editing.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

VoCo-LLaMA: Towards Vision Compression with Large Language Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

ATP-LLaVA: Adaptive Token Pruning for Large Vision Language Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

SAM2-LOVE: Segment Anything Model 2 in Language-aided Audio-Visual Scenes.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Ponder & Press: Advancing Visual GUI Agent towards General Computer Control.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

IteRPrimE: Zero-shot Referring Image Segmentation with Iterative Grad-CAM Refinement and Primary Word Emphasis.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
StableSwap: Stable Face Swapping in a Shared and Controllable Latent Space.
IEEE Trans. Multim., 2024

DOVE: Doodled vessel enhancement for photoacoustic angiography super resolution.
Medical Image Anal., 2024

A Multitask Fourier Transformer Network for Seismic Source Characterization Estimation From a Single-Station Waveform.
IEEE Geosci. Remote. Sens. Lett., 2024

AnyBimanual: Transferring Unimanual Policy for General Bimanual Manipulation.
CoRR, 2024

Momentum-GS: Momentum Gaussian Self-Distillation for High-Quality Large Scene Reconstruction.
CoRR, 2024

UVCG: Leveraging Temporal Consistency for Universal Video Protection.
CoRR, 2024

NovelGS: Consistent Novel-view Denoising via Large Gaussian Reconstruction Model.
CoRR, 2024

Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation.
CoRR, 2024

Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model.
CoRR, 2024

Arena Learning: Build Data Flywheel for LLMs Post-training via Simulated Chatbot Arena.
CoRR, 2024

Hierarchical Memory for Long Video QA.
CoRR, 2024

LIPE: Learning Personalized Identity Prior for Non-rigid Image Editing.
CoRR, 2024

GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation.
CoRR, 2024

VoCo-LLaMA: Towards Vision Compression with Large Language Models.
CoRR, 2024

Localizing Events in Videos with Multimodal Queries.
CoRR, 2024

Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams.
CoRR, 2024

ManiCM: Real-time 3D Diffusion Policy via Consistency Model for Robotic Manipulation.
CoRR, 2024

OpenCarbonEval: A Unified Carbon Emission Estimation Framework in Large-Scale AI Models.
CoRR, 2024

GaussianCube: Structuring Gaussian Splatting using Optimal Transport for 3D Generative Modeling.
CoRR, 2024

1st Place Solution for 5th LSVOS Challenge: Referring Video Object Segmentation.
CoRR, 2024

Localization-Aware Multi-Scale Representation Learning for Repetitive Action Counting.
Proceedings of the IEEE International Conference on Visual Communications and Image Processing, 2024

Fully Aligned Network for Referring Image Segmentation.
Proceedings of the IEEE International Conference on Visual Communications and Image Processing, 2024

GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

GaussianCube: A Structured and Explicit Radiance Representation for 3D Generative Modeling.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Q-VLM: Post-training Quantization for Large Vision-Language Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

WizardArena: Post-training Large Language Models via Simulated Offline Chatbot Arena.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Language-Free Compositional Action Generation via Decoupling Refinement.
Proceedings of the IEEE International Conference on Acoustics, 2024

RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models.
Proceedings of the Computer Vision - ECCV 2024, 2024

Post-training Quantization with Progressive Calibration and Activation Relaxing for Text-to-Image Diffusion Models.
Proceedings of the Computer Vision - ECCV 2024, 2024

ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation.
Proceedings of the Computer Vision - ECCV 2024, 2024

Plan, Posture and Go: Towards Open-Vocabulary Text-to-Motion Generation.
Proceedings of the Computer Vision - ECCV 2024, 2024

Learning Dual-Level Deformable Implicit Representation for Real-World Scale Arbitrary Super-Resolution.
Proceedings of the Computer Vision - ECCV 2024, 2024

MotionLCM: Real-Time Controllable Motion Generation via Latent Consistency Model.
Proceedings of the Computer Vision - ECCV 2024, 2024

FlowIE: Efficient Image Enhancement via Rectified Flow.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Narrative Action Evaluation with Prompt-Guided Multimodal Interaction.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

PTM-VQA: Efficient Video Quality Assessment Leveraging Diverse PreTrained Models from the Wild.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Once for Both: Single Stage of Importance and Sparsity Search for Vision Transformer Compression.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Towards Accurate Post-Training Quantization for Diffusion Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Universal Segmentation at Arbitrary Granularity with Language Instruction.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Open-Vocabulary Segmentation with Semantic-Assisted Calibration.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Segment and Caption Anything.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

CoSTA: End-to-End Comprehensive Space-Time Entanglement for Spatio-Temporal Video Grounding.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Learning Multi-Scale Video-Text Correspondence for Weakly Supervised Temporal Article Gronding.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Plan, Posture and Go: Towards Open-World Text-to-Motion Generation.
CoRR, 2023

OccNeRF: Self-Supervised Multi-Camera Occupancy Prediction with Neural Radiance Fields.
CoRR, 2023

ThinkBot: Embodied Instruction Following with Thought Chain Reasoning.
CoRR, 2023

Fine-tuning vision foundation model for crack segmentation in civil infrastructures.
CoRR, 2023

Lightweight Diffusion Models with Distillation-Based Block Neural Architecture Search.
CoRR, 2023

Language-free Compositional Action Generation via Decoupling Refinement.
CoRR, 2023

Efficient Text-Guided 3D-Aware Portrait Generation with Score Distillation Sampling on Distribution.
CoRR, 2023

Towards Accurate Data-free Quantization for Diffusion Models.
CoRR, 2023

Self-similarity-based super-resolution of photoacoustic angiography from hand-drawn doodles.
CoRR, 2023

Efficient Meshy Neural Fields for Animatable Human Avatars.
CoRR, 2023

SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

MCUFormer: Deploying Vision Tranformers on Microcontrollers with Limited Memory.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Ada-DQA: Adaptive Diverse Quality-aware Feature Acquisition for Video Quality Assessment.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

LUNA: Language as Continuing Anchors for Referring Expression Comprehension.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

HOI-aware Adaptive Network for Weakly-supervised Action Segmentation.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

GAIN: On the Generalization of Instructional Action Understanding.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Context-Aware Inpainter-Refiner for Skeleton-Based Human Motion Completion.
Proceedings of the IEEE International Conference on Image Processing, 2023

FineDance: A Fine-grained Choreography Dataset for 3D Full Body Dance Generation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Skip-Plan: Procedure Planning in Instructional Videos via Condensed Action Space Learning.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Global Knowledge Calibration for Fast Open-Vocabulary Segmentation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

LOGO: A Long-Form Video Dataset for Group Action Quality Assessment.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

FLAG3D: A 3D Fitness Activity Dataset with Language Instruction.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based Action Recognition.
ACM Trans. Multim. Comput. Commun. Appl., 2022

VideoABC: A Real-World Video Dataset for Abductive Visual Reasoning.
IEEE Trans. Image Process., 2022

HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

OrdinalCLIP: Learning Rank Prompts for Language-Guided Ordinal Regression.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

ScalableViT: Rethinking the Context-Oriented Generalization of Vision Transformer.
Proceedings of the Computer Vision, 2022

Global Spectral Filter Memory Network for Video Object Segmentation.
Proceedings of the Computer Vision - ECCV 2022, 2022

Semantic-Aware Auto-Encoders for Self-supervised Representation Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

BNV-Fusion: Dense 3D Reconstruction using Bi-level Neural Volume Fusion.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

LAVT: Language-Aware Vision Transformer for Referring Image Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

YouMVOS: An Actor-centric Multi-shot Video Object Segmentation Dataset.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
Comprehensive Instructional Video Analysis: The COIN Dataset and Performance Evaluation.
IEEE Trans. Pattern Anal. Mach. Intell., 2021

Unsupervised Embedding Learning from Uncertainty Momentum Modeling.
CoRR, 2021

Breaking Shortcut: Exploring Fully Convolutional Cycle-Consistency for Video Correspondence Learning.
CoRR, 2021

Hierarchical Interaction Network for Video Object Segmentation from Referring Expressions.
Proceedings of the 32nd British Machine Vision Conference 2021, 2021

2020
Graph Interaction Networks for Relation Transfer in Human Activity Videos.
IEEE Trans. Circuits Syst. Video Technol., 2020

Uncertainty-Aware Score Distribution Learning for Action Quality Assessment.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019
Learning Semantics-Preserving Attention and Contextual Interaction for Group Activity Recognition.
IEEE Trans. Image Process., 2019

Multi-Stream Deep Neural Networks for RGB-D Egocentric Action Recognition.
IEEE Trans. Circuits Syst. Video Technol., 2019

COIN: A Large-Scale Dataset for Comprehensive Instructional Video Analysis.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
Mining Semantics-Preserving Attention for Group Activity Recognition.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

Deep Progressive Reinforcement Learning for Skeleton-Based Action Recognition.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017
Action recognition in RGB-D egocentric videos.
Proceedings of the 2017 IEEE International Conference on Image Processing, 2017


  Loading...