Li Yuan

Orcid: 0000-0002-2120-5588

Affiliations:
  • Peking University, School of Electronic and Computer Engineering, Beijing, China
  • National University of Singapore, Singapore


According to our database1, Li Yuan authored at least 155 papers between 2018 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
MagicTime: Time-Lapse Video Generation Models as Metamorphic Simulators.
IEEE Trans. Pattern Anal. Mach. Intell., September, 2025

E-4DGS: High-Fidelity Dynamic Reconstruction from the Multi-view Event Cameras.
CoRR, August, 2025

Point Tree Transformer for Point Cloud Registration.
IEEE Trans. Circuits Syst. Video Technol., July, 2025

CoT-lized Diffusion: Let's Reinforce T2I Generation Step-by-step.
CoRR, July, 2025

Look-Back: Implicit Visual Re-focusing in MLLM Reasoning.
CoRR, July, 2025

Spatial-Temporal Spiking Feature Pruning in Spiking Transformer.
IEEE Trans. Cogn. Dev. Syst., June, 2025

LeanPO: Lean Preference Optimization for Likelihood Alignment in Video-LLMs.
CoRR, June, 2025

UniWorld-V1: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation.
CoRR, June, 2025

Beyond Chemical QA: Evaluating LLM's Chemical Reasoning with Modular Chemical Operations.
CoRR, May, 2025

Sci-Fi: Symmetric Constraint for Frame Inbetweening.
CoRR, May, 2025

OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation.
CoRR, May, 2025

ImgEdit: A Unified Image Editing Dataset and Benchmark.
CoRR, May, 2025

GS2E: Gaussian Splatting is an Effective Data Generator for Event Stream Generation.
CoRR, May, 2025

BioProBench: Comprehensive Dataset and Benchmark in Biological Protocol Understanding and Reasoning.
CoRR, May, 2025

HoloTime: Taming Video Diffusion Models for Panoramic 4D Scene Generation.
CoRR, April, 2025

How to Detect and Defeat Molecular Mirage: A Metric-Driven Benchmark for Hallucination in LLM-based Molecular Comprehension.
CoRR, April, 2025

GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation.
CoRR, April, 2025

Hierarchical Banzhaf Interaction for General Video-Language Representation Learning.
IEEE Trans. Pattern Anal. Mach. Intell., March, 2025

Decoupled peak property learning for efficient and interpretable electronic circular dichroism spectrum prediction.
Nat. Comput. Sci., March, 2025

NeuralGS: Bridging Neural Fields and 3D Gaussian Splatting for Compact 3D Representations.
CoRR, March, 2025

SwapAnyone: Consistent and Realistic Video Synthesis for Swapping Any Person into Any Video.
CoRR, March, 2025

WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation.
CoRR, March, 2025

UniGenX: Unified Generation of Sequence and Structure with Autoregressive Diffusion.
CoRR, March, 2025

Learning Box Regression and Mask Segmentation Under Long-Tailed Distribution with Gradient Transfusing.
Int. J. Comput. Vis., February, 2025

AE-NeRF: Augmenting Event-Based Neural Radiance Fields for Non-ideal Conditions and Larger Scene.
CoRR, January, 2025

TaxDiff: taxonomic-guided diffusion model for protein sequence generation.
Sci. China Inf. Sci., 2025

Multi-objective Aligned Bidword Generation Model for E-commerce Search Advertising.
Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2025

PiCO: Peer Review in LLMs based on Consistency Optimization.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Spiking Transformer with Spatial-Temporal Spiking Self-Attention.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Identity-Preserving Text-to-Video Generation by Frequency Decomposition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

RoomPainter: View-Integrated Diffusion for Consistent Indoor Scene Texturing.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Is Parameter Collision Hindering Continual Learning in LLMs?
Proceedings of the 31st International Conference on Computational Linguistics, 2025

Cycle3D: High-quality and Consistent Image-to-3D Generation via Generation-Reconstruction Cycle.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

AE-NeRF: Augmenting Event-Based Neural Radiance Fields for Non-ideal Conditions and Larger Scenes.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
Learnable Central Similarity Quantization for Efficient Image and Video Retrieval.
IEEE Trans. Neural Networks Learn. Syst., December, 2024

An Organ-Aware Diagnosis Framework for Radiology Report Generation.
IEEE Trans. Medical Imaging, December, 2024

Full Transformer Framework for Robust Point Cloud Registration With Deep Information Interaction.
IEEE Trans. Neural Networks Learn. Syst., October, 2024

Adversarial Attacks on Video Object Segmentation With Hard Region Discovery.
IEEE Trans. Circuits Syst. Video Technol., June, 2024

Efficient Long-Short Temporal Attention network for unsupervised Video Object Segmentation.
Pattern Recognit., February, 2024

Fully Transformer-Equipped Architecture for end-to-end Referring Video Object Segmentation.
Inf. Process. Manag., January, 2024

Masked Autoencoders for 3D Point Cloud Self-supervised Learning.
World Sci. Annu. Rev. Artif. Intell., 2024

Self-architectural knowledge distillation for spiking neural networks.
Neural Networks, 2024

Hierarchical Banzhaf Interaction for General Video-Language Representation Learning.
CoRR, 2024

Navigating Chemical-Linguistic Sharing Space with Heterogeneous Molecular Encoding.
CoRR, 2024

Next Patch Prediction for Autoregressive Visual Generation.
CoRR, 2024

DreamDance: Animating Human Images by Enriching 3D Geometry Cues from 2D Poses.
CoRR, 2024

Open-Sora Plan: Open-Source Large Video Generation Model.
CoRR, 2024

Identity-Preserving Text-to-Video Generation by Frequency Decomposition.
CoRR, 2024

Sparse Orthogonal Parameters Tuning for Continual Learning.
CoRR, 2024

ETTFS: An Efficient Training Framework for Time-to-First-Spike Neuron.
CoRR, 2024

Spatial-Temporal Search for Spiking Neural Networks.
CoRR, 2024

MoH: Multi-Head Attention as Mixture-of-Head Attention.
CoRR, 2024

Multi-granularity Score-based Generative Framework Enables Efficient Inverse Design of Complex Organics.
CoRR, 2024

ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis.
CoRR, 2024

OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model.
CoRR, 2024

HoloDreamer: Holistic 3D Panoramic World Generation from Text Descriptions.
CoRR, 2024

Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation.
CoRR, 2024

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions.
CoRR, 2024

EvaGaussians: Event Stream Assisted Gaussian Splatting from Blurry Images.
CoRR, 2024

UNIAA: A Unified Multi-modal Image Aesthetic Assessment Baseline and Benchmark.
CoRR, 2024

Envision3D: One Image to 3D with Anchor Views Interpolation.
CoRR, 2024

TaxDiff: Taxonomic-Guided Diffusion Model for Protein Sequence Generation.
CoRR, 2024

ProLLaMA: A Protein Large Language Model for Multi-Task Protein Language Processing.
CoRR, 2024

LLMBind: A Unified Modality-Task Integration Framework.
CoRR, 2024

MoE-LLaVA: Mixture of Experts for Large Vision-Language Models.
CoRR, 2024

Deep peak property learning for efficient chiral molecules ECD spectra prediction.
CoRR, 2024

Spikformer V2: Join the High Accuracy Club on ImageNet with an SNN Ticket.
CoRR, 2024

QKFormer: Hierarchical Spiking Transformer using Q-K Attention.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Spiking Transformer with Experts Mixture.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

VLMimic: Vision Language Models are Visual Imitation Learner for Fine-grained Actions.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Prompt2Poster: Automatically Artistic Chinese Poster Creation from Prompt Only.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Fast and Robust Point Cloud Registration with Tree-based Transformer.
Proceedings of the IEEE International Conference on Robotics and Automation, 2024

LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Progressive3D: Progressively Local Editing for Text-to-3D Content Creation with Complex Semantic Prompts.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Optimal ANN-SNN Conversion with Group Neurons.
Proceedings of the IEEE International Conference on Acoustics, 2024

Temporal Contrastive Learning for Spiking Neural Networks.
Proceedings of the Artificial Neural Networks and Machine Learning - ICANN 2024, 2024

A Multi-modal Spiking Meta-learner with Brain-Inspired Task-Aware Modulation Scheme.
Proceedings of the Artificial Neural Networks and Machine Learning - ICANN 2024, 2024

LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Video-LLaVA: Learning United Visual Representation by Alignment Before Projection.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Repaint123: Fast and High-Quality One Image to 3D Generation with Progressive Controllable Repainting.
Proceedings of the Computer Vision - ECCV 2024, 2024

HiFi-123: Towards High-Fidelity One Image to 3D Content Generation.
Proceedings of the Computer Vision - ECCV 2024, 2024

FreestyleRet: Retrieving Images from Style-Diversified Queries.
Proceedings of the Computer Vision - ECCV 2024, 2024

Learning Pseudo 3D Guidance for View-Consistent Texturing with 2D Diffusion.
Proceedings of the Computer Vision - ECCV 2024, 2024

Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation.
Proceedings of the Computer Vision - ECCV 2024, 2024

GraCo: Granularity-Controllable Interactive Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Regressor-Segmenter Mutual Prompt Learning for Crowd Counting.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

SynSP: Synergy of Smoothness and Precision in Pose Sequences Refinement.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Towards Better Seach Query Classification with Distribution-Diverse Multi-Expert Knowledge Distillation in JD Ads Search.
Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, 2024

RAP: Efficient Text-Video Retrieval with Sparse-and-Correlated Adapter.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

Parallel Vertex Diffusion for Unified Visual Grounding.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
MADAv2: Advanced Multi-Anchor Based Active Domain Adaptation Segmentation.
IEEE Trans. Pattern Anal. Mach. Intell., November, 2023

Truncated attention-aware proposal networks with multi-scale dilation for temporal action detection.
Pattern Recognit., October, 2023

VOLO: Vision Outlooker for Visual Recognition.
IEEE Trans. Pattern Anal. Mach. Intell., May, 2023

Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition.
IEEE Trans. Pattern Anal. Mach. Intell., 2023

Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable 2D Repainting.
CoRR, 2023

Machine Mindset: An MBTI Exploration of Large Language Models.
CoRR, 2023

FreestyleRet: Retrieving Images from Style-Diversified Queries.
CoRR, 2023

Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models.
CoRR, 2023

Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding.
CoRR, 2023

Act As You Wish: Fine-Grained Control of Motion Diffusion Model with Hierarchical Semantic Graphs.
CoRR, 2023

HiFi-123: Towards High-fidelity One Image to 3D Content Generation.
CoRR, 2023

LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment.
CoRR, 2023

Triple-View Knowledge Distillation for Semi-Supervised Semantic Segmentation.
CoRR, 2023

ChatLaw: Open-Source Legal Large Language Model with Integrated External Knowledge Bases.
CoRR, 2023

Auto-Spikformer: Spikformer Architecture Search.
CoRR, 2023

ChatFace: Chat-Guided Real Face Editing via Diffusion Latent Space Manipulation.
CoRR, 2023

Album Storytelling with Iterative Story-aware Captioning and Large Language Models.
CoRR, 2023

Parallel Vertex Diffusion for Unified Visual Grounding.
CoRR, 2023

Spike-driven Transformer.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Act As You Wish: Fine-Grained Control of Motion Diffusion Model with Hierarchical Semantic Graphs.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

PointGPT: Auto-regressively Generative Pre-training from Point Clouds.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

Deep Interactive Full Transformer Framework for Point Cloud Registration.
Proceedings of the IEEE International Conference on Robotics and Automation, 2023

Spikformer: When Spiking Neural Network Meets Transformer.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Learning Sparse Neural Networks with Identity Layers.
Proceedings of the Image and Graphics - 12th International Conference, 2023

Rethinking Point Cloud Registration as Masking and Reconstruction.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Multi-granularity Interaction Simulation for Unsupervised Interactive Segmentation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

DiffusionRet: Generative Text-Video Retrieval with Diffusion Model.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Learning with Fantasy: Semantic-Aware Virtual Contrastive Constraint for Few-Shot Class-Incremental Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Out-of-Candidate Rectification for Weakly Supervised Semantic Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

ACSeg: Adaptive Conceptualization for Unsupervised Semantic Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Dynamic Clustering Network for Unsupervised Semantic Segmentation.
CoRR, 2022

Masked Autoencoders for Point Cloud Self-supervised Learning.
Proceedings of the Computer Vision - ECCV 2022, 2022

Locality Guidance for Improving Vision Transformers on Tiny Datasets.
Proceedings of the Computer Vision, 2022

Improving Vision Transformers by Revisiting High-Frequency Components.
Proceedings of the Computer Vision, 2022

2021
Exploring global diverse attention via pairwise temporal relation for video summarization.
Pattern Recognit., 2021

Refiner: Refining Self-attention for Vision Transformers.
CoRR, 2021

Token Labeling: Training a 85.4% Top-1 Accuracy Vision Transformer with 56M Parameters on ImageNet.
CoRR, 2021

Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet.
CoRR, 2021

All Tokens Matter: Token Labeling for Training Better Vision Transformers.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

PnP-DETR: Towards Efficient Visual Analysis with Transformers.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Continual Learning via Bit-Level Information Preserving.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020
Unsupervised Video Summarization With Cycle-Consistent Adversarial LSTM Networks.
IEEE Trans. Multim., 2020

Adversarial images for the primate brain.
CoRR, 2020

A Simple Baseline for Pose Tracking in Videos of Crowded Scenes.
CoRR, 2020

Toward Accurate Person-level Action Recognition in Videos of Crowded Scenes.
CoRR, 2020

Toward Accurate Person-level Action Recognition in Videos of Crowed Scenes.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Towards Accurate Human Pose Estimation in Videos of Crowded Scenes.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

A Simple Baseline for Pose Tracking in Videos of Crowed Scenes.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Central Similarity Quantization for Efficient Image and Video Retrieval.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Revisiting Knowledge Distillation via Label Smoothing Regularization.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019
Revisit Knowledge Distillation: a Teacher-free Framework.
CoRR, 2019

Central Similarity Hashing via Hadamard matrix.
CoRR, 2019

Few-Shot Adaptive Faster R-CNN.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Distilling Object Detectors With Fine-Grained Feature Imitation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Cycle-SUM: Cycle-Consistent Adversarial LSTM Networks for Unsupervised Video Summarization.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
Object Relation Detection Based on One-shot Learning.
CoRR, 2018


  Loading...