Ying Shan

Orcid: 0000-0001-7673-8325

According to our database1, Ying Shan authored at least 259 papers between 2000 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
DARTScore: DuAl-Reconstruction Transformer for Video Captioning Evaluation.
IEEE Trans. Circuits Syst. Video Technol., April, 2024

DropConn: Dropout Connection Based Random GNNs for Molecular Property Prediction.
IEEE Trans. Knowl. Data Eng., February, 2024

UV Gaussians: Joint Learning of Mesh Deformation and Gaussian Textures for Human Avatar Modeling.
CoRR, 2024

Texture-GS: Disentangling the Geometry and Texture for 3D Gaussian Splatting Editing.
CoRR, 2024

SphereDiffusion: Spherical Geometry-Aware Distortion Resilient Diffusion Model.
CoRR, 2024

HeadEvolver: Text to Head Avatars via Locally Learnable Mesh Deformation.
CoRR, 2024

HRLAIF: Improvements in Helpfulness and Harmlessness in Open-domain Reinforcement Learning From AI Feedback.
CoRR, 2024

BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion.
CoRR, 2024

DO3D: Self-supervised Learning of Decomposed Object-aware 3D Motion and Depth from Monocular Videos.
CoRR, 2024

Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation.
CoRR, 2024

DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing.
CoRR, 2024

Advances in 3D Generation: A Survey.
CoRR, 2024

YOLO-World: Real-Time Open-Vocabulary Object Detection.
CoRR, 2024

RecDCL: Dual Contrastive Learning for Recommendation.
CoRR, 2024

TIP-Editor: An Accurate 3D Editor Following Both Text-Prompts And Image-Prompts.
CoRR, 2024

Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities.
CoRR, 2024

Supervised Fine-tuning in turn Improves Visual Foundation Models.
CoRR, 2024

VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models.
CoRR, 2024

Towards A Better Metric for Text-to-Video Generation.
CoRR, 2024

LLaMA Pro: Progressive LLaMA with Block Expansion.
CoRR, 2024

Sparse3D: Distilling Multiview-Consistent Diffusion for Object Reconstruction from Sparse Views.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

A Pre-convolved Representation for Plug-and-Play Neural Illumination Fields.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

SphereDiffusion: Spherical Geometry-Aware Distortion Resilient Diffusion Model.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

T2I-Adapter: Learning Adapters to Dig Out More Controllable Ability for Text-to-Image Diffusion Models.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

SC-NeuS: Consistent Neural Surface Reconstruction from Sparse and Noisy Views.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

SparseGNV: Generating Novel Views of Indoor Scenes with Sparse RGB-D Images.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Task-Aware Dual-Representation Network for Few-Shot Action Recognition.
IEEE Trans. Circuits Syst. Video Technol., October, 2023

ACNet: Approaching-and-Centralizing Network for Zero-Shot Sketch-Based Image Retrieval.
IEEE Trans. Circuits Syst. Video Technol., September, 2023

VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation.
CoRR, 2023

SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models.
CoRR, 2023

EgoPlan-Bench: Benchmarking Egocentric Embodied Planning with Multimodal Large Language Models.
CoRR, 2023

AFL-Net: Integrating Audio, Facial, and Lip Modalities with Cross-Attention for Robust Speaker Diarization in the Wild.
CoRR, 2023

Neural Concatenative Singing Voice Conversion: Rethinking Concatenation-Based Approach for One-Shot Singing Voice Conversion.
CoRR, 2023

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding.
CoRR, 2023

AnimateZero: Video Diffusion Models are Zero-Shot Image Animators.
CoRR, 2023

MotionCtrl: A Unified and Flexible Motion Controller for Video Generation.
CoRR, 2023

MagicStick: Controllable Video Editing via Control Handle Transformations.
CoRR, 2023

StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter.
CoRR, 2023

ConTex-Human: Free-View Rendering of Human from a Single Image with Texture-Consistent Synthesis.
CoRR, 2023

SEED-Bench-2: Benchmarking Multimodal Large Language Models.
CoRR, 2023

HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting.
CoRR, 2023

HumanRef: Single Image to 3D Human Generation via Reference-Guided Diffusion.
CoRR, 2023

GS-IR: 3D Gaussian Splatting for Inverse Rendering.
CoRR, 2023

ViT-Lens-2: Gateway to Omni-modal Intelligence.
CoRR, 2023

UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition.
CoRR, 2023

M<sup>2</sup>UGen: Multi-modal Music Understanding and Generation with the Power of Large Language Models.
CoRR, 2023

Vision-Language Instruction Tuning: A Review and Analysis.
CoRR, 2023

SemanticBoost: Elevating Motion Generation with Augmented Textual Cues.
CoRR, 2023

CustomNet: Zero-shot Object Customization with Variable-Viewpoints in Text-to-Image Diffusion Models.
CoRR, 2023

VideoCrafter1: Open Diffusion Models for High-Quality Video Generation.
CoRR, 2023

FreeNoise: Tuning-Free Longer Video Diffusion via Noise Rescheduling.
CoRR, 2023

TapMo: Shape-aware Motion Generation of Skeleton-free Characters.
CoRR, 2023

DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors.
CoRR, 2023

EvalCrafter: Benchmarking and Evaluating Large Video Generation Models.
CoRR, 2023

DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing.
CoRR, 2023

ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with Diffusion Models.
CoRR, 2023

HiFi-123: Towards High-fidelity One Image to 3D Content Generation.
CoRR, 2023

Making LLaMA SEE and Draw with SEED Tokenizer.
CoRR, 2023

One For All: Video Conversation is Feasible Without Video Instruction Tuning.
CoRR, 2023

Anti-Aliased Neural Implicit Surfaces with Encoding Level of Detail.
CoRR, 2023

HumTrans: A Novel Open-Source Dataset for Humming Melody Transcription and Beyond.
CoRR, 2023

Unified Pretraining Target Based Video-music Retrieval With Music Rhythm And Video Optical Flow Information.
CoRR, 2023

StyleAdapter: A Single-Pass LoRA-Free Model for Stylized Image Generation.
CoRR, 2023

Music Understanding LLaMA: Advancing Text-to-Music Generation with Question Answering and Captioning.
CoRR, 2023

ViT-Lens: Towards Omni-modal Representations.
CoRR, 2023

Guide3D: Create 3D Avatars from Text and Image Guidance.
CoRR, 2023

SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension.
CoRR, 2023

GET3D-: Learning GET3D from Unconstrained Image Collections.
CoRR, 2023

Planting a SEED of Vision in Large Language Model.
CoRR, 2023

Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation.
CoRR, 2023

NOFA: NeRF-based One-shot Facial Avatar Reconstruction.
CoRR, 2023

DeSRA: Detect and Delete the Artifacts of GAN-based Real-World Super-Resolution Models.
CoRR, 2023

DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models.
CoRR, 2023

ID-Pose: Sparse-view Camera Pose Estimation by Inverting Diffusion Models.
CoRR, 2023

DreamDiffusion: Generating High-Quality Images from Brain EEG Signals.
CoRR, 2023

PTVD: A Large-Scale Plot-Oriented Multimodal Dataset Based on Television Dramas.
CoRR, 2023

Towards Unseen Triples: Effective Text-Image-joint Learning for Scene Graph Generation.
CoRR, 2023

TaCA: Upgrading Your Visual Foundation Model with Task-agnostic Compatible Adapter.
CoRR, 2023

InstructP2P: Learning to Edit 3D Point Clouds with Text Instructions.
CoRR, 2023

Sticker820K: Empowering Interactive Retrieval with Stickers.
CoRR, 2023

PanoGRF: Generalizable Spherical Radiance Fields for Wide-baseline Panoramas.
CoRR, 2023

Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance.
CoRR, 2023

Inserting Anybody in Diffusion Models via Celeb Basis.
CoRR, 2023

Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models.
CoRR, 2023

TaleCrafter: Interactive Story Visualization with Multiple Characters.
CoRR, 2023

TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at Scale.
CoRR, 2023

A Confidence-based Partial Label Learning Model for Crowd-Annotated Named Entity Recognition.
CoRR, 2023

What Makes for Good Visual Tokenizers for Large Language Models?
CoRR, 2023

SparseGNV: Generating Novel Views of Indoor Scenes with Sparse Input Views.
CoRR, 2023

HOSNeRF: Dynamic Human-Object-Scene Neural Radiance Fields from a Single Video.
CoRR, 2023

NeAI: A Pre-convoluted Representation for Plug-and-Play Neural Ambient Illumination.
CoRR, 2023

TagGPT: Large Language Models are Zero-shot Multimodal Taggers.
CoRR, 2023

Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos.
CoRR, 2023

DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models.
CoRR, 2023

VMesh: Hybrid Volume-Mesh Representation for Efficient View Synthesis.
CoRR, 2023

BoPR: Body-aware Part Regressor for Human Shape and Pose Estimation.
CoRR, 2023

HMC: Hierarchical Mesh Coarsening for Skeleton-free Motion Retargeting.
CoRR, 2023

FateZero: Fusing Attentions for Zero-shot Text-based Video Editing.
CoRR, 2023

T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models.
CoRR, 2023

Masked Visual Reconstruction in Language Semantic Space.
CoRR, 2023

Anti-Aliased Neural Implicit Surfaces with Encoding Level of Detail.
Proceedings of the SIGGRAPH Asia 2023 Conference Papers, 2023

Neural Point-based Volumetric Avatar: Surface-guided Neural Points for Efficient and Photorealistic Volumetric Head Avatar.
Proceedings of the SIGGRAPH Asia 2023 Conference Papers, 2023

VMesh: Hybrid Volume-Mesh Representation for Efficient View Synthesis.
Proceedings of the SIGGRAPH Asia 2023 Conference Papers, 2023

Interactive Story Visualization with Multiple Characters.
Proceedings of the SIGGRAPH Asia 2023 Conference Papers, 2023

NOFA: NeRF-based One-shot Facial Avatar Reconstruction.
Proceedings of the ACM SIGGRAPH 2023 Conference Proceedings, 2023

NeRF-Texture: Texture Synthesis with Neural Radiance Fields.
Proceedings of the ACM SIGGRAPH 2023 Conference Proceedings, 2023

Inserting Anybody in Diffusion Models via Celeb Basis.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Exploiting Contextual Objects and Relations for 3D Visual Grounding.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

CL-NeRF: Continual Learning of Neural Radiance Fields for Evolving Scene Representation.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Meta-Adapter: An Online Few-shot Learner for Vision-Language Model.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

PanoGRF: Generalizable Spherical Radiance Fields for Wide-baseline Panoramas.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

VTLayout: A Multi-Modal Approach for Video Text Layout.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Toward Human Perception-Centric Video Thumbnail Generation.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Binary Embedding-based Retrieval at Tencent.
Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023

SGAT4PASS: Spherical Geometry-Aware Transformer for PAnoramic Semantic Segmentation.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

DeSRA: Detect and Delete the Artifacts of GAN-based Real-World Super-Resolution Models.
Proceedings of the International Conference on Machine Learning, 2023

π-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation.
Proceedings of the International Conference on Machine Learning, 2023

Do We Really Need Temporal Convolutions in Action Segmentation?
Proceedings of the IEEE International Conference on Multimedia and Expo, 2023

Masked Image Modeling with Denoising Contrast.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Speech2Lip: High-fidelity Speech to Lip Generation by Learning from a Short Video.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

FateZero: Fusing Attentions for Zero-shot Text-based Video Editing.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Order-Prompted Tag Sequence Generation for Video Tagging.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

HOSNeRF: Dynamic Human-Object-Scene Neural Radiance Fields from a Single Video.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Exploring Model Transferability through the Lens of Potential Energy.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

OmniZoomer: Learning to Move and Zoom in on Sphere at High-Resolution.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Enhancing the Vocal Range of Single-Speaker Singing Voice Synthesis with Melody-Unsupervised Pre-Training.
Proceedings of the IEEE International Conference on Acoustics, 2023

ERBNet: An Effective Representation Based Network for Unbiased Scene Graph Generation.
Proceedings of the IEEE International Conference on Acoustics, 2023

LayoutDiffusion: Controllable Diffusion Model for Layout-to-Image Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Learning Anchor Transformations for 3D Garment Animation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Skinned Motion Retargeting with Residual Perception of Motion Semantics & Geometry.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

OSRT: Omnidirectional Image Super-Resolution with Distortion-aware Transformer.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

3D GAN Inversion with Facial Symmetry Prior.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

RILS: Masked Visual Reconstruction in Language Semantic Space.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

DropMAE: Masked Autoencoders with Spatial-Attention Dropout for Tracking Tasks.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Accelerating Vision-Language Pretraining with Free Language Modeling.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

All in One: Exploring Unified Video-Language Pre-Training.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

DPE: Disentanglement of Pose and Expression for General Video Portrait Editing.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Improved Test-Time Adaptation for Domain Generalization.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

ViLEM: Visual-Language Error Modeling for Image-Text Retrieval.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Local-to-Global Registration for Bundle-Adjusting Neural Radiance Fields.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023


High-fidelity Facial Avatar Reconstruction from Monocular Video with Generative Priors.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

HRDFuse: Monocular 360° Depth Estimation by Collaboratively Learning Holistic-with-Regional Depth Distributions.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

SurfelNeRF: Neural Surfel Radiance Fields for Online Photorealistic Reconstruction of Indoor Scenes.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Characterizing the Impacts of Instances on Robustness.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

A Confidence-based Partial Label Learning Model for Crowd-Annotated Named Entity Recognition.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

On the Universal Adversarial Perturbations for Efficient Data-free Adversarial Detection.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

DSRM: Boost Textual Adversarial Training with Distribution Shift Risk Minimization.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Darwinian Model Upgrades: Model Evolving with Selective Compatibility.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

What Does Your Face Sound Like? 3D Face Shape towards Voice.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

Mitigating Artifacts in Real-World Video Super-resolution Models.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

Accelerating the Training of Video Super-resolution Models.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

Tagging before Alignment: Integrating Multi-Modal Tags for Video-Text Retrieval.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Depth-Aware Shadow Removal.
Comput. Graph. Forum, October, 2022

Hybrid Warping Fusion for Video Frame Interpolation.
Int. J. Comput. Vis., 2022

Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation.
CoRR, 2022

Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis.
CoRR, 2022

Latent Video Diffusion Models for High-Fidelity Video Generation with Arbitrary Lengths.
CoRR, 2022

Local-to-Global Registration for Bundle-Adjusting Neural Radiance Fields.
CoRR, 2022

Vis2Mus: Exploring Multimodal Representation Mapping for Controllable Music Generation.
CoRR, 2022

MonoNeuralFusion: Online Monocular Neural 3D Reconstruction with Geometric Priors.
CoRR, 2022

Music-driven Dance Regeneration with Controllable Key Pose Constraints.
CoRR, 2022

Self-Supervised Learning of Music-Dance Representation through Explicit-Implicit Rhythm Synchronization.
CoRR, 2022

Weakly-supervised Action Localization via Hierarchical Mining.
CoRR, 2022

Efficient U-Transformer with Boundary-Aware Loss for Action Segmentation.
CoRR, 2022

Privacy-Preserving Model Upgrades with Bidirectional Compatible Training in Image Retrieval.
CoRR, 2022

MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval.
CoRR, 2022

CREATE: A Benchmark for Chinese Short Video Retrieval and Title Generation.
CoRR, 2022

Revitalize Region Feature for Democratizing Video-Language Pre-training.
CoRR, 2022

All in One: Exploring Unified Video-Language Pre-training.
CoRR, 2022

Hot-Refresh Model Upgrades with Regression-Alleviating Compatible Training in Image Retrieval.
CoRR, 2022

BridgeFormer: Bridging Video-text Retrieval with Multiple Choice Questions.
CoRR, 2022

AnimeSR: Learning Real-World Super-Resolution Models for Animation Videos.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

DeVRF: Fast Deformable Voxel Radiance Fields for Dynamic Scenes.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

RepSR: Training Efficient VGG-style Super-Resolution Networks with Structural Re-Parameterization and Batch Normalization.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

PC-Dance: Posture-controllable Music-driven Dance Synthesis.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

A Hierarchical Speaker Representation Framework for One-shot Singing Voice Conversion.
Proceedings of the Interspeech 2022, 2022

Towards Universal Backward-Compatible Representation Learning.
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

Convolutional Transformer with Similarity-based Boundary Prediction for Action Segmentation.
Proceedings of the 34th IEEE International Conference on Tools with Artificial Intelligence, 2022

Hot-Refresh Model Upgrades with Regression-Free Compatible Training in Image Retrieval.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Dynamic Token Normalization improves Vision Transformers.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Uncertainty Modeling for Out-of-Distribution Generalization.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Audio-To-Symbolic Arrangement Via Cross-Modal Music Representation Learning.
Proceedings of the IEEE International Conference on Acoustics, 2022

Not All Models Are Equal: Predicting Model Transferability in a Self-challenging Fisher Space.
Proceedings of the Computer Vision - ECCV 2022, 2022

Metric Learning Based Interactive Modulation for Real-World Super-Resolution.
Proceedings of the Computer Vision - ECCV 2022, 2022

mc-BEiT: Multi-choice Discretization for Image BERT Pre-training.
Proceedings of the Computer Vision - ECCV 2022, 2022

VQFR: Blind Face Restoration with Vector-Quantized Dictionary and Parallel Decoder.
Proceedings of the Computer Vision - ECCV 2022, 2022

MILES: Visual BERT Pre-training with Injected Language Semantics for Video-Text Retrieval.
Proceedings of the Computer Vision - ECCV 2022, 2022

Temporally Efficient Vision Transformer for Video Instance Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

BTS: A Bi-lingual Benchmark for Text Segmentation in the Wild.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

VFHQ: A High-Quality Dataset and Benchmark for Video Face Super-Resolution.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022

Object-aware Video-language Pre-training for Retrieval.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Bridging Video-text Retrieval with Multiple Choice Questions.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Robust Human Matting via Semantic Guidance.
Proceedings of the Computer Vision - ACCV 2022, 2022

2021
High-Accuracy Guide Star Catalogue Generation with a Machine Learning Classification Algorithm.
Sensors, 2021

Dynamic Token Normalization Improves Vision Transformer.
CoRR, 2021

Tracking Instances as Queries.
CoRR, 2021

A Generic Object Re-identification System for Short Videos.
CoRR, 2021

Finding Discriminative Filters for Specific Degradations in Blind Super-Resolution.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Semantic-Guided Relation Propagation Network for Few-shot Action Recognition.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

TransFusion: Multi-Modal Fusion for Video Tag Inference via Translation-based Knowledge Embedding.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Cross-modal Consensus Network for Weakly Supervised Temporal Action Localization.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data.
Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2021

Enforcing Temporal Consistency in Video Depth Estimation.
Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2021

Crossover Learning for Fast Online Video Instance Segmentation.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Towards Vivid and Diverse Image Colorization with Generative Color Prior.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Instances as Queries.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Open-Book Video Captioning With Retrieve-Copy-Generate Network.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Towards Real-World Blind Face Restoration With Generative Facial Prior.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Distilling Audio-Visual Knowledge by Compositional Contrastive Learning.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020
Towards Interaction Detection Using Topological Analysis on Neural Networks.
CoRR, 2020

A Simple Yet Effective Method for Video Temporal Grounding with Cross-Modality Attention.
CoRR, 2020

Detecting Interactions from Neural Networks via Topological Analysis.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Dual Semantic Fusion Network for Video Object Detection.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Feature Augmented Memory with Global Attention Network for VideoQA.
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

Fast Video Object Segmentation Using the Global Context Module.
Proceedings of the Computer Vision - ECCV 2020, 2020

2019
Overview of the NLPCC 2019 Shared Task: Open Domain Conversation Evaluation.
Proceedings of the Natural Language Processing and Chinese Computing, 2019

2018
Recurrent Binary Embedding for GPU-Enabled Exhaustive Retrieval from Billion-Scale Semantic Vectors.
Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018

2017
Deep Embedding Forest: Forest-based Serving with Deep Embedding Features.
Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, August 13, 2017

2016
Deep Crossing: Web-Scale Modeling without Manually Crafted Combinatorial Features.
Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016

2013
An Empirical Research on Designing and Promoting the Brand Logo of Yangshan Shuimi Peaches Based on the Theory of Brand Experience.
Proceedings of the Cross-Cultural Design. Methods, Practice, and Case Studies, 2013

2010
Internet Vision.
Proc. IEEE, 2010

2009
Kernel PCA Regression for Missing Data Estimation in DNA Microarray Analysis.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2009), 2009

NSM: A Security Mechanism for Object-Based Storage System.
Proceedings of the CSIE 2009, 2009 WRI World Congress on Computer Science and Information Engineering, March 31, 2009

Efficient Scale-Space Spatiotemporal Saliency Tracking for Distortion-Free Video Retargeting.
Proceedings of the Computer Vision, 2009

2008
Unsupervised Learning of Discriminative Edge Measures for Vehicle Matching between Nonoverlapping Cameras.
IEEE Trans. Pattern Anal. Mach. Intell., 2008

Solving partial differential equations on irregular domains with moving interfaces, with applications to superconformal electrodeposition in semiconductor manufacturing.
J. Comput. Phys., 2008

Discovering class specific composite features through discriminative sampling with Swendsen-Wang Cut.
Proceedings of the 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2008), 2008

2007
Robust Object Matching for Persistent Tracking with Heterogeneous Features.
IEEE Trans. Pattern Anal. Mach. Intell., 2007

PEET: Prototype Embedding and Embedding Transition for Matching Vehicles over Disparate Viewpoints.
Proceedings of the 2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2007), 2007

2006
Shapeme Histogram Projection and Matching for Partial Object Recognition.
IEEE Trans. Pattern Anal. Mach. Intell., 2006

Rapid Object Indexing Using Locality Sensitive Hashing and Joint 3D-Signature Space Estimation.
IEEE Trans. Pattern Anal. Mach. Intell., 2006

Learning Exemplar-Based Categorization for the Detection of Multi-View Multi-Pose Objects.
Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), 2006

2005
Clustering multiple image sequences with a sequence-to-sequence similarity measure.
Int. J. Pattern Recognit. Artif. Intell., 2005

Vehicle Identification between Non-Overlapping Cameras without Direct Feature Matching.
Proceedings of the 10th IEEE International Conference on Computer Vision (ICCV 2005), 2005

Unsupervised Learning of Discriminative Edge Measures for Vehicle Matching between Non-Overlapping Cameras.
Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), 2005

Vehicle Fingerprinting for Reacquisition and Tracking in Videos.
Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), 2005

2004
Robust and Rapid Generation of Animated Faces from Video Images: A Model-Based Modeling Approach.
Int. J. Comput. Vis., 2004

Image-Based Surface Detail Transfer.
IEEE Computer Graphics and Applications, 2004

Partial Object Matching with Shapeme Histograms.
Proceedings of the Computer Vision, 2004

Linear Model Hashing and Batch RANSAC for Rapid and Accurate Object Recognition.
Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2004), with CD-ROM, 27 June, 2004

2003
Incremental motion estimation through modified bundle adjustment.
Proceedings of the 2003 International Conference on Image Processing, 2003

2002
New Measurements and Corner-Guidance for Curve Matching with Probabilistic Relaxation.
Int. J. Comput. Vis., 2002

2001
Expressive expression mapping with ratio images.
Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, 2001

Visual panel: virtual mouse, keyboard and 3D controller with an ordinary piece of paper.
Proceedings of the 2001 workshop on Perceptive user interfaces, 2001

Cloning Your Own Face with a Desktop Camera.
Proceedings of the Eighth International Conference On Computer Vision (ICCV-01), Vancouver, British Columbia, Canada, July 7-14, 2001, 2001

Model-Based Bundle Adjustment with Application to Face Modeling.
Proceedings of the Eighth International Conference On Computer Vision (ICCV-01), Vancouver, British Columbia, Canada, July 7-14, 2001, 2001

2000
A Progressive Scheme for Stereo Matching.
Proceedings of the 3D Structure from Images, 2000

Visual Screen: Transforming an Ordinary Screen into a Touch Screen.
Proceedings of the IAPR Conference on Machine Vision Applications (IAPR MVA 2000), 2000

Curve Matching with Probabilistic Relaxation.
Proceedings of the IAPR Conference on Machine Vision Applications (IAPR MVA 2000), 2000

Corner Guided Curve Matching and its Application to Scene Reconstruction.
Proceedings of the 2000 Conference on Computer Vision and Pattern Recognition (CVPR 2000), 2000


  Loading...