Weidi Xie

Orcid: 0000-0003-3804-2639

According to our database1, Weidi Xie authored at least 116 papers between 2017 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Towards Building Multilingual Language Model for Medicine.
CoRR, 2024

InstaGen: Enhancing Object Detection by Training on Synthetic Dataset.
CoRR, 2024

Synchformer: Efficient Synchronization from Sparse Cues.
CoRR, 2024

Retrieval-Augmented Egocentric Video Captioning.
CoRR, 2024

Annotation-free Audio-Visual Segmentation.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

2023
Self-Supervised Tumor Segmentation With Sim2Real Adaptation.
IEEE J. Biomed. Health Informatics, September, 2023

Aerial Monocular 3D Object Detection.
IEEE Robotics Autom. Lett., April, 2023

Amodal Ground Truth and Completion in the Wild.
CoRR, 2023

One Model to Rule them All: Towards Universal Segmentation for Medical Images with Text Prompts.
CoRR, 2023

Large-scale Long-tailed Disease Diagnosis on Radiology Images.
CoRR, 2023

A Strong Baseline for Temporal Video-Text Alignment.
CoRR, 2023

Appearance-based Refinement for Object-Centric Motion Segmentation.
CoRR, 2023

Grounded Question-Answering in Long Egocentric Videos.
CoRR, 2023

Can GPT-4V(ision) Serve Medical Applications? Case Studies on GPT-4V for Multimodal Medical Diagnosis.
CoRR, 2023

What Does Stable Diffusion Know about the 3D Scene?
CoRR, 2023

A Large-scale Dataset for Audio-Language Representation Learning.
CoRR, 2023

UniBrain: Universal Brain MRI Diagnosis with Hierarchical Knowledge-enhanced Pre-training.
CoRR, 2023

Diagnosing Human-object Interaction Detectors.
CoRR, 2023

Towards Generalist Foundation Model for Radiology.
CoRR, 2023

arXiVeri: Automatic table verification with GPT.
CoRR, 2023

Intelligent Grimm - Open-ended Visual Storytelling via Latent Diffusion Models.
CoRR, 2023

Annotation-free Audio-Visual Segmentation.
CoRR, 2023

PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering.
CoRR, 2023

PMC-LLaMA: Further Finetuning LLaMA on Medical Papers.
CoRR, 2023

Multi-modal Prompting for Low-Shot Temporal Action Localization.
CoRR, 2023

Knowledge-enhanced Pre-training for Auto-diagnosis of Chest Radiology Images.
CoRR, 2023

K-Diag: Knowledge-enhanced Disease Diagnosis in Radiographic Imaging.
CoRR, 2023

Guiding Text-to-Image Diffusion Model Towards Grounded Generation.
CoRR, 2023

MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training.
CoRR, 2023

Self-supervised Object-Centric Learning for Videos.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Deep Facial Phenotyping with Mixup Augmentation.
Proceedings of the Medical Image Understanding and Analysis - 27th Annual Conference, 2023

PMC-CLIP: Contrastive Language-Image Pre-training Using Biomedical Documents.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2023, 2023

Multi-Modal Classifiers for Open-Vocabulary Object Detection.
Proceedings of the International Conference on Machine Learning, 2023

Joint-Relation Transformer for Multi-Person Motion Prediction.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training for X-ray Diagnosis.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Towards Open-Vocabulary Video Instance Segmentation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Open-vocabulary Object Segmentation with Diffusion Models.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

The Making and Breaking of Camouflage.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

AutoAD II: The Sequel - Who, When, and What in Movie Audio Description.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Cali-NCE: Boosting Cross-modal Video Representation Learning with Calibrated Alignment.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Learning Open-Vocabulary Semantic Segmentation Models From Natural Language Supervision.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

NamedMask: Distilling Segmenters from Complementary Foundation Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Zero-shot Unsupervised Transfer Instance Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Collaboration Helps Camera Overtake LiDAR in 3D Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

AutoAD: Movie Description in Context.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

OvarNet: Towards Open-Vocabulary Object Attribute Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Boost Video Frame Interpolation via Motion Adaptation.
Proceedings of the 34th British Machine Vision Conference 2023, 2023

Zero-shot Composed Text-Image Retrieval.
Proceedings of the 34th British Machine Vision Conference 2023, 2023

2022
Subcortical segmentation of the fetal brain in 3D ultrasound using deep learning.
NeuroImage, 2022

Motion-inductive Self-supervised Object Discovery in Videos.
CoRR, 2022

K-Space Transformer for Fast MRI Reconstruction with Implicit Representation.
CoRR, 2022

PromptDet: Expand Your Detector Vocabulary with Uncurated Images.
CoRR, 2022

Segmenting Moving Objects via an Object-Centric Layered Representation.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

ReCo: Retrieve and Co-segment for Zero-shot Transfer.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Associating Objects and Their Effects in Video through Coordination Games.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Exploiting Transformation Invariance and Equivariance for Self-supervised Sound Localisation.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Adaptive 3D Localization of 2D Freehand Ultrasound Brain Images.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2022, 2022

Transforming the Interactive Segmentation for Medical Imaging.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2022, 2022

Prompting Visual-Language Models for Efficient Video Understanding.
Proceedings of the Computer Vision - ECCV 2022, 2022

PromptDet: Towards Open-Vocabulary Detection Using Uncurated Images.
Proceedings of the Computer Vision - ECCV 2022, 2022

It's About Time: Analog Clock Reading in the Wild.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Unsupervised Salient Object Detection with Spectral Cluster Voting.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022

Label, Verify, Correct: A Simple Few Shot Object Detection Method.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Temporal Alignment Networks for Long-term Video.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

A Simple Plugin for Transforming Images to Arbitrary Scales.
Proceedings of the 33rd British Machine Vision Conference 2022, 2022

K-Space Transformer for Undersampled MRI Reconstruction.
Proceedings of the 33rd British Machine Vision Conference 2022, 2022

A Tri-Layer Plugin to Improve Occluded Detection.
Proceedings of the 33rd British Machine Vision Conference 2022, 2022

Open-vocabulary Semantic Segmentation with Frozen Vision-Language Models.
Proceedings of the 33rd British Machine Vision Conference 2022, 2022

CounTR: Transformer-based Generalised Visual Counting.
Proceedings of the 33rd British Machine Vision Conference 2022, 2022

Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors.
Proceedings of the 33rd British Machine Vision Conference 2022, 2022

Turbo Training with Token Dropout.
Proceedings of the 33rd British Machine Vision Conference 2022, 2022

2021
Learning to map 2D ultrasound images into 3D space with minimal human annotation.
Medical Image Anal., 2021

ImplicitVol: Sensorless 3D Ultrasound Reconstruction with Deep Implicit Representation.
CoRR, 2021

Self-supervised Tumor Segmentation through Layer Decomposition.
CoRR, 2021

Quantum Self-Supervised Learning.
CoRR, 2021

NeRF-: Neural Radiance Fields Without Known Camera Parameters.
CoRR, 2021

Sli2Vol: Annotate a 3D Volume from a Single Slice with Self-supervised Learning.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2021 - 24th International Conference, Strasbourg, France, September 27, 2021

All you need are a few pixels: semantic segmentation with PixelPick.
Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2021

Self-supervised Video Object Segmentation by Motion Grouping.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Localizing Visual Sounds the Hard Way.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Segmenting Invisible Moving Objects.
Proceedings of the 32nd British Machine Vision Conference 2021, 2021

Audio-Visual Synchronisation in the wild.
Proceedings of the 32nd British Machine Vision Conference 2021, 2021

2020
Layered neural rendering for retiming people in video.
ACM Trans. Graph., 2020

Low-Memory CNNs Enabling Real-Time Ultrasound Segmentation Towards Mobile Deployment.
IEEE J. Biomed. Health Informatics, 2020

Voxceleb: Large-scale speaker verification in the wild.
Comput. Speech Lang., 2020

VoxSRC 2020: The Second VoxCeleb Speaker Recognition Challenge.
CoRR, 2020

Inducing Predictive Uncertainty Estimation for Face Recognition.
CoRR, 2020

Self-supervised Video Object Segmentation.
CoRR, 2020

Self-supervised Co-Training for Video Representation Learning.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Vggsound: A Large-Scale Audio-Visual Dataset.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Memory-Augmented Dense Predictive Coding for Video Representation Learning.
Proceedings of the Computer Vision - ECCV 2020, 2020

Smooth-AP: Smoothing the Path Towards Large-Scale Image Retrieval.
Proceedings of the Computer Vision - ECCV 2020, 2020

MAST: A Memory-Augmented Self-Supervised Tracker.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Inducing Predictive Uncertainty Estimation for Face Verification.
Proceedings of the 31st British Machine Vision Conference 2020, 2020

Betrayed by Motion: Camouflaged Object Discovery via Motion Segmentation.
Proceedings of the Computer Vision - ACCV 2020 - 15th Asian Conference on Computer Vision, Kyoto, Japan, November 30, 2020

2019
VoxSRC 2019: The first VoxCeleb Speaker Recognition Challenge.
CoRR, 2019

Self-supervised Learning for Video Correspondence Flow.
CoRR, 2019

Video Representation Learning by Dense Predictive Coding.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshops, 2019

Utterance-level Aggregation for Speaker Recognition in the Wild.
Proceedings of the IEEE International Conference on Acoustics, 2019

Geometry-Aware Video Object Detection for Static Cameras.
Proceedings of the 30th British Machine Vision Conference 2019, 2019

Self-supervised Video Representation Learning for Correspondence Flow.
Proceedings of the 30th British Machine Vision Conference 2019, 2019

AutoCorrect: Deep Inductive Alignment of Noisy Geometric Annotations.
Proceedings of the 30th British Machine Vision Conference 2019, 2019

2018
Ω-Net (Omega-Net): Fully automatic, multi-view cardiac MR detection, orientation, and segmentation with deep neural networks.
Medical Image Anal., 2018

Fully-automated alignment of 3D fetal brain ultrasound to a canonical reference space using multi-task learning.
Medical Image Anal., 2018

VP-Nets : Efficient automatic localization of key brain structures in 3D fetal neurosonography.
Medical Image Anal., 2018

Microscopy cell counting and detection with fully convolutional regression networks.
Comput. methods Biomech. Biomed. Eng. Imaging Vis., 2018

Can Dilated Convolutions Capture Ultrasound Video Dynamics?
Proceedings of the Machine Learning in Medical Imaging - 9th International Workshop, 2018

VGGFace2: A Dataset for Recognising Faces across Pose and Age.
Proceedings of the 13th IEEE International Conference on Automatic Face & Gesture Recognition, 2018

Comparator Networks.
Proceedings of the Computer Vision - ECCV 2018, 2018

Multicolumn Networks for Face Recognition.
Proceedings of the British Machine Vision Conference 2018, 2018

Class-Agnostic Counting.
Proceedings of the Computer Vision - ACCV 2018, 2018

2017
Deep neural networks in computer vision and biomedical image analysis.
PhD thesis, 2017

Omega-Net: Fully Automatic, Multi-View Cardiac MR Detection, Orientation, and Segmentation with Deep Neural Networks.
CoRR, 2017

Robust Regression of Brain Maturation from 3D Fetal Neurosonography Using CRNs.
Proceedings of the Fetal, Infant and Ophthalmic Medical Image Analysis, 2017

Freehand Ultrasound Image Simulation with Spatially-Conditioned Generative Adversarial Networks.
Proceedings of the Molecular Imaging, Reconstruction and Analysis of Moving Body Organs, and Stroke Imaging and Treatment, 2017

Feature Tracking Cardiac Magnetic Resonance via Deep Learning and Spline Optimization.
Proceedings of the Functional Imaging and Modelling of the Heart, 2017


  Loading...