Kai Chen

Orcid: 0000-0002-6820-2325

Affiliations:
  • Shanghai AI Laboratory, Guangzhou, China
  • SenseTime Research, Hong Kong
  • Chinese University of Hong Kong, MMLab, Hong Kong (PhD 2019)


According to our database1, Kai Chen authored at least 186 papers between 2017 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Dissecting Tool-Integrated Reasoning: An Empirical Study and Analysis.
CoRR, August, 2025

Efficient Mixed-Precision Large Language Model Inference with TurboMind.
CoRR, August, 2025

InternBootcamp Technical Report: Boosting LLM Reasoning with Verifiable Task Scaling.
CoRR, August, 2025

CharacterShot: Controllable and Consistent 4D Character Animation.
CoRR, August, 2025

IFDECORATOR: Wrapping Instruction Following Reinforcement Learning with Verifiable Rewards.
CoRR, August, 2025

CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward.
CoRR, August, 2025

Reconsidering Overthinking: Penalizing Internal and External Redundancy in CoT Reasoning.
CoRR, August, 2025

Language-Aware Vision Transformer for Referring Segmentation.
IEEE Trans. Pattern Anal. Mach. Intell., July, 2025

Smooth Reading: Bridging the Gap of Recurrent LLM to Self-Attention LLM on Long-Context Tasks.
CoRR, July, 2025

Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning.
CoRR, July, 2025

MotionShot: Adaptive Motion Transfer across Arbitrary Objects for Text-to-Video Generation.
CoRR, July, 2025

The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner.
CoRR, July, 2025

CompassJudger-2: Towards Generalist Judge Model via Verifiable Rewards.
CoRR, July, 2025

Rethinking Verification for LLM Code Generation: From Generation to Testing.
CoRR, July, 2025

Coding Triangle: How Does Large Language Model Understand Code?
CoRR, July, 2025

Pre-Trained Policy Discriminators are General Reward Models.
CoRR, July, 2025

OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems.
CoRR, June, 2025

Benchmarking and Improving Bird's Eye View Perception Robustness in Autonomous Driving.
IEEE Trans. Pattern Anal. Mach. Intell., May, 2025

Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving.
IEEE Trans. Pattern Anal. Mach. Intell., May, 2025

MedSentry: Understanding and Mitigating Safety Risks in Medical LLM Multi-Agent Systems.
CoRR, May, 2025

Deciphering Trajectory-Aided LLM Reasoning: An Optimization Perspective.
CoRR, May, 2025

MIG: Automatic Data Selection for Instruction Tuning by Maximizing Information Gain in Semantic Space.
CoRR, April, 2025

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.
CoRR, April, 2025

RIG: Synergizing Reasoning and Imagination in End-to-End Generalist Policy.
CoRR, March, 2025

LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning?
CoRR, March, 2025

SuperFlow++: Enhanced Spatiotemporal Consistency for Cross-Modal Data Pretraining.
CoRR, March, 2025

Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM.
CoRR, March, 2025

MDTeamGPT: A Self-Evolving LLM-based Multi-Agent Framework for Multi-Disciplinary Team Medical Consultation.
CoRR, March, 2025

Information Density Principle for MLLM Benchmarks.
CoRR, March, 2025

CritiQ: Mining Data Quality Criteria from Human Preferences.
CoRR, February, 2025

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference.
CoRR, February, 2025

UnitCoder: Scalable Iterative Code Synthesis with Unit Test Guidance.
CoRR, February, 2025

Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning.
CoRR, February, 2025

Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation.
CoRR, January, 2025

InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling.
CoRR, January, 2025

Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement.
CoRR, January, 2025

LargeAD: Large-Scale Cross-Sensor Data Pretraining for Autonomous Driving.
CoRR, January, 2025

Quantum circuit mapping based on discrete particle swarm optimization and deep reinforcement learning.
Swarm Evol. Comput., 2025

Calib3D: Calibrating Model Preferences for Reliable 3D Scene Understanding.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

RMP-SAM: Towards Real-Time Multi-Purpose Segment Anything.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

MindSearch: Mimicking Human Minds Elicits Deep AI Searcher.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

FaceShot: Bring Any Character into Life.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

A Self-Evolving Framework for Multi-Agent Medical Consultation Based on Large Language Models.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Prompting Large Language Models to Tackle the Full Software Development Lifecycle: A Case Study.
Proceedings of the 31st International Conference on Computational Linguistics, 2025

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Redundancy Principles for MLLMs Benchmarks.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Scaling up the State Size of RNN LLMs for Long-Context Scenarios.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

CritiQ: Mining Data Quality Criteria from Human Preferences.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Capability Salience Vector: Fine-grained Alignment of Loss and Capabilities for Downstream Task Scaling Law.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

What are the Essential Factors in Crafting Effective Long Context Multi-Hop Instruction Datasets? Insights and Best Practices.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024
Transformer-Based Visual Segmentation: A Survey.
IEEE Trans. Pattern Anal. Mach. Intell., December, 2024

PixMIM: Rethinking Pixel Reconstruction in Masked Image Modeling.
Trans. Mach. Learn. Res., 2024

A Novel Contrastive Learning Model for Aerial Images.
IEEE Geosci. Remote. Sens. Lett., 2024

Are Your LLMs Capable of Stable Reasoning?
CoRR, 2024

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions.
CoRR, 2024

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling.
CoRR, 2024

CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution.
CoRR, 2024

InternLM2.5-StepProver: Advancing Automated Theorem Proving via Expert Iteration on Large-Scale LEAN Problems.
CoRR, 2024

Training Language Models to Critique With Multi-agent Feedback.
CoRR, 2024

HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models.
CoRR, 2024

What are the Essential Factors in Crafting Effective Long Context Multi-Hop Instruction Datasets? Insights and Best Practices.
CoRR, 2024

LEAN-GitHub: Compiling GitHub LEAN repositories for a versatile LEAN prover.
CoRR, 2024

NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?
CoRR, 2024

CIBench: Evaluating Your LLMs with a Code Interpreter Plugin.
CoRR, 2024

Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models.
CoRR, 2024

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.
CoRR, 2024

FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds.
CoRR, 2024

StyleShot: A Snapshot on Any Style.
CoRR, 2024

MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning.
CoRR, 2024

InternLM-Law: An Open Source Chinese Legal Large Language Model.
CoRR, 2024

Sagiri: Low Dynamic Range Image Enhancement with Generative Diffusion Prior.
CoRR, 2024

An Empirical Study of Training State-of-the-Art LiDAR Segmentation Models.
CoRR, 2024

The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition.
CoRR, 2024

Adapting LLaMA Decoder to Vision Transformer.
CoRR, 2024

InternLM2 Technical Report.
CoRR, 2024

DevBench: A Comprehensive Benchmark for Software Development.
CoRR, 2024

CriticBench: Evaluating Large Language Models as Critic.
CoRR, 2024

InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning.
CoRR, 2024

InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model.
CoRR, 2024

OMG-Seg: Is One Model Good Enough For All Segmentation?
CoRR, 2024

RAP-SAM: Towards Real-Time All-Purpose Segment Anything.
CoRR, 2024

HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance.
CoRR, 2024

Amphion: an Open-Source Audio, Music, and Speech Generation Toolkit.
Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Emilia: An Extensive, Multilingual, and Diverse Speech Dataset For Large-Scale Speech Generation.
Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Lean Workbook: A large-scale Lean problem set formalized from natural language math problems.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

MotionBooth: Motion-Aware Customized Text-to-Video Generation.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

GTA: A Benchmark for General Tool Agents.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

AlchemistCoder: Harmonizing and Eliciting Code Capability by Hindsight Tuning on Multi-source Data.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Efficient LLM Jailbreak via Adaptive Dense-to-sparse Constrained Optimization.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

CriticEval: Evaluating Large-scale Language Model as Critic.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

BotChat: Evaluating LLMs' Capabilities of Having Multi-Turn Dialogues.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

VLMEvalKit: An Open-Source ToolKit for Evaluating Large Multi-Modality Models.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Differentiable Model Scaling using Differentiable Topk.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Can AI Assistants Know What They Don't Know?
Proceedings of the Forty-first International Conference on Machine Learning, 2024

ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Scaling Behavior for Large Language Models regarding Numeral Systems: An Example using Pythia.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

LawBench: Benchmarking Legal Knowledge of Large Language Models.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

A Task Is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting.
Proceedings of the Computer Vision - ECCV 2024, 2024

ScanReason: Empowering 3D Visual Grounding with Reasoning Capabilities.
Proceedings of the Computer Vision - ECCV 2024, 2024

GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest.
Proceedings of the Computer Vision - ECCV 2024 Workshops, 2024

Open-Vocabulary SAM: Segment and Recognize Twenty-Thousand Classes Interactively.
Proceedings of the Computer Vision - ECCV 2024, 2024

4D Contrastive Superflows are Dense 3D Representation Learners.
Proceedings of the Computer Vision - ECCV 2024, 2024

AnyControl: Create Your Artwork with Versatile Control on Text-to-Image Generation.
Proceedings of the Computer Vision - ECCV 2024, 2024

MMBench: Is Your Multi-modal Model an All-Around Player?
Proceedings of the Computer Vision - ECCV 2024, 2024

PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Towards Language-Driven Video Inpainting via Multimodal Large Language Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Make-It-Vivid: Dressing Your Animatable Biped Cartoon Characters from Text.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

OMG-Seg: Is One Model Good Enough for all Segmentation?
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024


MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

ANAH: Analytical Annotation of Hallucinations in Large Language Models.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

LLaST: Improved End-to-end Speech Translation System Leveraged by Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

T-Eval: Evaluating the Tool Utilization Capability of Large Language Models Step by Step.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
T-Eval: Evaluating the Tool Utilization Capability Step by Step.
CoRR, 2023

Amphion: An Open-Source Audio, Music and Speech Generation Toolkit.
CoRR, 2023

Mixed Pseudo Labels for Semi-Supervised Object Detection.
CoRR, 2023

Evaluating Hallucinations in Chinese Large Language Models.
CoRR, 2023

DST-Det: Simple Dynamic Self-Training for Open-Vocabulary Object Detection.
CoRR, 2023

LawBench: Benchmarking Legal Knowledge of Large Language Models.
CoRR, 2023

InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition.
CoRR, 2023

Object2Scene: Putting Objects in Context for Open-Vocabulary 3D Detection.
CoRR, 2023

Learning Referring Video Object Segmentation from Weak Annotation.
CoRR, 2023

GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest.
CoRR, 2023

MultiModal-GPT: A Vision and Language Model for Dialogue with Humans.
CoRR, 2023

RoboBEV: Towards Robust Bird's Eye View Perception under Corruptions.
CoRR, 2023

RIFormer: Keep Your Vision Backbone Effective While Removing Token Mixer.
CoRR, 2023

RTMPose: Real-Time Multi-Person Pose Estimation based on MMPose.
CoRR, 2023

Segment Any Point Cloud Sequences by Distilling Vision Foundation Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

TG-VQA: Ternary Game of Video Question Answering.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

Improving Pixel-based MIM by Reducing Wasted Modeling Capability.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Robo3D: Towards Robust and Reliable 3D Perception against Corruptions.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Dense Distinct Query for End-to-End Object Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

RIFormer: Keep Your Vision Backbone Effective But Removing Token Mixer.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
CARAFE++: Unified Content-Aware ReAssembly of FEatures.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

RTMDet: An Empirical Study of Designing Real-Time Object Detectors.
CoRR, 2022

DG-STGCN: Dynamic Spatial-Temporal Modeling for Skeleton-based Action Recognition.
CoRR, 2022

What Are Expected Queries in End-to-End Object Detection?
CoRR, 2022

Dense Siamese Network.
CoRR, 2022

Deliberated Domain Bridging for Domain Adaptive Semantic Segmentation.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

MMRotate: A Rotated Object Detection Benchmark using PyTorch.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

PYSKL: Towards Good Practices for Skeleton Action Recognition.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Dense Siamese Network for Dense Unsupervised Learning.
Proceedings of the Computer Vision - ECCV 2022, 2022

Mitigating Representation Bias in Action Recognition: Algorithms and Benchmarks.
Proceedings of the Computer Vision - ECCV 2022 Workshops, 2022

Group R-CNN for Weakly Semi-supervised Object Detection with Points.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

OCSampler: Compressing Videos to One Clip with Single-step Sampling.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

TransRank: Self-supervised Video Representation Learning via Ranking-based Transformation Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Revisiting Skeleton-based Action Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

LAVT: Language-Aware Vision Transformer for Referring Image Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
Towards Balanced Learning for Instance Recognition.
Int. J. Comput. Vis., 2021

STransGAN: An Empirical Study on Transformer in GANs.
CoRR, 2021

WSSOD: A New Pipeline for Weakly- and Semi-Supervised Object Detection.
CoRR, 2021

Revisiting Skeleton-based Action Recognition.
CoRR, 2021

K-Net: Towards Unified Image Segmentation.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Few-Shot Object Detection via Association and DIscrimination.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and Understanding.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Positional Encoding As Spatial Inductive Bias in GANs.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Seesaw Loss for Long-Tailed Instance Segmentation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Temporal ROI Align for Video Object Recognition.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Feature Pyramid Grids.
CoRR, 2020

Side-Aware Boundary Localization for More Precise Object Detection.
Proceedings of the Computer Vision - ECCV 2020, 2020

Prime Sample Attention in Object Detection.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019
MMDetection: Open MMLab Detection Toolbox and Benchmark.
CoRR, 2019

CARAFE: Content-Aware ReAssembly of FEatures.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Region Proposal by Guided Anchoring.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Libra R-CNN: Towards Balanced Learning for Object Detection.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Hybrid Task Cascade for Instance Segmentation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
Optimizing Video Object Detection via a Scale-Time Lattice.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017
Video Object Segmentation with Re-identification.
CoRR, 2017

Discover and Learn New Objects from Documentaries.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017


  Loading...