Xiangxiang Chu

Orcid: 0000-0003-2548-0605

According to our database1, Xiangxiang Chu authored at least 146 papers between 2017 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
TransitLM: A Large-Scale Dataset and Benchmark for Map-Free Transit Route Generation.
CoRR, May, 2026

RISE: Reliable Improvement in Self-Evolving Vision-Language Models.
CoRR, May, 2026

SaaSBench: Exploring the Boundaries of Coding Agents in Long-Horizon Enterprise SaaS Engineering.
CoRR, May, 2026

D<sup>2</sup>Evo: Dual Difficulty-Aware Self-Evolution for Data-Efficient Reinforcement Learning.
CoRR, May, 2026

Embedding-perturbed Exploration Preference Optimization for Flow Models.
CoRR, May, 2026

Learning Agentic Policy from Action Guidance.
CoRR, May, 2026

Ace-Skill: Bootstrapping Multimodal Agents with Prioritized and Clustered Evolution.
CoRR, May, 2026

FastPillars: A Deployment-Friendly Pillar-Based 3D Detector.
IEEE Trans. Circuits Syst. Video Technol., April, 2026

Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representation.
CoRR, April, 2026

LLaTiSA: Towards Difficulty-Stratified Time Series Reasoning from Visual Perception to Semantics.
CoRR, April, 2026

Elucidating the SNR-t Bias of Diffusion Probabilistic Models.
CoRR, April, 2026

CoEvolve: Training LLM Agents via Agent-Data Mutual Evolution.
CoRR, April, 2026

Visual Enhanced Depth Scaling for Multimodal Latent Reasoning.
CoRR, April, 2026

Visually-Guided Policy Optimization for Multimodal Reasoning.
CoRR, April, 2026

Breaking Block Boundaries: Anchor-based History-stable Decoding for Diffusion Large Language Models.
CoRR, April, 2026

SkillClaw: Let Skills Evolve Collectively with Agentic Evolver.
CoRR, April, 2026

MAR-GRPO: Stabilized GRPO for AR-diffusion Hybrid Image Generation.
CoRR, April, 2026

ConceptWeaver: Weaving Disentangled Concepts with Flow.
CoRR, March, 2026

Omni-WorldBench: Towards a Comprehensive Interaction-Centric Evaluation for World Models.
CoRR, March, 2026

Video-CoE: Reinforcing Video Event Prediction via Chain of Events.
CoRR, March, 2026

Layer-wise Instance Binding for Regional and Occlusion Control in Text-to-Image Diffusion Transformers.
CoRR, March, 2026

Geometry-Guided Reinforcement Learning for Multi-view Consistent 3D Scene Editing.
CoRR, March, 2026

From Scale to Speed: Adaptive Test-Time Scaling for Image Editing.
CoRR, March, 2026

MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios.
CoRR, February, 2026

IntRR: A Framework for Integrating SID Redistribution and Length Reduction.
CoRR, February, 2026

IntTravel: A Real-World Dataset and Generative Framework for Integrated Multi-Task Travel Recommendation.
CoRR, February, 2026

What if Agents Could Imagine? Reinforcing Open-Vocabulary HOI Comprehension through Generation.
CoRR, February, 2026

Code2World: A GUI World Model via Renderable Code Generation.
CoRR, February, 2026

GenMRP: A Generative Multi-Route Planning Framework for Efficient and Personalized Real-Time Industrial Navigation.
CoRR, February, 2026

SCASRec: A Self-Correcting and Auto-Stopping Model for Generative Route List Recommendation.
CoRR, February, 2026

FASA: Frequency-aware Sparse Attention.
CoRR, February, 2026

AR-MAP: Are Autoregressive Large Language Models Implicit Teachers for Diffusion Large Language Models?
CoRR, February, 2026

Entropy-Guided Data-Efficient Training for Multimodal Reasoning Reward Models.
CoRR, February, 2026

Q-Hawkeye: Reliable Visual Policy Optimization for Image Quality Assessment.
CoRR, January, 2026

Generation Enhances Understanding in Unified Multimodal Models via Multi-Representation Generation.
CoRR, January, 2026

Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation.
CoRR, January, 2026

Ranking-aware Reinforcement Learning for Ordinal Ranking.
CoRR, January, 2026

Latent Temporal Discrepancy as Motion Prior: A Loss-Weighting Strategy for Dynamic Fidelity in T2V.
CoRR, January, 2026

Everything in Its Place: Benchmarking Spatial Intelligence of Text-to-Image Models.
CoRR, January, 2026

Artifact-Aware Evaluation for High-Quality Video Generation.
CoRR, January, 2026

TEXTS-Diff: TEXTS-Aware Diffusion Model for Real-World Text Image Super-Resolution.
CoRR, January, 2026

Urban Socio-Semantic Segmentation with Vision-Language Reasoning.
CoRR, January, 2026

Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization.
CoRR, January, 2026

Adaptive Task Balancing for Visual Instruction Tuning via Inter-Task Contribution and Intra-Task Difficulty.
Proceedings of the ACM Web Conference 2026, 2026

SCALAR: Scale-wise Controllable Visual Autoregressive Learning.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

ImagerySearch: Adaptive Test-Time Search for Video Generation Beyond Semantic Dependency Constraints.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

Omni-Effects: Unified and Spatially-Controllable Visual Effects Generation.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

AdaCuRL: Adaptive Curriculum Reinforcement Learning with Invalid Sample Mitigation and Historical Revisiting.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
Taming Hallucinations: Boosting MLLMs' Video Understanding via Counterfactual Video Generation.
CoRR, December, 2025

Taming Preference Mode Collapse via Directional Decoupling Alignment in Diffusion Reinforcement Learning.
CoRR, December, 2025

Eevee: Towards Close-up High-resolution Video-based Virtual Try-on.
CoRR, November, 2025

Semantic Context Matters: Improving Conditioning for Autoregressive Models.
CoRR, November, 2025

Where and What Matters: Sensitivity-Aware Task Vectors for Many-Shot Multimodal In-Context Learning.
CoRR, November, 2025

Advancing End-to-End Pixel Space Generative Modeling via Self-supervised Pre-training.
CoRR, October, 2025

Video-STAR: Reinforcing Open-Vocabulary Action Recognition with Tools.
CoRR, October, 2025

Tree Search for LLM Agent Reinforcement Learning.
CoRR, September, 2025

IntSR: An Integrated Generative Framework for Search and Recommendation.
CoRR, September, 2025

From Editor to Dense Geometry Estimator.
CoRR, September, 2025

AutoDrive-R<sup>2</sup>: Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving.
CoRR, September, 2025

RAGSR: Regional Attention Guided Diffusion for Image Super-Resolution.
CoRR, August, 2025

Position Bias Mitigates Position Bias:Mitigate Position Bias Through Inter-Position Knowledge Distillation.
CoRR, August, 2025

S<sup>2</sup>-Guidance: Stochastic Self Guidance for Training-Free Enhancement of Diffusion Models.
CoRR, August, 2025

Comprehensive Comparison Network: a framework for locality-aware, routes-comparable and interpretable route recommendation.
CoRR, August, 2025

NarrLV: Towards a Comprehensive Narrative-Centric Evaluation for Long Video Generation Models.
CoRR, July, 2025

UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning.
CoRR, May, 2025

Effective Probabilistic Time Series Forecasting with Fourier Adaptive Noise-Separated Diffusion.
CoRR, May, 2025

FLUX-Text: A Simple and Advanced Diffusion Transformer Baseline for Scene Text Editing.
CoRR, May, 2025

FingER: Content Aware Fine-grained Evaluation with Reasoning for AI-Generated Videos.
CoRR, April, 2025

GPG: A Simple and Strong Reinforcement Learning Baseline for Model Reasoning.
CoRR, April, 2025

Next Token Is Enough: Realistic Image Quality and Aesthetic Scoring with Multimodal Large Language Model.
CoRR, March, 2025

DSFNet: Learning Disentangled Scenario Factorization for Multi-Scenario Route Ranking.
Proceedings of the Companion Proceedings of the ACM on Web Conference 2025, 2025

FingER: Content Aware Fine-grained Evaluation with Reasoning for AI-Generated Videos.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Contrastive Instruction Fine-Tuning Large Multimodal Model for Hateful Meme Classification.
Proceedings of the Nineteenth International AAAI Conference on Web and Social Media, 2025

UPRE: Zero-Shot Domain Adaptation for Object Detection via Unified Prompt and Representation Enhancement.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

VMBench: A Benchmark for Perception-Aligned Video Motion Generation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

LD-RPS: Zero-Shot Unified Image Restoration via Latent Diffusion Recurrent Posterior Sampling.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

USP: Unified Self-Supervised Pretraining for Image Generation and Understanding.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Lenna: Language Enhanced Reasoning Detection Assistant.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

HS-STaR: Hierarchical Sampling for Self-Taught Reasoners via Difficulty Estimation and Budget Reallocation.
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

POSITION BIAS MITIGATES POSITION BIAS: Mitigate Position Bias Through Inter-Position Knowledge Distillation.
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

Towards Efficient Foundation Model for Zero-shot Amodal Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024
AODet: Aerial Object Detection Using Transformers for Foreground Regions.
IEEE Trans. Geosci. Remote. Sens., 2024

Prioritize Denoising Steps on Diffusion Model Preference Alignment via Explicit Denoised Distribution Estimation.
CoRR, 2024

MMGenBench: Evaluating the Limits of LMMs from the Text-to-Image Generation Perspective.
CoRR, 2024

DreamCouple: Exploring High Quality Text-to-3D Generation Via Rectified Flow.
CoRR, 2024

PLUG: Revisiting Amodal Segmentation with Foundation Model and Hierarchical Focus.
CoRR, 2024

AdaFedFR: Federated Face Recognition with Adaptive Inter-Class Representation Learning.
CoRR, 2024

LogicalDefender: Discovering, Extracting, and Utilizing Common-Sense Knowledge.
CoRR, 2024

VisionLLaMA: A Unified LLaMA Interface for Vision Tasks.
CoRR, 2024

MobileVLM V2: Faster and Stronger Baseline for Vision Language Model.
CoRR, 2024

Revealing the Dark Secrets of Extremely Large Kernel ConvNets on Robustness.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

LiDAR-PTQ: Post-Training Quantization for Point Cloud 3D Object Detection.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Dyn-Adapter: Towards Disentangled Representation for Efficient Visual Recognition.
Proceedings of the Computer Vision - ECCV 2024, 2024

VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks.
Proceedings of the Computer Vision - ECCV 2024, 2024

PeLK: Parameter-Efficient Large Kernel ConvNets with Peripheral Convolution.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

SCTNet: Single-Branch CNN with Transformer Semantic Information for Real-Time Segmentation.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Norm Tweaking: High-Performance Low-Bit Quantization of Large Language Models.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Make RepVGG Greater Again: A Quantization-Aware Approach.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices.
CoRR, 2023

RobustCalib: Robust Lidar-Camera Extrinsic Calibration with Consistency Learning.
CoRR, 2023

Masked Autoencoders Are Robust Neural Architecture Search Learners.
CoRR, 2023

A Speed Odyssey for Deployable Quantization of LLMs.
CoRR, 2023

FPTQ: Fine-grained Post-Training Quantization for Large Language Models.
CoRR, 2023

FastPillars: A Deployment-friendly Pillar-based 3D Detector.
CoRR, 2023

EfficientRep: An Efficient Repvgg-style ConvNets with Hardware-aware Neural Network Design.
CoRR, 2023

YOLOv6 v3.0: A Full-Scale Reloading.
CoRR, 2023

Conditional Positional Encodings for Vision Transformers.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

ROME: Robustifying Memory-Efficient NAS via Topology Disentanglement and Gradient Accumulation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

MixPath: A Unified Approach for One-shot Neural Architecture Search.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

AeDet: Azimuth-Invariant Multi-View 3D Object Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications.
CoRR, 2022

PromptDet: Expand Your Detector Vocabulary with Uncurated Images.
CoRR, 2022

Fully Convolutional One-Stage 3D Object Detection on LiDAR Range Images.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

SegViT: Semantic Segmentation with Plain Vision Transformers.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

PromptDet: Towards Open-Vocabulary Detection Using Uncurated Images.
Proceedings of the Computer Vision - ECCV 2022, 2022

Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

EAPruning: Evolutionary Pruning for Vision Transformers and CNNs.
Proceedings of the 33rd British Machine Vision Conference 2022, 2022

A Unified Mixture-View Framework for Unsupervised Representation Learning.
Proceedings of the 33rd British Machine Vision Conference 2022, 2022

2021
DAAS: Differentiable Architecture and Augmentation Policy Search.
CoRR, 2021

CCTrans: Simplifying and Improving Crowd Counting with Transformer.
CoRR, 2021

Twins: Revisiting Spatial Attention Design in Vision Transformers.
CoRR, 2021

Do We Really Need Explicit Position Encodings for Vision Transformers?
CoRR, 2021

Twins: Revisiting the Design of Spatial Attention in Vision Transformers.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

DARTS-: Robustly Stepping out of Performance Collapse Without Indicators.
Proceedings of the 9th International Conference on Learning Representations, 2021

SCARLET-NAS: Bridging the Gap between Stability and Scalability in Weight-sharing Neural Architecture Search.
Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2021

FairNAS: Rethinking Evaluation Fairness of Weight Sharing Neural Architecture Search.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

AutoKWS: Keyword Spotting with Differentiable Architecture Search.
Proceedings of the IEEE International Conference on Acoustics, 2021

Noisy Differentiable Architecture Search.
Proceedings of the 32nd British Machine Vision Conference 2021, 2021

2020
Beyond Single Instance Multi-view Unsupervised Representation Learning.
CoRR, 2020

ROME: Robustifying Memory-Efficient NAS via Topology Disentanglement and Gradients Accumulation.
CoRR, 2020

Noisy Differentiable Architecture Search.
CoRR, 2020

MixPath: A Unified Approach for One-shot Neural Architecture Search.
CoRR, 2020

Neural Architecture Search on Acoustic Scene Classification.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Fast, Accurate and Lightweight Super-Resolution with Neural Architecture Search.
Proceedings of the 25th International Conference on Pattern Recognition, 2020

MoGA: Searching Beyond Mobilenetv3.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Fair DARTS: Eliminating Unfair Advantages in Differentiable Architecture Search.
Proceedings of the Computer Vision - ECCV 2020, 2020

Multi-objective Reinforced Evolution in Mobile Neural Architecture Search.
Proceedings of the Computer Vision - ECCV 2020 Workshops, 2020

Accurate and Efficient Single Image Super-Resolution with Matrix Channel Attention Network.
Proceedings of the Computer Vision - ACCV 2020 - 15th Asian Conference on Computer Vision, Kyoto, Japan, November 30, 2020

2019
ScarletNAS: Bridging the Gap Between Scalability and Fairness in Neural Architecture Search.
CoRR, 2019

FairNAS: Rethinking Evaluation Fairness of Weight Sharing Neural Architecture Search.
CoRR, 2019

A Matrix-in-matrix Neural Network for Image Super Resolution.
CoRR, 2019

Fast, Accurate and Lightweight Super-Resolution with Neural Architecture Search.
CoRR, 2019

Multi-Objective Reinforced Evolution in Mobile Neural Architecture Search.
CoRR, 2019

2018
Improved Crowding Distance for NSGA-II.
CoRR, 2018

Policy Optimization With Penalized Point Probability Distance: An Alternative To Proximal Policy Optimization.
CoRR, 2018

2017
Parameter Sharing Deep Deterministic Policy Gradient for Cooperative Multi-agent Reinforcement Learning.
CoRR, 2017


  Loading...