Jian Yang

Affiliations:
  • Beihang University, SKLCCSE, Beijing, China
  • M-A-P, China
  • ByteDance Seed, China
  • OPPO-AI Team, China


According to our database1, Jian Yang authored at least 50 papers between 2024 and 2026.

Collaborative distances:

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression.
CoRR, April, 2026

InCoder-32B-Thinking: Industrial Code World Model for Thinking.
CoRR, April, 2026

InCoder-32B: Code Foundation Model for Industrial Scenarios.
CoRR, March, 2026

Search More, Think Less: Rethinking Long-Horizon Agentic Search for Efficiency and Generalization.
CoRR, February, 2026

WorldTravel: A Realistic Multimodal Travel-Planning Benchmark with Tightly Coupled Constraints.
CoRR, February, 2026

Large-Scale Terminal Agentic Trajectory Generation from Dockerized Environments.
CoRR, February, 2026

COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values.
Proceedings of the Findings of the Association for Computational Linguistics: EACL 2026, 2026

2025
CodeSimpleQA: Scaling Factuality in Code Large Language Models.
CoRR, December, 2025

NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents.
CoRR, December, 2025

Multi-Docker-Eval: A 'Shovel of the Gold Rush' Benchmark on Automatic Environment Building for Software Engineering.
CoRR, December, 2025

AI Deception: Risks, Dynamics, and Controls.
CoRR, November, 2025

From Code Foundation Models to Agents and Applications: A Comprehensive Survey and Practical Guide to Code Intelligence.
CoRR, November, 2025

MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs.
CoRR, November, 2025

Scaling Latent Reasoning via Looped Language Models.
CoRR, October, 2025

A<sup>2</sup>FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning.
CoRR, October, 2025

OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs.
CoRR, October, 2025

TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling.
CoRR, August, 2025

MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents.
CoRR, August, 2025

Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL.
CoRR, August, 2025

Efficient Agents: Building Effective Agents While Reducing Cost.
CoRR, August, 2025

IFEvalCode: Controlled Code Generation.
CoRR, July, 2025

KAT-V1: Kwai-AutoThink Technical Report.
CoRR, July, 2025

A Survey on Latent Reasoning.
CoRR, July, 2025

OAgents: An Empirical Study of Building Effective Agents.
CoRR, June, 2025

Scaling Test-time Compute for LLM Agents.
CoRR, June, 2025

TaskCraft: Automated Generation of Agentic Tasks.
CoRR, June, 2025

USB: A Comprehensive and Unified Safety Evaluation Benchmark for Multimodal Large Language Models.
CoRR, May, 2025

Beyond Safe Answers: A Benchmark for Evaluating True Risk Awareness in Large Reasoning Models.
CoRR, May, 2025

KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation.
CoRR, May, 2025

Table-R1: Region-based Reinforcement Learning for Table Understanding.
CoRR, May, 2025

A Comprehensive Survey on Long Context Language Modeling.
CoRR, March, 2025

YuE: Scaling Open Foundation Models for Long-Form Music Generation.
CoRR, March, 2025

SampleMix: A Sample-wise Pre-training Data Mixing Strategey by Coordinating Data Quality and Diversity.
CoRR, March, 2025

CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models.
CoRR, February, 2025

KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

KOR-Bench: Benchmarking Language Models on Knowledge-Orthogonal Reasoning Tasks.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

OAgents: An Empirical Study of Building Effective Agents.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

Turning the Tide: Repository-based Code Reflection.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

T2R-BENCH: A Benchmark for Real World Table-to-Report Task.
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

SampleMix: A Sample-wise Pre-training Data Mixing Strategy by Coordinating Data Quality and Diversity.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

Breaking Size Barrier: Enhancing Reasoning for Large-Size Table Question Answering.
Proceedings of the Database Systems for Advanced Applications, 2025

LIME: Less Is More for MLLM Evaluation.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024
Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey.
CoRR, 2024

AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions.
CoRR, 2024

Aligning CodeLLMs with Direct Preference Optimization.
CoRR, 2024

KOR-Bench: Benchmarking Language Models on Knowledge-Orthogonal Reasoning Tasks.
CoRR, 2024

OmniBench: Towards The Future of Universal Omni-Language Models.
CoRR, 2024

LIME: Less Is More for MLLM Evaluation.
CoRR, 2024

Towards a Unified View of Preference Learning for Large Language Models: A Survey.
CoRR, 2024

LongIns: A Challenging Long-context Instruction-based Exam for LLMs.
CoRR, 2024


  Loading...