Jian Yang

Affiliations:

Beihang University, SKLCCSE, Beijing, China
M-A-P, China
ByteDance Seed, China
OPPO-AI Team, China

According to our database¹, Jian Yang authored at least 50 papers between 2024 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression.

[BibT_eX]

[DOI]

CoRR, April, 2026

InCoder-32B-Thinking: Industrial Code World Model for Thinking.

[BibT_eX]

[DOI]

CoRR, April, 2026

InCoder-32B: Code Foundation Model for Industrial Scenarios.

[BibT_eX]

[DOI]

CoRR, March, 2026

Search More, Think Less: Rethinking Long-Horizon Agentic Search for Efficiency and Generalization.

[BibT_eX]

[DOI]

CoRR, February, 2026

WorldTravel: A Realistic Multimodal Travel-Planning Benchmark with Tightly Coupled Constraints.

[BibT_eX]

[DOI]

CoRR, February, 2026

Large-Scale Terminal Agentic Trajectory Generation from Dockerized Environments.

[BibT_eX]

[DOI]

CoRR, February, 2026

COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EACL 2026, 2026

2025

CodeSimpleQA: Scaling Factuality in Code Large Language Models.

[BibT_eX]

[DOI]

CoRR, December, 2025

NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents.

[BibT_eX]

[DOI]

CoRR, December, 2025

Multi-Docker-Eval: A 'Shovel of the Gold Rush' Benchmark on Automatic Environment Building for Software Engineering.

[BibT_eX]

[DOI]

CoRR, December, 2025

AI Deception: Risks, Dynamics, and Controls.

[BibT_eX]

[DOI]

CoRR, November, 2025

From Code Foundation Models to Agents and Applications: A Comprehensive Survey and Practical Guide to Code Intelligence.

[BibT_eX]

[DOI]

CoRR, November, 2025

MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs.

[BibT_eX]

[DOI]

CoRR, November, 2025

Scaling Latent Reasoning via Looped Language Models.

[BibT_eX]

[DOI]

CoRR, October, 2025

A<sup>2</sup>FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning.

[BibT_eX]

[DOI]

CoRR, October, 2025

OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs.

[BibT_eX]

[DOI]

CoRR, October, 2025

TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling.

[BibT_eX]

[DOI]

CoRR, August, 2025

MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents.

[BibT_eX]

[DOI]

CoRR, August, 2025

Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL.

[BibT_eX]

[DOI]

CoRR, August, 2025

Efficient Agents: Building Effective Agents While Reducing Cost.

[BibT_eX]

[DOI]

CoRR, August, 2025

IFEvalCode: Controlled Code Generation.

[BibT_eX]

[DOI]

CoRR, July, 2025

KAT-V1: Kwai-AutoThink Technical Report.

[BibT_eX]

[DOI]

CoRR, July, 2025

A Survey on Latent Reasoning.

[BibT_eX]

[DOI]

CoRR, July, 2025

OAgents: An Empirical Study of Building Effective Agents.

[BibT_eX]

[DOI]

CoRR, June, 2025

Scaling Test-time Compute for LLM Agents.

[BibT_eX]

[DOI]

CoRR, June, 2025

TaskCraft: Automated Generation of Agentic Tasks.

[BibT_eX]

[DOI]

CoRR, June, 2025

USB: A Comprehensive and Unified Safety Evaluation Benchmark for Multimodal Large Language Models.

[BibT_eX]

[DOI]

CoRR, May, 2025

Beyond Safe Answers: A Benchmark for Evaluating True Risk Awareness in Large Reasoning Models.

[BibT_eX]

[DOI]

CoRR, May, 2025

KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation.

[BibT_eX]

[DOI]

CoRR, May, 2025

Table-R1: Region-based Reinforcement Learning for Table Understanding.

[BibT_eX]

[DOI]

CoRR, May, 2025

A Comprehensive Survey on Long Context Language Modeling.

[BibT_eX]

[DOI]

CoRR, March, 2025

YuE: Scaling Open Foundation Models for Long-Form Music Generation.

[BibT_eX]

[DOI]

CoRR, March, 2025

SampleMix: A Sample-wise Pre-training Data Mixing Strategey by Coordinating Data Quality and Diversity.

[BibT_eX]

[DOI]

CoRR, March, 2025

CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models.

[BibT_eX]

[DOI]

CoRR, February, 2025

KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

KOR-Bench: Benchmarking Language Models on Knowledge-Orthogonal Reasoning Tasks.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

OAgents: An Empirical Study of Building Effective Agents.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

Turning the Tide: Repository-based Code Reflection.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

T2R-BENCH: A Benchmark for Real World Table-to-Report Task.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

SampleMix: A Sample-wise Pre-training Data Mixing Strategy by Coordinating Data Quality and Diversity.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

Breaking Size Barrier: Enhancing Reasoning for Large-Size Table Question Answering.

[BibT_eX]

[DOI]

Proceedings of the Database Systems for Advanced Applications, 2025

LIME: Less Is More for MLLM Evaluation.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024

Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey.

[BibT_eX]

[DOI]

CoRR, 2024

AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions.

[BibT_eX]

[DOI]

CoRR, 2024

Aligning CodeLLMs with Direct Preference Optimization.

[BibT_eX]

[DOI]

CoRR, 2024

KOR-Bench: Benchmarking Language Models on Knowledge-Orthogonal Reasoning Tasks.

[BibT_eX]

[DOI]

CoRR, 2024

OmniBench: Towards The Future of Universal Omni-Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

LIME: Less Is More for MLLM Evaluation.

[BibT_eX]

[DOI]

CoRR, 2024

Towards a Unified View of Preference Learning for Large Language Models: A Survey.

[BibT_eX]

[DOI]

CoRR, 2024

LongIns: A Challenging Long-context Instruction-based Exam for LLMs.

[BibT_eX]

[DOI]

CoRR, 2024

Jian Yang

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...