Jiaming Ji

Orcid: 0000-0002-3769-2077

According to our database1, Jiaming Ji authored at least 70 papers between 2021 and 2026.

Collaborative distances:

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
AI Alignment: A Contemporary Survey.
ACM Comput. Surv., April, 2026

RedVLA: Physical Red Teaming for Vision-Language-Action Models.
CoRR, April, 2026

ShuttleEnv: An Interactive Data-Driven RL Environment for Badminton Strategy Modeling.
CoRR, March, 2026

VISA: Value Injection via Shielded Adaptation for Personalized LLM Alignment.
CoRR, March, 2026

Enhance the Safety in Reinforcement Learning by ADRC Lagrangian Methods.
CoRR, January, 2026

What, Whether and How? Unveiling Process Reward Models for Thinking with Images Reasoning.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
VLA-Arena: An Open-Source Framework for Benchmarking Vision-Language-Action Models.
CoRR, December, 2025

Are Your Agents Upward Deceivers?
CoRR, December, 2025

Debate with Images: Detecting Deceptive Behaviors in Multimodal Large Language Models.
CoRR, December, 2025

Arcadia: Toward a Full-Lifecycle Framework for Embodied Lifelong Learning.
CoRR, December, 2025

AI Deception: Risks, Dynamics, and Controls.
CoRR, November, 2025

SafeMT: Multi-turn Safety for Multimodal Language Models.
CoRR, October, 2025

On Robustness of Vision-Language-Action Model against Multi-Modal Perturbations.
CoRR, October, 2025

ReDMan: reliable dexterous manipulation with safe reinforcement learning.
Mach. Learn., August, 2025

Medical Reasoning in the Era of LLMs: A Systematic Review of Enhancement Techniques and Applications.
CoRR, August, 2025

A Game-Theoretic Negotiation Framework for Cross-Cultural Consensus in LLMs.
CoRR, June, 2025

SafeLawBench: Towards Safe Alignment of Large Language Models.
CoRR, June, 2025

InterMT: Multi-Turn Interleaved Preference Alignment with Human Feedback.
CoRR, May, 2025

The Mirage of Multimodality: Where Truth is Tested and Honesty Unravels.
CoRR, May, 2025

Mitigating Deceptive Alignment via Self-Monitoring.
CoRR, May, 2025

Generative RLHF-V: Learning Principles from Multi-modal Human Preference.
CoRR, May, 2025

J1: Exploring Simple Test-Time Scaling for LLM-as-a-Judge.
CoRR, May, 2025

A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment.
CoRR, April, 2025

Benchmarking Multi-National Value Alignment for Large Language Models.
CoRR, April, 2025

Safe RLHF-V: Safe Reinforcement Learning from Human Feedback in Multimodal Large Language Models.
CoRR, March, 2025

ThinkPatterns-21k: A Systematic Study on the Impact of Thinking Patterns in LLMs.
CoRR, March, 2025

SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Safe Reinforcement Learning.
CoRR, March, 2025

RedStar: Does Scaling Long-CoT Data Unlock Better Slow-Reasoning Systems?
CoRR, January, 2025

SAE-V: Interpreting Multimodal Models for Enhanced Alignment.
Proceedings of the Forty-second International Conference on Machine Learning, 2025

A Survey of LLM-based Agents in Medicine: How far are we from Baymax?
Proceedings of the Findings of the Association for Computational Linguistics, 2025

LegalReasoner: Step-wised Verification-Correction for Legal Judgment Reasoning.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Reward Generalization in RLHF: A Topological Perspective.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

Benchmarking Multi-National Value Alignment for Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

Language Models Resist Alignment: Evidence From Data Compression.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

PKU-SafeRLHF: Towards Multi-Level Safety Alignment for LLMs with Human Preference.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Boosting Policy and Process Reward Models with Monte Carlo Tree Search in Open-Domain QA.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

SafeLawBench: Towards Safe Alignment of Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

FinMME: Benchmark Dataset for Financial Multi-Modal Reasoning Evaluation.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Sequence to Sequence Reward Modeling: Improving RLHF by Language Feedback.
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

Stream Aligner: Efficient Sentence-Level Alignment via Distribution Induction.
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024
Bi-DexHands: Towards Human-Level Bimanual Dexterous Manipulation.
IEEE Trans. Pattern Anal. Mach. Intell., May, 2024

Heterogeneous-Agent Reinforcement Learning.
J. Mach. Learn. Res., 2024

OmniSafe: An Infrastructure for Accelerating Safe Reinforcement Learning Research.
J. Mach. Learn. Res., 2024

Align Anything: Training All-Modality Models to Follow Instructions with Language Feedback.
CoRR, 2024

Sequence to Sequence Reward Modeling: Improving RLHF by Language Feedback.
CoRR, 2024

PKU-SafeRLHF: A Safety Alignment Preference Dataset for Llama Family Models.
CoRR, 2024

SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset.
CoRR, 2024

Language Models Resist Alignment.
CoRR, 2024

Rethinking Information Structures in RLHF: Reward Generalization from a Graph Theory Perspective.
CoRR, 2024

Aligner: Achieving Efficient Alignment through Weak-to-Strong Correction.
CoRR, 2024

ProgressGym: Alignment with a Millennium of Moral Progress.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Aligner: Efficient Alignment by Learning to Correct.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Multi-task Learning LSTM-based Traffic Prediction in Data Center Networks.
Proceedings of the 8th International Conference on Machine Learning and Soft Computing, 2024

Safe RLHF: Safe Reinforcement Learning from Human Feedback.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

SafeDreamer: Safe Reinforcement Learning with World Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023
Object Detection of Flexible Objects with Arbitrary Orientation Based on Rotation-Adaptive YOLOv5.
Sensors, 2023

AI Alignment: A Comprehensive Survey.
CoRR, 2023

Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark.
CoRR, 2023

Baichuan 2: Open Large-scale Language Models.
CoRR, 2023

Safe DreamerV3: Safe Reinforcement Learning with World Models.
CoRR, 2023

BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset.
CoRR, 2023

Heterogeneous-Agent Reinforcement Learning.
CoRR, 2023

Safety Gymnasium: A Unified Safe Reinforcement Learning Benchmark.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

VOCE: Variational Optimization with Conservative Estimation for Offline Safe Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Augmented Proximal Policy Optimization for Safe Reinforcement Learning.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
CUP: A Conservative Update Policy Algorithm for Safe Reinforcement Learning.
CoRR, 2022

Constrained Update Projection Approach to Safe Policy Optimization.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

2021


  Loading...