Yichi Zhang

Orcid: 0000-0002-1894-3977

Affiliations:
  • Tsinghua University, Department of Computer Science and Technology, Institute for Artificial Intelligence, THBI Lab, Beijing, China


According to our database1, Yichi Zhang authored at least 25 papers between 2021 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
DeceptionBench: A Comprehensive Benchmark for AI Deception Behaviors in Real-world Scenarios.
CoRR, October, 2025

Towards Safe Reasoning in Large Reasoning Models via Corrective Intervention.
CoRR, September, 2025

Oyster-I: Beyond Refusal - Constructive Safety Alignment for Responsible Language Models.
CoRR, September, 2025

Unveiling Trust in Multimodal Large Language Models: Evaluation, Analysis, and Mitigation.
CoRR, August, 2025

A Survey on Autonomy-Induced Security Risks in Large Model-Based Agents.
CoRR, June, 2025

Mitigating Overthinking in Large Reasoning Models via Manifold Steering.
CoRR, May, 2025

Understanding Pre-training and Fine-tuning from Loss Landscape Perspectives.
CoRR, May, 2025

RealSafe-R1: Safety-Aligned DeepSeek-R1 without Compromising Reasoning Capability.
CoRR, April, 2025

Self-Memory Alignment: Mitigating Factual Hallucinations with Generalized Improvement.
CoRR, February, 2025

STAIR: Improving Safety Alignment with Introspective Reasoning.
CoRR, February, 2025

Breaking the Ceiling: Exploring the Potential of Jailbreak Attacks through Expanding Strategy Space.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024
Scaling Laws for Black box Adversarial Attacks.
CoRR, 2024

Benchmarking Trustworthiness of Multimodal Large Language Models: A Comprehensive Study.
CoRR, 2024

MultiTrust: A Comprehensive Benchmark Towards Trustworthy Multimodal Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

PINNacle: A Comprehensive Benchmark of Physics-Informed Neural Networks for Solving PDEs.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Rethinking Model Ensemble in Transfer-based Adversarial Attacks.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Exploring the Transferability of Visual Prompting for Multimodal Large Language Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
To make yourself invisible with Adversarial Semantic Contours.
Comput. Vis. Image Underst., April, 2023

How Robust is Google's Bard to Adversarial Image Attacks?
CoRR, 2023

Understanding the Robustness of 3D Object Detection with Bird's-Eye-View Representations in Autonomous Driving.
CoRR, 2023

Rethinking Model Ensemble in Transfer-based Adversarial Attacks.
CoRR, 2023

Understanding the Robustness of 3D Object Detection with Bird'View Representations in Autonomous Driving.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Physics-Informed Machine Learning: A Survey on Problems, Methods and Applications.
CoRR, 2022

2021
Unrestricted Adversarial Attacks on ImageNet Competition.
CoRR, 2021

Adversarial Semantic Contour for Object Detection.
CoRR, 2021


  Loading...