Yichi Zhang

Orcid: 0000-0002-1894-3977

Affiliations:

Tsinghua University, Department of Computer Science and Technology, Institute for Artificial Intelligence, THBI Lab, Beijing, China

According to our database¹, Yichi Zhang authored at least 25 papers between 2021 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2025

DeceptionBench: A Comprehensive Benchmark for AI Deception Behaviors in Real-world Scenarios.

[BibT_eX]

[DOI]

CoRR, October, 2025

Towards Safe Reasoning in Large Reasoning Models via Corrective Intervention.

[BibT_eX]

[DOI]

CoRR, September, 2025

Oyster-I: Beyond Refusal - Constructive Safety Alignment for Responsible Language Models.

[BibT_eX]

[DOI]

CoRR, September, 2025

Unveiling Trust in Multimodal Large Language Models: Evaluation, Analysis, and Mitigation.

[BibT_eX]

[DOI]

CoRR, August, 2025

A Survey on Autonomy-Induced Security Risks in Large Model-Based Agents.

[BibT_eX]

[DOI]

CoRR, June, 2025

Mitigating Overthinking in Large Reasoning Models via Manifold Steering.

[BibT_eX]

[DOI]

CoRR, May, 2025

Understanding Pre-training and Fine-tuning from Loss Landscape Perspectives.

[BibT_eX]

[DOI]

CoRR, May, 2025

RealSafe-R1: Safety-Aligned DeepSeek-R1 without Compromising Reasoning Capability.

[BibT_eX]

[DOI]

CoRR, April, 2025

Self-Memory Alignment: Mitigating Factual Hallucinations with Generalized Improvement.

[BibT_eX]

[DOI]

CoRR, February, 2025

STAIR: Improving Safety Alignment with Introspective Reasoning.

[BibT_eX]

[DOI]

CoRR, February, 2025

Breaking the Ceiling: Exploring the Potential of Jailbreak Attacks through Expanding Strategy Space.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024

Scaling Laws for Black box Adversarial Attacks.

[BibT_eX]

[DOI]

CoRR, 2024

Benchmarking Trustworthiness of Multimodal Large Language Models: A Comprehensive Study.

[BibT_eX]

[DOI]

CoRR, 2024

MultiTrust: A Comprehensive Benchmark Towards Trustworthy Multimodal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

PINNacle: A Comprehensive Benchmark of Physics-Informed Neural Networks for Solving PDEs.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Rethinking Model Ensemble in Transfer-based Adversarial Attacks.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Exploring the Transferability of Visual Prompting for Multimodal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

To make yourself invisible with Adversarial Semantic Contours.

[BibT_eX]

[DOI]

Comput. Vis. Image Underst., April, 2023

How Robust is Google's Bard to Adversarial Image Attacks?

[BibT_eX]

[DOI]

CoRR, 2023

Understanding the Robustness of 3D Object Detection with Bird's-Eye-View Representations in Autonomous Driving.

[BibT_eX]

[DOI]

CoRR, 2023

Rethinking Model Ensemble in Transfer-based Adversarial Attacks.

[BibT_eX]

[DOI]

CoRR, 2023

Understanding the Robustness of 3D Object Detection with Bird'View Representations in Autonomous Driving.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Physics-Informed Machine Learning: A Survey on Problems, Methods and Applications.

[BibT_eX]

[DOI]

CoRR, 2022

2021

Unrestricted Adversarial Attacks on ImageNet Competition.

[BibT_eX]

[DOI]

CoRR, 2021

Adversarial Semantic Contour for Object Detection.

[BibT_eX]

[DOI]

CoRR, 2021

Yichi Zhang

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...