Tony T. Wang

Orcid: 0000-0001-5991-5625

Affiliations:

MIT, CSAIL, Cambridge, MA, USA

According to our database¹, Tony T. Wang authored at least 11 papers between 2022 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2025

Learning to Interpret Weight Differences in Language Models.

[BibT_eX]

[DOI]

CoRR, October, 2025

A connectomics-driven analysis reveals novel characterization of border regions in mouse visual cortex.

[BibT_eX]

[DOI]

Neural Networks, 2025

Failures to Find Transferable Image Jailbreaks Between Vision-Language Models.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Can Go AIs Be Adversarially Robust?

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024

Jailbreak Defense in a Narrow Domain: Limitations of Existing Methods and a New Transcript-Classifier Approach.

[BibT_eX]

[DOI]

CoRR, 2024

Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

2023

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2023

Forbidden Facts: An Investigation of Competing Objectives in Llama-2.

[BibT_eX]

[DOI]

CoRR, 2023

Cliff-Learning.

[BibT_eX]

[DOI]

Tony T. Wang

Igor Zablotchi

Nir Shavit

Jonathan S. Rosenfeld

CoRR, 2023

Adversarial Policies Beat Superhuman Go AIs.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

2022

Adversarial Policies Beat Professional-Level Go AIs.

[BibT_eX]

[DOI]

CoRR, 2022

Tony T. Wang

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...