Fazl Barez
Orcid: 0009-0008-1889-6577
According to our database1,
Fazl Barez
authored at least 49 papers
between 2021 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2025
CoRR, August, 2025
CoRR, July, 2025
Beyond Monoliths: Expert Orchestration for More Capable, Democratic, and Safe Large Language Models.
CoRR, June, 2025
CoRR, May, 2025
SafetyNet: Detecting Harmful Outputs in LLMs by Modeling and Monitoring Deceptive Behaviors.
CoRR, May, 2025
CoRR, April, 2025
AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons.
CoRR, March, 2025
Same Question, Different Words: A Latent Adversarial Framework for Prompt Robustness.
CoRR, March, 2025
CoRR, February, 2025
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency, 2025
2024
Jailbreak Defense in a Narrow Domain: Limitations of Existing Methods and a New Transcript-Classifier Approach.
CoRR, 2024
CoRR, 2024
CoRR, 2024
CoRR, 2024
CoRR, 2024
CoRR, 2024
CoRR, 2024
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Proceedings of the Forty-first International Conference on Machine Learning, 2024
Proceedings of the Forty-first International Conference on Machine Learning, 2024
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Interpreting Context Look-ups in Transformers: Investigating Attention-MLP Interactions.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024
Towards Interpretable Sequence Continuation: Analyzing Shared Circuits in Large Language Models.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024
Proceedings of the Findings of the Association for Computational Linguistics, 2024
2023
CoRR, 2023
DeepDecipher: Accessing and Investigating Neuron Activation in Large Language Models.
CoRR, 2023
N2G: A Scalable Approach for Quantifying Interpretable Neuron Representations in Large Language Models.
CoRR, 2023
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023
The Larger they are, the Harder they Fail: Language Models do not Recognize Identifier Swaps in Python.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023
2021
CoRR, 2021