Fazl Barez

According to our database1, Fazl Barez authored at least 19 papers between 2021 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Interpreting Context Look-ups in Transformers: Investigating Attention-MLP Interactions.
CoRR, 2024

Increasing Trust in Language Models through the Reuse of Verified Circuits.
CoRR, 2024

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training.
CoRR, 2024

Large Language Models Relearn Removed Concepts.
CoRR, 2024

2023
Measuring Value Alignment.
CoRR, 2023

Locating Cross-Task Sequence Continuation Circuits in Transformers.
CoRR, 2023

Understanding Addition in Transformers.
CoRR, 2023

Interpreting Reward Models in RLHF-Tuned Language Models Using Sparse Autoencoders.
CoRR, 2023

AI Systems of Concern.
CoRR, 2023

DeepDecipher: Accessing and Investigating Neuron Activation in Large Language Models.
CoRR, 2023

Neuron to Graph: Interpreting Language Model Neurons at Scale.
CoRR, 2023

N2G: A Scalable Approach for Quantifying Interpretable Neuron Representations in Large Language Models.
CoRR, 2023

System III: Learning with Domain Knowledge for Safety Constraints.
CoRR, 2023

Fairness in AI and Its Long-Term Implications on Society.
CoRR, 2023

Exploring the Advantages of Transformers for High-Frequency Trading.
CoRR, 2023

Benchmarking Specialized Databases for High-frequency Data.
CoRR, 2023

Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

The Larger they are, the Harder they Fail: Language Models do not Recognize Identifier Swaps in Python.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2021
ED2: An Environment Dynamics Decomposition Framework for World Model Construction.
CoRR, 2021


  Loading...