Alex Gu

Orcid: 0000-0002-4814-0796

According to our database1, Alex Gu authored at least 29 papers between 2021 and 2026.

Collaborative distances:

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative Tasks.
CoRR, March, 2026

Vibe Code Bench: Evaluating AI Models on End-to-End Web Application Development.
Proceedings of the ACM Conference on AI and Agentic Systems, 2026

2025
Completion ≠ Collaboration: Scaling Collaborative Effort with Agents.
CoRR, October, 2025

ProofOptimizer: Training Language Models to Simplify Proofs without Human Demonstrations.
CoRR, October, 2025

CWM: An Open-Weights LLM for Research on Code Generation with World Models.
CoRR, October, 2025

Investigating Advanced Reasoning of Large Language Models via Black-Box Interaction.
CoRR, August, 2025

Solving Inequality Proofs with Large Language Models.
CoRR, June, 2025

Challenges and Paths Towards AI for Software Engineering.
CoRR, March, 2025

Solving Inequality Proofs with Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Position: Future Research and Challenges Remain Towards AI for Software Engineering.
Proceedings of the Forty-second International Conference on Machine Learning, 2025

Mixture of Parrots: Experts improve memorization more than reasoning.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024
The Disagreement Problem in Explainable Machine Learning: A Practitioner's Perspective.
Trans. Mach. Learn. Res., 2024

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions.
CoRR, 2024

StarCoder 2 and The Stack v2: The Next Generation.
CoRR, 2024

Language Agnostic Code Embeddings.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

The Counterfeit Conundrum: Can Code Language Models Grasp the Nuances of Their Incorrect Generations?
Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023
StarCoder: may the source be with you!
Trans. Mach. Learn. Res., 2023

Certified Interpretability Robustness for Class Activation Mapping.
CoRR, 2023

SantaCoder: don't reach for the stars!
CoRR, 2023

LeanDojo: Theorem Proving with Retrieval-Augmented Language Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Min-Max Multi-objective Bilevel Optimization with Applications in Robust Machine Learning.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

LINC: A Neurosymbolic Approach for Logical Reasoning by Combining Language Models with First-Order Logic Provers.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

2022
ObSynth: An Interactive Synthesis System for Generating Object Models from Natural Language Specifications.
CoRR, 2022

Min-Max Bilevel Multi-objective Optimization with Applications in Machine Learning.
CoRR, 2022

The Disagreement Problem in Explainable Machine Learning: A Practitioner's Perspective.
CoRR, 2022

2021
Reproducibility Report: La-MAML: Look-ahead Meta Learning for Continual Learning.
CoRR, 2021

Three Operator Splitting with Subgradients, Stochastic Gradients, and Adaptive Learning Rates.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021


  Loading...