Matei Zaharia
Orcid: 0000-0002-7547-7204Affiliations:
- Stanford University, CA, USA
  According to our database1,
  Matei Zaharia
  authored at least 230 papers
  between 2006 and 2025.
  
  
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
- 
    on zbmath.org
- 
    on twitter.com
- 
    on orcid.org
- 
    on id.loc.gov
- 
    on dl.acm.org
On csauthors.net:
Bibliography
  2025
    CoRR, September, 2025
    
  
    Proc. VLDB Endow., August, 2025
    
  
    Proc. VLDB Endow., August, 2025
    
  
DeepScholar-Bench: A Live Benchmark and Automated Evaluation for Generative Research Synthesis.
    
  
    CoRR, August, 2025
    
  
Multi-module GRPO: Composing Policy Gradients and Prompt Optimization for Language Model Programs.
    
  
    CoRR, August, 2025
    
  
Semantic Operators and Their Optimization: Towards AI-Based Data Analytics with Accuracy Guarantees.
    
  
    Proc. VLDB Endow., July, 2025
    
  
    CoRR, July, 2025
    
  
    CoRR, July, 2025
    
  
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!
    
  
    CoRR, February, 2025
    
  
BARE: Combining Base and Instruction-Tuned Language Models for Better Synthetic Data Generation.
    
  
    CoRR, February, 2025
    
  
Identification of cardiac wall motion abnormalities in diverse populations by deep learning of the electrocardiogram.
    
  
    npj Digit. Medicine, 2025
    
  
Databricks Lakeguard: Supporting Fine-grained Access Control and Multi-user Capabilities for Apache Spark Workloads.
    
  
    Proceedings of the Companion of the 2025 International Conference on Management of Data, 2025
    
  
    Proceedings of the Companion of the 2025 International Conference on Management of Data, 2025
    
  
Blink Twice - Automatic Workload Pinning and Regression Detection for Versionless Apache Spark using Retries.
    
  
    Proceedings of the Companion of the 2025 International Conference on Management of Data, 2025
    
  
    Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2025
    
  
    Proceedings of the 19th USENIX Symposium on Operating Systems Design and Implementation, 2025
    
  
    Proceedings of the International Conference on Neuro-symbolic Systems, 2025
    
  
    Proceedings of the Thirteenth International Conference on Learning Representations, 2025
    
  
    Proceedings of the Thirteenth International Conference on Learning Representations, 2025
    
  
    Proceedings of the Advances in Information Retrieval, 2025
    
  
    Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2025
    
  
  2024
    Proc. VLDB Endow., August, 2024
    
  
FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance.
    
  
    Trans. Mach. Learn. Res., 2024
    
  
ACORN: Performant and Predicate-Agnostic Search Over Vector Embeddings and Structured Data.
    
  
    Proc. ACM Manag. Data, 2024
    
  
Specifications: The missing link to making the development of LLM systems an engineering discipline.
    
  
    CoRR, 2024
    
  
Networks of Networks: Complexity Class Principles Applied to Compound AI Systems Design.
    
  
    CoRR, 2024
    
  
LOTUS: Enabling Semantic Queries with LLMs Over Tables of Unstructured and Structured Data.
    
  
    CoRR, 2024
    
  
    CoRR, 2024
    
  
Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks.
    
  
    Proceedings of the IEEE Security and Privacy, 2024
    
  
Are More LLM Calls All You Need? Towards the Scaling Properties of Compound AI Systems.
    
  
    Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
    
  
    Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024
    
  
Everything Everywhere All At Once: Efficient Cross-Service Program Analysis with OverSeer.
    
  
    Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering Workshops, 2024
    
  
    Proceedings of the Twelfth International Conference on Learning Representations, 2024
    
  
    Proceedings of the Twelfth International Conference on Learning Representations, 2024
    
  
    Proceedings of the 4th Workshop on Machine Learning and Systems, 2024
    
  
    Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024
    
  
  2023
    Proc. VLDB Endow., 2023
    
  
    Proc. VLDB Endow., 2023
    
  
DSPy Assertions: Computational Constraints for Self-Refining Language Model Pipelines.
    
  
    CoRR, 2023
    
  
    CoRR, 2023
    
  
    Proceedings of the 29th Symposium on Operating Systems Principles, 2023
    
  
    Proceedings of the Sixth Conference on Machine Learning and Systems, 2023
    
  
    Proceedings of the IEEE INFOCOM 2023, 2023
    
  
    Proceedings of the 13th Conference on Innovative Data Systems Research, 2023
    
  
    Proceedings of the 13th Conference on Innovative Data Systems Research, 2023
    
  
    Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023
    
  
    Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023
    
  
  2022
Author Correction: Advances, challenges and opportunities in creating data for trustworthy AI.
    
  
    Nat. Mac. Intell., October, 2022
    
  
    Proc. VLDB Endow., 2022
    
  
    Proc. VLDB Endow., 2022
    
  
    Proc. VLDB Endow., 2022
    
  
    Nat. Mach. Intell., 2022
    
  
    Math. Program. Comput., 2022
    
  
    J. Priv. Confidentiality, 2022
    
  
Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP.
    
  
    CoRR, 2022
    
  
    Proceedings of the SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12, 2022
    
  
Finding Label and Model Errors in Perception Data With Learned Observation Assertions.
    
  
    Proceedings of the SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12, 2022
    
  
    Proceedings of the SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12, 2022
    
  
    Proceedings of the 19th USENIX Symposium on Networked Systems Design and Implementation, 2022
    
  
    Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
    
  
    Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
    
  
    Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022
    
  
    Proceedings of the International Conference on Machine Learning, 2022
    
  
Hindsight: Posterior-guided training of retrievers for improved open-ended generation.
    
  
    Proceedings of the Tenth International Conference on Learning Representations, 2022
    
  
    Proceedings of the Tenth International Conference on Learning Representations, 2022
    
  
    Proceedings of the 31st ACM International Conference on Information & Knowledge Management, 2022
    
  
    Proceedings of the 12th Conference on Innovative Data Systems Research, 2022
    
  
    Proceedings of the 12th Conference on Innovative Data Systems Research, 2022
    
  
    Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022
    
  
  2021
    Trans. Assoc. Comput. Linguistics, 2021
    
  
    Proc. VLDB Endow., 2021
    
  
Toward Compact Parameter Representations for Architecture-Agnostic Neural Network Compression.
    
  
    CoRR, 2021
    
  
DistIR: An Intermediate Representation and Simulator for Efficient Neural Network Distribution.
    
  
    CoRR, 2021
    
  
    CoRR, 2021
    
  
    CoRR, 2021
    
  
Express: Lowering the Cost of Metadata-hiding Communication with Cryptographic Privacy.
    
  
    Proceedings of the 30th USENIX Security Symposium, 2021
    
  
    Proceedings of the SOSP '21: ACM SIGOPS 28th Symposium on Operating Systems Principles, 2021
    
  
    Proceedings of the International Conference for High Performance Computing, 2021
    
  
    Proceedings of the 18th USENIX Symposium on Networked Systems Design and Implementation, 2021
    
  
    Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021
    
  
    Proceedings of the 38th International Conference on Machine Learning, 2021
    
  
    Proceedings of the HotOS '21: Workshop on Hot Topics in Operating Systems, 2021
    
  
Don't Hate the Player, Hate the Game: Safety and Utility in Multi-Agent Congestion Control.
    
  
    Proceedings of the HotNets '21: The 20th ACM Workshop on Hot Topics in Networks, 2021
    
  
Clamor: Extending Functional Cluster Computing Frameworks with Fine-Grained Remote Memory Access.
    
  
    Proceedings of the SoCC '21: ACM Symposium on Cloud Computing, 2021
    
  
Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics.
    
  
    Proceedings of the 11th Conference on Innovative Data Systems Research, 2021
    
  
    Proceedings of the 11th Conference on Innovative Data Systems Research, 2021
    
  
  2020
A Demonstration of Willump: A Statistically-Aware End-to-end Optimizer for Machine Learning Inference.
    
  
    Proc. VLDB Endow., 2020
    
  
    Proc. VLDB Endow., 2020
    
  
    Proc. VLDB Endow., 2020
    
  
    CoRR, 2020
    
  
    CoRR, 2020
    
  
Offload Annotations: Bringing Heterogeneous Computing to Existing Libraries and Workloads.
    
  
    Proceedings of the 2020 USENIX Annual Technical Conference, 2020
    
  
    Proceedings of the 2020 USENIX Annual Technical Conference, 2020
    
  
    Proceedings of the SPAA '20: 32nd ACM Symposium on Parallelism in Algorithms and Architectures, 2020
    
  
    Proceedings of the Fourth Workshop on Data Management for End-To-End Machine Learning, 2020
    
  
ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT.
    
  
    Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, 2020
    
  
    Proceedings of the International Conference for High Performance Computing, 2020
    
  
    Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation, 2020
    
  
    Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020
    
  
    Proceedings of the Third Conference on Machine Learning and Systems, 2020
    
  
    Proceedings of the Third Conference on Machine Learning and Systems, 2020
    
  
    Proceedings of the Third Conference on Machine Learning and Systems, 2020
    
  
Improving the Accuracy, Scalability, and Performance of Graph Neural Networks with Roc.
    
  
    Proceedings of the Third Conference on Machine Learning and Systems, 2020
    
  
    Proceedings of the 8th International Conference on Learning Representations, 2020
    
  
    Proceedings of the Heterogeneous Data Management, Polystores, and Analytics for Healthcare, 2020
    
  
    Proceedings of the CCS '20: 2020 ACM SIGSAC Conference on Computer and Communications Security, 2020
    
  
    Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020
    
  
  2019
    login Usenix Mag., 2019
    
  
    ACM SIGOPS Oper. Syst. Rev., 2019
    
  
BlazeIt: Optimizing Declarative Aggregation and Limit Queries for Neural Network-Based Video Analytics.
    
  
    Proc. VLDB Endow., 2019
    
  
From Laptop to Lambda: Outsourcing Everyday Jobs to Thousands of Transient Functional Containers.
    
  
    Proceedings of the 2019 USENIX Annual Technical Conference, 2019
    
  
    Proceedings of the 27th ACM Symposium on Operating Systems Principles, 2019
    
  
    Proceedings of the 27th ACM Symposium on Operating Systems Principles, 2019
    
  
TASO: optimizing deep learning computation with automatic generation of graph substitutions.
    
  
    Proceedings of the 27th ACM Symposium on Operating Systems Principles, 2019
    
  
    Proceedings of the Second Conference on Machine Learning and Systems, SysML 2019, 2019
    
  
    Proceedings of the Second Conference on Machine Learning and Systems, SysML 2019, 2019
    
  
    Proceedings of the 36th International Conference on Machine Learning, 2019
    
  
    Proceedings of the 35th IEEE International Conference on Data Engineering, 2019
    
  
    Proceedings of the ACM Symposium on Cloud Computing, SoCC 2019, 2019
    
  
Challenges and Opportunities in DNN-Based Video Analytics: A Demonstration of the BlazeIt Video Query Engine.
    
  
    Proceedings of the 9th Biennial Conference on Innovative Data Systems Research, 2019
    
  
  2018
    Proceedings of the Encyclopedia of Database Systems, Second Edition, 2018
    
  
    Proc. VLDB Endow., 2018
    
  
    Proc. VLDB Endow., 2018
    
  
    Proc. VLDB Endow., 2018
    
  
Splitability Annotations: Optimizing Black-Box Function Composition in Existing Libraries.
    
  
    CoRR, 2018
    
  
    CoRR, 2018
    
  
    Proceedings of the 2018 International Conference on Management of Data, 2018
    
  
    Proceedings of the 2018 International Conference on Management of Data, 2018
    
  
  2017
    Proc. VLDB Endow., 2017
    
  
    CoRR, 2017
    
  
    Proceedings of the 26th Symposium on Operating Systems Principles, 2017
    
  
    Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation, 2017
    
  
    Proceedings of the 16th ACM Workshop on Hot Topics in Networks, Palo Alto, CA, USA, 2017
    
  
    Proceedings of the 8th Biennial Conference on Innovative Data Systems Research, 2017
    
  
    Proceedings of the 2017 IEEE International Conference on Big Data (IEEE BigData 2017), 2017
    
  
  2016
    Proc. VLDB Endow., 2016
    
  
    IACR Cryptol. ePrint Arch., 2016
    
  
    Proceedings of the 2016 International Conference on Management of Data, 2016
    
  
    Proceedings of the Workshop on Human-In-the-Loop Data Analytics, 2016
    
  
    Proceedings of the 2016 International Conference on Management of Data, 2016
    
  
    Proceedings of the 13th USENIX Symposium on Networked Systems Design and Implementation, 2016
    
  
    Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016
    
  
    Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016
    
  
    Proceedings of the Fourth International Workshop on Graph Data Management Experiences and Systems, Redwood Shores, CA, USA, June 24, 2016
    
  
  2015
    Proceedings of the 25th Symposium on Operating Systems Principles, 2015
    
  
    Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31, 2015
    
  
  2014
    Proceedings of the ACM Symposium on Cloud Computing, 2014
    
  
  2013
    PhD thesis, 2013
    
  
Large-Scale Estimation in Cyberphysical Systems Using Streaming Data: A Case Study With Arterial Traffic Estimation.
    
  
    IEEE Trans Autom. Sci. Eng., 2013
    
  
    Proceedings of the ACM SIGOPS 24th Symposium on Operating Systems Principles, 2013
    
  
    Proceedings of the ACM SIGOPS 24th Symposium on Operating Systems Principles, 2013
    
  
    Proceedings of the ACM SIGMOD International Conference on Management of Data, 2013
    
  
    Proceedings of the Eighth Eurosys Conference 2013, 2013
    
  
  2012
Large Scale Estimation in Cyberphysical Systems using Streaming Data: a Case Study with Smartphone Traces
    
  
    CoRR, 2012
    
  
    Proceedings of the 2012 USENIX Annual Technical Conference, 2012
    
  
    Proceedings of the ACM SIGMOD International Conference on Management of Data, 2012
    
  
    Proceedings of the ACM SIGCOMM 2012 Conference, 2012
    
  
Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing.
    
  
    Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation, 2012
    
  
Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters.
    
  
    Proceedings of the 4th USENIX Workshop on Hot Topics in Cloud Computing, 2012
    
  
    Proceedings of the 34th Annual Meeting of the Cognitive Science Society, 2012
    
  
  2011
    Proceedings of the ACM SIGCOMM 2011 Conference on Applications, 2011
    
  
    Proceedings of the 8th USENIX Symposium on Networked Systems Design and Implementation, 2011
    
  
    Proceedings of the 8th USENIX Symposium on Networked Systems Design and Implementation, 2011
    
  
    Proceedings of the 3rd USENIX Workshop on Hot Topics in Cloud Computing, 2011
    
  
    Proceedings of the ACM Symposium on Cloud Computing in conjunction with SOSP 2011, 2011
    
  
  2010
    Proceedings of the 2nd USENIX Workshop on Hot Topics in Cloud Computing, 2010
    
  
Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling.
    
  
    Proceedings of the European Conference on Computer Systems, 2010
    
  
  2009
    Proceedings of the 2009 International Conference on Information and Communication Technologies and Development, 2009
    
  
    Proceedings of the Workshop on Hot Topics in Cloud Computing, 2009
    
  
  2008
    Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation, 2008
    
  
  2007
    Proceedings of the 6th International workshop on Peer-To-Peer Systems, 2007
    
  
    Proceedings of the 2007 International Conference on Information and Communication Technologies and Development, 2007
    
  
  2006
    Proceedings of the 12th Annual International Conference on Mobile Computing and Networking, 2006
    
  
    Proceedings of the 5th International workshop on Peer-To-Peer Systems, 2006