Matei Zaharia

According to our database1, Matei Zaharia authored at least 115 papers between 2006 and 2021.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2021
Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval.
CoRR, 2021

2020
Posh: A Data-Aware Shell.
login Usenix Mag., 2020

A Demonstration of Willump: A Statistically-Aware End-to-end Optimizer for Machine Learning Inference.
Proc. VLDB Endow., 2020

Approximate Selection with Guarantees using Proxies.
Proc. VLDB Endow., 2020

Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores.
Proc. VLDB Endow., 2020

Task-agnostic Indexes for Deep Learning-based Queries over Unstructured Data.
CoRR, 2020

Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads.
CoRR, 2020

Jointly Optimizing Preprocessing and Inference for DNN-based Visual Analytics.
CoRR, 2020

DBOS: A Proposal for a Data-Centric Operating System.
CoRR, 2020

Relevance-guided Supervision for OpenQA with ColBERT.
CoRR, 2020

Similarity Search for Efficient Active Learning and Search of Rare Concepts.
CoRR, 2020

Overlook: Differentially Private Exploratory Visualization for Big Data.
CoRR, 2020

Sparse GPU Kernels for Deep Learning.
CoRR, 2020

Memory-Efficient Pipeline-Parallel DNN Training.
CoRR, 2020

Offload Annotations: Bringing Heterogeneous Computing to Existing Libraries and Workloads.
Proceedings of the 2020 USENIX Annual Technical Conference, 2020

Spectral Lower Bounds on the I/O Complexity of Computation Graphs.
Proceedings of the SPAA '20: 32nd ACM Symposium on Parallelism in Algorithms and Architectures, 2020

Developments in MLflow: A System to Accelerate the Machine Learning Lifecycle.
Proceedings of the Fourth Workshop on Data Management for End-To-End Machine Learning, 2020

ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT.
Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, 2020

FrugalML: How to use ML Prediction APIs more accurately and cheaply.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020


Willump: A Statistically-Aware End-to-end Optimizer for Machine Learning Inference.
Proceedings of Machine Learning and Systems 2020, 2020

Model Assertions for Monitoring and Improving ML Models.
Proceedings of Machine Learning and Systems 2020, 2020

Improving the Accuracy, Scalability, and Performance of Graph Neural Networks with Roc.
Proceedings of Machine Learning and Systems 2020, 2020

Selection via Proxy: Efficient Data Selection for Deep Learning.
Proceedings of the 8th International Conference on Learning Representations, 2020

PPMLP 2020: Workshop on Privacy-Preserving Machine Learning In Practice.
Proceedings of the CCS '20: 2020 ACM SIGSAC Conference on Computer and Communications Security, 2020

Fleet: A Framework for Massively Parallel Streaming on FPGAs.
Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020

2019
Outsourcing Everyday Jobs to Thousands of Cloud Functions with gg.
login Usenix Mag., 2019

Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark.
ACM SIGOPS Oper. Syst. Rev., 2019

BlazeIt: Optimizing Declarative Aggregation and Limit Queries for Neural Network-Based Video Analytics.
Proc. VLDB Endow., 2019

ObliDB: Oblivious Query Processing for Secure Databases.
Proc. VLDB Endow., 2019

Express: Lowering the Cost of Metadata-hiding Communication with Cryptographic Privacy.
CoRR, 2019

MLPerf Training Benchmark.
CoRR, 2019

Automated Lower Bounds on the I/O Complexity of Computation Graphs.
CoRR, 2019

SysML: The New Frontier of Machine Learning Systems.
CoRR, 2019

From Laptop to Lambda: Outsourcing Everyday Jobs to Thousands of Transient Functional Containers.
Proceedings of the 2019 USENIX Annual Technical Conference, 2019

Optimizing data-intensive computations in existing libraries with split annotations.
Proceedings of the 27th ACM Symposium on Operating Systems Principles, 2019

PipeDream: generalized pipeline parallelism for DNN training.
Proceedings of the 27th ACM Symposium on Operating Systems Principles, 2019

TASO: optimizing deep learning computation with automatic generation of graph substitutions.
Proceedings of the 27th ACM Symposium on Operating Systems Principles, 2019

Beyond Data and Model Parallelism for Deep Neural Networks.
Proceedings of Machine Learning and Systems 2019, 2019

Optimizing DNN Computation with Relaxed Graph Substitutions.
Proceedings of Machine Learning and Systems 2019, 2019

LIT: Learned Intermediate Representation Training for Model Compression.
Proceedings of the 36th International Conference on Machine Learning, 2019

To Index or Not to Index: Optimizing Exact Maximum Inner Product Search.
Proceedings of the 35th IEEE International Conference on Data Engineering, 2019

Lessons from Large-Scale Software as a Service at Databricks.
Proceedings of the ACM Symposium on Cloud Computing, SoCC 2019, 2019

Challenges and Opportunities in DNN-Based Video Analytics: A Demonstration of the BlazeIt Video Query Engine.
Proceedings of the CIDR 2019, 2019

2018
Big Data Platforms for Data Analytics.
Proceedings of the Encyclopedia of Database Systems, Second Edition, 2018

Evaluating End-to-End Optimization for Data Analytics Applications in Weld.
Proc. VLDB Endow., 2018

Filter Before You Parse: Faster Analytics on Raw Data with Sparser.
Proc. VLDB Endow., 2018

DIFF: A Relational Interface for Large-Scale Data Explanation.
Proc. VLDB Endow., 2018

Accelerating the Machine Learning Lifecycle with MLflow.
IEEE Data Eng. Bull., 2018

Splitability Annotations: Optimizing Black-Box Function Composition in Existing Libraries.
CoRR, 2018

LIT: Block-wise Intermediate Representation Training for Model Compression.
CoRR, 2018

BlazeIt: Fast Exploratory Video Queries using Neural Networks.
CoRR, 2018

MISTIQUE: A System to Store and Query Model Intermediates for Model Diagnosis.
Proceedings of the 2018 International Conference on Management of Data, 2018

Structured Streaming: A Declarative API for Real-Time Applications in Apache Spark.
Proceedings of the 2018 International Conference on Management of Data, 2018

2017
NoScope: Optimizing Deep CNN-Based Queries over Video Streams at Scale.
Proc. VLDB Endow., 2017

An Oblivious General-Purpose SQL Database for the Cloud.
CoRR, 2017

Weld: Rethinking the Interface Between Data-Intensive Applications.
CoRR, 2017

Optimizing Deep CNN-Based Queries over Video Streams at Scale.
CoRR, 2017

Infrastructure for Usable Machine Learning: The Stanford DAWN Project.
CoRR, 2017

SimDex: Exploiting Model Similarity in Exact Matrix Factorization Recommendations.
CoRR, 2017

Stadium: A Distributed Metadata-Private Messaging System.
Proceedings of the 26th Symposium on Operating Systems Principles, 2017

DIY Hosting for Online Privacy.
Proceedings of the 16th ACM Workshop on Hot Topics in Networks, Palo Alto, CA, USA, 2017

A Common Runtime for High Performance Data Analysis.
Proceedings of the CIDR 2017, 2017

Making caches work for graph analytics.
Proceedings of the 2017 IEEE International Conference on Big Data, BigData 2017, 2017

2016
Voodoo - A Vector Algebra for Portable Database Performance on Modern Hardware.
Proc. VLDB Endow., 2016

MLlib: Machine Learning in Apache Spark.
J. Mach. Learn. Res., 2016

Splinter: Practical Private Queries on Public Data.
IACR Cryptol. ePrint Arch., 2016

Stadium: A Distributed Metadata-Private Messaging System.
IACR Cryptol. ePrint Arch., 2016

Optimizing Cache Performance for Graph Analytics.
CoRR, 2016

Apache Spark: a unified engine for big data processing.
Commun. ACM, 2016

SparkR: Scaling R Programs with Spark.
Proceedings of the 2016 International Conference on Management of Data, 2016

ModelDB: a system for machine learning model management.
Proceedings of the Workshop on Human-In-the-Loop Data Analytics, 2016

Introduction to Spark 2.0 for Database Researchers.
Proceedings of the 2016 International Conference on Management of Data, 2016

FairRide: Near-Optimal, Fair Cache Sharing.
Proceedings of the 13th USENIX Symposium on Networked Systems Design and Implementation, 2016

Yggdrasil: An Optimized System for Training Deep Decision Trees at Scale.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Matrix Computations and Optimization in Apache Spark.
Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016

GraphFrames: an integrated API for mixing graph and relational queries.
Proceedings of the Fourth International Workshop on Graph Data Management Experiences and Systems, Redwood Shores, CA, USA, June 24, 2016

2015
Scaling Spark in the Real World: Performance and Usability.
Proc. VLDB Endow., 2015

linalg: Matrix Computations in Apache Spark.
CoRR, 2015

Vuvuzela: scalable private messaging resistant to traffic analysis.
Proceedings of the 25th Symposium on Operating Systems Principles, 2015

Spark SQL: Relational Data Processing in Spark.
Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31, 2015

2014
Tachyon: Reliable, Memory Speed Storage for Cluster Computing Frameworks.
Proceedings of the ACM Symposium on Cloud Computing, 2014

2013
An Architecture for and Fast and General Data Processing on Large Clusters.
PhD thesis, 2013

Large-Scale Estimation in Cyberphysical Systems Using Streaming Data: A Case Study With Arterial Traffic Estimation.
IEEE Trans Autom. Sci. Eng., 2013

Discretized streams: fault-tolerant streaming computation at scale.
Proceedings of the ACM SIGOPS 24th Symposium on Operating Systems Principles, 2013

Sparrow: distributed, low latency scheduling.
Proceedings of the ACM SIGOPS 24th Symposium on Operating Systems Principles, 2013

Shark: SQL and rich analytics at scale.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2013

Choosy: max-min fair sharing for datacenter jobs with constraints.
Proceedings of the Eighth Eurosys Conference 2013, 2013

2012
Fast and Interactive Analytics over Hadoop Data with Spark.
login Usenix Mag., 2012

Large Scale Estimation in Cyberphysical Systems using Streaming Data: a Case Study with Smartphone Traces
CoRR, 2012

Cloud Terminal: Secure Access to Sensitive Applications from Untrusted Systems.
Proceedings of the 2012 USENIX Annual Technical Conference, 2012

Shark: fast data analysis using coarse-grained distributed memory.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2012

Multi-resource fair queueing for packet processing.
Proceedings of the ACM SIGCOMM 2012 Conference, 2012

Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing.
Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation, 2012

Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters.
Proceedings of the 4th USENIX Workshop on Hot Topics in Cloud Computing, 2012

Optimally Designing Games for Cognitive Science Research.
Proceedings of the 34th Annual Meeting of the Cognitive Science Society, 2012

2011
Mesos: Flexible Resource Sharing for the Cloud.
login Usenix Mag., 2011

Faster and More Accurate Sequence Alignment with SNAP
CoRR, 2011

Design and implementation of the KioskNet system.
Comput. Networks, 2011

Managing data transfers in computer clusters with orchestra.
Proceedings of the ACM SIGCOMM 2011 Conference on Applications, 2011

Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center.
Proceedings of the 8th USENIX Symposium on Networked Systems Design and Implementation, 2011

Dominant Resource Fairness: Fair Allocation of Multiple Resource Types.
Proceedings of the 8th USENIX Symposium on Networked Systems Design and Implementation, 2011

The Datacenter Needs an Operating System.
Proceedings of the 3rd USENIX Workshop on Hot Topics in Cloud Computing, 2011

Scaling the mobile millennium system in the cloud.
Proceedings of the ACM Symposium on Cloud Computing in conjunction with SOSP 2011, 2011

2010
A view of cloud computing.
Commun. ACM, 2010

Spark: Cluster Computing with Working Sets.
Proceedings of the 2nd USENIX Workshop on Hot Topics in Cloud Computing, 2010

Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling.
Proceedings of the European Conference on Computer Systems, 2010

2009
ICTD for healthcare in Ghana: Two parallel case studies.
Proceedings of the 2009 International Conference on Information and Communication Technologies and Development, 2009

A Common Substrate for Cluster Computing.
Proceedings of the Workshop on Hot Topics in Cloud Computing, 2009

2008
Gossip-based search selection in hybrid peer-to-peer networks.
Concurr. Comput. Pract. Exp., 2008

Improving MapReduce Performance in Heterogeneous Environments.
Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation, 2008

2007
Very low-cost internet access using KioskNet.
Comput. Commun. Rev., 2007

Finding Content in File-Sharing Networks When You Can't Even Spell.
Proceedings of the 6th International workshop on Peer-To-Peer Systems, 2007

Design and implementation of the KioskNet system.
Proceedings of the 2007 International Conference on Information and Communication Technologies and Development, 2007

2006
Low-cost communication for rural internet kiosks using mechanical backhaul.
Proceedings of the 12th Annual International Conference on Mobile Computing and Networking, 2006


  Loading...