Michael J. Cafarella

Orcid: 0000-0001-6122-0590

  • University of Michigan, Ann Arbor, USA

According to our database1, Michael J. Cafarella authored at least 113 papers between 2004 and 2024.

Collaborative distances:



In proceedings 
PhD thesis 


Online presence:

On csauthors.net:


Databases Unbound: Querying All of the World's Bytes with AI.
Proc. VLDB Endow., August, 2024

LucidScript: Bottom-up Standardization for Data Preparation.
Proc. VLDB Endow., August, 2024

Optimizing Video Selection LIMIT Queries With Commonsense Knowledge.
Proc. VLDB Endow., March, 2024

Summarized Causal Explanations For Aggregate Views.
Proc. ACM Manag. Data, February, 2024

MDCR: A Dataset for Multi-Document Conditional Reasoning.
CoRR, 2024

A Declarative System for Optimizing AI Workloads.
CoRR, 2024

Sawmill: From Logs to Causal Diagnosis of Large Systems.
Proceedings of the Companion of the 2024 International Conference on Management of Data, 2024

Digging Up Threats to Validity: A Data Marshalling Approach to Sensitivity Analysis.
Proceedings of the Conference on Governance, 2024

Press ECCS to Doubt (Your Causal Graph).
Proceedings of the Conference on Governance, 2024

Cackle: Analytical Workload Cost and Performance Stability With Elastic Pools.
Proc. ACM Manag. Data, December, 2023

SeeSaw: Interactive Ad-hoc Search Over Image Databases.
Proc. ACM Manag. Data, December, 2023

Causal Data Integration.
Proc. VLDB Endow., 2023

Pando: Enhanced Data Skipping with Logical Data Partitioning.
Proc. VLDB Endow., 2023

R<sup>3</sup>: Record-Replay-Retroaction for Database-Backed Applications.
Proc. VLDB Endow., 2023

PAINE Demo: Optimizing Video Selection Queries With Commonsense Knowledge.
Proc. VLDB Endow., 2023

SEED: Simple, Efficient, and Effective Data Management via Large Language Models.
CoRR, 2023

NEXUS: On Explaining Confounding Bias.
Proceedings of the Companion of the 2023 International Conference on Management of Data, 2023

On Explaining Confounding Bias.
Proceedings of the 39th IEEE International Conference on Data Engineering, 2023

Transactions Make Debugging Easy.
Proceedings of the 13th Conference on Innovative Data Systems Research, 2023

Apiary: A DBMS-Backed Transactional Function-as-a-Service Framework.
CoRR, 2022

Infrastructure for Rapid Open Knowledge Network Development.
AI Mag., 2022

Enabling useful provenance in scripting languages with a human-in-the-loop.
Proceedings of the HILDA@SIGMOD 2022: Proceedings of the Workshop on Human-In-the-Loop Data Analytics, 2022

Controlled Intentional Degradation in Analytical Video Systems.
Proceedings of the SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12, 2022

HILDA'22: The SIGMOD 2022 Workshop on Human-in-the-Loop Data Analytics.
Proceedings of the SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12, 2022

Debugging the OmniTable Way.
Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, 2022

Building a Shared Conceptual Model of Complex, Heterogeneous Data Systems: A Demonstration.
Proceedings of the 12th Conference on Innovative Data Systems Research, 2022

A Progress Report on DBOS: A Database-oriented Operating System.
Proceedings of the 12th Conference on Innovative Data Systems Research, 2022

Replicated Layout for In-Memory Database Systems.
Proc. VLDB Endow., 2021

DBOS: A DBMS-oriented Operating System.
Proc. VLDB Endow., 2021

SkyQuery: An Aerial Drone Video Sensing Platform.
CoRR, 2021

TagMe: GPS-Assisted Automatic Object Annotation in Videos.
CoRR, 2021

Technical Report on Data Integration and Preparation.
CoRR, 2021

Data Governance in a Database Operating System (DBOS).
Proceedings of the Heterogeneous Data Management, Polystores, and Analytics for Healthcare, 2021

DBOS: A Proposal for a Data-Centric Operating System.
CoRR, 2020

Unnatural Language Processing: Bridging the Gap Between Synthetic and Natural Language Data.
CoRR, 2020

A Method for Optimizing Opaque Filter Queries.
Proceedings of the 2020 International Conference on Management of Data, 2020

MIRIS: Fast Object Track Queries in Video.
Proceedings of the 2020 International Conference on Management of Data, 2020

Duoquest: A Dual-Specification System for Expressive SQL Queries.
Proceedings of the 2020 International Conference on Management of Data, 2020

BeeCluster: drone orchestration via predictive optimization.
Proceedings of the MobiSys '20: The 18th Annual International Conference on Mobile Systems, 2020

Towards Data Discovery by Example.
Proceedings of the Heterogeneous Data Management, Polystores, and Analytics for Healthcare, 2020

A Polystore Based Database Operating System (DBOS).
Proceedings of the Heterogeneous Data Management, Polystores, and Analytics for Healthcare, 2020

Constructing Expressive Relational Queries with Dual-Specification Synthesis.
Proceedings of the 10th Conference on Innovative Data Systems Research, 2020

Knowledge Graph Programming with a Human-in-the-Loop: Preliminary Results.
Proceedings of the Workshop on Human-In-the-Loop Data Analytics, 2019

Physical Representation-Based Predicate Optimization for a Visual Analytics Database.
Proceedings of the 35th IEEE International Conference on Data Engineering, 2019

You can't debug what you can't see: Expanding observability with the OmniTable.
Proceedings of the Workshop on Hot Topics in Operating Systems, 2019

CLX: Towards verifiable PBE data transformation.
Proceedings of the Advances in Database Technology, 2019

Demonstration of a Multiresolution Schema Mapping System.
Proceedings of the 9th Biennial Conference on Innovative Data Systems Research, 2019

Context-specific Language Modeling for Human Trafficking Detection from Online Advertisements.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

Ten Years of WebTables.
Proc. VLDB Endow., 2018

Constraint-based Explanation and Repair of Filter-Based Transformations.
Proc. VLDB Endow., 2018

Unifacta: Profiling-driven String Pattern Standardization.
CoRR, 2018

Beaver: Towards a Declarative Schema Mapping.
Proceedings of the Workshop on Human-In-the-Loop Data Analytics, 2018

Sledgehammer: Cluster-Fueled Debugging.
Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation, 2018

DeepDive: declarative knowledge base construction.
Commun. ACM, 2017

Database Learning: Toward a Database that Becomes Smarter Every Time.
Proceedings of the 2017 ACM International Conference on Management of Data, 2017

Foofah: A Programming-By-Example System for Synthesizing Data Transformation Programs.
Proceedings of the 2017 ACM International Conference on Management of Data, 2017

Foofah: Transforming Data By Example.
Proceedings of the 2017 ACM International Conference on Management of Data, 2017

Spreadsheet Property Detection With Rule-assisted Active Learning.
Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, 2017

A Declarative Query Processing System for Nowcasting.
Proc. VLDB Endow., 2016

Runtime Support for Human-in-the-Loop Feature Engineering System.
IEEE Data Eng. Bull., 2016

Long-tail Vocabulary Dictionary Extraction from the Web.
Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, 2016

Extracting Databases from Dark Data with DeepDive.
Proceedings of the 2016 International Conference on Management of Data, 2016

DQBarge: Improving Data-Quality Tradeoffs in Large-Scale Internet Services.
Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, 2016

HARE: Hardware accelerator for regular expressions.
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

HAWK: Hardware support for unstructured log processing.
Proceedings of the 32nd IEEE International Conference on Data Engineering, 2016

Visualization-aware sampling for very large databases.
Proceedings of the 32nd IEEE International Conference on Data Engineering, 2016

Dark Data: Are we solving the right problems?
Proceedings of the 32nd IEEE International Conference on Data Engineering, 2016

A query system for social media signals.
Proceedings of the 32nd IEEE International Conference on Data Engineering, 2016

Input selection for fast feature engineering.
Proceedings of the 32nd IEEE International Conference on Data Engineering, 2016

DBExplorer: Exploratory Search in Databases.
Proceedings of the 19th International Conference on Extending Database Technology, 2016

Mindtagger: A Demonstration of Data Labeling in Knowledge Base Construction.
Proc. VLDB Endow., 2015

Neighbor-Sensitive Hashing.
Proc. VLDB Endow., 2015

Link-Prediction Enhanced Consensus Clustering for Complex Networks.
CoRR, 2015

DiagramFlyer: A Search Engine for Data-Driven Diagrams.
Proceedings of the 24th International Conference on World Wide Web Companion, 2015

Machine Learning and Databases: The Sound of Things to Come or a Cacophony of Hype?
Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31, 2015

Synthesizing Data Programs.
Proceedings of the Seventh Biennial Conference on Innovative Data Systems Research, 2015

An Integrated Development Environment for Faster Feature Engineering.
Proc. VLDB Endow., 2014

Using web corpus statistics for program analysis.
Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, 2014

Integrating spreadsheet data via accurate and low-effort extraction.
Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2014

Reducing MapReduce Abstraction Costs for Text-centric Applications.
Proceedings of the 43rd International Conference on Parallel Processing, 2014

Senbazuru: A Prototype Spreadsheet Database Management System.
Proc. VLDB Endow., 2013

Ringtail: A Generalized Nowcasting System.
Proc. VLDB Endow., 2013

Ringtail: Feature Selection For Easier Nowcasting.
Proceedings of the 16th International Workshop on the Web and Databases 2013, 2013

Automatic web spreadsheet data extraction.
Proceedings of the 3RD International Workshop on Semantic Search over the Web, 2013

Minimizing Remote Accesses in MapReduce Clusters.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

Leveraging Noisy Lists for Social Feed Ranking.
Proceedings of the Seventh International Conference on Weblogs and Social Media, 2013

Brainwash: A Data System for Feature Engineering.
Proceedings of the Sixth Biennial Conference on Innovative Data Systems Research, 2013

Sample-driven schema mapping.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2012

Automatic Optimization for MapReduce Programs.
Proc. VLDB Endow., 2011

Structured data on the web.
Commun. ACM, 2011

Web data management.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2011

Manimal: Relational Optimization for Data-Intensive Programs.
Proceedings of the 13th International Workshop on the Web and Databases 2010, 2010

Data Integration for the Relational Web.
Proc. VLDB Endow., 2009

How Best to Build Web-Scale Data Managers? A Panel Discussion.
Proc. VLDB Endow., 2009

Extracting and Querying a Comprehensive Web Database.
Proceedings of the Fourth Biennial Conference on Innovative Data Systems Research, 2009

Ontology-driven, unsupervised instance population.
J. Web Semant., 2008

Web-scale extraction of structured data.
SIGMOD Rec., 2008

Data management projects at Google.
SIGMOD Rec., 2008

WebTables: exploring the power of tables on the web.
Proc. VLDB Endow., 2008

Uncovering the Relational Web.
Proceedings of the 11th International Workshop on the Web and Databases, 2008

Navigating Extracted Data with Schema Discovery.
Proceedings of the Tenth International Workshop on the Web and Databases, 2007

TextRunner: Open Information Extraction on the Web.
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, 2007

Open Information Extraction from the Web.
Proceedings of the IJCAI 2007, 2007

Structured Querying of Web Text Data: A Technical Challenge.
Proceedings of the Third Biennial Conference on Innovative Data Systems Research, 2007

Structured Queries Over Web Text.
IEEE Data Eng. Bull., 2006

Ontology-Driven Information Extraction with OntoSyphon.
Proceedings of the Semantic Web - ISWC 2006, 5th International Semantic Web Conference, 2006

Machine Reading.
Proceedings of the Proceedings, 2006

Unsupervised named-entity extraction from the Web: An experimental study.
Artif. Intell., 2005

A search engine for natural language applications.
Proceedings of the 14th international conference on World Wide Web, 2005

KnowItNow: Fast, Scalable Information Extraction from the Web.
Proceedings of the HLT/EMNLP 2005, 2005

Building Nutch: Open Source Search.
ACM Queue, 2004

Web-scale information extraction in knowitall: (preliminary results).
Proceedings of the 13th international conference on World Wide Web, 2004

Methods for Domain-Independent Information Extraction from the Web: An Experimental Comparison.
Proceedings of the Nineteenth National Conference on Artificial Intelligence, 2004
