# Christopher Ré

Affiliations:
• Stanford University, Department of Computer Science
• University of Washington, Seattle, Washington, USA

According to our database1, Christopher Ré authored at least 281 papers between 2002 and 2022.

Collaborative distances:

Book
In proceedings
Article
PhD thesis
Other

## Bibliography

2022
Contrastive Adapters for Foundation Model Group Robustness.
CoRR, 2022

How to Train Your HiPPO: State Space Models with Generalized Orthogonal Basis Projections.
CoRR, 2022

On the Parameterization and Initialization of Diagonal State Space Models.
CoRR, 2022

The Importance of Background Information for Out of Distribution Generalization.
CoRR, 2022

Fine-tuning Language Models over Slow Networks using Activation Compression with Guarantees.
CoRR, 2022

Decentralized Training of Foundation Models in Heterogeneous Environments.
CoRR, 2022

Comparing interpretation methods in mental state decoding analyses with deep learning models.
CoRR, 2022

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness.
CoRR, 2022

Can Foundation Models Help Us Achieve Perfect Secrecy?
CoRR, 2022

Can Foundation Models Wrangle Your Data?
CoRR, 2022

Domino: Discovering Systematic Errors with Cross-Modal Embeddings.
CoRR, 2022

Shoring Up the Foundations: Fusing Model Embeddings and Weak Supervision.
CoRR, 2022

Reasoning over Public and Private Data in Retrieval-Based Systems.
CoRR, 2022

SKM-TEA: A Dataset for Accelerated MRI Reconstruction with Dense Image Labels for Quantitative Clinical Evaluation.
CoRR, 2022

BARACK: Partially Supervised Group Robustness With Guarantees.
CoRR, 2022

Declarative machine learning systems.
Commun. ACM, 2022

Is Data Management the Beating Heart of AI Systems?
Proceedings of the SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12, 2022

The DB Community vis-à-vis Environmental, Health, and Societal Grand Challenges: Innovation Engine, Plumber, or Bystander?
Proceedings of the SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12, 2022

Correct-N-Contrast: a Contrastive Approach for Improving Robustness to Spurious Correlations.
Proceedings of the International Conference on Machine Learning, 2022

It's Raw! Audio Generation with State-Space Models.
Proceedings of the International Conference on Machine Learning, 2022

Monarch: Expressive Structured Matrices for Efficient and Accurate Training.
Proceedings of the International Conference on Machine Learning, 2022

Perfectly Balanced: Improving Transfer and Robustness of Supervised Contrastive Learning.
Proceedings of the International Conference on Machine Learning, 2022

dcbench: a benchmark for data-centric AI systems.
Proceedings of the DEEM '22: Proceedings of the Sixth Workshop on Data Management for End-To-End Machine Learning Philadelphia, 2022

Data Management Opportunities for Foundation Models.
Proceedings of the 12th Conference on Innovative Data Systems Research, 2022

TABi: Type-Aware Bi-Encoders for Open-Domain Entity Retrieval.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

Metadata Shaping: A Simple Approach for Knowledge-Enhanced Language Models.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

2021
Declarative Machine Learning Systems: The future of machine learning will depend on it being in the hands of the rest of us.
ACM Queue, 2021

Ember: No-Code Context Enrichment via Similarity-Based Keyless Joins.
Proc. VLDB Endow., 2021

Comparison of segmentation-free and segmentation-dependent computer-aided diagnosis of breast masses on a public mammography dataset.
J. Biomed. Informatics, 2021

ML-In-Databases: Assessment and Prognosis.
IEEE Data Eng. Bull., 2021

Pixelated Butterfly: Simple and Efficient Sparse training for Neural Network Models.
CoRR, 2021

Efficiently Modeling Long Sequences with Structured State Spaces.
CoRR, 2021

Scatterbrain: Unifying Sparse and Low-rank Attention Approximation.
CoRR, 2021

Metadata Shaping: Natural Language Annotations for the Tail.
CoRR, 2021

Noise2Recon: A Semi-Supervised Framework for Joint MRI Reconstruction and Denoising.
CoRR, 2021

Challenges for cognitive decoding using deep learning methods.
CoRR, 2021

Robustness Gym: Unifying the NLP Evaluation Landscape.
CoRR, 2021

Rethinking Neural Operations for Diverse Tasks.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Personalized Benchmarking with the Ludwig Benchmarking Toolkit.
Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

Combining Recurrent, Convolutional, and Continuous-time Models with Linear State Space Layers.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

SKM-TEA: A Dataset for Accelerated MRI Reconstruction with Dense Image Labels for Quantitative Clinical Evaluation.
Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

Scatterbrain: Unifying Sparse and Low-rank Attention.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Robustness Gym: Unifying the NLP Evaluation Landscape.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations, 2021

Goodwill Hunting: Analyzing and Repurposing Off-the-Shelf Named Entity Linking Systems.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers, 2021

PipeMare: Asynchronous Pipeline Parallel DNN Training.
Proceedings of Machine Learning and Systems 2021, 2021

Observational Supervision for Medical Image Classification Using Gaze Data.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2021 - 24th International Conference, Strasbourg, France, September 27, 2021

Catformer: Designing Stable Transformers via Sensitivity Analysis.
Proceedings of the 38th International Conference on Machine Learning, 2021

Mandoline: Model Evaluation under Distribution Shift.
Proceedings of the 38th International Conference on Machine Learning, 2021

HoroPCA: Hyperbolic Dimensionality Reduction via Horospherical Projections.
Proceedings of the 38th International Conference on Machine Learning, 2021

Cut out the annotator, keep the cutout: better segmentation with weak supervision.
Proceedings of the 9th International Conference on Learning Representations, 2021

Model Patching: Closing the Subgroup Performance Gap with Data Augmentation.
Proceedings of the 9th International Conference on Learning Representations, 2021

MONGOOSE: A Learnable LSH Framework for Efficient Neural Network Training.
Proceedings of the 9th International Conference on Learning Representations, 2021

Cross-Domain Data Integration for Named Entity Disambiguation in Biomedical Text.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, 2021

Bootleg: Chasing the Tail with Self-Supervised Named Entity Disambiguation.
Proceedings of the 11th Conference on Innovative Data Systems Research, 2021

Comparing the Value of Labeled and Unlabeled Data in Method-of-Moments Latent Variable Estimation.
Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

2020
Creating Hardware Component Knowledge Bases with Training Data Generation and Multi-task Learning.
ACM Trans. Embed. Comput. Syst., 2020

Leveraging Organizational Resources to Adapt Models to New Data Modalities.
Proc. VLDB Endow., 2020

Cross-Modal Data Programming Enables Rapid Medical Machine Learning.
Patterns, 2020

Classifying non-small cell lung cancer types and transcriptomic subtypes using convolutional neural networks.
J. Am. Medical Informatics Assoc., 2020

Sharp Bias-variance Tradeoffs of Hard Parameter Sharing in High-dimensional Linear Regression.
CoRR, 2020

GRIP: A Graph Neural Network Accelerator Architecture.
CoRR, 2020

Train and You'll Miss It: Interactive Model Iteration with Weak Supervision and Pre-Trained Embeddings.
CoRR, 2020

Machine Learning on Graphs: A Model and Comprehensive Taxonomy.
CoRR, 2020

Assessing Robustness to Noise: Low-Cost Head CT Triage.
CoRR, 2020

Extracting chemical reactions from text using Snorkel.
BMC Bioinform., 2020

No Subclass Left Behind: Fine-Grained Robustness in Coarse-Grained Classification Problems.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

HiPPO: Recurrent Memory with Optimal Polynomial Projections.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

From Trees to Continuous Embeddings and Back: Hyperbolic Hierarchical Clustering.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Understanding the Downstream Instability of Word Embeddings.
Proceedings of Machine Learning and Systems 2020, 2020

On the Generalization Effects of Linear Transformations in Data Augmentation.
Proceedings of the 37th International Conference on Machine Learning, 2020

Fast and Three-rious: Speeding Up Weak Supervision with Triplet Methods.
Proceedings of the 37th International Conference on Machine Learning, 2020

Kaleidoscope: An Efficient, Learnable Representation For All Structured Linear Maps.
Proceedings of the 8th International Conference on Learning Representations, 2020

Understanding and Improving Information Transfer in Multi-Task Learning.
Proceedings of the 8th International Conference on Learning Representations, 2020

Sparse Recovery for Orthogonal Polynomial Transforms.
Proceedings of the 47th International Colloquium on Automata, Languages, and Programming, 2020

Overton: A Data System for Monitoring and Improving Machine-Learned Products.
Proceedings of the 10th Conference on Innovative Data Systems Research, 2020

Hidden stratification causes clinically meaningful failures in machine learning for medical imaging.
Proceedings of the ACM CHIL '20: ACM Conference on Health, 2020

Ivy: Instrumental Variable Synthesis for Causal Inference.
Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

Low-Dimensional Hyperbolic Knowledge Graph Embeddings.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Contextual Embeddings: When Are They Worth It?
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

2019
Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark.
ACM SIGOPS Oper. Syst. Rev., 2019

The Seattle Report on Database Research.
SIGMOD Rec., 2019

Rekall: Specifying Video Events using Compositions of Spatiotemporal Labels.
CoRR, 2019

Slice-based Learning: A Programming Model for Residual Learning in Critical Data Slices.
CoRR, 2019

Overton: A Data System for Monitoring and Improving Machine-Learned Products.
CoRR, 2019

Low-Memory Neural Network Training: A Technical Report.
CoRR, 2019

Medical device surveillance with electronic health records.
CoRR, 2019

Osprey: Weak Supervision of Imbalanced Extraction Problems without Code.
Proceedings of the 3rd International Workshop on Data Management for End-to-End Machine Learning, 2019

Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale.
Proceedings of the 2019 International Conference on Management of Data, 2019

Multi-Resolution Weak Supervision for Sequential Data.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

On the Downstream Performance of Compressed Word Embeddings.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Slice-based Learning: A Programming Model for Residual Learning in Critical Data Slices.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Hyperbolic Graph Convolutional Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Doubly Weak Supervision of Deep Learning Models for Head CT.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2019, 2019

Automating the generation of hardware component knowledge bases.
Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, 2019

Utilizing Weak Supervision to Infer Complex Objects and Situations in Autonomous Driving Data.
Proceedings of the 2019 IEEE Intelligent Vehicles Symposium, 2019

Learning Dependency Structures for Weak Supervision Models.
Proceedings of the 36th International Conference on Machine Learning, 2019

A Kernel Theory of Modern Data Augmentation.
Proceedings of the 36th International Conference on Machine Learning, 2019

Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations.
Proceedings of the 36th International Conference on Machine Learning, 2019

Learning Mixed-Curvature Representations in Product Spaces.
Proceedings of the 7th International Conference on Learning Representations, 2019

A Formal Framework for Probabilistic Unclean Databases.
Proceedings of the 22nd International Conference on Database Theory, 2019

Scene Graph Prediction With Limited Labels.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

The Role of Massively Multi-Task and Weak Supervision in Software 2.0.
Proceedings of the 9th Biennial Conference on Innovative Data Systems Research, 2019

Classifying Non-Small Cell Lung Cancer Histopathology Types and Transcriptomic Subtypes using Convolutional Neural Networks.
Proceedings of the AMIA 2019, 2019

Low-Precision Random Fourier Features for Memory-constrained Kernel Approximation.
Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, 2019

Training Complex Models with Multi-Task Weak Supervision.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
A Relational Framework for Classifier Engineering.
SIGMOD Rec., 2018

Knowledge Base Construction in the Machine-learning Era.
ACM Queue, 2018

Snuba: Automating Weak Supervision to Label Training Data.
Proc. VLDB Endow., 2018

It's All a Matter of Degree - Using Degree Information to Optimize Multiway Joins.
Theory Comput. Syst., 2018

Worst-case Optimal Join Algorithms.
J. ACM, 2018

Hypertree Decompositions Revisited for PGMs.
CoRR, 2018

High-Accuracy Low-Precision Training.
CoRR, 2018

Research for practice: knowledge base construction in the machine-learning era.
Commun. ACM, 2018

A Two-pronged Progress in Structured Dense Matrix Vector Multiplication.
Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, 2018

Exploring the Utility of Developer Exhaust.
Proceedings of the Second Workshop on Data Management for End-To-End Machine Learning, 2018

Snorkel MeTaL: Weak Supervision for Multi-Task Learning.
Proceedings of the Second Workshop on Data Management for End-To-End Machine Learning, 2018

Fonduer: Knowledge Base Construction from Richly Formatted Data.
Proceedings of the 2018 International Conference on Management of Data, 2018

Machine learning and deep analytics for biocomputing: Call for better explainability.
Proceedings of the Biocomputing 2018: Proceedings of the Pacific Symposium, 2018

Learning Compressed Transforms with Low Displacement Rank.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Software 2.0 and Snorkel: Beyond Hand-Labeled Data.
Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018

Proceedings of the 35th International Conference on Machine Learning, 2018

Learning Invariance with Compact Transforms.
Proceedings of the 6th International Conference on Learning Representations, 2018

Proceedings of the 34th IEEE International Conference on Data Engineering, 2018

Unraveling the Molecular Basis of Lung Adenocarcinoma Dedifferentiation and Prognosis by Integrating Omics and Histopathology.
Proceedings of the AMIA 2018, 2018

Accelerated Stochastic Power Iteration.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2018

Training Classifiers with Natural Language Explanations.
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018

2017
Incremental knowledge base construction using DeepDive.
VLDB J., 2017

EmptyHeaded: A Relational Engine for Graph Processing.
ACM Trans. Database Syst., 2017

Report from the third workshop on Algorithms and Systems for MapReduce and Beyond (BeyondMR'16).
SIGMOD Rec., 2017

HoloClean: Holistic Data Repairs with Probabilistic Inference.
Proc. VLDB Endow., 2017

Snorkel: Rapid Training Data Creation with Weak Supervision.
Proc. VLDB Endow., 2017

Proc. VLDB Endow., 2017

Weighted SGD for $\ell_p$ Regression with Randomized Preconditioning.
J. Mach. Learn. Res., 2017

LevelHeaded: Making Worst-Case Optimal Joins Work in the Common Case.
CoRR, 2017

YellowFin and the Art of Momentum Tuning.
CoRR, 2017

SwellShark: A Generative Model for Biomedical Named Entity Recognition without Labeled Data.
CoRR, 2017

Infrastructure for Usable Machine Learning: The Stanford DAWN Project.
CoRR, 2017

DeepDive: declarative knowledge base construction.
Commun. ACM, 2017

Snorkel: Beyond Hand-labeled Data.
Proceedings of the Thirty-Third Conference on Uncertainty in Artificial Intelligence, 2017

Flipper: A Systematic Approach to Debugging Training Sets.
Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics, 2017

SLiMFast: Guaranteed Results for Data Fusion and Source Reliability.
Proceedings of the 2017 ACM International Conference on Management of Data, 2017

Snorkel: Fast Training Set Generation for Information Extraction.
Proceedings of the 2017 ACM International Conference on Management of Data, 2017

Inferring Generative Model Structure with Static Analysis.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Learning to Compose Domain-Specific Transformations for Data Augmentation.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

ShortFuse: Biomedical Time Series Representations in the Presence of Structured Information.
Proceedings of the Machine Learning for Health Care Conference, 2017

Understanding and Optimizing Asynchronous Low-Precision Stochastic Gradient Descent.
Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

Learning the Structure of Generative Models without Labeled Data.
Proceedings of the 34th International Conference on Machine Learning, 2017

GYM: A Multiround Distributed Join Algorithm.
Proceedings of the 20th International Conference on Database Theory, 2017

Snorkel: A System for Lightweight Extraction.
Proceedings of the 8th Biennial Conference on Innovative Data Systems Research, 2017

Predicting Non-Small Cell Lung Cancer Diagnosis and Prognosis by Fully Automated Microscopic Pathology Image Features.
Proceedings of the AMIA 2017, 2017

2016
Materialization Optimizations for Feature Selection Workloads.
ACM Trans. Database Syst., 2016

Joins via Geometric Resolutions: Worst Case and Beyond.
ACM Trans. Database Syst., 2016

DeepDive: Declarative Knowledge Base Construction.
SIGMOD Rec., 2016

Parallel SGD: When does averaging help?
CoRR, 2016

Socratic Learning.
CoRR, 2016

CoRR, 2016

Omnivore: An Optimizer for Multi-device Deep Learning on CPUs and GPUs.
CoRR, 2016

Recurrence Width for Structured Dense Matrix Vector Multiplication.
CoRR, 2016

Large-scale extraction of gene interactions from full-text literature using DeepDive.
Bioinform., 2016

Weighted SGD for <i>ℓ<sub>p</sub></i> Regression with Randomized Preconditioning.
Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, 2016

Extracting Databases from Dark Data with DeepDive.
Proceedings of the 2016 International Conference on Management of Data, 2016

Data programming with DDLite: putting humans in a different part of the loop.
Proceedings of the Workshop on Human-In-the-Loop Data Analytics, 2016

EmptyHeaded: A Relational Engine for Graph Processing.
Proceedings of the 2016 International Conference on Management of Data, 2016

AJAR: Aggregations and Joins over Annotated Relations.
Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, 2016

Sub-sampled Newton Methods with Non-uniform Sampling.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Data Programming: Creating Large Training Sets, Quickly.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Scan Order in Gibbs Sampling: Models in Which it Matters and Bounds on How Much.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

High Performance Parallel Stochastic Gradient Descent in Shared Memory.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Wikipedia Knowledge Graph with DeepDive.
Proceedings of the Wiki, 2016

Ensuring Rapid Mixing and Low Bias for Asynchronous Gibbs Sampling.
Proceedings of the 33nd International Conference on Machine Learning, 2016

Dark Data: Are we solving the right problems?
Proceedings of the 32nd IEEE International Conference on Data Engineering, 2016

Old techniques for new join algorithms: A case study in RDF processing.
Proceedings of the 32nd IEEE International Conference on Data Engineering Workshops, 2016

Asynchrony begets momentum, with an application to deep learning.
Proceedings of the 54th Annual Allerton Conference on Communication, 2016

2015
Incremental Knowledge Base Construction Using DeepDive.
Proc. VLDB Endow., 2015

Mindtagger: A Demonstration of Data Labeling in Knowledge Base Construction.
Proc. VLDB Endow., 2015

An asynchronous parallel stochastic coordinate descent algorithm.
J. Mach. Learn. Res., 2015

The mobilize center: an NIH big data to knowledge center to advance human movement research and improve mobility.
J. Am. Medical Informatics Assoc., 2015

Building a Large-scale Multimodal Knowledge Base for Visual Question Answering.
CoRR, 2015

Incremental Knowledge Base Construction Using DeepDive.
CoRR, 2015

Exploiting Features for Data Source Quality Estimation.
CoRR, 2015

Aggregations over Generalized Hypertree Decompositions.
CoRR, 2015

EmptyHeaded: Boolean Algebra Based Graph Processing.
CoRR, 2015

Energy-Efficient Abundant-Data Computing: The N3XT 1, 000x.
Computer, 2015

DunceCap: Query Plans Using Generalized Hypertree Decompositions.
Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31, 2015

Machine Learning and Databases: The Sound of Things to Come or a Cacophony of Hype?
Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31, 2015

DunceCap: Compiling Worst-Case Optimal Query Plans.
Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31, 2015

Join Processing for Graph Patterns: An Old Dog with New Tricks.
Proceedings of the Third International Workshop on Graph Data Management Experiences and Systems, 2015

Exploiting Correlations for Expensive Predicate Evaluation.
Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31, 2015

Caffe con Troll: Shallow Ideas to Speed Up Deep Learning.
Proceedings of the Fourth Workshop on Data analytics in the Cloud, 2015

Taming the Wild: A Unified Analysis of Hogwild-Style Algorithms.
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Rapidly Mixing Gibbs Sampling for a Class of Factor Graphs Using Hierarchy Width.
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Asynchronous stochastic convex optimization: the noise is in the noise and SGD don't care.
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Global Convergence of Stochastic Gradient Descent for Some Non-convex Matrix Problems.
Proceedings of the 32nd International Conference on Machine Learning, 2015

Jedi: A Storage Manager for SIMD-aware, Worst-case Optimal Join Processing.
Proceedings of the Workshops of the EDBT/ICDT 2015 Joint Conference (EDBT/ICDT), 2015

A Database Framework for Classifier Engineering.
Proceedings of the 9th Alberto Mendelzon International Workshop on Foundations of Data Management, Lima, Peru, May 6, 2015

2014
The Beckman Report on Database Research.
SIGMOD Rec., 2014

DimmWitted: A Study of Main-Memory Statistical Analytics.
Proc. VLDB Endow., 2014

Transducing Markov sequences.
J. ACM, 2014

Approximation trade-offs in a Markovian stream warehouse: An empirical study.
Inf. Syst., 2014

Feature Engineering for Knowledge Base Construction.
IEEE Data Eng. Bull., 2014

Global Convergence of Stochastic Gradient Descent for Some Nonconvex Matrix Problems.
CoRR, 2014

A machine-compiled macroevolutionary history of Phanerozoic life.
CoRR, 2014

GYM: A Multiround Join Algorithm In MapReduce.
CoRR, 2014

Tradeoffs in Main-Memory Statistical Analytics from Impala to DimmWitted.
Proceedings of the 2nd International Workshop on In Memory Data Management and Analytics, 2014

Beyond worst-case analysis for joins with minesweeper.
Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, 2014

Parallel Feature Selection Inspired by Group Testing.
Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

Effectively Creating Weakly Labeled Training Examples via Approximate Domain Knowledge.
Proceedings of the Inductive Logic Programming - 24th International Conference, 2014

The Theory of Zeta Graphs with an Application to Random Networks.
Proceedings of the Proc. 17th International Conference on Database Theory (ICDT), 2014

Links between Join Processing and Convex Geometry.
Proceedings of the Proc. 17th International Conference on Database Theory (ICDT), 2014

2013
Probabilistic Web Data Management.
World Wide Web, 2013

Skew strikes back: new developments in the theory of join algorithms.
SIGMOD Rec., 2013

Feature Selection in Enterprise Analytics: A Demonstration using an R-based Data Analytics System.
Proc. VLDB Endow., 2013

Ringtail: A Generalized Nowcasting System.
Proc. VLDB Endow., 2013

Parallel stochastic gradient algorithms for large-scale matrix completion.
Math. Program. Comput., 2013

Towards Instance Optimal Join Algorithms for Data in Indexes
CoRR, 2013

An Approximate, Efficient Solver for LP Rounding.
CoRR, 2013

Hazy: making it easier to build and maintain big-data analytics.
Commun. ACM, 2013

Ringtail: Feature Selection For Easier Nowcasting.
Proceedings of the 16th International Workshop on the Web and Databases 2013, 2013

Bootstrapping Knowledge Base Acceleration.
Proceedings of The Twenty-Second Text REtrieval Conference, 2013

Evaluating Stream Filtering for Entity Profile Updates for TREC 2013.
Proceedings of The Twenty-Second Text REtrieval Conference, 2013

Towards high-throughput gibbs sampling at scale: a study across storage managers.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2013

GeoDeepDive: statistical inference using familiar data-processing languages.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2013

An Approximate, Efficient LP Solver for LP Rounding.
Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013

Brainwash: A Data System for Feature Engineering.
Proceedings of the Sixth Biennial Conference on Innovative Data Systems Research, 2013

A Tutorial on Trained Systems: A New Generation of Data Management Systems?
Proceedings of the Big Data - 29th British National Conference on Databases, 2013

Understanding Tables in Context Using Standard NLP Toolkits.
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, 2013

Using Commonsense Knowledge to Automatically Create (Noisy) Training Examples from Text.
Proceedings of the Statistical Relational Artificial Intelligence, 2013

2012
Understanding cardinality estimation using entropy maximization.
ACM Trans. Database Syst., 2012

Proc. VLDB Endow., 2012

Toward a Noncommutative Arithmetic-geometric Mean Inequality: Conjectures, Case-studies, and Consequences.
Proceedings of the COLT 2012, 2012

Elementary: Large-Scale Knowledge-Base Construction via Machine Learning and Statistical Inference.
Int. J. Semantic Web Inf. Syst., 2012

DeepDive: Web-scale Knowledge-base Construction using Statistical Learning and Inference.
Proceedings of the Second International Workshop on Searching and Integrating New Web Data Sources, 2012

Building an Entity-Centric Stream Filtering Test Collection for TREC 2012.
Proceedings of The Twenty-First Text REtrieval Conference, 2012

Towards a unified architecture for in-RDBMS analytics.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2012

Worst-case optimal join algorithms: [extended abstract].
Proceedings of the 31st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, 2012

Factoring nonnegative matrices with linear programs.
Proceedings of the Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012

Scaling Inference for Markov Logic via Dual Decomposition.
Proceedings of the 12th IEEE International Conference on Data Mining, 2012

Optimizing Statistical Information Extraction Programs over Evolving Text.
Proceedings of the IEEE 28th International Conference on Data Engineering (ICDE 2012), 2012

Big Data versus the Crowd: Looking for Relationships in All the Right Places.
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, July 8-14, 2012, Jeju Island, Korea, 2012

2011
Probabilistic Databases
Synthesis Lectures on Data Management, Morgan & Claypool Publishers, 2011

Tuffy: Scaling up Statistical Inference in Markov Logic Networks using an RDBMS.
Proc. VLDB Endow., 2011

Probabilistic Management of OCR Data using an RDBMS.
Proc. VLDB Endow., 2011

Incrementally maintaining classification using an RDBMS.
Proc. VLDB Endow., 2011

Automatic Optimization for MapReduce Programs.
Proc. VLDB Endow., 2011

Queries and materialized views on probabilistic databases.
J. Comput. Syst. Sci., 2011

Felix: Scaling Inference for Markov Logic with an Operator-based Approach
CoRR, 2011

Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent.
Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

2010
Manimal: Relational Optimization for Data-Intensive Programs.
Proceedings of the 13th International Workshop on the Web and Databases 2010, 2010

Approximation trade-offs in Markovian stream processing: An empirical study.
Proceedings of the 26th International Conference on Data Engineering, 2010

2009
The trichotomy of HAVING queries on a probabilistic database.
VLDB J., 2009

Repeatability & workability evaluation of SIGMOD 2009.
SIGMOD Rec., 2009

Lahar Demonstration: Warehousing Markovian Streams.
Proc. VLDB Endow., 2009

Probabilistic databases: diamonds in the dirt.
Commun. ACM, 2009

Query Containment of Tier-2 Queries over a Probabilistic Database.
Proceedings of the Third VLDB workshop on Management of Uncertain Data (MUD2009) in conjunction with VLDB 2009, 2009

Access Methods for Markovian Streams.
Proceedings of the 25th International Conference on Data Engineering, 2009

Large-Scale Deduplication with Constraints Using Dedupalog.
Proceedings of the 25th International Conference on Data Engineering, 2009

General Database Statistics Using Entropy Maximization.
Proceedings of the Database Programming Languages, 2009

2008
Approximate lineage for probabilistic databases.
Proc. VLDB Endow., 2008

Systems aspects of probabilistic data management.
Proc. VLDB Endow., 2008

Challenges for Event Queries over Markovian Streams.
IEEE Internet Comput., 2008

Managing Probabilistic Data with MystiQ: The Can-Do, the Could-Do, and the Can't-Do.
Proceedings of the Scalable Uncertainty Management, Second International Conference, 2008

Event queries on correlated probabilistic streams.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2008

A demonstration of Cascadia through a digital diary application.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2008

Implementing NOT EXISTS Predicates over a Probabilistic Database.
Proceedings of the International Workshop on Quality in Databases and Management of Uncertain Data, 2008

08421 Working Group: Report of the Probabilistic Databases Benchmarking.
Proceedings of the Uncertainty Management in Information Systems, 12.10. - 17.10.2008, 2008

2007
Managing Uncertainty in Social Networks.
IEEE Data Eng. Bull., 2007

Materialized Views in Probabilistic Databases for Information Exchange and Query Optimization.
Proceedings of the 33rd International Conference on Very Large Data Bases, 2007

Efficient Top-k Query Evaluation on Probabilistic Data.
Proceedings of the 23rd International Conference on Data Engineering, 2007

Efficient Evaluation of.
Proceedings of the Database Programming Languages, 11th International Symposium, 2007

Management of data with uncertainties.
Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, 2007

Structured Querying of Web Text Data: A Technical Challenge.
Proceedings of the Third Biennial Conference on Innovative Data Systems Research, 2007

2006
Query Evaluation on Probabilistic Databases.
IEEE Data Eng. Bull., 2006

A Complete and Efficient Algebraic Compiler for XQuery.
Proceedings of the 22nd International Conference on Data Engineering, 2006

XQuery!: An XML Query Language with Side Effects.
Proceedings of the Current Trends in Database Technology - EDBT 2006, 2006

2005
A Framework for XML-Based Integration of Data, Visualization and Analysis in a Biomedical Domain.
Proceedings of the Database and XML Technologies, 2005

MYSTIQ: a system for finding more answers by using probabilities.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2005

Supporting workflow in a course management system.
Proceedings of the 36th SIGCSE Technical Symposium on Computer Science Education, 2005

2003
WS-Membership - Failure Management in a Web-Services World.
Proceedings of the Twelfth International World Wide Web Conference, 2003

2002
A Collaborative Infrastructure for Scalable and Robust News Delivery.
Proceedings of the 22nd International Conference on Distributed Computing Systems, 2002