Felix Naumann

Hazar Harmouch

Proceedings of the Proceedings 28th International Conference on Extending Database Technology, 2025

PRISMA: A Privacy-Preserving Schema Matcher using Functional Dependencies.

[BibT_eX]

[DOI]

Proceedings of the Proceedings 28th International Conference on Extending Database Technology, 2025

Table Dissolution: Adding Salt To Your Data.

[BibT_eX]

[DOI]

Francesco Pugnaloni

Tassilo Klein

Proceedings of the Workshop on Data Management for End-to-End Machine Learning, 2025

ReCLAIM: An Integrated Platform for Data on Nazi-Looted Cultural Assets.

[BibT_eX]

[DOI]

Carl Friedrich Mecking

Konstantin Sturtzkopf

Proceedings of the Datenbanksysteme für Business, 2025

Data Quality in the Age of AI.

[BibT_eX]

[DOI]

Proceedings of the Advances in Databases and Information Systems, 2025

2024

Incremental Detection of Denial Constraint Violations.

[BibT_eX]

[DOI]

Youri Kaminsky

Proc. VLDB Endow., December, 2024

AutoTSAD: Unsupervised Holistic Anomaly Detection for Time Series Data.

[BibT_eX]

[DOI]

Sebastian Schmidl

Proc. VLDB Endow., July, 2024

Determining the Largest Overlap between Tables.

[BibT_eX]

[DOI]

Proc. ACM Manag. Data, February, 2024

Discovering Functional Dependencies through Hitting Set Enumeration.

[BibT_eX]

[DOI]

Proc. ACM Manag. Data, February, 2024

Enabling Data Dependency-based Query Optimization.

[BibT_eX]

[DOI]

Daniel Lindner

Daniel Ritter

CoRR, 2024

Data Quality Assessment: Challenges and Opportunities.

[BibT_eX]

[DOI]

CoRR, 2024

Overlap-Based Duplicate Table Detection.

[BibT_eX]

[DOI]

Proceedings of the 32nd Symposium of Advanced Database Systems, 2024

Discovering Denial Constraints in Dynamic Datasets.

[BibT_eX]

[DOI]

Fábio Porto

Proceedings of the 40th IEEE International Conference on Data Engineering, 2024

Efficient Discovery of Temporal Inclusion Dependencies in Wikipedia Tables.

[BibT_eX]

[DOI]

Fatemeh Nargesian

Proceedings of the Proceedings 27th International Conference on Extending Database Technology, 2024

TASHEEH: Repairing Row-Structure in Raw CSV Files.

[BibT_eX]

[DOI]

Proceedings of the Proceedings 27th International Conference on Extending Database Technology, 2024

2023

Editorial: Special Issue for Selected Papers of VLDB 2021.

[BibT_eX]

[DOI]

VLDB J., November, 2023

Correction to: Data dependencies for query optimization: a survey.

[BibT_eX]

[DOI]

Jan Kossmann

VLDB J., March, 2023

BrewER: Entity Resolution On-Demand.

[BibT_eX]

[DOI]

Proc. VLDB Endow., 2023

Pollock: A Data Loading Benchmark.

[BibT_eX]

[DOI]

Proc. VLDB Endow., 2023

Discovering Similarity Inclusion Dependencies.

[BibT_eX]

[DOI]

Youri Kaminsky

Proc. ACM Manag. Data, 2023

Matching Roles from Temporal Data: Why Joe Biden is not only President, but also Commander-in-Chief.

[BibT_eX]

[DOI]

Fatemeh Nargesian

Proc. ACM Manag. Data, 2023

Preface QDB.

[BibT_eX]

[DOI]

Proceedings of the Joint Proceedings of Workshops at the 49th International Conference on Very Large Data Bases (VLDB 2023), Vancouver, Canada, August 28, 2023

BCNF* - From Normalized- to Star-Schemas and Back Again.

[BibT_eX]

[DOI]

Proceedings of the Companion of the 2023 International Conference on Management of Data, 2023

Entity Resolution On-Demand for Querying Dirty Datasets.

[BibT_eX]

[DOI]

Proceedings of the 31st Symposium of Advanced Database Systems, 2023

Detecting Stale Data in Wikipedia Infoboxes.

[BibT_eX]

[DOI]

Proceedings of the Proceedings 26th International Conference on Extending Database Technology, 2023

MORPHER: Structural Transformation of Ill-formed Rows.

[BibT_eX]

[DOI]

Mazhar Hameed

Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, 2023

ExtracTable: Extracting Tables from Raw Data Files.

[BibT_eX]

[DOI]

Leonardo Hübscher

Proceedings of the Datenbanksysteme für Business, 2023

2022

Data dependencies for query optimization: a survey.

[BibT_eX]

[DOI]

Jan Kossmann

VLDB J., 2022

Diversity and Inclusion Activities in Database Conferences: A 2021 Report.

[BibT_eX]

[DOI]

SIGMOD Rec., 2022

Entity Resolution On-Demand.

[BibT_eX]

[DOI]

Proc. VLDB Endow., 2022

Fast Algorithms for Denial Constraint Discovery.

[BibT_eX]

[DOI]

Fábio Porto

Proc. VLDB Endow., 2022

Frost: A Platform for Benchmarking and Exploring Data Matching Results.

[BibT_eX]

[DOI]

Roland Gremmelspacher

Fabian Panse

Proc. VLDB Endow., 2022

AI Compliance - Challenges of Bridging Data Science and Law.

[BibT_eX]

[DOI]

ACM J. Data Inf. Qual., 2022

Data Errors: Symptoms, Causes and Origins.

[BibT_eX]

[DOI]

Ihab F. Ilyas

IEEE Data Eng. Bull., 2022

The Effects of Data Quality on ML-Model Performance.

[BibT_eX]

[DOI]

CoRR, 2022

Mondrian: Spreadsheet Layout Detection.

[BibT_eX]

[DOI]

Proceedings of the SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12, 2022

Mining Change Rules.

[BibT_eX]

[DOI]

Proceedings of the 25th International Conference on Extending Database Technology, 2022

SURAGH: Syntactic Pattern Matching to Identify Ill-Formed Records.

[BibT_eX]

[DOI]

Proceedings of the 25th International Conference on Extending Database Technology, 2022

Aggregation Detection in CSV Files.

[BibT_eX]

[DOI]

Proceedings of the 25th International Conference on Extending Database Technology, 2022

Exploring and Analyzing Change: The Janus Project.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Information & Knowledge Management, 2022

Workload-driven, Lazy Discovery of Data Dependencies for Query Optimization.

[BibT_eX]

[DOI]

Proceedings of the 12th Conference on Innovative Data Systems Research, 2022

2021

Discovering Relaxed Functional Dependencies Based on Multi-Attribute Dominance.

[BibT_eX]

[DOI]

IEEE Trans. Knowl. Data Eng., 2021

VLDB 2021: Designing a Hybrid Conference.

[BibT_eX]

[DOI]

SIGMOD Rec., 2021

How Inclusive are We?

[BibT_eX]

[DOI]

SIGMOD Rec., 2021

Detecting Layout Templates in Complex Multiregion Files.

[BibT_eX]

[DOI]

Proc. VLDB Endow., 2021

Fast Detection of Denial Constraint Violations.

[BibT_eX]

[DOI]

Eduardo Cunha de Almeida

Proc. VLDB Endow., 2021

Knowledge Transfer for Entity Resolution with Siamese Neural Networks.

[BibT_eX]

[DOI]

Michael Loster

ACM J. Data Inf. Qual., 2021

Ein Data Engineering Kurs für 10.000 Teilnehmer.

[BibT_eX]

[DOI]

Datenbank-Spektrum, 2021

Frost: Benchmarking and Exploring Data Matching Results.

[BibT_eX]

[DOI]

Roland Gremmelspacher

Fabian Panse

CoRR, 2021

Few-Shot Knowledge Validation using Rules.

[BibT_eX]

[DOI]

Proceedings of the WWW '21: The Web Conference 2021, 2021

The Secret Life of Wikipedia Tables.

[BibT_eX]

[DOI]

Proceedings of the 2nd Workshop on Search, 2021

Evaluation of Duplicate Detection Algorithms: From Quality Measures to Test Data Generation.

[BibT_eX]

[DOI]

Fabian Panse

Proceedings of the 37th IEEE International Conference on Data Engineering, 2021

Relational Header Discovery using Similarity Search in a Table Corpus.

[BibT_eX]

[DOI]

Hazar Harmouch

Proceedings of the 37th IEEE International Conference on Data Engineering, 2021

Discovering Relaxed Functional Dependencies based on Multi-attribute Dominance [Extended Abstract].

[BibT_eX]

[DOI]

Proceedings of the 37th IEEE International Conference on Data Engineering, 2021

Structured Object Matching across Web Page Revisions.

[BibT_eX]

[DOI]

Proceedings of the 37th IEEE International Conference on Data Engineering, 2021

Structure Detection in Verbose CSV Files.

[BibT_eX]

[DOI]

Proceedings of the 24th International Conference on Extending Database Technology, 2021

2020

RHEEMix in the data jungle: a cost-based optimizer for cross-platform systems.

[BibT_eX]

[DOI]

Zoi Kaoudi

Bertty Contreras-Rojas

Sanjay Chawla

VLDB J., 2020

Efficient Discovery of Matching Dependencies.

[BibT_eX]

[DOI]

Philipp Schirmer

ACM Trans. Database Syst., 2020

Data Preparation: A Survey of Commercial Tools.

[BibT_eX]

[DOI]

Mazhar Hameed

SIGMOD Rec., 2020

MDedup: Duplicate Detection with Matching Dependencies.

[BibT_eX]

[DOI]

Proc. VLDB Endow., 2020

Front Matter.

[BibT_eX]

[DOI]

Proc. VLDB Endow., 2020

Hitting Set Enumeration with Partial Information for Unique Column Combination Discovery.

[BibT_eX]

[DOI]

Proc. VLDB Endow., 2020

Holistic primary key and foreign key detection.

[BibT_eX]

[DOI]

J. Intell. Inf. Syst., 2020

Data Preparation for Duplicate Detection.

[BibT_eX]

[DOI]

ACM J. Data Inf. Qual., 2020

Transforming Pairwise Duplicates to Entity Clusters for High-quality Duplicate Detection.

[BibT_eX]

[DOI]

Uwe Draisbach

Peter Christen

ACM J. Data Inf. Qual., 2020

Explainable AI under contract and tort law: legal incentives and technical challenges.

[BibT_eX]

[DOI]

Artif. Intell. Law, 2020

Natural Key Discovery in Wikipedia Tables.

[BibT_eX]

[DOI]

Proceedings of the WWW '20: The Web Conference 2020, Taipei, Taiwan, April 20-24, 2020, 2020

Data Profiling in the Relational World (invited paper).

[BibT_eX]

[DOI]

Proceedings of the Joint Proceedings of Workshops AI4LEGAL2020, 2020

Sense Tree: Discovery of New Word Senses with Graph-based Scoring.

[BibT_eX]

[DOI]

Proceedings of the Conference "Lernen, 2020

Discovering Biased News Articles Leveraging Multiple Human Annotations.

[BibT_eX]

[DOI]

Alexander Löser

Maria Mestre

Proceedings of The 12th Language Resources and Evaluation Conference, 2020

Efficient Detection of Data Dependency Violations.

[BibT_eX]

[DOI]

Edson Ramiro Lucas Filho

Eduardo C. de Almeida

Proceedings of the CIKM '20: The 29th ACM International Conference on Information and Knowledge Management, 2020

2019

Discovery of Approximate (and Exact) Denial Constraints.

[BibT_eX]

[DOI]

Eduardo C. de Almeida

Proc. VLDB Endow., 2019

Editorial.

[BibT_eX]

[DOI]

Datenbank-Spektrum, 2019

Exploring Change.

[BibT_eX]

[DOI]

Theodore Johnson

Proceedings of the 27th Italian Symposium on Advanced Database Systems, 2019

A Scoring-based Approach for Data Preparator Suggestion.

[BibT_eX]

[DOI]

Saravanan Thirumuruganathan

Proceedings of the Conference on "Lernen, Wissen, Daten, Analysen", Berlin, Germany, September 30, 2019

Discovery of Genuine Functional Dependencies from Relational Data with Missing Values.

[BibT_eX]

[DOI]

Proceedings of the Actes du XXXVIIème Congrès INFORSID, Paris, France, June 11-14, 2019., 2019

Optimizing Cross-Platform Data Movement.

[BibT_eX]

[DOI]

Zoi Kaoudi

Sanjay Chawla

Bertty Contreras-Rojas

Proceedings of the 35th IEEE International Conference on Data Engineering, 2019

DynFD: Functional Dependency Discovery in Dynamic Datasets.

[BibT_eX]

[DOI]

Daniel Neuschäfer-Rube

Proceedings of the Advances in Database Technology, 2019

Inclusion Dependency Discovery: An Experimental Evaluation of Thirteen Algorithms.

[BibT_eX]

[DOI]

Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 2019

DBChEx: Interactive Exploration of Data and Schema Change.

[BibT_eX]

[DOI]

Proceedings of the 9th Biennial Conference on Innovative Data Systems Research, 2019

The relational database management systems genealogy.

[BibT_eX]

[DOI]

Proceedings of the Making Databases Work: the Pragmatic Wisdom of Michael Stonebraker, 2019

2018

Data Profiling

[BibT_eX]

[DOI]

Synthesis Lectures on Data Management, Morgan & Claypool Publishers, ISBN: 978-3-031-01865-7, 2018

Exploring Change - A New Dimension of Data Analytics.

[BibT_eX]

[DOI]

Theodore Johnson

Proc. VLDB Endow., 2018

Efficient Discovery of Approximate Dependencies.

[BibT_eX]

[DOI]

Proc. VLDB Endow., 2018

Experience: Enhancing Address Matching with Geocoding and Similarity Measure Selection.

[BibT_eX]

[DOI]

Axel Kroschk

Clifford Mosley

ACM J. Data Inf. Qual., 2018

Data Change Exploration Using Time Series Clustering.

[BibT_eX]

[DOI]

Datenbank-Spektrum, 2018

RHEEMix in the Data Jungle - A Cross-Platform Query Optimizer -.

[BibT_eX]

[DOI]

Zoi Kaoudi

Sanjay Chawla

Bertty Contreras

CoRR, 2018

Where in the World Is Carmen Sandiego?: Detecting Person Locations via Social Media Discussions.

[BibT_eX]

[DOI]

Proceedings of the 10th ACM Conference on Web Science, 2018

The Challenges of Creating, Maintaining and Exploring Graphs of Financial Entities.

[BibT_eX]

[DOI]

Proceedings of the Fourth International Workshop on Data Science for Macro-Modeling with Financial and Economic Datasets, 2018

Towards Progressive Search-driven Entity Resolution.

[BibT_eX]

[DOI]

Proceedings of the 26th Italian Symposium on Advanced Database Systems, 2018

Dissecting Company Names using Sequence Labeling.

[BibT_eX]

[DOI]

Proceedings of the Conference "Lernen, Wissen, Daten, Analysen", 2018

Piggyback Profiling: Enhancing Query Results with Metadata.

[BibT_eX]

[DOI]

Proceedings of the Conference "Lernen, Wissen, Daten, Analysen", 2018

CurEx: A System for Extracting, Curating, and Exploring Domain-Specific Knowledge Graphs from Text.

[BibT_eX]

[DOI]

Proceedings of the 27th ACM International Conference on Information and Knowledge Management, 2018

2017

Detecting Inclusion Dependencies on Very Many Tables.

[BibT_eX]

[DOI]

Fabian Tschirschnitz

ACM Trans. Database Syst., 2017

Data Quality: The Role of Empiricism.

[BibT_eX]

[DOI]

SIGMOD Rec., 2017

Cardinality Estimation: An Experimental Survey.

[BibT_eX]

[DOI]

Hazar Harmouch

Proc. VLDB Endow., 2017

Efficient Denial Constraint Discovery with Hydra.

[BibT_eX]

[DOI]

Proc. VLDB Endow., 2017

Das Fachgebiet "Informationssysteme" am Hasso-Plattner-Institut.

[BibT_eX]

[DOI]

Datenbank-Spektrum, 2017

What was Hillary Clinton doing in Katy, Texas?

[BibT_eX]

[DOI]

Proceedings of the 26th International Conference on World Wide Web Companion, 2017

Enabling Change Exploration: Vision Paper.

[BibT_eX]

[DOI]

Theodore Johnson

Vladislav Shkapenyuk

Proceedings of the ExploreDB'17, Chicago, IL, USA, May 19, 2017, 2017

Data Profiling: A Tutorial.

[BibT_eX]

[DOI]

Lukasz Golab

Proceedings of the 2017 ACM International Conference on Management of Data, 2017

Uncovering Business Relationships: Context-sensitive Relationship Extraction for Difficult Relationship Types.

[BibT_eX]

[DOI]

Proceedings of the Lernen, 2017

Identifying Media Bias by Analyzing Reported Speech.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Data Mining, 2017

Data-driven Schema Normalization.

[BibT_eX]

[DOI]

Proceedings of the 20th International Conference on Extending Database Technology, 2017

Improving Company Recognition from Unstructured Text by using Dictionaries.

[BibT_eX]

[DOI]

Proceedings of the 20th International Conference on Extending Database Technology, 2017

Metacrate: Organize and Analyze Millions of Data Profiles.

[BibT_eX]

[DOI]

Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, 2017

A Hybrid Approach for Efficient Unique Column Combination Discovery.

[BibT_eX]

[DOI]

Proceedings of the Datenbanksysteme für Business, 2017

Fast Approximate Discovery of Inclusion Dependencies.

[BibT_eX]

[DOI]

Proceedings of the Datenbanksysteme für Business, 2017

2016

CohEEL: Coherent and efficient named entity linking through random walks.

[BibT_eX]

[DOI]

J. Web Semant., 2016

Efficient order dependency detection.

[BibT_eX]

[DOI]

Philipp Langer

VLDB J., 2016

The Information Systems Group at HPI.

[BibT_eX]

[DOI]

SIGMOD Rec., 2016

Data Anamnesis: Admitting Raw Data into an Organization.

[BibT_eX]

[DOI]

IEEE Data Eng. Bull., 2016

Which Answer is Best?: Predicting Accepted Answers in MOOC Forums.

[BibT_eX]

[DOI]

Maximilian Jenders

Proceedings of the 25th International Conference on World Wide Web, 2016

A Hybrid Approach to Functional Dependency Discovery.

[BibT_eX]

[DOI]

Proceedings of the 2016 International Conference on Management of Data, 2016

RDFind: Scalable Conditional Inclusion Dependency Discovery in RDF Datasets.

[BibT_eX]

[DOI]

Proceedings of the 2016 International Conference on Management of Data, 2016

Topic Shifts in StackOverflow: Ask it Like Socrates.

[BibT_eX]

[DOI]

Proceedings of the Natural Language Processing and Information Systems, 2016

Cluster-Based Sorted Neighborhood for Efficient Duplicate Detection.

[BibT_eX]

[DOI]

Ahmad Samiei

Proceedings of the IEEE International Conference on Data Mining Workshops, 2016

Data profiling.

[BibT_eX]

[DOI]

Lukasz Golab

Proceedings of the 32nd IEEE International Conference on Data Engineering, 2016

Holistic Data Profiling: Simultaneous Discovery of Various Metadata.

[BibT_eX]

[DOI]

Proceedings of the 19th International Conference on Extending Database Technology, 2016

Combination of Rule-based and Textual Similarity Approaches to Match Financial Entities.

[BibT_eX]

[DOI]

Ahmad Samiei

Michael Loster

Proceedings of the Second International Workshop on Data Science for Macro-Modeling, 2016

Approximate Discovery of Functional Dependencies for Large Datasets.

[BibT_eX]

[DOI]

Proceedings of the 25th ACM International Conference on Information and Knowledge Management, 2016

2015

Profiling relational data: a survey.

[BibT_eX]

[DOI]

Lukasz Golab

VLDB J., 2015

Progressive Duplicate Detection.

[BibT_eX]

[DOI]

IEEE Trans. Knowl. Data Eng., 2015

Divide & Conquer-based Inclusion Dependency Discovery.

[BibT_eX]

[DOI]

Proc. VLDB Endow., 2015

Functional Dependency Discovery: An Experimental Evaluation of Seven Algorithms.

[BibT_eX]

[DOI]

Proc. VLDB Endow., 2015

Data Profiling with Metanome.

[BibT_eX]

[DOI]

Proc. VLDB Endow., 2015

Front Matter.

[BibT_eX]

[DOI]

Proc. VLDB Endow., 2015

Editorial.

[BibT_eX]

[DOI]

ACM J. Data Inf. Qual., 2015

SOFA: An extensible logical optimizer for UDF-heavy data flows.

[BibT_eX]

[DOI]

Inf. Syst., 2015

Who wants a computer to be a millionaire?

[BibT_eX]

[DOI]

Saeedeh Momtazi

Inf. Process. Lett., 2015

Uniqueness, Density, and Keyness: Exploring Class Hierarchies.

[BibT_eX]

[DOI]

Anja Jentzsch

Hannes Mühleisen

Proceedings of the 6th International Workshop on Consuming Linked Data (COLD 2015) co-located with 14th International Semantic Web Conference (ISWC 2015), 2015

Exploring Linked Data Graph Structures.

[BibT_eX]

[DOI]

Proceedings of the ISWC 2015 Posters & Demonstrations Track co-located with the 14th International Semantic Web Conference (ISWC-2015), 2015

A Serendipity Model for News Recommendation.

[BibT_eX]

[DOI]

Proceedings of the KI 2015: Advances in Artificial Intelligence, 2015

Estimating Data Integration and Cleaning Effort.

[BibT_eX]

[DOI]

Paolo Papotti

Proceedings of the 18th International Conference on Extending Database Technology, 2015

Scaling Out the Discovery of Inclusion Dependencies.

[BibT_eX]

[DOI]

Proceedings of the Datenbanksysteme für Business, 2015

2014

The Stratosphere platform for big data analytics.

[BibT_eX]

[DOI]

Alexander Alexandrov

Rico Bergmann

Stephan Ewen

Johann-Christoph Freytag

VLDB J., 2014

Reach for gold: An annealing standard to evaluate duplicate detection results.

[BibT_eX]

[DOI]

ACM J. Data Inf. Qual., 2014

Ein Datenbankkurs mit 6000 Teilnehmern - Erfahrungen auf der openHPI MOOC Plattform.

[BibT_eX]

[DOI]

Maximilian Jenders

Inform. Spektrum, 2014

Semi-Supervised Consensus Clustering: Reducing Human Effort.

[BibT_eX]

[DOI]

Tobias Vogel

Proceedings of the 2014 IEEE International Conference on Data Mining Workshops, 2014

Bootstrapping Wikipedia to answer ambiguous person name queries.

[BibT_eX]

[DOI]

Proceedings of the Workshops Proceedings of the 30th International Conference on Data Engineering Workshops, 2014

Detecting unique column combinations on dynamic data.

[BibT_eX]

[DOI]

Proceedings of the IEEE 30th International Conference on Data Engineering, Chicago, 2014

Profiling and mining RDF data with ProLOD++.

[BibT_eX]

[DOI]

Proceedings of the IEEE 30th International Conference on Data Engineering, Chicago, 2014

LODOP - Multi-Query Optimization for Linked Data Profiling Queries.

[BibT_eX]

[DOI]

Benedikt Forchhammer

Anja Jentzsch

Proceedings of the 1st International Workshop on Dataset PROFIling & fEderated Search for Linked Data co-located with the 11th Extended Semantic Web Conference, 2014

Amending RDF Entities with New Facts.

[BibT_eX]

[DOI]

Proceedings of the Semantic Web: ESWC 2014 Satellite Events, 2014

BEL: Bagging for Entity Linking.

[BibT_eX]

[DOI]

Proceedings of the COLING 2014, 2014

Estimating the Number and Sizes of Fuzzy-Duplicate Clusters.

[BibT_eX]

[DOI]

Gjergji Kasneci

Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, 2014

DFD: Efficient Functional Dependency Discovery.

[BibT_eX]

[DOI]

Patrick Schulze

Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, 2014

2013

Topic modeling for expert finding using latent Dirichlet allocation.

[BibT_eX]

[DOI]

Saeedeh Momtazi

WIREs Data Mining Knowl. Discov., 2013

Data profiling revisited.

[BibT_eX]

[DOI]

SIGMOD Rec., 2013

Scalable Discovery of Unique Column Combinations.

[BibT_eX]

[DOI]

Anja Jentzsch

Proc. VLDB Endow., 2013

Fusion Cubes: Towards Self-Service Business Intelligence.

[BibT_eX]

[DOI]

Int. J. Data Warehous. Min., 2013

Cross-lingual entity matching and infobox alignment in Wikipedia.

[BibT_eX]

[DOI]

Daniel Rinser

Inf. Syst., 2013

Cost-aware query planning for similarity search.

[BibT_eX]

[DOI]

Inf. Syst., 2013

Improving RDF Data Through Association Rule Mining.

[BibT_eX]

[DOI]

Datenbank-Spektrum, 2013

SOFA: An Extensible Logical Optimizer for UDF-heavy Dataflows.

[BibT_eX]

[DOI]

CoRR, 2013

Bootstrapped Grouping of Results to Ambiguous Person Name Queries.

[BibT_eX]

[DOI]

CoRR, 2013

Analyzing and predicting viral tweets.

[BibT_eX]

[DOI]

Maximilian Jenders

Gjergji Kasneci

Proceedings of the 22nd International World Wide Web Conference, 2013

Bulk sorted access for efficient top-k retrieval.

[BibT_eX]

[DOI]

Proceedings of the Conference on Scientific and Statistical Database Management, 2013

On choosing thresholds for duplicate detection.

[BibT_eX]

Uwe Draisbach

Proceedings of the 18th International Conference on Information Quality, 2013

Systematic ETL management - Experiences with high-level operators.

[BibT_eX]

Proceedings of the 18th International Conference on Information Quality, 2013

Caching and Prefetching Strategies for SPARQL Queries.

[BibT_eX]

[DOI]

Proceedings of the Semantic Web: ESWC 2013 Satellite Events, 2013

Detecting SPARQL Query Templates for Data Prefetching.

[BibT_eX]

[DOI]

Proceedings of the Semantic Web: Semantics and Big Data, 10th International Conference, 2013

Synonym Analysis for Predicate Expansion.

[BibT_eX]

[DOI]

Proceedings of the Semantic Web: Semantics and Big Data, 10th International Conference, 2013

Duplicate Detection on GPUs.

[BibT_eX]

[DOI]

Proceedings of the Datenbanksysteme für Business, 2013

2012

Integrating open government data with stratosphere for more transparency.

[BibT_eX]

[DOI]

J. Web Semant., 2012

Scalable Iterative Graph Duplicate Detection.

[BibT_eX]

[DOI]

IEEE Trans. Knowl. Data Eng., 2012

The data analytics group at the qatar computing research institute.

[BibT_eX]

[DOI]

Nan Tang

SIGMOD Rec., 2012

Holistic and Scalable Ontology Alignment for Linked Open Data.

[BibT_eX]

[DOI]

Proceedings of the WWW2012 Workshop on Linked Data on the Web, 2012

GovWILD: integrating open government data for transparency.

[BibT_eX]

[DOI]

Peter Haase

Michael Schmidt

Proceedings of the 21st World Wide Web Conference, 2012

Efficient Similarity Search in Very Large String Sets.

[BibT_eX]

[DOI]

Proceedings of the Scientific and Statistical Database Management, 2012

The Quality of Web Data.

[BibT_eX]

Proceedings of the 17th International Conference on Information Quality, 2012

Adaptive Windows for Duplicate Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE 28th International Conference on Data Engineering (ICDE 2012), 2012

Scalable peer-to-peer-based RDF management.

[BibT_eX]

[DOI]

Daniel Hefenbrock

Proceedings of the I-SEMANTICS 2012 - 8th International Conference on Semantic Systems, 2012

Schema Decryption for Large Extract-Transform-Load Systems.

[BibT_eX]

[DOI]

Proceedings of the Conceptual Modeling, 2012

LINDA: distributed web-of-data-scale entity matching.

[BibT_eX]

[DOI]

Proceedings of the 21st ACM International Conference on Information and Knowledge Management, 2012

Latent topics in graph-structured data.

[BibT_eX]

[DOI]

Gjergji Kasneci

Proceedings of the 21st ACM International Conference on Information and Knowledge Management, 2012

Discovering conditional inclusion dependencies.

[BibT_eX]

[DOI]

Proceedings of the 21st ACM International Conference on Information and Knowledge Management, 2012

Reconciling ontologies and the web of data.

[BibT_eX]

[DOI]

Proceedings of the 21st ACM International Conference on Information and Knowledge Management, 2012

2011

Creating voiD descriptions for Web-scale data.

[BibT_eX]

[DOI]

J. Web Semant., 2011

Eliminating NULLs with Subsumption and Complementation.

[BibT_eX]

[DOI]

Melanie Herschel

IEEE Data Eng. Bull., 2011

Projektseminar "Similarity Search Algorithms".

[BibT_eX]

[DOI]

Datenbank-Spektrum, 2011

Kurz erklärt: Datenfusion.

[BibT_eX]

[DOI]

Datenbank-Spektrum, 2011

Instance-Based 'One-to-Some' Assignment of Similarity Measures to Attributes - (Short Paper).

[BibT_eX]

[DOI]

Tobias Vogel

Proceedings of the On the Move to Meaningful Internet Systems: OTM 2011, 2011

A generalization of blocking and windowing algorithms for duplicate detection.

[BibT_eX]

[DOI]

Uwe Draisbach

Proceedings of the 2011 International Conference on Data and Knowledge Engineering, 2011

Dr. Crowdsource: or how i learned to stop worrying and love web data.

[BibT_eX]

[DOI]

Proceedings of the 2nd International Workshop on Business intelligencE and the WEB, 2011

SPRINT: ranking search results by paths.

[BibT_eX]

[DOI]

Proceedings of the EDBT 2011, 2011

Extreme web data integration.

[BibT_eX]

[DOI]

Proceedings of the 4th workshop on Workshop for Ph.D. students in information & knowledge management, 2011

Black swan: augmenting statistics with event data.

[BibT_eX]

[DOI]

Armin Zamani Farahani

Robert Christoph Peschel

Stephan Richter

Thomas Stening

Sven Viehmeier

Proceedings of the 20th ACM Conference on Information and Knowledge Management, 2011

Efficient similarity search: arbitrary similarity measures, arbitrary composition.

[BibT_eX]

[DOI]

Proceedings of the 20th ACM Conference on Information and Knowledge Management, 2011

Frequency-aware similarity measures: why Arnold Schwarzenegger is always a duplicate.

[BibT_eX]

[DOI]

Proceedings of the 20th ACM Conference on Information and Knowledge Management, 2011

Advancing the discovery of unique column combinations.

[BibT_eX]

[DOI]

Proceedings of the 20th ACM Conference on Information and Knowledge Management, 2011

Improving Service Discovery through Enriched Service Descriptions.

[BibT_eX]

[DOI]

Proceedings of the Datenbanksysteme für Business, 2011

2010

An Introduction to Duplicate Detection

[BibT_eX]

[DOI]

Melanie Herschel

Synthesis Lectures on Data Management, Morgan & Claypool Publishers, ISBN: 978-3-031-01835-0, 2010

13th international workshop on the web and databases: WebDB 2010.

[BibT_eX]

[DOI]

SIGMOD Rec., 2010

Graph-based concept identification and disambiguation for enterprise search.

[BibT_eX]

[DOI]

Wojciech M. Barczynski

Proceedings of the 19th International Conference on World Wide Web, 2010

ECIR - A Lightweight Approach for Entity-Centric Information Retrieval.

[BibT_eX]

[DOI]

Wojciech M. Barczynski

Falk Brauer

Proceedings of The Nineteenth Text REtrieval Conference, 2010

Towards a diamond SOA operational model.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Service-Oriented Computing and Applications, 2010

Collecting, Annotating, and Classifying Public Web Services.

[BibT_eX]

[DOI]

Mircea Craculeac

Proceedings of the On the Move to Meaningful Internet Systems: OTM 2010, 2010

Profiling linked open data with ProLOD.

[BibT_eX]

[DOI]

Proceedings of the Workshops Proceedings of the 26th International Conference on Data Engineering, 2010

Complement union for data integration.

[BibT_eX]

[DOI]

Proceedings of the Workshops Proceedings of the 26th International Conference on Data Engineering, 2010

Linking open government data: what journalists wish they had known.

[BibT_eX]

[DOI]

Proceedings of the Proceedings the 6th International Conference on Semantic Systems, 2010

Towards Granular Data Placement Strategies for Cloud Platforms.

[BibT_eX]

[DOI]

Proceedings of the 2010 IEEE International Conference on Granular Computing, 2010

Subsumption and complementation as data fusion operators.

[BibT_eX]

[DOI]

Proceedings of the EDBT 2010, 2010

Dynamic tags for dynamic data web services.

[BibT_eX]

[DOI]

Proceedings of the 5th Workshop on Emerging Web Services Technology, 2010

Extracting structured information from Wikipedia articles to populate infoboxes.

[BibT_eX]

[DOI]

Proceedings of the 19th ACM Conference on Information and Knowledge Management, 2010

2009

Data fusion - Resolving Data Conflicts for Integration.

[BibT_eX]

[DOI]

Proc. VLDB Endow., 2009

Guest Editorial for the Special Issue on Data Quality in Databases.

[BibT_eX]

[DOI]

Louiqa Raschid

ACM J. Data Inf. Qual., 2009

A Machine Learning Approach to Foreign Key Discovery.

[BibT_eX]

[DOI]

Proceedings of the 12th International Workshop on the Web and Databases, 2009

METL: Managing and Integrating ETL Processes.

[BibT_eX]

[DOI]

Proceedings of the VLDB 2009 PhD Workshop. Co-located with the 35th International Conference on Very Large Data Bases (VLDB 2009). Lyon, 2009

Encapsulating Multi-stepped Web Forms as Web Services.

[BibT_eX]

[DOI]

Tobias Vogel

Frank Kaufer

Proceedings of the Service-Oriented Computing. ICSOC/ServiceWave 2009 Workshops, 2009

Information Quality.

[BibT_eX]

[DOI]

Mary Roth

Proceedings of the Database Technologies: Concepts, 2009

2008

Industry-scale duplicate detection.

[BibT_eX]

[DOI]

Proc. VLDB Endow., 2008

A research agenda for query processing in large-scale peer data management systems.

[BibT_eX]

[DOI]

Inf. Syst., 2008

Data fusion.

[BibT_eX]

[DOI]

ACM Comput. Surv., 2008

Managing ETL Processes.

[BibT_eX]

[DOI]

Proceedings of the International Workshop on New Trends in Information Integration, 2008

Scaling up duplicate detection in graph data.

[BibT_eX]

[DOI]

Melanie Herschel

Proceedings of the 17th ACM Conference on Information and Knowledge Management, 2008

2007

Datenqualität.

[BibT_eX]

[DOI]

Inform. Spektrum, 2007

Peer-Daten-Management-Systems - PDMS (Kurz erklärt).

[BibT_eX]

Datenbank-Spektrum, 2007

FuSem - Exploring Different Semantics of Data Fusion.

[BibT_eX]

[DOI]

Karsten Draba

Proceedings of the 33rd International Conference on Very Large Data Bases, 2007

Networked PIM Using PDMS.

[BibT_eX]

[DOI]

Proceedings of the Third International Workshop on Networking Meets Databases, 2007

Rule-Based Measurement Of Data Quality In Nominal Data.

[BibT_eX]

[DOI]

Proceedings of the 12th International Conference on Information Quality, 2007

Emergent Data Quality Annotation And Visualization.

[BibT_eX]

[DOI]

Paul Führing

Proceedings of the 12th International Conference on Information Quality, 2007

Efficiently Detecting Inclusion Dependencies.

[BibT_eX]

[DOI]

Proceedings of the 23rd International Conference on Data Engineering, 2007

System P: Completeness-driven Query Answering in Peer Data Management Systems.

[BibT_eX]

[DOI]

Proceedings of the Datenbanksysteme in Business, 2007

Schema- und Metadatenmanagement in Peer Data Management Systemen.

[BibT_eX]

Proceedings of the Datenbanksysteme in Business, 2007

A Classification of Schema Mappings and Analysis of Mapping Tools.

[BibT_eX]

[DOI]

Frank Legler

Proceedings of the Datenbanksysteme in Business, 2007

Informationsintegration - Architekturen und Methoden zur Integration verteilter und heterogener Datenquellen.

[BibT_eX]

dpunkt.verlag, 2007

2006

Data Fusion in Three Steps: Resolving Schema, Tuple, and Value Inconsistencies.

[BibT_eX]

[DOI]

IEEE Data Eng. Bull., 2006

Detecting Duplicates in Complex XML Data.

[BibT_eX]

[DOI]

Proceedings of the 22nd International Conference on Data Engineering, 2006

XStruct: Efficient Schema Extraction from Multiple and Large XML Documents.

[BibT_eX]

[DOI]

Jan Hegewald

Proceedings of the 22nd International Conference on Data Engineering Workshops, 2006

Efficiently Computing Inclusion Dependencies for Schema Discovery.

[BibT_eX]

[DOI]

Jana Bauckmann

Proceedings of the 22nd International Conference on Data Engineering Workshops, 2006

XML Duplicate Detection Using Sorted Neighborhoods.

[BibT_eX]

[DOI]

Sven Puhlmann

Proceedings of the Advances in Database Technology, 2006

Query Planning in the Presence of Overlapping Sources.

[BibT_eX]

[DOI]

Proceedings of the Advances in Database Technology, 2006

Assessing the Completeness of Sensor Data.

[BibT_eX]

[DOI]

Jit Biswas

Qiang Qiu

Proceedings of the Database Systems for Advanced Applications, 2006

Informationsintegration: Architekturen und Methoden zur Integration verteilter und heterogener Datenquellen.

[BibT_eX]

[DOI]

dpunkt, ISBN: 3-89864-400-6, 2006

2005

Ein Data-Quality-Wettbewerb.

[BibT_eX]

Michael Mielke

Heiko Müller

Datenbank-Spektrum, 2005

A Data Model and Query Language to Explore Enhanced Links and Paths in Life Science Sources.

[BibT_eX]

Proceedings of the Eight International Workshop on the Web & Databases (WebDB 2005), 2005

Automatic Data Fusion with HumMer.

[BibT_eX]

[DOI]

Proceedings of the 31st International Conference on Very Large Data Bases, Trondheim, Norway, August 30, 2005

DogmatiX Tracks down Duplicates in XML.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGMOD International Conference on Management of Data, 2005

Clio: A Schema Mapping Tool for Information Integration.

[BibT_eX]

[DOI]

Lucian Popa

Howard Ho

Proceedings of the 8th International Symposium on Parallel Architectures, 2005

Schema Matching using Duplicates.

[BibT_eX]

[DOI]

Alexander Bilke

Proceedings of the 21st International Conference on Data Engineering, 2005

Benefit and Cost of Query Answering in PDMS.

[BibT_eX]

[DOI]

Proceedings of the Databases, 2005

(Almost) Hands-Off Information Integration for the Life Sciences.

[BibT_eX]

[DOI]

Proceedings of the Second Biennial Conference on Innovative Data Systems Research, 2005

Self-Extending Peer Data Management.

[BibT_eX]

[DOI]

Proceedings of the Datenbanksysteme in Business, 2005

Declarative Data Fusion - Syntax, Semantics, and Implementation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Databases and Information Systems, 2005

2004

BioFast: Challenges in Exploring Linked Life Science Sources.

[BibT_eX]

[DOI]

SIGMOD Rec., 2004

Completeness of integrated information sources.

[BibT_eX]

[DOI]

Johann Christoph Freytag

Inf. Syst., 2004

Eine Übung zur Vorlesung Informationsintegration.

[BibT_eX]

Datenbank-Spektrum, 2004

Detecting Duplicate Objects in XML Documents.

[BibT_eX]

[DOI]

Proceedings of the IQIS 2004, 2004

Information Quality: How Good Are Off-The-Shelf DBMS?

[BibT_eX]

Mary Roth

Proceedings of the Ninth International Conference on Information Quality (ICIQ 2004), 2004

Qualitäts- und Semantik-gesteuerte Anfragebearbeitung für Peer-basierte Datenmanagementsysteme (PDMS).

[BibT_eX]

[DOI]

Proceedings of the 34. Jahrestagung der Gesellschaft für Informatik, 2004

FUSE BY: Syntax und Semantik zur Informationsfusion in SQL.

[BibT_eX]

[DOI]

Proceedings of the 34. Jahrestagung der Gesellschaft für Informatik, 2004

Links and Paths through Life Sciences Data Sources.

[BibT_eX]

[DOI]

Proceedings of the Data Integration in the Life Sciences, First International Workshop, 2004

Labeling and Enhancing Life Sciences Links.

[BibT_eX]

[DOI]

Proceedings of the 3rd International IEEE Computer Society Computational Systems Bioinformatics Conference, 2004

2003

Qualitätsgesteuerte Anfragebearbeitung für Integrierte Informationssysteme.

[BibT_eX]

[DOI]

it Inf. Technol., 2003

Data Quality in Genome Databases.

[BibT_eX]

Heiko Müller

Proceedings of the Eighth International Conference on Information Quality (ICIQ 2003), 2003

Exploring Life Sciences Data Sources.

[BibT_eX]

[DOI]

Proceedings of IJCAI-03 Workshop on Information Integration on the Web (IIWeb-03), 2003

Super-Fast XML Wrapper Generation in DB2: A Demonstration.

[BibT_eX]

[DOI]

Vanja Josifovski

Sabine Massmann

Proceedings of the 19th International Conference on Data Engineering, 2003

Semantic Overlay Clusters within Super-Peer Networks.

[BibT_eX]

[DOI]

Proceedings of the Databases, 2003

2002

Schema Management.

[BibT_eX]

[DOI]

Anastasios Kementsietsidis

C. T. Howard Ho

IEEE Data Eng. Bull., 2002

Declarative Data Merging with Conflict Resolution.

[BibT_eX]

Matthias Häussler

Proceedings of the Seventh International Conference on Information Quality (ICIQ 2002), 2002

Attribute Classification Using Feature Analysis.

[BibT_eX]

[DOI]

Proceedings of the 18th International Conference on Data Engineering, San Jose, CA, USA, February 26, 2002

Mapping XML and Relational Schemas with Clio.

[BibT_eX]

[DOI]

Proceedings of the 18th International Conference on Data Engineering, San Jose, CA, USA, February 26, 2002

Quality-Driven Query Answering for Integrated Information Systems

[BibT_eX]

[DOI]

Lecture Notes in Computer Science 2261, Springer, ISBN: 3-540-43349-X, 2002

2001

From Databases to Information Systems - Information Quality Makes the Difference.

[BibT_eX]

Proceedings of the Sixth Conference on Information Quality (IQ 2001), 2001

2000

Assessment Methods for Information Quality Criteria.

[BibT_eX]

Claudia Rolker

Proceedings of the Fifth Conference on Information Quality (IQ 2000), 2000

Qualitätsgesteuerte Anfragebearbeitung für Integrierte Informationssysteme.

[BibT_eX]

[DOI]

Proceedings of the Ausgezeichnete Informatikdissertationen 2000, 2000

Query Planning with Information Quality Bounds.

[BibT_eX]

[DOI]

Proceedings of the Flexible Query Answering Systems, 2000

Quality-driven Query Planning.

[BibT_eX]

[DOI]

Proceedings of the 7th EDBT 2000 PhD Workshop, March 31 - April 1, 2000. Konstanz, Germany, 2000

1999

Quality-driven Integration of Heterogenous Information Systems.

[BibT_eX]

[DOI]

Johann Christoph Freytag

Proceedings of the VLDB'99, 1999

Do Metadata Models meet IQ Requirements?

[BibT_eX]

Claudia Rolker

Proceedings of the Fourth Conference on Information Quality (IQ 1999), 1999

Density Scores for Cooperative Query Answering.

[BibT_eX]

[DOI]

Proceedings of the 4. Workshop Föderierte Datenbanken, 1999

1998

Quality Driven Source Selection Using Data Envelope Analysis.

[BibT_eX]