Felix Naumann

According to our database1, Felix Naumann authored at least 202 papers between 1998 and 2020.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Other 

Links

Homepages:

On csauthors.net:

Bibliography

2020
MDedup: Duplicate Detection with Matching Dependencies.
PVLDB, 2020

2019
Discovery of Approximate (and Exact) Denial Constraints.
PVLDB, 2019

Editorial.
Datenbank-Spektrum, 2019

Exploring Change.
Proceedings of the 27th Italian Symposium on Advanced Database Systems, 2019

A Scoring-based Approach for Data Preparator Suggestion.
Proceedings of the Conference on "Lernen, Wissen, Daten, Analysen", Berlin, Germany, September 30, 2019

Optimizing Cross-Platform Data Movement.
Proceedings of the 35th IEEE International Conference on Data Engineering, 2019

DynFD: Functional Dependency Discovery in Dynamic Datasets.
Proceedings of the Advances in Database Technology, 2019

Inclusion Dependency Discovery: An Experimental Evaluation of Thirteen Algorithms.
Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 2019

DBChEx: Interactive Exploration of Data and Schema Change.
Proceedings of the CIDR 2019, 2019

The relational database management systems genealogy.
Proceedings of the Making Databases Work: the Pragmatic Wisdom of Michael Stonebraker, 2019

2018
Data Profiling
Synthesis Lectures on Data Management, Morgan & Claypool Publishers, 2018

Exploring Change - A New Dimension of Data Analytics.
PVLDB, 2018

Discovery of Genuine Functional Dependencies from Relational Data with Missing Values.
PVLDB, 2018

Efficient Discovery of Approximate Dependencies.
PVLDB, 2018

Experience: Enhancing Address Matching with Geocoding and Similarity Measure Selection.
J. Data and Information Quality, 2018

Data Change Exploration Using Time Series Clustering.
Datenbank-Spektrum, 2018

RHEEMix in the Data Jungle - A Cross-Platform Query Optimizer -.
CoRR, 2018

Where in the World Is Carmen Sandiego?: Detecting Person Locations via Social Media Discussions.
Proceedings of the 10th ACM Conference on Web Science, 2018

The Challenges of Creating, Maintaining and Exploring Graphs of Financial Entities.
Proceedings of the Fourth International Workshop on Data Science for Macro-Modeling with Financial and Economic Datasets, 2018

Towards Progressive Search-driven Entity Resolution.
Proceedings of the 26th Italian Symposium on Advanced Database Systems, 2018

Dissecting Company Names using Sequence Labeling.
Proceedings of the Conference "Lernen, Wissen, Daten, Analysen", 2018

Piggyback Profiling: Enhancing Query Results with Metadata.
Proceedings of the Conference "Lernen, Wissen, Daten, Analysen", 2018

CurEx: A System for Extracting, Curating, and Exploring Domain-Specific Knowledge Graphs from Text.
Proceedings of the 27th ACM International Conference on Information and Knowledge Management, 2018

2017
Detecting Inclusion Dependencies on Very Many Tables.
ACM Trans. Database Syst., 2017

Data Quality: The Role of Empiricism.
SIGMOD Record, 2017

Cardinality Estimation: An Experimental Survey.
PVLDB, 2017

Efficient Denial Constraint Discovery with Hydra.
PVLDB, 2017

Das Fachgebiet "Informationssysteme" am Hasso-Plattner-Institut.
Datenbank-Spektrum, 2017

What was Hillary Clinton doing in Katy, Texas?
Proceedings of the 26th International Conference on World Wide Web Companion, 2017

Enabling Change Exploration: Vision Paper.
Proceedings of the ExploreDB'17, Chicago, IL, USA, May 19, 2017, 2017

Data Profiling: A Tutorial.
Proceedings of the 2017 ACM International Conference on Management of Data, 2017

Uncovering Business Relationships: Context-sensitive Relationship Extraction for Difficult Relationship Types.
Proceedings of the Lernen, 2017

Identifying Media Bias by Analyzing Reported Speech.
Proceedings of the 2017 IEEE International Conference on Data Mining, 2017

Data-driven Schema Normalization.
Proceedings of the 20th International Conference on Extending Database Technology, 2017

Improving Company Recognition from Unstructured Text by using Dictionaries.
Proceedings of the 20th International Conference on Extending Database Technology, 2017

Metacrate: Organize and Analyze Millions of Data Profiles.
Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, 2017

A Hybrid Approach for Efficient Unique Column Combination Discovery.
Proceedings of the Datenbanksysteme für Business, 2017

Fast Approximate Discovery of Inclusion Dependencies.
Proceedings of the Datenbanksysteme für Business, 2017

2016
CohEEL: Coherent and efficient named entity linking through random walks.
J. Web Semant., 2016

Efficient order dependency detection.
VLDB J., 2016

The Information Systems Group at HPI.
SIGMOD Record, 2016

Data Anamnesis: Admitting Raw Data into an Organization.
IEEE Data Eng. Bull., 2016

Which Answer is Best?: Predicting Accepted Answers in MOOC Forums.
Proceedings of the 25th International Conference on World Wide Web, 2016

A Hybrid Approach to Functional Dependency Discovery.
Proceedings of the 2016 International Conference on Management of Data, 2016

RDFind: Scalable Conditional Inclusion Dependency Discovery in RDF Datasets.
Proceedings of the 2016 International Conference on Management of Data, 2016

Topic Shifts in StackOverflow: Ask it Like Socrates.
Proceedings of the Natural Language Processing and Information Systems, 2016

Cluster-Based Sorted Neighborhood for Efficient Duplicate Detection.
Proceedings of the IEEE International Conference on Data Mining Workshops, 2016

Data profiling.
Proceedings of the 32nd IEEE International Conference on Data Engineering, 2016

Holistic Data Profiling: Simultaneous Discovery of Various Metadata.
Proceedings of the 19th International Conference on Extending Database Technology, 2016

Combination of Rule-based and Textual Similarity Approaches to Match Financial Entities.
Proceedings of the Second International Workshop on Data Science for Macro-Modeling, 2016

Approximate Discovery of Functional Dependencies for Large Datasets.
Proceedings of the 25th ACM International Conference on Information and Knowledge Management, 2016

2015
Profiling relational data: a survey.
VLDB J., 2015

Progressive Duplicate Detection.
IEEE Trans. Knowl. Data Eng., 2015

Divide & Conquer-based Inclusion Dependency Discovery.
PVLDB, 2015

Functional Dependency Discovery: An Experimental Evaluation of Seven Algorithms.
PVLDB, 2015

Data Profiling with Metanome.
PVLDB, 2015

Front Matter.
PVLDB, 2015

Editorial.
J. Data and Information Quality, 2015

SOFA: An extensible logical optimizer for UDF-heavy data flows.
Inf. Syst., 2015

Who wants a computer to be a millionaire?
Inf. Process. Lett., 2015

Uniqueness, Density, and Keyness: Exploring Class Hierarchies.
Proceedings of the 6th International Workshop on Consuming Linked Data co-located with 14th International Semantic Web Conference (ISWC 2105), 2015

Exploring Linked Data Graph Structures.
Proceedings of the ISWC 2015 Posters & Demonstrations Track co-located with the 14th International Semantic Web Conference (ISWC-2015), 2015

A Serendipity Model for News Recommendation.
Proceedings of the KI 2015: Advances in Artificial Intelligence, 2015

Estimating Data Integration and Cleaning Effort.
Proceedings of the 18th International Conference on Extending Database Technology, 2015

Scaling Out the Discovery of Inclusion Dependencies.
Proceedings of the Datenbanksysteme für Business, 2015

2014
The Stratosphere platform for big data analytics.
VLDB J., 2014

Reach for gold: An annealing standard to evaluate duplicate detection results.
J. Data and Information Quality, 2014

Ein Datenbankkurs mit 6000 Teilnehmern - Erfahrungen auf der openHPI MOOC Plattform.
Informatik Spektrum, 2014

Semi-Supervised Consensus Clustering: Reducing Human Effort.
Proceedings of the 2014 IEEE International Conference on Data Mining Workshops, 2014

Bootstrapping Wikipedia to answer ambiguous person name queries.
Proceedings of the Workshops Proceedings of the 30th International Conference on Data Engineering Workshops, 2014

Detecting unique column combinations on dynamic data.
Proceedings of the IEEE 30th International Conference on Data Engineering, Chicago, 2014

Profiling and mining RDF data with ProLOD++.
Proceedings of the IEEE 30th International Conference on Data Engineering, Chicago, 2014

LODOP - Multi-Query Optimization for Linked Data Profiling Queries.
Proceedings of the 1st International Workshop on Dataset PROFIling & fEderated Search for Linked Data co-located with the 11th Extended Semantic Web Conference, 2014

Amending RDF Entities with New Facts.
Proceedings of the 3rd Workshop on Knowledge Discovery and Data Mining Meets Linked Open Data co-located with 11th Extended Semantic Web Conference (ESWC 2014), 2014

BEL: Bagging for Entity Linking.
Proceedings of the COLING 2014, 2014

Estimating the Number and Sizes of Fuzzy-Duplicate Clusters.
Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, 2014

DFD: Efficient Functional Dependency Discovery.
Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, 2014

2013
Topic modeling for expert finding using latent Dirichlet allocation.
Wiley Interdiscip. Rev. Data Min. Knowl. Discov., 2013

Data profiling revisited.
SIGMOD Record, 2013

Scalable Discovery of Unique Column Combinations.
PVLDB, 2013

Fusion Cubes: Towards Self-Service Business Intelligence.
IJDWM, 2013

Cross-lingual entity matching and infobox alignment in Wikipedia.
Inf. Syst., 2013

Cost-aware query planning for similarity search.
Inf. Syst., 2013

Improving RDF Data Through Association Rule Mining.
Datenbank-Spektrum, 2013

SOFA: An Extensible Logical Optimizer for UDF-heavy Dataflows.
CoRR, 2013

Bootstrapped Grouping of Results to Ambiguous Person Name Queries.
CoRR, 2013

Analyzing and predicting viral tweets.
Proceedings of the 22nd International World Wide Web Conference, 2013

Bulk sorted access for efficient top-k retrieval.
Proceedings of the Conference on Scientific and Statistical Database Management, 2013

On choosing thresholds for duplicate detection.
Proceedings of the 18th International Conference on Information Quality, 2013

Systematic ETL management - Experiences with high-level operators.
Proceedings of the 18th International Conference on Information Quality, 2013

Caching and Prefetching Strategies for SPARQL Queries.
Proceedings of the Semantic Web: ESWC 2013 Satellite Events, 2013

Detecting SPARQL Query Templates for Data Prefetching.
Proceedings of the Semantic Web: Semantics and Big Data, 10th International Conference, 2013

Synonym Analysis for Predicate Expansion.
Proceedings of the Semantic Web: Semantics and Big Data, 10th International Conference, 2013

Duplicate Detection on GPUs.
Proceedings of the Datenbanksysteme für Business, 2013

2012
Integrating open government data with stratosphere for more transparency.
J. Web Semant., 2012

Scalable Iterative Graph Duplicate Detection.
IEEE Trans. Knowl. Data Eng., 2012

The data analytics group at the qatar computing research institute.
SIGMOD Record, 2012

Holistic and Scalable Ontology Alignment for Linked Open Data.
Proceedings of the WWW2012 Workshop on Linked Data on the Web, 2012

GovWILD: integrating open government data for transparency.
Proceedings of the 21st World Wide Web Conference, 2012

Efficient Similarity Search in Very Large String Sets.
Proceedings of the Scientific and Statistical Database Management, 2012

The Quality of Web Data.
Proceedings of the 17th International Conference on Information Quality, 2012

Adaptive Windows for Duplicate Detection.
Proceedings of the IEEE 28th International Conference on Data Engineering (ICDE 2012), 2012

Scalable peer-to-peer-based RDF management.
Proceedings of the I-SEMANTICS 2012 - 8th International Conference on Semantic Systems, 2012

Schema Decryption for Large Extract-Transform-Load Systems.
Proceedings of the Conceptual Modeling, 2012

LINDA: distributed web-of-data-scale entity matching.
Proceedings of the 21st ACM International Conference on Information and Knowledge Management, 2012

Latent topics in graph-structured data.
Proceedings of the 21st ACM International Conference on Information and Knowledge Management, 2012

Discovering conditional inclusion dependencies.
Proceedings of the 21st ACM International Conference on Information and Knowledge Management, 2012

Reconciling ontologies and the web of data.
Proceedings of the 21st ACM International Conference on Information and Knowledge Management, 2012

2011
Creating voiD descriptions for Web-scale data.
J. Web Semant., 2011

Eliminating NULLs with Subsumption and Complementation.
IEEE Data Eng. Bull., 2011

Projektseminar "Similarity Search Algorithms".
Datenbank-Spektrum, 2011

Kurz erklärt: Datenfusion.
Datenbank-Spektrum, 2011

Instance-Based 'One-to-Some' Assignment of Similarity Measures to Attributes - (Short Paper).
Proceedings of the On the Move to Meaningful Internet Systems: OTM 2011, 2011

A generalization of blocking and windowing algorithms for duplicate detection.
Proceedings of the 2011 International Conference on Data and Knowledge Engineering, 2011

Dr. Crowdsource: or how i learned to stop worrying and love web data.
Proceedings of the 2nd International Workshop on Business intelligencE and the WEB, 2011

SPRINT: ranking search results by paths.
Proceedings of the EDBT 2011, 2011

Extreme web data integration.
Proceedings of the 4th workshop on Workshop for Ph.D. students in information & knowledge management, 2011

Black swan: augmenting statistics with event data.
Proceedings of the 20th ACM Conference on Information and Knowledge Management, 2011

Efficient similarity search: arbitrary similarity measures, arbitrary composition.
Proceedings of the 20th ACM Conference on Information and Knowledge Management, 2011

Frequency-aware similarity measures: why Arnold Schwarzenegger is always a duplicate.
Proceedings of the 20th ACM Conference on Information and Knowledge Management, 2011

Advancing the discovery of unique column combinations.
Proceedings of the 20th ACM Conference on Information and Knowledge Management, 2011

Improving Service Discovery through Enriched Service Descriptions.
Proceedings of the Datenbanksysteme für Business, 2011

2010
An Introduction to Duplicate Detection
Synthesis Lectures on Data Management, Morgan & Claypool Publishers, 2010

13th international workshop on the web and databases: WebDB 2010.
SIGMOD Record, 2010

Graph-based concept identification and disambiguation for enterprise search.
Proceedings of the 19th International Conference on World Wide Web, 2010

ECIR - A Lightweight Approach for Entity-Centric Information Retrieval.
Proceedings of The Nineteenth Text REtrieval Conference, 2010

Towards a diamond SOA operational model.
Proceedings of the IEEE International Conference on Service-Oriented Computing and Applications, 2010

Collecting, Annotating, and Classifying Public Web Services.
Proceedings of the On the Move to Meaningful Internet Systems: OTM 2010, 2010

Profiling linked open data with ProLOD.
Proceedings of the Workshops Proceedings of the 26th International Conference on Data Engineering, 2010

Complement union for data integration.
Proceedings of the Workshops Proceedings of the 26th International Conference on Data Engineering, 2010

Linking open government data: what journalists wish they had known.
Proceedings of the Proceedings the 6th International Conference on Semantic Systems, 2010

Towards Granular Data Placement Strategies for Cloud Platforms.
Proceedings of the 2010 IEEE International Conference on Granular Computing, 2010

Subsumption and complementation as data fusion operators.
Proceedings of the EDBT 2010, 2010

Dynamic tags for dynamic data web services.
Proceedings of the 5th Workshop on Emerging Web Services Technology, 2010

Extracting structured information from Wikipedia articles to populate infoboxes.
Proceedings of the 19th ACM Conference on Information and Knowledge Management, 2010

2009
Data fusion - Resolving Data Conflicts for Integration.
PVLDB, 2009

Guest Editorial for the Special Issue on Data Quality in Databases.
J. Data and Information Quality, 2009

A Machine Learning Approach to Foreign Key Discovery.
Proceedings of the 12th International Workshop on the Web and Databases, 2009

METL: Managing and Integrating ETL Processes.
Proceedings of the VLDB 2009 PhD Workshop. Co-located with the 35th International Conference on Very Large Data Bases (VLDB 2009). Lyon, 2009

Encapsulating Multi-stepped Web Forms as Web Services.
Proceedings of the Service-Oriented Computing. ICSOC/ServiceWave 2009 Workshops, 2009

Information Quality.
Proceedings of the Database Technologies: Concepts, 2009

2008
Industry-scale duplicate detection.
PVLDB, 2008

A research agenda for query processing in large-scale peer data management systems.
Inf. Syst., 2008

Data fusion.
ACM Comput. Surv., 2008

Managing ETL Processes.
Proceedings of the International Workshop on New Trends in Information Integration, 2008

Scaling up duplicate detection in graph data.
Proceedings of the 17th ACM Conference on Information and Knowledge Management, 2008

2007
Datenqualität.
Informatik Spektrum, 2007

Peer-Daten-Management-Systems - PDMS (Kurz erklärt).
Datenbank-Spektrum, 2007

FuSem - Exploring Different Semantics of Data Fusion.
Proceedings of the 33rd International Conference on Very Large Data Bases, 2007

Networked PIM Using PDMS.
Proceedings of the Third International Workshop on Networking Meets Databases, 2007

Rule-Based Measurement Of Data Quality In Nominal Data.
Proceedings of the 12th International Conference on Information Quality, 2007

Emergent Data Quality Annotation And Visualization.
Proceedings of the 12th International Conference on Information Quality, 2007

Efficiently Detecting Inclusion Dependencies.
Proceedings of the 23rd International Conference on Data Engineering, 2007

System P: Completeness-driven Query Answering in Peer Data Management Systems.
Proceedings of the Datenbanksysteme in Business, 2007

Schema- und Metadatenmanagement in Peer Data Management Systemen.
Proceedings of the Datenbanksysteme in Business, 2007

A Classification of Schema Mappings and Analysis of Mapping Tools.
Proceedings of the Datenbanksysteme in Business, 2007

Informationsintegration - Architekturen und Methoden zur Integration verteilter und heterogener Datenquellen.
dpunkt.verlag, 2007

2006
Data Fusion in Three Steps: Resolving Schema, Tuple, and Value Inconsistencies.
IEEE Data Eng. Bull., 2006

Detecting Duplicates in Complex XML Data.
Proceedings of the 22nd International Conference on Data Engineering, 2006

XStruct: Efficient Schema Extraction from Multiple and Large XML Documents.
Proceedings of the 22nd International Conference on Data Engineering Workshops, 2006

Efficiently Computing Inclusion Dependencies for Schema Discovery.
Proceedings of the 22nd International Conference on Data Engineering Workshops, 2006

XML Duplicate Detection Using Sorted Neighborhoods.
Proceedings of the Advances in Database Technology, 2006

Query Planning in the Presence of Overlapping Sources.
Proceedings of the Advances in Database Technology, 2006

Assessing the Completeness of Sensor Data.
Proceedings of the Database Systems for Advanced Applications, 2006

Informationsintegration: Architekturen und Methoden zur Integration verteilter und heterogener Datenquellen.
dpunkt, ISBN: 3-89864-400-6, 2006

2005
Ein Data-Quality-Wettbewerb.
Datenbank-Spektrum, 2005

A Data Model and Query Language to Explore Enhanced Links and Paths in Life Science Sources.
Proceedings of the Eight International Workshop on the Web & Databases (WebDB 2005), 2005

Automatic Data Fusion with HumMer.
Proceedings of the 31st International Conference on Very Large Data Bases, Trondheim, Norway, August 30, 2005

DogmatiX Tracks down Duplicates in XML.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2005

Clio: A Schema Mapping Tool for Information Integration.
Proceedings of the 8th International Symposium on Parallel Architectures, 2005

Schema Matching using Duplicates.
Proceedings of the 21st International Conference on Data Engineering, 2005

Benefit and Cost of Query Answering in PDMS.
Proceedings of the Databases, 2005

(Almost) Hands-Off Information Integration for the Life Sciences.
Proceedings of the CIDR 2005, 2005

Self-Extending Peer Data Management.
Proceedings of the Datenbanksysteme in Business, 2005

Declarative Data Fusion - Syntax, Semantics, and Implementation.
Proceedings of the Advances in Databases and Information Systems, 2005

2004
BioFast: Challenges in Exploring Linked Life Science Sources.
SIGMOD Record, 2004

Completeness of integrated information sources.
Inf. Syst., 2004

Eine Übung zur Vorlesung Informationsintegration.
Datenbank-Spektrum, 2004

Detecting Duplicate Objects in XML Documents.
Proceedings of the IQIS 2004, 2004

Information Quality: How Good Are Off-The-Shelf DBMS?
Proceedings of the Ninth International Conference on Information Quality (ICIQ 2004), 2004

Qualitäts- und Semantik-gesteuerte Anfragebearbeitung für Peer-basierte Datenmanagementsysteme (PDMS).
Proceedings of the INFORMATIK 2004, 2004

FUSE BY: Syntax und Semantik zur Informationsfusion in SQL.
Proceedings of the INFORMATIK 2004, 2004

Links and Paths through Life Sciences Data Sources.
Proceedings of the Data Integration in the Life Sciences, First International Workshop, 2004

Labeling and Enhancing Life Sciences Links.
Proceedings of the 3rd International IEEE Computer Society Computational Systems Bioinformatics Conference, 2004

2003
Qualitätsgesteuerte Anfragebearbeitung für Integrierte Informationssysteme.
it - Information Technology, 2003

Data Quality in Genome Databases.
Proceedings of the Eighth International Conference on Information Quality (ICIQ 2003), 2003

Exploring Life Sciences Data Sources.
Proceedings of IJCAI-03 Workshop on Information Integration on the Web (IIWeb-03), 2003

Super-Fast XML Wrapper Generation in DB2: A Demonstration.
Proceedings of the 19th International Conference on Data Engineering, 2003

Semantic Overlay Clusters within Super-Peer Networks.
Proceedings of the Databases, 2003

2002
Schema Management.
IEEE Data Eng. Bull., 2002

Declarative Data Merging with Conflict Resolution.
Proceedings of the Seventh International Conference on Information Quality (ICIQ 2002), 2002

Attribute Classification Using Feature Analysis.
Proceedings of the 18th International Conference on Data Engineering, San Jose, CA, USA, February 26, 2002

Mapping XML and Relational Schemas with Clio.
Proceedings of the 18th International Conference on Data Engineering, San Jose, CA, USA, February 26, 2002

Quality-Driven Query Answering for Integrated Information Systems
Lecture Notes in Computer Science 2261, Springer, ISBN: 3-540-43349-X, 2002

2001
From Databases to Information Systems - Information Quality Makes the Difference.
Proceedings of the Sixth Conference on Information Quality (IQ 2001), 2001

2000
Assessment Methods for Information Quality Criteria.
Proceedings of the Fifth Conference on Information Quality (IQ 2000), 2000

Query Planning with Information Quality Bounds.
Proceedings of the Flexible Query Answering Systems, 2000

Quality-driven Query Planning.
Proceedings of the 7th EDBT 2000 PhD Workshop, March 31 - April 1, 2000. Konstanz, Germany, 2000

1999
Quality-driven Integration of Heterogenous Information Systems.
Proceedings of the VLDB'99, 1999

Do Metadata Models meet IQ Requirements?
Proceedings of the Fourth Conference on Information Quality (IQ 1999), 1999

Density Scores for Cooperative Query Answering.
Proceedings of the 4. Workshop Föderierte Datenbanken, 1999

1998
Quality Driven Source Selection Using Data Envelope Analysis.
Proceedings of the Third Conference on Information Quality (IQ 1998), 1998


  Loading...