AnHai Doan

Affiliations:
  • University of Wisconsin, Madison, WI, USA


According to our database1, AnHai Doan authored at least 118 papers between 1994 and 2023.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2023
Effective entity matching with transformers.
VLDB J., November, 2023

Sparkly: A Simple yet Surprisingly Strong TF/IDF Blocker for Entity Matching.
Proc. VLDB Endow., 2023

2022
Cloud Data Systems: What are the Opportunities for the Database Research Community?
Proc. VLDB Endow., 2022

Toward Data Cleaning with a Target Accuracy: A Case Study for Value Normalization.
Proceedings of the IEEE International Conference on Big Data, 2022

2021
Deep Learning for Blocking in Entity Matching: A Design Space Exploration.
Proc. VLDB Endow., 2021

2020
Advice from SIGMOD/PODS 2020.
SIGMOD Rec., 2020

Deep Entity Matching with Pre-Trained Language Models.
Proc. VLDB Endow., 2020

Magellan: toward building ecosystems of entity matching solutions.
Commun. ACM, 2020

CoClean: Collaborative Data Cleaning.
Proceedings of the 2020 International Conference on Management of Data, 2020

Manually Detecting Errors for Data Cleaning Using Adaptive Crowdsourcing Strategies.
Proceedings of the 23rd International Conference on Extending Database Technology, 2020

Data Curation with Deep Learning.
Proceedings of the 23rd International Conference on Extending Database Technology, 2020

2019
The Seattle Report on Database Research.
SIGMOD Rec., 2019

Entity Matching Meets Data Science: A Progress Report from the Magellan Project.
Proceedings of the 2019 International Conference on Management of Data, 2019

Executing Entity Matching End to End: A Case Study.
Proceedings of the Advances in Database Technology, 2019

2018
CloudMatcher: A Hands-Off Cloud/Crowd Service for Entity Matching.
Proc. VLDB Endow., 2018

Smurf: Self-Service String Matching Using Random Forests.
Proc. VLDB Endow., 2018

Toward a System Building Agenda for Data Integration (and Data Science).
IEEE Data Eng. Bull., 2018

BigGorilla: An Open-Source Ecosystem for Data Preparation and Integration.
IEEE Data Eng. Bull., 2018

Deep Learning for Entity Matching: A Design Space Exploration.
Proceedings of the 2018 International Conference on Management of Data, 2018

Human-in-the-Loop Data Analysis: A Personal Perspective.
Proceedings of the Workshop on Human-In-the-Loop Data Analytics, 2018

MatchCatcher: A Debugger for Blocking in Entity Matching.
Proceedings of the 21st International Conference on Extending Database Technology, 2018

2017
Toward a System Building Agenda for Data Integration.
CoRR, 2017

MetaSRA: normalized human sample-specific metadata for the Sequence Read Archive.
Bioinform., 2017

Human-in-the-Loop Challenges for Entity Matching: A Midterm Report.
Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics, 2017

Falcon: Scaling Up Hands-Off Crowdsourced Entity Matching to Build Cloud Services.
Proceedings of the 2017 ACM International Conference on Management of Data, 2017

Towards Interactive Debugging of Rule-based Entity Matching.
Proceedings of the 20th International Conference on Extending Database Technology, 2017

Entity Matching Using Magellan: Matching Drug Reference Tables.
Proceedings of the Summit on Clinical Research Informatics, 2017

What is Our Agenda for Data Science?
Proceedings of the 8th Biennial Conference on Innovative Data Systems Research, 2017

bigNN: An open-source big data toolkit focused on biomedical sentence classification.
Proceedings of the 2017 IEEE International Conference on Big Data (IEEE BigData 2017), 2017

2016
Magellan: Toward Building Entity Matching Management Systems over Data Science Stacks.
Proc. VLDB Endow., 2016

Magellan: Toward Building Entity Matching Management Systems.
Proc. VLDB Endow., 2016

2015
Why Big Data Industrial Systems Need Rules and What We Can Do About It.
Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31, 2015

On Debugging Non-Answers in Keyword Search Systems.
Proceedings of the 18th International Conference on Extending Database Technology, 2015

2014
The Beckman Report on Database Research.
SIGMOD Rec., 2014

Chimera: Large-Scale Classification using Machine Learning, Rules, and Crowdsourcing.
Proc. VLDB Endow., 2014

Tracking Entities in the Dynamic World: A Fast Algorithm for Matching Temporal Records.
Proc. VLDB Endow., 2014

Corleone: hands-off crowdsourcing for entity matching.
Proceedings of the International Conference on Management of Data, 2014

Modeling entity evolution for temporal record matching.
Proceedings of the International Conference on Management of Data, 2014

2013
Entity Extraction, Linking, Classification, and Tagging for Social Media: A Wikipedia-Based Approach.
Proc. VLDB Endow., 2013

Social Media Analytics: The Kosmix Story.
IEEE Data Eng. Bull., 2013

Badger: Toward Crowdsourcing the Building of Structured Knowledge Bases.
Proceedings of the 16th International Workshop on the Web and Databases 2013, 2013

Building, maintaining, and using knowledge bases: a report from the trenches.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2013

2012
Muppet: MapReduce-Style Processing of Fast Data.
Proc. VLDB Endow., 2012

Principles of Data Integration.
Morgan Kaufmann, ISBN: 978-0-12-416044-6, 2012

2011
Tuffy: Scaling up Statistical Inference in Markov Logic Networks using an RDBMS.
Proc. VLDB Endow., 2011

Crowdsourcing Applications and Platforms: A Data Management Perspective.
Proc. VLDB Endow., 2011

Crowdsourcing systems on the World-Wide Web.
Commun. ACM, 2011

2010
Toward Scalable Keyword Search over Relational Data.
Proc. VLDB Endow., 2010

Crowds, clouds, and algorithms: exploring the human side of "big data" applications.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2010

Toward industrial-strength keyword search systems over relational data.
Proceedings of the 26th International Conference on Data Engineering, 2010

2009
Modeling and Extracting Deep-Web Query Interfaces.
Proceedings of the Advances in Information and Intelligent Systems, 2009

Combining keyword search and forms for ad hoc querying of databases.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2009

Optimizing complex extraction programs over evolving text data.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2009

Efficiently incorporating user feedback into information extraction and integration programs.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2009

Weighted Proximity Best-Joins for Information Retrieval.
Proceedings of the 25th International Conference on Data Engineering, 2009

Join Optimization of Information Extraction Output: Quality Matters!
Proceedings of the 25th International Conference on Data Engineering, 2009

The Case for a Structured Approach to Managing Unstructured Data.
Proceedings of the Fourth Biennial Conference on Innovative Data Systems Research, 2009

2008
Information extraction challenges in managing unstructured data.
SIGMOD Rec., 2008

Databases and Web 2.0 panel at VLDB 2007.
SIGMOD Rec., 2008

The Claremont report on database research.
SIGMOD Rec., 2008

On the provenance of non-answers to queries over extracted data.
Proc. VLDB Endow., 2008

Analyzing and revising data integration schemas to improve their matchability.
Proc. VLDB Endow., 2008

Toward best-effort information extraction.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2008

Building Structured Web Community Portals Via Extraction, Integration, and Mass Collaboration.
Proceedings of the PRICAI 2008: Trends in Artificial Intelligence, 2008

A Discriminative Approach to Ontology Mapping.
Proceedings of the International Workshop on New Trends in Information Integration, 2008

Matching Schemas in Online Communities: A Web 2.0 Approach.
Proceedings of the 24th International Conference on Data Engineering, 2008

Optimizing SQL Queries over Text Databases.
Proceedings of the 24th International Conference on Data Engineering, 2008

Building Community Wikipedias: A Machine-Human Partnership Approach.
Proceedings of the 24th International Conference on Data Engineering, 2008

Efficient Information Extraction over Evolving Text Data.
Proceedings of the 24th International Conference on Data Engineering, 2008

2007
eTuner: tuning schema matching software using synthetic scenarios.
VLDB J., 2007

User-Centric Research Challenges in Community Information Management Systems.
IEEE Data Eng. Bull., 2007

Declarative Information Extraction Using Datalog with Embedded Extraction Predicates.
Proceedings of the 33rd International Conference on Very Large Data Bases, 2007

Building Structured Web Community Portals: A Top-Down, Compositional, and Incremental Approach.
Proceedings of the 33rd International Conference on Very Large Data Bases, 2007

A Relational Approach to Incrementally Extracting and Querying Structure in Unstructured Data.
Proceedings of the 33rd International Conference on Very Large Data Bases, 2007

OLAP over Imprecise Data with Domain Constraints.
Proceedings of the 33rd International Conference on Very Large Data Bases, 2007

Data Quality Challenges in Community Systems.
Proceedings of the Fifth International Workshop on Quality in Databases, 2007

Source-aware Entity Matching: A Compositional Approach.
Proceedings of the 23rd International Conference on Data Engineering, 2007

Efficient Keyword Search Across Heterogeneous Relational Databases.
Proceedings of the 23rd International Conference on Data Engineering, 2007

SQL Queries Over Unstructured Text Databases.
Proceedings of the 23rd International Conference on Data Engineering, 2007

K-Anonymization as Spatial Indexing: Toward Scalable and Incremental Anonymization.
Proceedings of the 23rd International Conference on Data Engineering, 2007

DBLife: A Community Information Management Platform for the Database Research Community (Demo).
Proceedings of the Third Biennial Conference on Innovative Data Systems Research, 2007

2006
Community Information Management.
IEEE Data Eng. Bull., 2006

Managing information extraction: state of the art and research directions.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2006

WebIQ: Learning from the Web to Match Deep-Web Query Interfaces.
Proceedings of the 22nd International Conference on Data Engineering, 2006

2005
Semantic Integration.
AI Mag., 2005

Semantic Integration Research in the Database Community: A Brief Survey.
AI Mag., 2005

Tuning Schema Matching Software using Synthetic Scenarios.
Proceedings of the 31st International Conference on Very Large Data Bases, Trondheim, Norway, August 30, 2005

Mapping Maintenance for Data Integration Systems.
Proceedings of the 31st International Conference on Very Large Data Bases, Trondheim, Norway, August 30, 2005

Bootstrapping Domain Ontology for Semantic Web Services from Source Web Sites.
Proceedings of the Technologies for E-Services, 6th International Workshop, 2005

Merging Interface Schemas on the Deep Web via Clustering Aggregation.
Proceedings of the 5th IEEE International Conference on Data Mining (ICDM 2005), 2005

Integrating Data from Disparate Sources: A Mass Collaboration Approach.
Proceedings of the 21st International Conference on Data Engineering, 2005

Corpus-based Schema Matching.
Proceedings of the 21st International Conference on Data Engineering, 2005

Collaborative Development of Information Integration Systems.
Proceedings of the Knowledge Collection from Volunteer Contributors, 2005

Constraint-Based Entity Matching.
Proceedings of the Proceedings, 2005

2004
Introduction to the Special Issue on Semantic Integration.
SIGMOD Rec., 2004

Semantic Integration Workshop at the 2nd International Semantic Web Conference (ISWC-2003).
SIGMOD Rec., 2004

Semantic Integration Workshop at the Second International Semantic Web Conference (ISWC-2003).
AI Mag., 2004

An Interactive Clustering-based Approach to Integrating Source Query interfaces on the Deep Web.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2004

iMAP: Discovering Complex Mappings between Database Schemas.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2004

Privacy-preserving data integration and sharing.
Proceedings of the 9th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 2004

Ontology Matching: A Machine Learning Approach.
Proceedings of the Handbook on Ontologies, 2004

2003
Learning to match ontologies on the Semantic Web.
VLDB J., 2003

Learning to Match the Schemas of Data Sources: A Multistrategy Approach.
Mach. Learn., 2003

Profile-Based Object Matching for Information Integration.
IEEE Intell. Syst., 2003

Building Data Integration Systems: A Mass Collaboration Approach.
Proceedings of the International Workshop on Web and Databases, 2003

Building Data Integration Systems: A Mass Collaboration Approach.
Proceedings of IJCAI-03 Workshop on Information Integration on the Web (IIWeb-03), 2003

Object Matching for Information Integration: A Profiler-Based Approach.
Proceedings of IJCAI-03 Workshop on Information Integration on the Web (IIWeb-03), 2003

Crossing the Structure Chasm.
Proceedings of the First Biennial Conference on Innovative Data Systems Research, 2003

2002
Database Research at the University of Illinois at Urbana-Champaign.
SIGMOD Rec., 2002

Learning to map between ontologies on the semantic web.
Proceedings of the Eleventh International World Wide Web Conference, 2002

Efficiently Ordering Query Plans for Data Integration.
Proceedings of the 18th International Conference on Data Engineering, San Jose, CA, USA, February 26, 2002

2001
Reconciling Schemas of Disparate Data Sources: A Machine-Learning Approach.
Proceedings of the 2001 ACM SIGMOD international conference on Management of data, 2001

2000
Learning Source Description for Data Integration.
Proceedings of the Third International Workshop on the Web and Databases, 2000

1998
Geometric Foundations for Interval-Based Probabilities.
Ann. Math. Artif. Intell., 1998

1996
Sound Abstraction of Probabilistic Actions in The Constraint Mass Assignment Framework.
Proceedings of the UAI '96: Proceedings of the Twelfth Annual Conference on Uncertainty in Artificial Intelligence, 1996

Modeling Probabilistic Actions for Practical Decision-Theoretic Planning.
Proceedings of the Third International Conference on Artificial Intelligence Planning Systems, 1996

1995
Efficient Decision-Theoretic Planning: Techniques and Empirical Analysis.
Proceedings of the UAI '95: Proceedings of the Eleventh Annual Conference on Uncertainty in Artificial Intelligence, 1995

1994
Abstracting Probabilistic Actions.
Proceedings of the UAI '94: Proceedings of the Tenth Annual Conference on Uncertainty in Artificial Intelligence, 1994


  Loading...