Dong Deng

Orcid: 0000-0002-4596-3850

Affiliations:
  • Rutgers University, New Brunswick, NJ, USA
  • Massachusetts Institute of Technology, CSAIL, Cambridge, MA, USA (former)
  • Tsinghua University, Beijing, China (PhD 2016)


According to our database1, Dong Deng authored at least 54 papers between 2010 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Neural Locality Sensitive Hashing for Entity Blocking.
CoRR, 2024

2023
ARKGraph: All-Range Approximate K-Nearest-Neighbor Graph.
Proc. VLDB Endow., 2023

Near-Duplicate Sequence Search at Scale for Large Language Model Memorization Evaluation.
Proc. ACM Manag. Data, 2023

The Case for Learned Provenance Graph Storage Systems.
Proceedings of the 32nd USENIX Security Symposium, 2023

2022
G-SLIDE: A GPU-Based Sub-Linear Deep Learning Engine via LSH Sparsification.
IEEE Trans. Parallel Distributed Syst., 2022

Efficient Load-Balanced Butterfly Counting on GPU.
Proc. VLDB Endow., 2022

Spine: Scaling up Programming-by-Negative-Example for String Filtering and Transformation.
Proceedings of the SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12, 2022

TxtAlign: Efficient Near-Duplicate Text Alignment Search via Bottom-k Sketches for Plagiarism Detection.
Proceedings of the SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12, 2022

2021
Correction to: Internal and external memory set containment join.
VLDB J., 2021

Internal and external memory set containment join.
VLDB J., 2021

Preface.
J. Comput. Sci. Technol., 2021

Allign: Aligning All-Pair Near-Duplicate Passages in Long Texts.
Proceedings of the SIGMOD '21: International Conference on Management of Data, 2021

2020
DeltaPQ: Lossless Product Quantization Code Compression for High Dimensional Similarity Search.
Proc. VLDB Endow., 2020

Efficient Locality-Sensitive Hashing Over High-Dimensional Data Streams.
Proceedings of the 36th IEEE International Conference on Data Engineering, 2020

2019
Balance-Aware Distributed String Similarity-Based Query Processing System.
Proc. VLDB Endow., 2019

Technical Report: Optimizing Human Involvement for Entity Matching and Consolidation.
CoRR, 2019

JOSIE: Overlap Set Similarity Search for Finding Joinable Tables in Data Lakes.
Proceedings of the 2019 International Conference on Management of Data, 2019

2ED: An Efficient Entity Extraction Algorithm Using Two-Level Edit-Distance.
Proceedings of the 35th IEEE International Conference on Data Engineering, 2019

LCJoin: Set Containment Join via List Crosscutting.
Proceedings of the 35th IEEE International Conference on Data Engineering, 2019

Unsupervised String Transformation Learning for Entity Consolidation.
Proceedings of the 35th IEEE International Conference on Data Engineering, 2019

2018
A partial-order-based framework for cost-effective crowdsourced entity resolution.
VLDB J., 2018

Overlap Set Similarity Joins with Theoretical Guarantees.
Proceedings of the 2018 International Conference on Management of Data, 2018

Building Data Civilizer Pipelines with an Advanced Workflow Engine.
Proceedings of the 34th IEEE International Conference on Data Engineering, 2018

2017
A unified framework for string similarity search with edit-distance constraint.
VLDB J., 2017

Approximate String Joins with Abbreviations.
Proc. VLDB Endow., 2017

Dima: A Distributed In-Memory Similarity-Based Query Processing System.
Proc. VLDB Endow., 2017

SilkMoth: An Efficient Method for Finding Related Sets with Maximum Matching Constraints.
Proc. VLDB Endow., 2017

Error-Tolerant Big Data Processing.
CoRR, 2017

Entity Consolidation: The Golden Record Problem.
CoRR, 2017

A Technical Report: Entity Extraction using Both Character-based and Token-based Similarity.
CoRR, 2017

What to do about database decay.
Commun. ACM, 2017

A Demo of the Data Civilizer System.
Proceedings of the 2017 ACM International Conference on Management of Data, 2017

The Data Civilizer System.
Proceedings of the 8th Biennial Conference on Innovative Data Systems Research, 2017

2016
META: An Efficient Matching-Based Method for Error-Tolerant Autocompletion.
Proc. VLDB Endow., 2016

Detecting Data Errors: Where are we and what needs to be done?
Proc. VLDB Endow., 2016

String similarity search and join: a survey.
Frontiers Comput. Sci., 2016

Cost-Effective Crowdsourced Entity Resolution: A Partial-Order Approach.
Proceedings of the 2016 International Conference on Management of Data, 2016

Database decay and how to avoid it.
Proceedings of the 2016 IEEE International Conference on Big Data (IEEE BigData 2016), 2016

2015
A unified framework for approximate dictionary-based entity extraction.
VLDB J., 2015

An Efficient Partition Based Method for Exact Set Similarity Joins.
Proc. VLDB Endow., 2015

Efficient Similarity Join and Search on Multi-Attribute Data.
Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31, 2015

Two birds with one stone: An efficient hierarchical framework for top-k and threshold-based string similarity search.
Proceedings of the 31st IEEE International Conference on Data Engineering, 2015

2014
State-of-the-art in string similarity search and join.
SIGMOD Rec., 2014

Distributed Graph Simulation: Impossibility and Possibility.
Proc. VLDB Endow., 2014

A pivotal prefix based filtering algorithm for string similarity search.
Proceedings of the International Conference on Management of Data, 2014

MassJoin: A mapreduce-based method for scalable string similarity joins.
Proceedings of the IEEE 30th International Conference on Data Engineering, Chicago, 2014

2013
A partition-based method for string similarity joins with edit-distance constraints.
ACM Trans. Database Syst., 2013

Scalable Column Concept Determination for Web Tables Using Large Knowledge Bases.
Proc. VLDB Endow., 2013

Top-k string similarity search with edit-distance constraints.
Proceedings of the 29th IEEE International Conference on Data Engineering, 2013

Efficient parallel partition-based algorithms for similarity search and join with edit distance constraints.
Proceedings of the Joint 2013 EDBT/ICDT Conferences, 2013

2012
An Efficient Trie-based Method for Approximate Entity Extraction with Edit-Distance Constraints.
Proceedings of the IEEE 28th International Conference on Data Engineering (ICDE 2012), 2012

2011
PASS-JOIN: A Partition-based Method for Similarity Joins.
Proc. VLDB Endow., 2011

Faerie: efficient filtering algorithms for approximate dictionary-based entity extraction.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2011

2010
Extending dictionary-based entity extraction to tolerate errors.
Proceedings of the 19th ACM Conference on Information and Knowledge Management, 2010


  Loading...