Sandeep Tata

Orcid: 0009-0007-7785-5516

According to our database1, Sandeep Tata authored at least 50 papers between 2003 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
STRUM-LLM: Attributed and Structured Contrastive Summarization.
CoRR, 2024

2023
STRUM: Extractive Aspect-Based Contrastive Summarization.
Proceedings of the Companion Proceedings of the ACM Web Conference 2023, 2023

VRDU: A Benchmark for Visually-rich Document Understanding.
Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023

Selective Labeling: How to Radically Lower Data-Labeling Costs for Document Extraction Models.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

2022
An Augmentation Strategy for Visually Rich Documents.
CoRR, 2022

A Benchmark for Structured Extractions from Complex Documents.
CoRR, 2022

Radically Lower Data-Labeling Costs for Visually Rich Document Extraction Models.
CoRR, 2022

Data-Efficient Information Extraction from Form-Like Documents.
CoRR, 2022

Learning Transferable Node Representations for Attribute Extraction from Web Documents.
Proceedings of the WSDM '22: The Fifteenth ACM International Conference on Web Search and Data Mining, Virtual Event / Tempe, AZ, USA, February 21, 2022

DI-2022: The Third Document Intelligence Workshop.
Proceedings of the KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14, 2022

2021
Glean: Structured Extractions from Templatic Documents.
Proc. VLDB Endow., 2021

Simplified DOM Trees for Transferable Attribute Extraction from the Web.
CoRR, 2021

DI-2021: The Second Document Intelligence Workshop.
Proceedings of the KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2021

2020
Active Learning for Skewed Data Sets.
CoRR, 2020

FreeDOM: A Transferable Neural Architecture for Structured Information Extraction on Web Documents.
Proceedings of the KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2020

Improving Recommendation Quality in Google Drive.
Proceedings of the KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2020

Migrating a Privacy-Safe Information Extraction System to a Software 2.0 Design.
Proceedings of the 10th Conference on Innovative Data Systems Research, 2020

Representation Learning for Information Extraction from Form-like Documents.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

2019
Online Template Induction for Machine-Generated Emails.
Proc. VLDB Endow., 2019

RiSER: Learning Better Representations for Richly Structured Emails.
Proceedings of the World Wide Web Conference, 2019

ItemSuggest: A Data Management Platform for Machine Learned Ranking Services.
Proceedings of the 9th Biennial Conference on Innovative Data Systems Research, 2019

2018
Query Languages and Evaluation Techniques for Biological Sequence Data.
Proceedings of the Encyclopedia of Database Systems, Second Edition, 2018

Hidden in Plain Sight: Classifying Emails Using Embedded Image Contents.
Proceedings of the 2018 World Wide Web Conference on World Wide Web, 2018

Anatomy of a Privacy-Safe Large-Scale Information Extraction System Over Email.
Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018

Recommendations for All: Solving Thousands of Recommendation Problems Daily.
Proceedings of the 34th IEEE International Conference on Data Engineering, 2018

2017
Quick Access: Building a Smart Experience for Google Drive.
Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, August 13, 2017

2014
Diff-Index: Differentiated Index in Distributed Log-Structured Data Stores.
Proceedings of the 17th International Conference on Extending Database Technology, 2014

2013
Toward a scale-out data-management middleware for low-latency enterprise computing.
IBM J. Res. Dev., 2013

A platform for eXtreme Analytics.
IBM J. Res. Dev., 2013

BlueSNP: R package for highly scalable genome-wide association studies using Hadoop clusters.
Bioinform., 2013

Sparkler: supporting large-scale matrix factorization.
Proceedings of the Joint 2013 EDBT/ICDT Conferences, 2013

2012
Clydesdale: structured data processing on hadoop.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2012

Clydesdale: structured data processing on MapReduce.
Proceedings of the 15th International Conference on Extending Database Technology, 2012

2011
Efficient and Accurate Discovery of Patterns in Sequence Data Sets.
IEEE Trans. Knowl. Data Eng., 2011

Using Paxos to Build a Scalable, Consistent, and Highly Available Datastore.
Proc. VLDB Endow., 2011

Column-Oriented Storage Techniques for MapReduce.
Proc. VLDB Endow., 2011

2010
Efficient and accurate discovery of patterns in sequence datasets.
Proceedings of the 26th International Conference on Data Engineering, 2010

2009
Query Languages and Evaluation Techniques for Biological Sequence Data.
Proceedings of the Encyclopedia of Database Systems, 2009

Towards a Scalable Enterprise Content Analytics Platform.
IEEE Data Eng. Bull., 2009

Leveraging a scalable row store to build a distributed text index.
Proceedings of the First International CIKM Workshop on Cloud Data Management, 2009

2008
SQAK: doing more with keywords.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2008

On common tools for databases - The case for a client-based index advisor.
Proceedings of the 24th International Conference on Data Engineering Workshops, 2008

FLAME: Shedding Light on Hidden Frequent Patterns in Sequence Datasets.
Proceedings of the 24th International Conference on Data Engineering, 2008

2007
Declarative Querying For Biological Sequences.
PhD thesis, 2007

Estimating the selectivity of <i>tf-idf</i> based cosine similarity predicates.
SIGMOD Rec., 2007

Periscope/SQ: Interactive Exploration of Biological Sequence Databases.
Proceedings of the 33rd International Conference on Very Large Data Bases, 2007

2006
Declarative Querying for Biological Sequences.
Proceedings of the 22nd International Conference on Data Engineering, 2006

2005
Practical methods for constructing suffix trees.
VLDB J., 2005

2004
Practical Suffix Tree Construction.
Proceedings of the (e)Proceedings of the Thirtieth International Conference on Very Large Data Bases, VLDB 2004, Toronto, Canada, August 31, 2004

2003
PiQA: An Algebra for Querying Protein Data Sets.
Proceedings of the 15th International Conference on Scientific and Statistical Database Management (SSDBM 2003), 2003


  Loading...