Alex Szalay

Orcid: 0000-0002-4108-3282

  • Johns Hopkins University, USA

According to our database1, Alex Szalay authored at least 142 papers between 1999 and 2023.

Collaborative distances:


ACM Fellow

ACM Fellow 2023, "For contributions in systems, big data, open data, and for service to the community".



In proceedings 
PhD thesis 


Online presence:



Short-read aligner performance in germline variant identification.
Bioinform., August, 2023

First Organoid Intelligence (OI) workshop to form an OI community.
Frontiers Artif. Intell., February, 2023

Playing catch-up in building an open research commons.
CoRR, 2022

Performance optimization in DNA short-read alignment.
Bioinform., 2022

A Light-Weight Interpretable Model for Nuclei Detection and Weakly-Supervised Segmentation.
Proceedings of the Medical Optical Imaging and Virtual Microscopy Image Analysis, 2022

A Light-weight Interpretable CompositionalNetwork for Nuclei Detection and Weakly-supervised Segmentation.
CoRR, 2021

Wireless Sensor Network for in situ Soil Moisture Monitoring.
Proceedings of the 10th International Conference on Sensor Networks, 2021

Arioc: High-concurrency short-read alignment on multiple GPUs.
PLoS Comput. Biol., November, 2020

Baryon acoustic oscillations reconstruction using convolutional neural networks.
CoRR, 2020

SciServer: A science platform for astronomy and beyond.
Astron. Comput., 2020

Sketch and Scale Geo-distributed tSNE and UMAP.
Proceedings of the 2020 IEEE International Conference on Big Data (IEEE BigData 2020), 2020

The Terabase Search Engine: a large-scale relational database of short-read sequences.
Bioinform., 2019

StePS: A multi-GPU cosmological N-body Code for compactified simulations.
Astron. Comput., 2019

Big data and extreme-scale computing.
Int. J. High Perform. Comput. Appl., 2018

Realizing the potential of data science.
Commun. ACM, 2018

Arioc: GPU-accelerated alignment of short bisulfite-treated reads.
Bioinform., 2018

Scalable streaming tools for analyzing N-body simulations: Finding halos and investigating excursion sets in one pass.
Astron. Comput., 2018

Database-Centric Scientific Computing - (In Memoriam Jim Gray).
Proceedings of the Advances in Databases and Information Systems, 2018

Photo-z-SQL: Integrated, flexible photometric redshift computation in a database.
Astron. Comput., 2017

Extreme Event Analysis in Next Generation Simulation Architectures.
Proceedings of the High Performance Computing - 32nd International Conference, 2017

Accurately initializing real time clocks to provide synchronized time in sensor networks.
Proceedings of the 2017 International Conference on Computing, 2017

An SSD-based eigensolver for spectral analysis on billion-node graphs.
CoRR, 2016

Keynote speaker: Exascale numerical laboratories.
Proceedings of the 6th IEEE Symposium on Large Data Analysis and Visualization, 2016

A fast algorithm for neutrally-buoyant Lagrangian particles in numerical ocean modeling.
Proceedings of the 12th IEEE International Conference on e-Science, 2016

DOT-K: A distributed online top-K elements algorithm using extreme value statistics.
Proceedings of the 12th IEEE International Conference on e-Science, 2016

The WFIRST Science Archive and Analysis Center.
Proceedings of the Astroinformatics 2016, Sorrento, Italy, October 19-25, 2016, 2016

Optimize Unsynchronized Garbage Collection in an SSD Array.
CoRR, 2015

Delivering SKA Science.
CoRR, 2015

FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs.
Proceedings of the 13th USENIX Conference on File and Storage Technologies, 2015

Streaming Algorithms for Halo Finders.
Proceedings of the 11th IEEE International Conference on e-Science, 2015

Ten Years of SkyServer II: How Astronomers and the Public Have Embraced e-Science.
Comput. Sci. Eng., 2014

Ten Years of SkyServer I: Tracking Web and SQL e-Science Usage.
Comput. Sci. Eng., 2014

Hadoop in Low-Power Processors.
CoRR, 2014

FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs.
CoRR, 2014

From simulations to interactive numerical laboratories.
Proceedings of the 2014 Winter Simulation Conference, 2014

The future of computerized decision making.
Proceedings of the 2014 Winter Simulation Conference, 2014

Efficient classification of billions of points into complex geographic regions using hierarchical triangular mesh.
Proceedings of the Conference on Scientific and Statistical Database Management, 2014

Point cloud databases.
Proceedings of the Conference on Scientific and Statistical Database Management, 2014

Robust time synchronization in wireless sensor networks using real time clock.
Proceedings of the 12th ACM Conference on Embedded Network Sensor Systems, 2014

Real time change point detection by incremental PCA in large scale sensor data.
Proceedings of the IEEE High Performance Extreme Computing Conference, 2014

Flux-freezing breakdown in high-conductivity magnetohydrodynamic turbulence.
Nat., 2013

From Large Simulations to Interactive Numerical Laboratories.
IEEE Data Eng. Bull., 2013

SkyQuery: Federating Astronomy Archives.
Comput. Sci. Eng., 2013

Adaptive exploration for large-scale protein analysis in the molecular dynamics database.
Proceedings of the Conference on Scientific and Statistical Database Management, 2013

Graywulf: a platform for federated scientific databases and services.
Proceedings of the Conference on Scientific and Statistical Database Management, 2013

Inverted indices for particle tracking in petascale cosmological simulations.
Proceedings of the Conference on Scientific and Statistical Database Management, 2013

The open connectome project data cluster: scalable analysis and vision for high-throughput neuroscience.
Proceedings of the Conference on Scientific and Statistical Database Management, 2013

Toward millions of file system IOPS on low-cost, commodity hardware.
Proceedings of the International Conference for High Performance Computing, 2013

Turbulence Visualization at the Terascale on Desktop PCs.
IEEE Trans. Vis. Comput. Graph., 2012

Just-in-Time Analytics on Large File Systems.
IEEE Trans. Computers, 2012

Vortices within vortices: hierarchical nature of vortex tubes in turbulence
CoRR, 2012

Single parameter galaxy classification: The Principal Curve through the multi-dimensional space of galaxy properties
CoRR, 2012

SkyQuery: An Implementation of a Parallel Probabilistic Join Engine for Cross-Identification of Multiple Astronomical Databases.
Proceedings of the Scientific and Statistical Database Management, 2012

Incremental and Parallel Analytics on Astrophysical Data Streams.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Data-intensive spatial filtering in large numerical simulation datasets.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

The Future of Scientific Data Bases.
Proceedings of the IEEE 28th International Conference on Data Engineering (ICDE 2012), 2012

Data-intensive discoveries in science: the fourth paradigm.
Proceedings of the DIDC'12, 2012

A Parallel Page Cache: IOPS and Caching for Multicore Systems.
Proceedings of the 4th USENIX Workshop on Hot Topics in Storage and File Systems, 2012

Large science databases - are cloud services ready for them?
Sci. Program., 2011

Extreme Data-Intensive Scientific Computing.
Comput. Sci. Eng., 2011

Big Data [Guest editorial].
Comput. Sci. Eng., 2011

Implementing a General Spatial Indexing Library for Relational Databases of Large Numerical Simulations.
Proceedings of the Scientific and Statistical Database Management, 2011

I/O streaming evaluation of batch queries for data-intensive computational turbulence.
Proceedings of the Conference on High Performance Computing Networking, 2011

MPI-DB, A Parallel Database Services Software Library for Scientific Computing.
Proceedings of the Recent Advances in the Message Passing Interface, 2011

Performance modeling and analysis of flash-based storage devices.
Proceedings of the IEEE 27th Symposium on Mass Storage Systems and Technologies, 2011

Array requirements for scientific applications and an implementation for microsoft SQL server.
Proceedings of the 2011 EDBT/ICDT Workshop on Array Databases, 2011

VisWeek Capstone Address.
IEEE Trans. Vis. Comput. Graph., 2010

Low-power amdahl-balanced blades for data intensive computing.
ACM SIGOPS Oper. Syst. Rev., 2010

Scientific data management at the Johns Hopkins institute for data intensive engineering and science.
SIGMOD Rec., 2010

Wireless sensor networks for soil science.
Int. J. Sens. Networks, 2010

Middleware support for many-task computing.
Clust. Comput., 2010

JAWS: Job-Aware Workload Scheduling for the Exploration of Turbulence Simulations.
Proceedings of the Conference on High Performance Computing Networking, 2010

Mobile air pollution monitoring network.
Proceedings of the 2010 ACM Symposium on Applied Computing (SAC), 2010

Geometry of the Cosmic Web: Minkowski Functionals from the Delaunay Tessellation.
Proceedings of the Seventh International Symposium on Voronoi Diagrams in Science and Engineering, 2010

Migrating a (large) science database to the cloud.
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, 2010

An overview of the Open Science Data Cloud.
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, 2010

Phoenix: An Epidemic Approach to Time Reconstruction.
Proceedings of the Wireless Sensor Networks, 7th European Conference, 2010

Model-Based Event Detection in Wireless Sensor Networks
CoRR, 2009

GrayWulf: Scalable Clustered Architecture for Data Intensive Computing.
Proceedings of the 42st Hawaii International International Conference on Systems Science (HICSS-42 2009), 2009

GrayWulf: Scalable Software Architecture for Data Intensive Computing.
Proceedings of the 42st Hawaii International International Conference on Systems Science (HICSS-42 2009), 2009

Sundial: Using Sunlight to Reconstruct Global Timestamps.
Proceedings of the Wireless Sensor Networks, 6th European Conference, 2009

Proceedings of the Euro-Par 2009 Parallel Processing, 2009

Building Reliable Data Pipelines for Managing Community Data Using Scientific Workflows.
Proceedings of the Fifth International Conference on e-Science, 2009

Gray's laws: database-centric computing in science.
Proceedings of the Fourth Paradigm: Data-Intensive Scientific Discovery, 2009

The Sloan Digital Sky Survey and beyond.
SIGMOD Rec., 2008

The Claremont report on database research.
SIGMOD Rec., 2008

The Catalog Archive Server Database Management System.
Comput. Sci. Eng., 2008

The sqlLoader Data-Loading Pipeline.
Comput. Sci. Eng., 2008

Accelerating Large-scale Data Exploration through Data Diffusion
CoRR, 2008

Data Diffusion: Dynamic Resource Provision and Data-Aware Scheduling for Data Intensive Applications
CoRR, 2008

Data-Intensive Computing in the 21st Century.
Computer, 2008

Jim Gray, astronomer.
Commun. ACM, 2008

New Challenges in Petascale Scientific Databases.
Proceedings of the Scientific and Statistical Database Management, 2008

Scientific publishing in the era of petabye data.
Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, 2008

On Building Scientific Workflow Systems for Data Management in the Cloud.
Proceedings of the Fourth International Conference on e-Science, 2008

Efficient scheduling of scientific workflows in a high performance computing cluster.
Proceedings of the 6th International Workshop on Challenges of Large Applications in Distributed Environments, 2008

Data Management in the Worldwide Sensor Web.
IEEE Pervasive Comput., 2007

Digital Data Preservation for Scholarly Publications in Astronomy.
Int. J. Digit. Curation, 2007

SkyServer Traffic Report - The First Five Years
CoRR, 2007

Cross-Matching Multiple Spatial Observations and Dealing with Missing Data
CoRR, 2007

The Zones Algorithm for Finding Points-Near-a-Point or Cross-Matching Spatial Datasets
CoRR, 2007

Life Under Your Feet: An End-to-End Soil Ecology Sensor Network, Database, Web Server, and Analysis Service
CoRR, 2007

Large-Scale Query and XMatch, Entering the Parallel Zone
CoRR, 2007

Indexing the Sphere with the Hierarchical Triangular Mesh
CoRR, 2007

Using Table Valued Functions in SQL Server 2005 To Implement a Spatial Data Library
CoRR, 2007

Spatial Indexing of Large Multidimensional Databases.
Proceedings of the Third Biennial Conference on Innovative Data Systems Research, 2007

Data mining middleware for wide-area high-performance networks.
Future Gener. Comput. Syst., 2006

Designing a Multi-petabyte Database for LSST
CoRR, 2006

Petascale Computational Systems.
Computer, 2006

Data analysis tools for sensor-based science.
Proceedings of the 4th International Conference on Embedded Networked Sensor Systems, 2006

Poster reception - Harnessing grid resources to enable the dynamic analysis of large astronomy datasets.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Data management and query - Estimating query result sizes for proxy caching in scientific database federations.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Bandwidth challenge - Transporting sloan digital sky survey data using SECTOR.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Digital Data Preservation and Curation: A Collaboration Among Libraries, Publishers, and the Virtual Observatory - A pilot project aimed at preserving, curating, and enabling access to digital data and associated electronic journals content..
Proceedings of the 3rd International Conference on Digital Preservation, 2006

Distributing the Sloan Digital Sky Survey Using UDT and Sector.
Proceedings of the Second International Conference on e-Science and Grid Technologies (e-Science 2006), 2006

Scientific data management in the coming decade.
SIGMOD Rec., 2005

Batch is back: CasJobs, serving multi-TB data on the Web
CoRR, 2005

Batch is Back: CasJobs, Serving Multi-TB Data on the Web.
Proceedings of the 2005 IEEE International Conference on Web Services (ICWS 2005), 2005

When Database Systems Meet the Grid.
Proceedings of the Second Biennial Conference on Innovative Data Systems Research, 2005

Where the Rubber Meets the Sky: Bridging the Gap between Databases and Science.
IEEE Data Eng. Bull., 2004

There Goes the Neighborhood: Relational Algebra for Spatial Data Search
CoRR, 2004

The Sloan Digital Sky Survey Science Archive: Migrating a Multi-Terabyte Astronomical Archive from Object to Relational DBMS
CoRR, 2004

The World Wide Telescope: An Archetype for Online Science
CoRR, 2004

Extending the SDSS Batch Query System to the National Virtual Observatory Grid
CoRR, 2004

Scientific Data Federation.
Proceedings of the Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition, 2004

Migrating a multiterabyte archive from object to relational databases.
Comput. Sci. Eng., 2003

SkyQuery: A Web Service Approach to Federate Databases.
Proceedings of the First Biennial Conference on Innovative Data Systems Research, 2003

TeraScale SneakerNet: Using Inexpensive Disks for Backup, Archiving, and Data Exchange
CoRR, 2002

Online Scientific Data Curation, Publication, and Archiving
CoRR, 2002

Web Services for the Virtual Observatory
CoRR, 2002

SkyQuery: A WebService Approach to Federate Databases
CoRR, 2002

Spatial Clustering of Galaxies in Large Datasets
CoRR, 2002

Petabyte Scale Data Mining: Dream or Reality?
CoRR, 2002

The world-wide telescope.
Commun. ACM, 2002

Data Mining the SDSS SkyServer Database.
Proceedings of the Distributed Data & Structures 4, 2002

The SDSS skyserver: public access to the sloan digital sky server data.
Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, 2002

Very Fast Outlier Detection in Large Multidimensional Data Sets.
Proceedings of the 2002 ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 2002

Designing and Mining Multi-Terabyte Astronomy Archives: The Sloan Digital Sky Survey.
Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, 2000

Astronomical archives of the future: a Virtual Observatory.
Future Gener. Comput. Syst., 1999

The Sloan Digital Sky Survey.
Comput. Sci. Eng., 1999

Designing and Mining Multi-Terabyte Astronomy Archives: The Sloan Digital Sky Survey
CoRR, 1999

Digital Sky - Panel.
Proceedings of the IEEE Forum on Research and Technology Advances in Digital Libraries, 1999
