Nadathur Satish

Orcid: 0000-0002-8065-3401

According to our database1, Nadathur Satish authored at least 56 papers between 2005 and 2021.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2021
Low-Precision Hardware Architectures Meet Recommendation Model Inference at Scale.
IEEE Micro, 2021

First-Generation Inference Accelerator Deployment at Facebook.
CoRR, 2021

2019
Parallelizing Word2Vec in Shared and Distributed Memory.
IEEE Trans. Parallel Distributed Syst., 2019

2018
Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications.
CoRR, 2018

Glow: Graph Lowering Compiler Techniques for Neural Networks.
CoRR, 2018

2017
Bridging the Gap between HPC and Big Data frameworks.
Proc. VLDB Endow., 2017

Deep learning at 15PF: supervised and semi-supervised classification for scientific data.
Proceedings of the International Conference for High Performance Computing, 2017

Galactos: computing the anisotropic 3-point correlation function for 2 billion galaxies.
Proceedings of the International Conference for High Performance Computing, 2017

Banshee: bandwidth-efficient DRAM caching via software/hardware cooperation.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

2016
Efficient Approximation Algorithms for Weighted b-Matching.
SIAM J. Sci. Comput., 2016

BlackOut: Speeding up Recurrent Neural Network Language Models With Very Large Vocabularies.
Proceedings of the 4th International Conference on Learning Representations, 2016

Parallelizing Word2Vec in Multi-Core and Many-Core Architectures.
CoRR, 2016

Designing scalable <i>b</i>-Matching algorithms on distributed memory multiprocessors by approximation.
Proceedings of the International Conference for High Performance Computing, 2016

Graphicionado: A high-performance and energy-efficient accelerator for graph analytics.
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

High Performance Parallel Stochastic Gradient Descent in Shared Memory.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

PANDA: Extreme Scale Parallel K-Nearest Neighbor on Distributed Architectures.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

GraphPad: Optimized Graph Primitives for Parallel and Distributed Platforms.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Data tiering in heterogeneous memory systems.
Proceedings of the Eleventh European Conference on Computer Systems, 2016

2015
GraphMat: High performance graph analytics made productive.
Proc. VLDB Endow., 2015

GraphMat: High performance graph analytics made productive.
CoRR, 2015

Can traditional programming bridge the ninja performance gap for parallel computing applications?
Commun. ACM, 2015

Parallel Efficient Sparse Matrix-Matrix Multiplication on Multicore Platforms.
Proceedings of the High Performance Computing - 30th International Conference, 2015

Exploiting NVM in large-scale graph analytics.
Proceedings of the 3rd Workshop on Interactions of NVM/FLASH with Operating Systems and Workloads, 2015

Full correlation matrix analysis of fMRI data on Intel® Xeon Phi™ coprocessors.
Proceedings of the International Conference for High Performance Computing, 2015

BD-CATS: big data clustering at trillion particle scale.
Proceedings of the International Conference for High Performance Computing, 2015

Improving graph partitioning for modern graphs and architectures.
Proceedings of the 5th Workshop on Irregular Applications - Architectures and Algorithms, 2015

IMP: indirect memory prefetcher.
Proceedings of the 48th International Symposium on Microarchitecture, 2015

Scalable Bayesian Optimization Using Deep Neural Networks.
Proceedings of the 32nd International Conference on Machine Learning, 2015

2014
GenBase: a complex analytics genomics benchmark.
Proceedings of the International Conference on Management of Data, 2014

Navigating the maze of graph analytics frameworks using massive graph datasets.
Proceedings of the International Conference on Management of Data, 2014

Pardicle: Parallel Approximate Density-Based Clustering.
Proceedings of the International Conference for High Performance Computing, 2014

2013
Streaming Similarity Search over one Billion Tweets using Parallel Locality-Sensitive Hashing.
Proc. VLDB Endow., 2013

2012
DySER: Unifying Functionality and Parallelism Specialization for Energy-Efficient Computing.
IEEE Micro, 2012

CloudRAMSort: fast and efficient large-scale distributed RAM sort on shared-nothing cluster.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2012

Analysis and Optimization of Financial Analytics Benchmark on Modern Multi- and Many-core IA-Based Architectures.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Large-scale energy-efficient graph traversal: a path to efficient data-intensive supercomputing.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

GPP-Grep: High-Speed Regular Expression Processing Engine on General Purpose Processors.
Proceedings of the Research in Attacks, Intrusions, and Defenses, 2012

Fast and Efficient Graph Traversal Algorithm for CPUs: Maximizing Single-Node Efficiency.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

2011
Designing fast architecture-sensitive tree search on modern multicore/many-core processors.
ACM Trans. Database Syst., 2011

PALM: Parallel Architecture-Friendly Latch-Free Modifications to B+ Trees on Many-Core Processors.
Proc. VLDB Endow., 2011

Fast Updates on Read-Optimized Databases Using Multi-Core CPUs.
Proc. VLDB Endow., 2011

2010
Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2010

FAST: fast architecture sensitive tree search on modern CPUs and GPUs.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2010

3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs.
Proceedings of the Conference on High Performance Computing Networking, 2010

Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU.
Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010

2009
Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs.
Proc. VLDB Endow., 2009

ClearPath: highly parallel collision avoidance for multi-agent simulation.
Proceedings of the 2009 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, 2009

Interactive Modeling, Simulation and Control of Large-Scale Crowds and Traffic.
Proceedings of the Motion in Games, Second International Workshop, 2009

Designing efficient sorting algorithms for manycore GPUs.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Optimizing the use of GPU memory in applications with large data sets.
Proceedings of the 16th International Conference on High Performance Computing, 2009

2008
Scheduling task dependence graphs with variable task execution times onto heterogeneous multiprocessors.
Proceedings of the 8th ACM & IEEE International conference on Embedded software, 2008

2007
Efficient Parallelization of H.264 Decoding with Macro Block Level Scheduling.
Proceedings of the 2007 IEEE International Conference on Multimedia and Expo, 2007

A decomposition-based constraint optimization approach for statically scheduling task graphs with communication delays to multiprocessors.
Proceedings of the 2007 Design, Automation and Test in Europe Conference and Exposition, 2007

2005
An FPGA-based Soft Multiprocessor System for IPv4 Packet Forwarding.
Proceedings of the 2005 International Conference on Field Programmable Logic and Applications (FPL), 2005

Soft multiprocessor systems for network applications (abstract only).
Proceedings of the ACM/SIGDA 13th International Symposium on Field Programmable Gate Arrays, 2005

An automated exploration framework for FPGA-based soft multiprocessor systems.
Proceedings of the 3rd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2005


  Loading...