Pradeep Dubey

David Esteban Bernal Neira

CoRR, January, 2026

2025

Intel Xeon 6 Product Family.

[BibT_eX]

[DOI]

Michael D. Powell

Patrick Fleming

Venkidesh Iyer Krishna

IEEE Micro, 2025

2023

Microscaling Data Formats for Deep Learning.

[BibT_eX]

[DOI]

CoRR, 2023

AUTOSPARSE: Towards Automated Sparse Training of Deep Neural Networks.

[BibT_eX]

[DOI]

CoRR, 2023

HamLib: A Library of Hamiltonians for Benchmarking Quantum Algorithms and Hardware.

[BibT_eX]

[DOI]

Alicia B. Magann

Shavindra P. Premaratne

Proceedings of the IEEE International Conference on Quantum Computing and Engineering, 2023

2022

FP8 Formats for Deep Learning.

[BibT_eX]

[DOI]

Richard Grisenthwaite

CoRR, 2022

2020

Systolic Computing on GPUs for Productive Performance.

[BibT_eX]

[DOI]

CoRR, 2020

MISIM: An End-to-End Neural Code Similarity System.

[BibT_eX]

[DOI]

CoRR, 2020

Context-Aware Parse Trees.

[BibT_eX]

[DOI]

CoRR, 2020

SuSy: A Programming Model for Productive Construction of High-Performance Systolic Arrays on FPGAs.

[BibT_eX]

[DOI]

Christopher J. Hughes

Dimitris S. Papailiopoulos

Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2020

2019

Parallelizing Word2Vec in Shared and Distributed Memory.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2019

K-TanH: Hardware Efficient Activations For Deep Learning.

[BibT_eX]

[DOI]

CoRR, 2019

A Study of BFLOAT16 for Deep Learning Training.

[BibT_eX]

[DOI]

Nataraj Jammalamadaka

CoRR, 2019

SysML: The New Frontier of Machine Learning Systems.

[BibT_eX]

[DOI]

Alexandros G. Dimakis

Anastasios Kyrillidis

Shivaram Venkataraman

CoRR, 2019

T2S-Tensor: Productively Generating High-Performance Spatial Hardware for Dense Tensor Computations.

[BibT_eX]

[DOI]

Nitish Kumar Srivastava

Christopher J. Hughes

Timothy G. Mattson

Proceedings of the 27th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2019

2018

Graphical exchange mechanisms.

[BibT_eX]

[DOI]

Siddhartha Sahi

Martin Shubik

Games Econ. Behav., 2018

Money as minimal complexity.

[BibT_eX]

[DOI]

Siddhartha Sahi

Martin Shubik

Games Econ. Behav., 2018

On Scale-out Deep Learning Training for Cloud and HPC.

[BibT_eX]

[DOI]

Srinivas Sridharan

CoRR, 2018

Mixed Precision Training of Convolutional Neural Networks using Integer Operations.

[BibT_eX]

[DOI]

Proceedings of the 6th International Conference on Learning Representations, 2018

2017

Ternary Neural Networks with Fine-Grained Quantization.

[BibT_eX]

[DOI]

CoRR, 2017

Ternary Residual Networks.

[BibT_eX]

[DOI]

CoRR, 2017

Deep learning at 15PF: supervised and semi-supervised classification for scientific data.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2017

Galactos: computing the anisotropic 3-point correlation function for 2 billion galaxies.

[BibT_eX]

[DOI]

Brian Friesen

Proceedings of the International Conference for High Performance Computing, 2017

The Quest for The Ultimate Learning Machine.

[BibT_eX]

[DOI]

Proceedings of the 2017 ACM on International Symposium on Physical Design, 2017

ScaleDeep: A Scalable Compute Architecture for Learning and Evaluating Deep Networks.

[BibT_eX]

[DOI]

Swagath Venkataramani

Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

Faster CNNs with Direct Sparse Convolutions and Guided Pruning.

[BibT_eX]

[DOI]

Proceedings of the 5th International Conference on Learning Representations, 2017

2016

Full-Stack Architecting to Achieve a Billion-Requests-Per-Second Throughput on a Single Key-Value Store Server Platform.

[BibT_eX]

[DOI]

ACM Trans. Comput. Syst., 2016

Efficient Approximation Algorithms for Weighted b-Matching.

[BibT_eX]

[DOI]

Arif M. Khan

Alex Pothen

Fredrik Manne

Mahantesh Halappanavar

SIAM J. Sci. Comput., 2016

Achieving One Billion Key-Value Requests per Second on a Single Server.

[BibT_eX]

[DOI]

IEEE Micro, 2016

Optimizations in a high-performance conjugate gradient benchmark for IA-based multi- and many-core processors.

[BibT_eX]

[DOI]

Dhiraj D. Kalamkar

Int. J. High Perform. Comput. Appl., 2016

Scaling up Hartree-Fock calculations on Tianhe-2.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2016

Eliciting performance: deterministic versus proportional prizes.

[BibT_eX]

[DOI]

Siddhartha Sahi

Int. J. Game Theory, 2016

Holistic SparseCNN: Forging the Trident of Accuracy, Speed, and Size.

[BibT_eX]

[DOI]

CoRR, 2016

BlackOut: Speeding up Recurrent Neural Network Language Models With Very Large Vocabularies.

[BibT_eX]

[DOI]

Shihao Ji

S. V. N. Vishwanathan

Proceedings of the 4th International Conference on Learning Representations, 2016

Parallelizing Word2Vec in Multi-Core and Many-Core Architectures.

[BibT_eX]

[DOI]

CoRR, 2016

Distributed Deep Learning Using Synchronous Stochastic Gradient Descent.

[BibT_eX]

[DOI]

Dipankar Das

Sasikanth Avancha

Dheevatsa Mudigere

CoRR, 2016

High Order Seismic Simulations on the Intel Xeon Phi Processor (Knights Landing).

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing - 31st International Conference, 2016

Designing scalable <i>b</i>-Matching algorithms on distributed memory multiprocessors by approximation.

[BibT_eX]

[DOI]

Arif M. Khan

Alex Pothen

Mahantesh Halappanavar

Proceedings of the International Conference for High Performance Computing, 2016

PANDA: Extreme Scale Parallel K-Nearest Neighbor on Distributed Architectures.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

GraphPad: Optimized Graph Primitives for Parallel and Distributed Platforms.

[BibT_eX]

[DOI]

Theodore L. Willke

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

2015

GraphMat: High performance graph analytics made productive.

[BibT_eX]

[DOI]

Subramanya Dulloor

Satya Gautam Vadlamudi

Dipankar Das

Proc. VLDB Endow., 2015

Beacon: Deployment and Application of Intel Xeon Phi Coprocessorsfor Scientific Computing.

[BibT_eX]

[DOI]

Comput. Sci. Eng., 2015

GraphMat: High performance graph analytics made productive.

[BibT_eX]

[DOI]

Subramanya Dulloor

Satya Gautam Vadlamudi

Dipankar Das

CoRR, 2015

Decentralization of a Machine: Some Definitions.

[BibT_eX]

[DOI]

CoRR, 2015

Parallel Efficient Sparse Matrix-Matrix Multiplication on Multicore Platforms.

[BibT_eX]

[DOI]

Satya Gautam Vadlamudi

Proceedings of the High Performance Computing - 30th International Conference, 2015

BD-CATS: big data clustering at trillion particle scale.

[BibT_eX]

[DOI]

Surendra Byna

Proceedings of the International Conference for High Performance Computing, 2015

High-performance algebraic multigrid solver optimized for multi-core based distributed parallel systems.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2015

Improving graph partitioning for modern graphs and architectures.

[BibT_eX]

[DOI]

Dominique LaSalle

Proceedings of the 5th Workshop on Irregular Applications - Architectures and Algorithms, 2015

Architecting to achieve a billion requests per second throughput on a single key-value store server platform.

[BibT_eX]

[DOI]

Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

Exploring Shared-Memory Optimizations for an Unstructured Mesh CFD Application on Modern Parallel Systems.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

2014

Sparsifying Synchronization for High-Performance Shared-Memory Sparse Triangular Solver.

[BibT_eX]

[DOI]

Proceedings of the Supercomputing - 29th International Conference, 2014

Navigating the maze of graph analytics frameworks using massive graph datasets.

[BibT_eX]

[DOI]

Jiwon Seo

Muhammad Amber Hassaan

Shubho Sengupta

Zhaoming Yin

Proceedings of the International Conference on Management of Data, 2014

Pardicle: Parallel Approximate Density-Based Clustering.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2014

Efficient Shared-Memory Implementation of High-Performance Conjugate Gradient Benchmark and its Application to Unstructured Matrices.

[BibT_eX]

[DOI]

Dhiraj D. Kalamkar

Xing Liu

Yutong Lu

Proceedings of the International Conference for High Performance Computing, 2014

Lattice QCD with Domain Decomposition on Intel® Xeon Phi Co-Processors.

[BibT_eX]

[DOI]

Tilo Wettig

Proceedings of the International Conference for High Performance Computing, 2014

Petascale High Order Dynamic Rupture Earthquake Simulations on Heterogeneous Supercomputers.

[BibT_eX]

[DOI]

Alexander Breuer

Sebastian Rettenberger

Proceedings of the International Conference for High Performance Computing, 2014

Improving the energy efficiency of Big Cores.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

Improving Communication Performance and Scalability of Native Applications on Intel Xeon Phi Coprocessor Clusters.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

2013

Intel "big data" science and technology center vision and execution plan.

[BibT_eX]

[DOI]

Michael Stonebraker

Sam Madden

SIGMOD Rec., 2013

Streaming Similarity Search over one Billion Tweets using Parallel Locality-Sensitive Hashing.

[BibT_eX]

[DOI]

Aizana Turmukhametova

Proc. VLDB Endow., 2013

Lattice QCD on Intel® Xeon PhiTM Coprocessors.

[BibT_eX]

[DOI]

Bálint Joó

Dhiraj D. Kalamkar

William A. Watson III

Proceedings of the Supercomputing - 28th International Supercomputing Conference, 2013

Tera-scale 1D FFT with low-communication algorithm and Intel® Xeon Phi™ coprocessors.

[BibT_eX]

[DOI]

Ganesh Bikshandi

Ping Tak Peter Tang

Daehyun Kim

Proceedings of the International Conference for High Performance Computing, 2013

Design and Implementation of the Linpack Benchmark for Single and Multi-node Systems Based on Intel® Xeon Phi Coprocessor.

[BibT_eX]

[DOI]

Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

Efficient sparse matrix-vector multiplication on x86-based many-core processors.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Supercomputing, 2013

2012

Large-scale fluid simulation using velocity-vorticity domain decomposition.

[BibT_eX]

[DOI]

ACM Trans. Graph., 2012

CloudRAMSort: fast and efficient large-scale distributed RAM sort on shared-nothing cluster.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGMOD International Conference on Management of Data, 2012

Optimization of geometric multigrid for emerging multi- and manycore processors.

[BibT_eX]

[DOI]

Proceedings of the SC Conference on High Performance Computing Networking, 2012

Analysis and Optimization of Financial Analytics Benchmark on Modern Multi- and Many-core IA-Based Architectures.

[BibT_eX]

[DOI]

Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Large-scale energy-efficient graph traversal: a path to efficient data-intensive supercomputing.

[BibT_eX]

[DOI]

Proceedings of the SC Conference on High Performance Computing Networking, 2012

Billion-particle SIMD-friendly two-point correlation on large-scale HPC cluster systems.

[BibT_eX]

[DOI]

Proceedings of the SC Conference on High Performance Computing Networking, 2012

GPP-Grep: High-Speed Regular Expression Processing Engine on General Purpose Processors.

[BibT_eX]

[DOI]

Proceedings of the Research in Attacks, Intrusions, and Defenses, 2012

Can traditional programming bridge the Ninja performance gap for parallel computing applications?

[BibT_eX]

[DOI]

Proceedings of the 39th International Symposium on Computer Architecture (ISCA 2012), 2012

High Performance Non-uniform FFT on Modern X86-based Multi-core Systems.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Fast and Efficient Graph Traversal Algorithm for CPUs: Maximizing Single-Node Efficiency.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Emerging Applications.

[BibT_eX]

[DOI]

Fundamentals of Multicore Software Development, 2012

2011

Designing fast architecture-sensitive tree search on modern multicore/many-core processors.

[BibT_eX]

[DOI]

ACM Trans. Database Syst., 2011

PALM: Parallel Architecture-Friendly Latch-Free Modifications to B+ Trees on Many-Core Processors.

[BibT_eX]

[DOI]

Proc. VLDB Endow., 2011

Fast Updates on Read-Optimized Databases Using Multi-Core CPUs.

[BibT_eX]

[DOI]

Proc. VLDB Endow., 2011

High-Performance 3D Compressive Sensing MRI Reconstruction Using Many-Core Architectures.

[BibT_eX]

[DOI]

Int. J. Biomed. Imaging, 2011

Designing and dynamically load balancing hybrid LU for multi/many-core.

[BibT_eX]

[DOI]

Comput. Sci. Res. Dev., 2011

Interactive hybrid simulation of large-scale traffic.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computer Graphics and Interactive Techniques, 2011

PhysBAM: physically based simulation.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computer Graphics and Interactive Techniques, 2011

High-performance lattice QCD for multi-core based parallel systems using a cache-friendly hybrid threaded-MPI approach.

[BibT_eX]

[DOI]

Proceedings of the Conference on High Performance Computing Networking, 2011

2010

Credit cards and inflation.

[BibT_eX]

[DOI]

John Geanakoplos

Games Econ. Behav., 2010

A celebration of Robert Aumann's achievements on the occasion of his 80th birthday.

[BibT_eX]

[DOI]

Eric Maskin

Yair Tauman

Games Econ. Behav., 2010

Grading exams: 100, 99, 98, ... or A, B, C?

[BibT_eX]

[DOI]

John Geanakoplos

Games Econ. Behav., 2010

Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGMOD International Conference on Management of Data, 2010

FAST: fast architecture sensitive tree search on modern CPUs and GPUs.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGMOD International Conference on Management of Data, 2010

PLEdestrians: A Least-Effort Approach to Crowd Simulation.

[BibT_eX]

[DOI]

Proceedings of the 2010 Eurographics/ACM SIGGRAPH Symposium on Computer Animation, 2010

3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs.

[BibT_eX]

[DOI]

Proceedings of the Conference on High Performance Computing Networking, 2010

Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU.

[BibT_eX]

[DOI]

Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010

2009

Mapping High-Fidelity Volume Rendering for Medical Imaging to CPU, GPU and Many-Core Architectures.

[BibT_eX]

[DOI]

IEEE Trans. Vis. Comput. Graph., 2009

Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs.

[BibT_eX]

[DOI]

Proc. VLDB Endow., 2009

Larrabee: A Many-Core x86 Architecture for Visual Computing.

[BibT_eX]

[DOI]

IEEE Micro, 2009

Perfect competition in an oligopoly (including bilateral monopoly).

[BibT_eX]

[DOI]

Dieter Sondermann

Games Econ. Behav., 2009

ClearPath: highly parallel collision avoidance for multi-agent simulation.

[BibT_eX]

[DOI]

Proceedings of the 2009 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, 2009

Interactive Modeling, Simulation and Control of Large-Scale Crowds and Traffic.

[BibT_eX]

[DOI]

Proceedings of the Motion in Games, Second International Workshop, 2009

2008

Efficient implementation of sorting on multi-core SIMD CPU architecture.

[BibT_eX]

[DOI]

Proc. VLDB Endow., 2008

Convergence of Recognition, Mining, and Synthesis Workloads and Its Implications.

[BibT_eX]

[DOI]

Yen-Kuang Chen

Jatin Chhugani

Christopher J. Hughes

Proc. IEEE, 2008

Second Life and the New Generation of Virtual Worlds.

[BibT_eX]

[DOI]

Computer, 2008

2007

Cache-conscious frequent pattern mining on modern and emerging processors.

[BibT_eX]

[DOI]

Amol Ghoting

Gregory Buehrer

Srinivasan Parthasarathy

VLDB J., 2007

Scaling performance of interior-point method on large-scale chip multiprocessor system.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE Conference on High Performance Networking and Computing, 2007

2006

Strategic complements and substitutes, and potential games.

[BibT_eX]

[DOI]

Andriy Zapechelnyuk

Games Econ. Behav., 2006

Competing for Customers in a Social Network: The Quasi-linear Case.

[BibT_eX]

[DOI]

Rahul Garg

Bernard De Meyer

Proceedings of the Internet and Network Economics, Second International Workshop, 2006

Games of Connectivity.

[BibT_eX]

[DOI]

Rahul Garg

Proceedings of the Internet and Network Economics, Second International Workshop, 2006

2005

Compound voting and the Banzhaf index.

[BibT_eX]

[DOI]

Ezra Einy

Games Econ. Behav., 2005

Cache-conscious Frequent Pattern Mining on a Modern Processor.

[BibT_eX]

[DOI]

Amol Ghoting

Gregory Buehrer

Srinivasan Parthasarathy

Proceedings of the 31st International Conference on Very Large Data Bases, Trondheim, Norway, August 30, 2005

A Characterization of Data Mining Workloads on a Modern Processor.

[BibT_eX]

[DOI]

Amol Ghoting

Gregory Buehrer

Srinivasan Parthasarathy

Proceedings of the Workshop on Data Management on New Hardware, 2005

2004

Learning with perfect information.

[BibT_eX]

[DOI]

Games Econ. Behav., 2004

2003

Optimal scrutiny in multi-period promotion tournaments.

[BibT_eX]

[DOI]

Games Econ. Behav., 2003

2000

Compression Tolerant Watermarking for Image Verification.

[BibT_eX]

[DOI]

Proceedings of the 2000 International Conference on Image Processing, 2000

1997

Characterizing vulnerability of parallelism to resource constraints.

[BibT_eX]

[DOI]

V. Vivekanand

K. Gopinath

Proceedings of the Fourth International on High-Performance Computing, 1997

1986

Inefficiency of Nash Equilibria.

[BibT_eX]

[DOI]

Math. Oper. Res., 1986

1984

Totally balanced games arising from controlled programming problems.

[BibT_eX]

[DOI]

Lloyd S. Shapley

Math. Program., 1984

1981

Information Conditions, Communication and General Equilibrium.

[BibT_eX]

[DOI]

Martin Shubik

Math. Oper. Res., 1981

Value Theory Without Efficiency.

[BibT_eX]

[DOI]

Abraham Neyman

Robert James Weber

Math. Oper. Res., 1981

1980

Asymptotic Semivalues and a Short Proof of Kannai's Theorem.

[BibT_eX]

[DOI]

Math. Oper. Res., 1980

1979

Mathematical Properties of the Banzhaf Power Index.

[BibT_eX]

[DOI]