Oreste Villa

Stephen W. Keckler

Proceedings of the 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2021

2020

HMG: Extending Cache Coherence Protocols Across Modern Hierarchical Multi-GPU Systems.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020

2019

Special Issue on: Systems for Learning, Inferencing, and Discovering (SLID).

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2019

NVBit: A Dynamic Binary Instrumentation Framework for NVIDIA GPUs.

[BibT_eX]

[DOI]

Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

2018

Combining HW/SW Mechanisms to Improve NUMA Performance of Multi-GPU Systems.

[BibT_eX]

[DOI]

Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

2017

Exploring Efficient Hardware Support for Applications with Irregular Memory Patterns on Multinode Manycore Architectures.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2017

Beyond the socket: NUMA-aware GPUs.

[BibT_eX]

[DOI]

Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

MCM-GPU: Multi-Chip-Module GPUs for Continued Performance Scalability.

[BibT_eX]

[DOI]

Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

2016

Special Issue on Theory and Practice of Irregular Applications (TaPIA).

[BibT_eX]

[DOI]

Parallel Comput., 2016

2015

Designing Efficient Heterogeneous Memory Architectures.

[BibT_eX]

[DOI]

IEEE Micro, 2015

Special Issue on Architectures and Algorithms for Irregular Applications (AAIA) - Guest editors' introduction.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2015

In-Memory Graph Databases for Web-Scale Data.

[BibT_eX]

[DOI]

Computer, 2015

High-Performance, Distributed Dictionary Encoding of RDF Datasets.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

GEMS: Graph Database Engine for Multithreaded Systems.

[BibT_eX]

[DOI]

Jesse Weaver

Gregory Todd Williams

David J. Haglin

Proceedings of the Big Data - Algorithms, Analytics, and Applications., 2015

2014

Toward a data scalable solution for facilitating discovery of science resources.

[BibT_eX]

[DOI]

Jesse Weaver

Parallel Comput., 2014

Scaling Semantic Graph Databases in Size and Performance.

[BibT_eX]

[DOI]

IEEE Micro, 2014

Scaling the Power Wall: A Path to Exascale.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2014

Scaling Irregular Applications through Data Aggregation and Software Multithreading.

[BibT_eX]

[DOI]

Daniel G. Chavarría-Miranda

Mateo Valero

Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

A Flexible CUDA LU-Based Solver for Small, Batched Linear Systems.

[BibT_eX]

[DOI]

Nitin Gawande

Proceedings of the Numerical Computations with GPUs, 2014

2013

Optimizing tensor contraction expressions for hybrid CPU-GPU execution.

[BibT_eX]

[DOI]

Wenjing Ma

Karol Kowalski

Gagan Agrawal

Clust. Comput., 2013

Composing Data Parallel Code for a SPARQL Graph Engine.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Social Computing, SocialCom 2013, 2013

Toward a data scalable solution for facilitating discovery of scientific data resources.

[BibT_eX]

[DOI]

Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems, 2013

Exploiting points-to maps for de-/serialization code generation.

[BibT_eX]

[DOI]

Selim Ciraci

Proceedings of the 28th Annual ACM Symposium on Applied Computing, 2013

YAPPA: A compiler-based parallelization framework for irregular applications on MPSoCs.

[BibT_eX]

[DOI]

Proceedings of the 24th IEEE International Symposium on Rapid System Prototyping, 2013

Prototyping hardware support for irregular applications.

[BibT_eX]

[DOI]

Proceedings of the 2013 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, 2013

Exploring manycore multinode systems for irregular applications with FPGA prototyping.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE Hot Chips 25 Symposium (HCS), 2013

Power/Performance Trade-Offs of Small Batched LU Based Solvers on GPUs.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2013 Parallel Processing, 2013

Accelerating subsurface transport simulation on heterogeneous clusters.

[BibT_eX]

[DOI]

Nitin Gawande

Proceedings of the 2013 IEEE International Conference on Cluster Computing, 2013

Accelerating semantic graph databases on commodity clusters.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE International Conference on Big Data (IEEE BigData 2013), 2013

Exploring hardware support for scaling irregular applications on multi-node multi-core architectures.

[BibT_eX]

[DOI]

Proceedings of the 24th International Conference on Application-Specific Systems, 2013

2012

Fast and Accurate Simulation of the Cray XMT Multithreaded Supercomputer.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2012

Aho-Corasick String Matching on Shared and Distributed-Memory Parallel Architectures.

[BibT_eX]

[DOI]

Daniel G. Chavarría-Miranda

IEEE Trans. Parallel Distributed Syst., 2012

Approximate weighted matching on emerging manycore and multithreaded architectures.

[BibT_eX]

[DOI]

Mahantesh Halappanavar

Int. J. High Perform. Comput. Appl., 2012

Designing Next-Generation Massively Multithreaded Architectures for Irregular Applications.

[BibT_eX]

[DOI]

Computer, 2012

A High Performance Computing Network and System Simulator for the Power Grid: NGNS^2.

[BibT_eX]

[DOI]

Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Efficient Sorting on the Tilera Manycore Architecture.

[BibT_eX]

[DOI]

Proceedings of the IEEE 24th International Symposium on Computer Architecture and High Performance Computing, 2012

A Bandwidth-Optimized Multi-core Architecture for Irregular Applications.

[BibT_eX]

[DOI]

Proceedings of the 12th IEEE/ACM International Symposium on Cluster, 2012

2011

Towards efficient execution of irregular applications: panel outline.

[BibT_eX]

[DOI]

Proceedings of the first workshop on Irregular applications: architectures and algorithm, 2011

Irregular applications: architectures & algorithms.

[BibT_eX]

[DOI]

Proceedings of the first workshop on Irregular applications: architectures and algorithm, 2011

Exploring Fine-Grained Task-Based Execution on Multi-GPU Systems.

[BibT_eX]

[DOI]

Long Chen

Guang R. Gao

Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER), 2011

Contention Modeling for Multithreaded Distributed Shared Memory Machines: The Cray XMT.

[BibT_eX]

[DOI]

Proceedings of the 11th IEEE/ACM International Symposium on Cluster, 2011

Experiences with String Matching on the Fermi Architecture.

[BibT_eX]

[DOI]

Proceedings of the Architecture of Computing Systems - ARCS 2011, 2011

2010

Applications in Data-Intensive Computing.

[BibT_eX]

[DOI]

Adv. Comput., 2010

Accelerating DNA analysis applications on GPU clusters.

[BibT_eX]

[DOI]

Proceedings of the IEEE 8th Symposium on Application Specific Processors, 2010

High performance Molecular Dynamic simulation on single and multi-GPU systems.

[BibT_eX]

[DOI]

Long Chen

Proceedings of the International Symposium on Circuits and Systems (ISCAS 2010), May 30, 2010

Dynamic load balancing on single- and multi-GPU systems.

[BibT_eX]

[DOI]

Long Chen

Guang R. Gao

Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Acceleration of Streamed Tensor Contraction Expressions on GPGPU-Based Clusters.

[BibT_eX]

[DOI]

Wenjing Ma

Karol Kowalski

Proceedings of the 2010 IEEE International Conference on Cluster Computing, 2010

Efficient pattern matching on GPUs for intrusion detection systems.

[BibT_eX]

[DOI]

Donatella Sciuto

Proceedings of the 7th Conference on Computing Frontiers, 2010

2009

Input-independent, scalable and fast string matching on the Cray XMT.

[BibT_eX]

[DOI]

Daniel G. Chavarría-Miranda

Kristyn J. Maschhoff

Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Scalable transparent checkpoint-restart of global address space applications on virtual machines over infiniband.

[BibT_eX]

[DOI]

Jarek Nieplocha

David M. Brown Jr.

Proceedings of the 6th Conference on Computing Frontiers, 2009

2008

Efficient Breadth-First Search on the Cell/BE Processor.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2008

Accelerating Real-Time String Searching with Multicore Processors.

[BibT_eX]

[DOI]

Computer, 2008

High-speed string searching against large dictionaries on the Cell/B.E. Processor.

[BibT_eX]

[DOI]

Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

A Modular Approach to Model Heterogeneous MPSoC at Cycle Level.

[BibT_eX]

[DOI]

Proceedings of the 11th Euromicro Conference on Digital System Design: Architectures, 2008

Exact multi-pattern string matching on the cell/b.e. processor.

[BibT_eX]

[DOI]

Proceedings of the 5th Conference on Computing Frontiers, 2008

Efficiency and scalability of barrier synchronization on NoC based many-core architectures.

[BibT_eX]

[DOI]

Gianluca Palermo

Cristina Silvano

Proceedings of the 2008 International Conference on Compilers, 2008

2007

Challenges in Mapping Graph Exploration Algorithms on Advanced Multi-core Processors.

[BibT_eX]

[DOI]

Juan Fernández Peinador

Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Peak-Performance DFA-based String Matching on the Cell Processor.

[BibT_eX]

[DOI]

Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Transparent system-level migration of PGAS applications using Xen on InfiniBand.

[BibT_eX]

[DOI]