Oreste Villa

According to our database1, Oreste Villa authored at least 67 papers between 2005 and 2023.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2023
FinePack: Transparently Improving the Efficiency of Fine-Grained Transfers in Multi-GPU Systems.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

Parsimony: Enabling SIMD/Vector Programming in Standard Compiler Flows.
Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization, 2023

2021
Need for Speed: Experiences Building a Trustworthy System-Level GPU Simulator.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

NVBitFI: Dynamic Fault Injection for GPUs.
Proceedings of the 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2021

2020
HMG: Extending Cache Coherence Protocols Across Modern Hierarchical Multi-GPU Systems.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020

2019
Special Issue on: Systems for Learning, Inferencing, and Discovering (SLID).
J. Parallel Distributed Comput., 2019

NVBit: A Dynamic Binary Instrumentation Framework for NVIDIA GPUs.
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

2018
Combining HW/SW Mechanisms to Improve NUMA Performance of Multi-GPU Systems.
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

2017
Exploring Efficient Hardware Support for Applications with Irregular Memory Patterns on Multinode Manycore Architectures.
IEEE Trans. Parallel Distributed Syst., 2017

Beyond the socket: NUMA-aware GPUs.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

MCM-GPU: Multi-Chip-Module GPUs for Continued Performance Scalability.
Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

2016
Special Issue on Theory and Practice of Irregular Applications (TaPIA).
Parallel Comput., 2016

2015
Designing Efficient Heterogeneous Memory Architectures.
IEEE Micro, 2015

Special Issue on Architectures and Algorithms for Irregular Applications (AAIA) - Guest editors' introduction.
J. Parallel Distributed Comput., 2015

In-Memory Graph Databases for Web-Scale Data.
Computer, 2015

High-Performance, Distributed Dictionary Encoding of RDF Datasets.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

GEMS: Graph Database Engine for Multithreaded Systems.
Proceedings of the Big Data - Algorithms, Analytics, and Applications., 2015

2014
Toward a data scalable solution for facilitating discovery of science resources.
Parallel Comput., 2014

Scaling Semantic Graph Databases in Size and Performance.
IEEE Micro, 2014

Scaling the Power Wall: A Path to Exascale.
Proceedings of the International Conference for High Performance Computing, 2014

Scaling Irregular Applications through Data Aggregation and Software Multithreading.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

A Flexible CUDA LU-Based Solver for Small, Batched Linear Systems.
Proceedings of the Numerical Computations with GPUs, 2014

2013
Optimizing tensor contraction expressions for hybrid CPU-GPU execution.
Clust. Comput., 2013

Composing Data Parallel Code for a SPARQL Graph Engine.
Proceedings of the International Conference on Social Computing, SocialCom 2013, 2013

Toward a data scalable solution for facilitating discovery of scientific data resources.
Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems, 2013

Exploiting points-to maps for de-/serialization code generation.
Proceedings of the 28th Annual ACM Symposium on Applied Computing, 2013

YAPPA: A compiler-based parallelization framework for irregular applications on MPSoCs.
Proceedings of the 24th IEEE International Symposium on Rapid System Prototyping, 2013

Prototyping hardware support for irregular applications.
Proceedings of the 2013 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, 2013

Exploring manycore multinode systems for irregular applications with FPGA prototyping.
Proceedings of the 2013 IEEE Hot Chips 25 Symposium (HCS), 2013

Power/Performance Trade-Offs of Small Batched LU Based Solvers on GPUs.
Proceedings of the Euro-Par 2013 Parallel Processing, 2013

Accelerating subsurface transport simulation on heterogeneous clusters.
Proceedings of the 2013 IEEE International Conference on Cluster Computing, 2013

Accelerating semantic graph databases on commodity clusters.
Proceedings of the 2013 IEEE International Conference on Big Data (IEEE BigData 2013), 2013

Exploring hardware support for scaling irregular applications on multi-node multi-core architectures.
Proceedings of the 24th International Conference on Application-Specific Systems, 2013

2012
Fast and Accurate Simulation of the Cray XMT Multithreaded Supercomputer.
IEEE Trans. Parallel Distributed Syst., 2012

Aho-Corasick String Matching on Shared and Distributed-Memory Parallel Architectures.
IEEE Trans. Parallel Distributed Syst., 2012

Approximate weighted matching on emerging manycore and multithreaded architectures.
Int. J. High Perform. Comput. Appl., 2012

Designing Next-Generation Massively Multithreaded Architectures for Irregular Applications.
Computer, 2012

A High Performance Computing Network and System Simulator for the Power Grid: NGNS^2.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Efficient Sorting on the Tilera Manycore Architecture.
Proceedings of the IEEE 24th International Symposium on Computer Architecture and High Performance Computing, 2012

A Bandwidth-Optimized Multi-core Architecture for Irregular Applications.
Proceedings of the 12th IEEE/ACM International Symposium on Cluster, 2012

2011
Towards efficient execution of irregular applications: panel outline.
Proceedings of the first workshop on Irregular applications: architectures and algorithm, 2011

Irregular applications: architectures & algorithms.
Proceedings of the first workshop on Irregular applications: architectures and algorithm, 2011

Exploring Fine-Grained Task-Based Execution on Multi-GPU Systems.
Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER), 2011

Contention Modeling for Multithreaded Distributed Shared Memory Machines: The Cray XMT.
Proceedings of the 11th IEEE/ACM International Symposium on Cluster, 2011

Experiences with String Matching on the Fermi Architecture.
Proceedings of the Architecture of Computing Systems - ARCS 2011, 2011

2010
Applications in Data-Intensive Computing.
Adv. Comput., 2010

Accelerating DNA analysis applications on GPU clusters.
Proceedings of the IEEE 8th Symposium on Application Specific Processors, 2010

High performance Molecular Dynamic simulation on single and multi-GPU systems.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2010), May 30, 2010

Dynamic load balancing on single- and multi-GPU systems.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Acceleration of Streamed Tensor Contraction Expressions on GPGPU-Based Clusters.
Proceedings of the 2010 IEEE International Conference on Cluster Computing, 2010

Efficient pattern matching on GPUs for intrusion detection systems.
Proceedings of the 7th Conference on Computing Frontiers, 2010

2009
Input-independent, scalable and fast string matching on the Cray XMT.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Scalable transparent checkpoint-restart of global address space applications on virtual machines over infiniband.
Proceedings of the 6th Conference on Computing Frontiers, 2009

2008
Efficient Breadth-First Search on the Cell/BE Processor.
IEEE Trans. Parallel Distributed Syst., 2008

Accelerating Real-Time String Searching with Multicore Processors.
Computer, 2008

High-speed string searching against large dictionaries on the Cell/B.E. Processor.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

A Modular Approach to Model Heterogeneous MPSoC at Cycle Level.
Proceedings of the 11th Euromicro Conference on Digital System Design: Architectures, 2008

Exact multi-pattern string matching on the cell/b.e. processor.
Proceedings of the 5th Conference on Computing Frontiers, 2008

Efficiency and scalability of barrier synchronization on NoC based many-core architectures.
Proceedings of the 2008 International Conference on Compilers, 2008

2007
Exploration of distributed shared memory architectures for NoC-based multiprocessors.
J. Syst. Archit., 2007

Challenges in Mapping Graph Exploration Algorithms on Advanced Multi-core Processors.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Peak-Performance DFA-based String Matching on the Cell Processor.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Transparent system-level migration of PGAS applications using Xen on InfiniBand.
Proceedings of the 2007 IEEE International Conference on Cluster Computing, 2007

2006
Efficient Synchronization for Embedded On-Chip Multiprocessors.
IEEE Trans. Very Large Scale Integr. Syst., 2006

An efficient synchronization technique for multiprocessor systems on-chip.
SIGARCH Comput. Archit. News, 2006

Power/performance hardware optimization for synchronization intensive applications in MPSoCs.
Proceedings of the Conference on Design, Automation and Test in Europe, 2006

2005
Fast Dynamic Memory Integration in Co-Simulation Frameworks for Multiprocessor System on-Chip.
Proceedings of the 2005 Design, 2005


  Loading...