Marcelo Cintra

According to our database1, Marcelo Cintra authored at least 43 papers between 2006 and 2020.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2020
PIUMA: Programmable Integrated Unified Memory Architecture.
CoRR, 2020

2018
DHTM: Durable Hardware Transactional Memory.
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

2017
ATOM: Atomic Durability in Non-volatile Memory through Hardware Logging.
Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

2016
Fence Placement for Legacy Data-Race-Free Programs via Synchronization Read Detection.
ACM Trans. Archit. Code Optim., 2016

2015
Adaptive Selection of Cache Indexing Bits for Removing Conflict Misses.
IEEE Trans. Computers, 2015

REWIND: Recovery Write-Ahead System for In-Memory Non-Volatile Data-Structures.
Proc. VLDB Endow., 2015

Understanding the Effects of Data Corruption on Application Behavior Based on Data Characteristics.
Proceedings of the Computer Safety, Reliability, and Security, 2015

Efficient persist barriers for multicores.
Proceedings of the 48th International Symposium on Microarchitecture, 2015

2014
Automatic Skeleton-Driven Memory Affinity for Transactional Worklist Applications.
Int. J. Parallel Program., 2014

Static Approximation of MPI Communication Graphs for Optimized Process Placement.
Proceedings of the Languages and Compilers for Parallel Computing, 2014

2013
Characterizing the impact of process variation on write endurance enhancing techniques for non-volatile memory systems.
Proceedings of the ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems, 2013

LUCAS: latency-adaptive unified cluster assignment and instruction scheduling.
Proceedings of the SIGPLAN/SIGBED Conference on Languages, 2013

Aligned Scheduling: Cache-Efficient Instruction Scheduling for VLIW Processors.
Proceedings of the Languages and Compilers for Parallel Computing, 2013

DRIFT: Decoupled CompileR-Based Instruction-Level Fault-Tolerance.
Proceedings of the Languages and Compilers for Parallel Computing, 2013

CASTED: Core-Adaptive Software Transient Error Detection for Tightly Coupled Cores.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

Acceldroid: Co-designed acceleration of Android bytecode.
Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, 2013

CAeSaR: Unified cluster-assignment scheduling and communication reuse for clustered VLIW processors.
Proceedings of the International Conference on Compilers, 2013

2012
Autotuning Skeleton-Driven Optimizations for Transactional Worklist Applications.
IEEE Trans. Parallel Distributed Syst., 2012

Mixed speculative multithreaded execution models.
ACM Trans. Archit. Code Optim., 2012

UCIFF: Unified Cluster Assignment Instruction Scheduling and Fast Frequency Selection for Heterogeneous Clustered VLIW Cores.
Proceedings of the Languages and Compilers for Parallel Computing, 2012

ASCIB: adaptive selection of cache indexing bits for removing conflict misses.
Proceedings of the International Symposium on Low Power Electronics and Design, 2012

SuperCoP: a general, correct, and performance-efficient supervised memory system.
Proceedings of the Computing Frontiers Conference, CF'12, 2012

2011
Software-Based Cache Coherence with Hardware-Assisted Selective Self-Invalidations Using Bloom Filters.
IEEE Trans. Computers, 2011

An Evaluation of an OS-Based Coherence Scheme for Tiled CMPs.
Int. J. Parallel Program., 2011

Complementing user-level coarse-grain parallelism with implicit speculative parallelism.
Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

Increasing the energy efficiency of TLS systems using intermediate checkpointing.
Proceedings of the 18th International Conference on High Performance Computing, 2011

A machine learning-based approach for thread mapping on transactional memory applications.
Proceedings of the 18th International Conference on High Performance Computing, 2011

Phase-Based Application-Driven Hierarchical Power Management on the Single-chip Cloud Computer.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

2010
Modeling Multithreaded Query Execution on Chip Multiprocessors.
Proceedings of the International Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures, 2010

Improving compiler-runtime separation with XIR.
Proceedings of the 6th International Conference on Virtual Execution Environments, 2010

Profitability-based power allocation for speculative multithreaded systems.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Toward a more accurate understanding of the limits of the TLS execution paradigm.
Proceedings of the 2010 IEEE International Symposium on Workload Characterization, 2010

Generating code for holistic query evaluation.
Proceedings of the 26th International Conference on Data Engineering, 2010

Handling branches in TLS systems with Multi-Path Execution.
Proceedings of the 16th International Conference on High-Performance Computer Architecture (HPCA-16 2010), 2010

Compiler-Directed Performance Model Construction for Parallel Programs.
Proceedings of the Architecture of Computing Systems, 2010

2009
Stream chaining: exploiting multiple levels of correlation in data prefetching.
Proceedings of the 36th International Symposium on Computer Architecture (ISCA 2009), 2009

Combining thread level speculation helper threads and runahead execution.
Proceedings of the 23rd international conference on Supercomputing, 2009

Distance-aware round-robin mapping for large NUCA caches.
Proceedings of the 16th International Conference on High Performance Computing, 2009

2008
An OS-based alternative to full hardware coherence on tiled CMPs.
Proceedings of the 14th International Conference on High-Performance Computer Architecture (HPCA-14 2008), 2008

A Generic Tool Supporting Cache Designs and Optimisation on Shared Memory Systems.
Proceedings of the 9th Workshop on Parallel Systems and Algorithms (PASA) held at the 21st Conference on the Architecture of Computing Systems (ARCS), 2008

2007
Introduction to Part 2.
Trans. High Perform. Embed. Archit. Compil., 2007

Using PredictiveModeling for Cross-Program Design Space Exploration in Multicore Systems.
Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007

2006
Quantifying Uncertainty in Points-To Relations.
Proceedings of the Languages and Compilers for Parallel Computing, 2006


  Loading...