Guido Araujo

Orcid: 0000-0003-4869-5190

Affiliations:
  • University of Campinas (UNICAMP), Institute of Computing, Sao Paulo, Brazil


According to our database1, Guido Araujo authored at least 132 papers between 1994 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Enabling HW-Based Task Scheduling in Large Multicore Architectures.
IEEE Trans. Computers, January, 2024

2023
Advancing Direct Convolution Using Convolution Slicing Optimization and ISA Extensions.
ACM Trans. Archit. Code Optim., December, 2023

Fast matrix multiplication via compiler-only layered data reorganization and intrinsic lowering.
Softw. Pract. Exp., September, 2023

Source Matching and Rewriting for MLIR Using String-Based Automata.
ACM Trans. Archit. Code Optim., June, 2023

MassCCS: A High-Performance Collision Cross-Section Software for Large Macromolecular Assemblies.
J. Chem. Inf. Model., June, 2023

Tensor slicing and optimization for multicore NPUs.
J. Parallel Distributed Comput., May, 2023

On the impact of mode transition on phased transactional memory performance.
J. Parallel Distributed Comput., March, 2023

2022
Using Barrier Elision to Improve Transactional Code Generation.
ACM Trans. Archit. Code Optim., 2022

Special Issue on Compiling for Accelerators.
IEEE Micro, 2022

Source Matching and Rewriting.
CoRR, 2022

Implementing the Broadcast Operation in a Distributed Task-based Runtime.
Proceedings of the International Symposium on Computer Architecture and High Performance Computing Workshops, 2022

An OpenMP-only Linear Algebra Library for Distributed Architectures.
Proceedings of the International Symposium on Computer Architecture and High Performance Computing Workshops, 2022

Ion-Molecule Collision Cross-Section Simulation using Linked-cell and Trajectory Parallelization.
Proceedings of the 2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), 2022

The OpenMP Cluster Programming Model.
Proceedings of the Workshop Proceedings of the 51st International Conference on Parallel Processing, 2022

Improving Convolution via Cache Hierarchy Tiling and Reduced Packing.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2022

2021
KernelFaRer: Replacing Native-Code Idioms with High-Performance Library Calls.
ACM Trans. Archit. Code Optim., 2021

Efficient Tensor Slicing for Multicore NPUs using Memory Burst Modeling.
Proceedings of the 33rd IEEE International Symposium on Computer Architecture and High Performance Computing, 2021

Improving Phased Transactional Memory via Commit Throughput and Capacity Estimation.
Proceedings of the 33rd IEEE International Symposium on Computer Architecture and High Performance Computing, 2021

Pooling Acceleration in the DaVinci Architecture Using Im2col and Col2im Instructions.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2021

Enabling OpenMP Task Parallelism on Multi-FPGAs.
Proceedings of the 29th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2021

Accelerating Graph Applications Using Phased Transactional Memory.
Proceedings of the Euro-Par 2021: Parallel Processing, 2021

2020
Acceleration Opportunities in Linear Algebra Applications via Idiom Recognition.
Proceedings of the Companion of the 2020 ACM/SPEC International Conference on Performance Engineering, 2020

OmpTracing: Easy Profiling of OpenMP Programs.
Proceedings of the 32nd IEEE International Symposium on Computer Architecture and High Performance Computing, 2020

Using OpenMP to Detect and Speculate Dynamic DOALL Loops.
Proceedings of the OpenMP: Portable Multi-Level Parallelism on Modern Systems, 2020

Improving Transactional Code Generation via Variable Annotation and Barrier Elision.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

NV-PhTM: An Efficient Phase-Based Transactional System for Non-volatile Memory.
Proceedings of the Euro-Par 2020: Parallel Processing, 2020

2019
The Case for Phase-Based Transactional Memory.
IEEE Trans. Parallel Distributed Syst., 2019

Data-flow analysis and optimization for data coherence in heterogeneous architectures.
J. Parallel Distributed Comput., 2019

Adding Tightly-Integrated Task Scheduling Acceleration to a RISC-V Multi-core Processor.
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

Circumventing Uniqueness of XOR Arbiter PUFs.
Proceedings of the 22nd Euromicro Conference on Digital System Design, 2019

2018
Using Hardware-Transactional-Memory Support to Implement Thread-Level Speculation.
IEEE Trans. Parallel Distributed Syst., 2018

Cluster Programming using the OpenMP Accelerator Model.
ACM Trans. Archit. Code Optim., 2018

CRPUF: A modeling-resistant delay PUF based on cylindrical reconvergence.
Microprocess. Microsystems, 2018

High performance collision cross section calculation - HPCCS.
J. Comput. Chem., 2018

Automatic Ray-Tracer Cloud Offloading in OPENMP.
Proceedings of the 30th International Symposium on Computer Architecture and High Performance Computing, 2018

DOACROSS Parallelization Based on Component Annotation and Loop-Carried Probability.
Proceedings of the 30th International Symposium on Computer Architecture and High Performance Computing, 2018

Automatic Offloading of Cluster Accelerators.
Proceedings of the 26th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2018

Automatic annotation of tasks in structured code.
Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, 2018

2017
DawnCC: Automatic Annotation for Data Parallelism and Offloading.
ACM Trans. Archit. Code Optim., 2017

Automatic Scan Parallelization in OpenMP.
Proceedings of the 2017 International Symposium on Computer Architecture and High Performance Computing Workshops, 2017

Data Coherence Analysis and Optimization for Heterogeneous Computing.
Proceedings of the 29th International Symposium on Computer Architecture and High Performance Computing, 2017

Compiling and Optimizing OpenMP 4.X Programs to OpenCL and SPIR.
Proceedings of the Scaling OpenMP for Exascale Performance and Portability, 2017

Revisiting phased transactional memory.
Proceedings of the International Conference on Supercomputing, 2017

The Cloud as an OpenMP Offloading Device.
Proceedings of the 46th International Conference on Parallel Processing, 2017

Performance Evaluation of Thread-Level Speculation in Off-the-Shelf Hardware Transactional Memories.
Proceedings of the Euro-Par 2017: Parallel Processing - 23rd International Conference on Parallel and Distributed Computing, Santiago de Compostela, Spain, August 28, 2017

2016
Study of hardware transactional memory characteristics and serialization policies on Haswell.
Parallel Comput., 2016

Parallel Computation for the All-Pairs Suffix-Prefix Problem.
Proceedings of the String Processing and Information Retrieval, 2016

Automatic Insertion of Copy Annotation in Data-Parallel Programs.
Proceedings of the 28th International Symposium on Computer Architecture and High Performance Computing, 2016

Evaluating and Improving Thread-Level Speculation in Hardware Transactional Memories.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Task parallel programming model + hardware acceleration = performance advantage.
Proceedings of the 2016 IEEE Hot Chips 28 Symposium (HCS), 2016

Cylindrical Reconvergence Physical Unclonable Function.
Proceedings of the 2016 Euromicro Conference on Digital System Design, 2016

2015
Guest Editorial: SBAC-PAD 2013.
Int. J. Parallel Program., 2015

Improving the Statistical Variability of Delay-based Physical Unclonable Functions.
Proceedings of the 28th Symposium on Integrated Circuits and Systems Design, 2015

Using Hardware Transactional Memory to Enable Speculative Trace Optimization.
Proceedings of the 2015 International Symposium on Computer Architecture and High Performance Computing Workshops, 2015

Serialization Management for Best-Effort Hardware Transactional Memory.
Proceedings of the 27th International Symposium on Computer Architecture and High Performance Computing, 2015

Performance implications of dynamic memory allocators on transactional memory systems.
Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2015

The Batched DOACROSS loop parallelization algorithm.
Proceedings of the 2015 International Conference on High Performance Computing & Simulation, 2015

Computer security by hardware-intrinsic authentication.
Proceedings of the 2015 International Conference on Hardware/Software Codesign and System Synthesis, 2015

2014
Microcode Compression Using Structured-Constrained Clustering.
Int. J. Parallel Program., 2014

Cloud-based OpenMP Parallelization Using a MapReduce Runtime.
Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing, 2014

Multi-dimensional Evaluation of Haswell's Transactional Memory Performance.
Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing, 2014

Loop-Carried Dependence Verification in OpenMP.
Proceedings of the Using and Improving OpenMP for Devices, Tasks, and More, 2014

Measuring Effective Work to Reward Success in Dynamic Transaction Scheduling.
Proceedings of the 43rd International Conference on Parallel Processing, 2014

Wear-out analysis of Error Correction Techniques in Phase-Change Memory.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014

2013
Extending decoupled software pipeline to parallelize Java programs.
Softw. Pract. Exp., 2013

Transaction Scheduling Using Dynamic Conflict Avoidance.
Int. J. Parallel Program., 2013

Modeling virtual machines misprediction overhead.
Proceedings of the IEEE International Symposium on Workload Characterization, 2013

Transaction scheduling using conflict avoidance and Contention Intensity.
Proceedings of the 20th Annual International Conference on High Performance Computing, 2013

Cache-based cross-iteration coherence for speculative parallelization.
Proceedings of the 20th Annual International Conference on High Performance Computing, 2013

2012
Data center power and performance optimization through global selection of P-states and utilization rates.
Sustain. Comput. Informatics Syst., 2012

Computational reflection and its application to platform verification.
Des. Autom. Embed. Syst., 2012

Exploring Dynamic Program Behavior with Frames and Phases.
Proceedings of the 13th Symposium on Computer Systems, 2012

2011
Structure-Constrained Microcode Compression.
Proceedings of the 23rd International Symposium on Computer Architecture and High Performance Computing, 2011

LUTS: A Lightweight User-Level Transaction Scheduler.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2011

2010
ISAMAP: Instruction Mapping Driven by Dynamic Binary Translation.
Proceedings of the Computer Architecture, 2010

Trace Execution Automata in Dynamic Binary Translation.
Proceedings of the Computer Architecture, 2010

Reducing False Aborts in STM Systems.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2010

T-DRE: a hardware trusted computing base for direct recording electronic vote machines.
Proceedings of the Twenty-Sixth Annual Computer Security Applications Conference, 2010

2009
A Multi-Model Engine for High-Level Power Estimation Accuracy Optimization.
IEEE Trans. Very Large Scale Integr. Syst., 2009

Characterizing the Energy Consumption of Software Transactional Memory.
IEEE Comput. Archit. Lett., 2009

On the energy-efficiency of software transactional memory.
Proceedings of the 22st Annual Symposium on Integrated Circuits and Systems Design: Chip on the Dunes, 2009

2008
Instruction Scheduling Based on Subgraph Isomorphism for a High Performance Computer Processor.
J. Univers. Comput. Sci., 2008

2007
A Custom Instruction Approach for Hardware and Software Implementations of Finite Field Arithmetic over F<sub>2<sup>163</sup></sub> using Gaussian Normal Bases.
J. VLSI Signal Process., 2007

A Flexible Platform Framework for Rapid Transactional Memory Systems Prototyping and Evaluation.
Proceedings of the 18th IEEE International Workshop on Rapid System Prototyping (RSP 2007), 2007

A Methodology and Toolset to Enable SystemC and VHDL Co-simulation.
Proceedings of the 2007 IEEE Computer Society Annual Symposium on VLSI (ISVLSI 2007), 2007

On the Limitations of Power Macromodeling Techniques.
Proceedings of the 2007 IEEE Computer Society Annual Symposium on VLSI (ISVLSI 2007), 2007

A multi-model power estimation engine for accuracy optimization.
Proceedings of the 2007 International Symposium on Low Power Electronics and Design, 2007

The Image Forest Transform Architecture.
Proceedings of the 2007 International Conference on Field-Programmable Technology, 2007

A computational reflection mechanism to support platform debugging in SystemC.
Proceedings of the 5th International Conference on Hardware/Software Codesign and System Synthesis, 2007

2006
Offset assignment using simultaneous variable coalescing.
ACM Trans. Embed. Comput. Syst., 2006

Exploiting dynamic reconfiguration techniques: the 2D-VLIW approach.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Clustering-Based Microcode Compression.
Proceedings of the 24th International Conference on Computer Design (ICCD 2006), 2006

Software-Based Transparent and Comprehensive Control-Flow Error Detection.
Proceedings of the Fourth IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2006), 2006

2D-VLIW: An Architecture Based on the Geometry of Computation.
Proceedings of the 2006 IEEE International Conference on Application-Specific Systems, 2006

2005
Efficient datapath merging for partially reconfigurable architectures.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2005

Dynamic binary control-flow errors detection.
SIGARCH Comput. Archit. News, 2005

The datapath merging problem in reconfigurable systems: Complexity, dual bounds and heuristic evaluation.
ACM J. Exp. Algorithmics, 2005

The ArchC Architecture Description Language and Tools.
Int. J. Parallel Program., 2005

Platform designer: An approach for modeling multiprocessor platforms based on SystemC.
Des. Autom. Embed. Syst., 2005

A SystemC-only design methodology and the CINE-IP multimedia platform.
Des. Autom. Embed. Syst., 2005

Design of a decompressor engine on a SPARC processor.
Proceedings of the 18th Annual Symposium on Integrated Circuits and Systems Design, 2005

High-Level Switching Activity Prediction Through Sampled Monitored Simulation.
Proceedings of the 2005 International Symposium on System-on-Chip, 2005

A custom instruction approach for hardware and software implementations of finite field arithmetic over F<sub>2<sup>63</sup></sub> using Gaussian normal bases.
Proceedings of the 2005 IEEE International Conference on Field-Programmable Technology, 2005

Processor Centric Specification and Modelling of MPSoCs.
Proceedings of the Forum on specification and Design Languages, 2005

2004
The design of dynamically reconfigurable datapath coprocessors.
ACM Trans. Embed. Comput. Syst., 2004

The Datapath Merging Problem in Reconfigurable Systems: Lower Bounds and Heuristic Evaluation.
Proceedings of the Experimental and Efficient Algorithms, Third International Workshop, 2004

Teaching computer architecture using an architecture description language.
Proceedings of the 2004 workshop on Computer architecture education, 2004

An automatic testbench generation tool for a SystemC functional verification methodology.
Proceedings of the 17th Annual Symposium on Integrated Circuits and Systems Design, 2004

ArchC: A SystemC-Based Architecture Description Language.
Proceedings of the 16th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2004), 2004

Multi-Profile Instruction Based Compression.
Proceedings of the 16th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2004), 2004

Optimizations for Compiled Simulation Using Instruction Type Information.
Proceedings of the 16th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2004), 2004

Fast instruction set custornization.
Proceedings of the 2nd Workshop on Embedded Systems for Real-Time Multimedia, 2004

Modeling and Simulating Memory Hierarchies in a Platform-Based Design Methodology.
Proceedings of the 2004 Design, 2004

Multi-profile based code compression.
Proceedings of the 41th Design Automation Conference, 2004

2003
Address register allocation for arrays in loops of embedded programs.
Microelectron. J., 2003

Improving Offset Assignment through Simultaneous Variable Coalescing.
Proceedings of the Software and Compilers for Embedded Systems, 7th International Workshop, 2003

Exploring Memory Hierarchy with ArchC.
Proceedings of the 15th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2003), 2003

Mixed static/dynamic profiling for dictionary based code compression.
Proceedings of the 2003 International Symposium on System-on-Chip, 2003

2002
Global array reference allocation.
ACM Trans. Design Autom. Electr. Syst., 2002

Datapath Merging and Interconnection Sharing for Reconfigurable Architectures.
Proceedings of the 15th International Symposium on System Synthesis (ISSS 2002), 2002

2001
A retargetable VLIW compiler framework for DSPs withinstruction-level parallelism.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2001

Optimal Live Range Merge for Address Register Allocation in Embedded Programs.
Proceedings of the Compiler Construction, 10th International Conference, 2001

Tailoring pipeline bypassing and functional unit mapping to application in clustered VLIW architectures.
Proceedings of the 2001 International Conference on Compilers, 2001

2000
Expression-tree-based algorithms for code compression on embedded RISC architectures.
IEEE Trans. Very Large Scale Integr. Syst., 2000

Array Reference Allocation Using SSA-Form and Live Range Growth.
Proceedings of the Languages, 2000

1999
Compressed Code Execution on DSP Architectures.
Proceedings of the 12th International Symposium on System Synthesis, 1999

1998
Code generation for fixed-point DSPs.
ACM Trans. Design Autom. Electr. Syst., 1998

Code Compression Based on Operand Factorization.
Proceedings of the 31st Annual IEEE/ACM International Symposium on Microarchitecture, 1998

1996
Instruction Set Design and Optimizations for Address Computation in DSP Architectures.
Proceedings of the 9th International Symposium on System Synthesis, 1996

Using Register-Transfer Paths in Code Generation for Heterogeneous Memory-Register Architectures.
Proceedings of the 33st Conference on Design Automation, 1996

1995
Optimal code generation for embedded memory non-homogeneous register architectures.
Proceedings of the 8th International Symposium on System Synthesis (ISSS 1995), 1995

1994
Challenges in code generation for embedded processors.
Proceedings of the Code Generation for Embedded Processors [Dagstuhl Workshop, Dagstuhl, Germany, August 31, 1994


  Loading...