Adrián Cristal

Orcid: 0000-0003-1277-9296

According to our database1, Adrián Cristal authored at least 205 papers between 2003 and 2023.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2023
Vitruvius+: An Area-Efficient RISC-V Decoupled Vector Coprocessor for High Performance Computing Applications.
ACM Trans. Archit. Code Optim., June, 2023

Mix-GEMM: An efficient HW-SW Architecture for Mixed-Precision Quantized Deep Neural Networks Inference on Edge Devices.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

VAQUERO: A Scratchpad-based Vector Accelerator for Query Processing.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023


2022
MoRS: An Approximate Fault Modeling Framework for Reduced-Voltage SRAMs.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022

Can We Trust Undervolting in FPGA-Based Deep Learning Designs at Harsh Conditions?
IEEE Micro, 2022

Adaptable Register File Organization for Vector Processors.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022


BiSon-e: a lightweight and high-performance accelerator for narrow integer linear algebra computing on the edge.
Proceedings of the ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022, 2022

2021
MoRS: An Approximate Fault Modelling Framework for Reduced-Voltage SRAMs.
CoRR, 2021

VIA: A Smart Scratchpad for Vector Units with Application to Sparse Matrix Computations.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

Towards Zero-Waste Recovery and Zero-Overhead Checkpointing in Ensemble Data Assimilation.
Proceedings of the 28th IEEE International Conference on High Performance Computing, 2021

Understanding Power Consumption and Reliability of High-Bandwidth Memory with Voltage Underscaling.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2021

2020
A RISC-V Simulator and Benchmark Suite for Designing and Evaluating Vector Architectures.
ACM Trans. Archit. Code Optim., 2020

Exceeding Conservative Limits: A Consolidated Analysis on Modern Hardware Margins.
CoRR, 2020

Power and Accuracy of Multi-Layer Perceptrons (MLPs) under Reduced-voltage FPGA BRAMs Operation.
CoRR, 2020

On the Resilience of Deep Learning for Reduced-voltage FPGAs.
Proceedings of the 28th Euromicro International Conference on Parallel, 2020

An Experimental Study of Reduced-Voltage Operation in Modern FPGAs for Neural Network Acceleration.
Proceedings of the 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2020



2019
LEGaTO: Low-Energy, Secure, and Resilient Toolset for Heterogeneous Computing.
CoRR, 2019

TauRieL: Targeting Traveling Salesman Problem with a deep reinforcement learning inspired architecture.
CoRR, 2019

Evaluating Built-In ECC of FPGA On-Chip Memories for the Mitigation of Undervolting Faults.
Proceedings of the 27th Euromicro International Conference on Parallel, 2019

Modern Hardware Margins: CPUs, GPUs, FPGAs Recent System-Level Studies.
Proceedings of the 25th IEEE International Symposium on On-Line Testing and Robust System Design, 2019

A Novel FPGA-Based High Throughput Accelerator For Binary Search Trees.
Proceedings of the 17th International Conference on High Performance Computing & Simulation, 2019

Ground-Truth Prediction to Accelerate Soft-Error Impact Analysis for Iterative Methods.
Proceedings of the 26th IEEE International Conference on High Performance Computing, 2019

2018
Memory Controller for Vector Processor.
J. Signal Process. Syst., 2018

Vector Processing-Aware Advanced Clock-Gating Techniques for Low-Power Fused Multiply-Add.
IEEE Trans. Very Large Scale Integr. Syst., 2018

Exploring the capabilities of support vector machines in detecting silent data corruptions.
Sustain. Comput. Informatics Syst., 2018

A General Guide to Applying Machine Learning to Computer Architecture.
Supercomput. Front. Innov., 2018

EMVS: Embedded Multi Vector-core System.
J. Syst. Archit., 2018

Low-Precision Floating-Point Schemes for Neural Network Training.
CoRR, 2018

On the Resilience of RTL NN Accelerators: Fault Characterization and Mitigation.
Proceedings of the 30th International Symposium on Computer Architecture and High Performance Computing, 2018


Comprehensive Evaluation of Supply Voltage Underscaling in FPGA on-Chip Memories.
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

A Demo of FPGA Aggressive Voltage Downscaling: Power and Reliability Tradeoffs.
Proceedings of the 28th International Conference on Field Programmable Logic and Applications, 2018

Fault Characterization Through FPGA Undervolting.
Proceedings of the 28th International Conference on Field Programmable Logic and Applications, 2018


2017
An Integrated Vector-Scalar Design on an In-Order ARM Core.
ACM Trans. Archit. Code Optim., 2017

Determinism at Standard-Library Level in TM-Based Applications.
Int. J. Parallel Program., 2017

Mth: Codesigned Hardware/Software Support for Fine Grain Threads.
IEEE Comput. Archit. Lett., 2017

A Machine Learning Approach for Performance Prediction and Scheduling on Heterogeneous CPUs.
Proceedings of the 29th International Symposium on Computer Architecture and High Performance Computing, 2017

RETHINK big: European roadmap for hardware anc networking optimizations for big data.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2017

MACORD: Online Adaptive Machine Learning Framework for Silent Error Detection.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

A Deep Learning Mapper (DLM) for Scheduling on Heterogeneous Systems.
Proceedings of the High Performance Computing - 4th Latin American Conference, 2017

2016
Range Translations for Fast Virtual Memory.
IEEE Micro, 2016

Architectural support for efficient message passing on shared memory multi-cores.
J. Parallel Distributed Comput., 2016

Implications of non-volatile memory as primary storage for database management systems.
Proceedings of the International Conference on Embedded Computer Systems: Architectures, 2016

Exploring Energy Reduction in Future Technology Nodes via Voltage Scaling with Application to 10nm.
Proceedings of the 24th Euromicro International Conference on Parallel, 2016

A Fully Parameterizable Low Power Design of Vector Fused Multiply-Add Using Active Clock-Gating Techniques.
Proceedings of the 2016 International Symposium on Low Power Electronics and Design, 2016

Future Vector Microprocessor Extensions for Data Aggregations.
Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

CRC-Based Memory Reliability for Task-Parallel HPC Applications.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Energy-efficient address translation.
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

Energy minimization at all layers of the data center: The ParaDIME project.
Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition, 2016

Towards low-power embedded vector processor.
Proceedings of the ACM International Conference on Computing Frontiers, CF'16, 2016

Spatial Support Vector Regression to Detect Silent Errors in the Exascale Era.
Proceedings of the IEEE/ACM 16th International Symposium on Cluster, 2016

Accelerating Hash-Based Query Processing Operations on FPGAs by a Hash Table Caching Technique.
Proceedings of the High Performance Computing - Third Latin American Conference, 2016

POSTER: An Integrated Vector-Scalar Design on an In-order ARM Core.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

Big Data and HPC Acceleration with Vivado HLS.
Proceedings of the FPGAs for Software Programmers, 2016

2015
TRADE: Precise Dynamic Race Detection for Scalable Transactional Memory Systems.
ACM Trans. Parallel Comput., 2015

DaSH: A benchmark suite for hybrid dataflow and shared memory programming models.
Parallel Comput., 2015

Reimagining Heterogeneous Computing: A Functional Instruction-Set Architecture Computing Model.
IEEE Micro, 2015

Kernel-to-User-Mode Transition-Aware Hardware Scheduling.
IEEE Micro, 2015

ParaDIME: Parallel Distributed Infrastructure for Minimization of Energy for data centers.
Microprocess. Microsystems, 2015

Thread Lock Section-Aware Scheduling on Asymmetric Single-ISA Multi-Core.
IEEE Comput. Archit. Lett., 2015

Chapter One - An Overview of Architecture-Level Power- and Energy-Efficient Design Techniques.
Adv. Comput., 2015

FAcET: Fast and accurate power/energy estimation tool for CPU-GPU platforms at architectural-level.
Proceedings of the 28th IEEE International System-on-Chip Conference, 2015

Performance and Energy Efficient Hardware-Based Scheduler for Symmetric/Asymmetric CMPs.
Proceedings of the 27th International Symposium on Computer Architecture and High Performance Computing, 2015

Tidy Cache: Improving Data Placement in Die-Stacked DRAM Caches.
Proceedings of the 27th International Symposium on Computer Architecture and High Performance Computing, 2015

Imposing coarse-grained reconfiguration to general purpose processors.
Proceedings of the 2015 International Conference on Embedded Computer Systems: Architectures, 2015

NanoCheckpoints: A Task-Based Asynchronous Dataflow Framework for Efficient and Scalable Checkpoint/Restart.
Proceedings of the 23rd Euromicro International Conference on Parallel, 2015

NEMsCAM: A novel CAM cell based on nano-electro-mechanical switch and CMOS for energy efficient TLBs.
Proceedings of the 2015 IEEE/ACM International Symposium on Nanoscale Architectures, 2015

Joint Circuit-System Design Space Exploration of Multiplier Unit Structure for Energy-Efficient Vector Processors.
Proceedings of the 2015 IEEE Computer Society Annual Symposium on VLSI, 2015

JSRAM: A Circuit-Level Technique for Trading-Off Robustness and Capacity in Cache Memories.
Proceedings of the 2015 IEEE Computer Society Annual Symposium on VLSI, 2015

Redundant memory mappings for fast access to large memories.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

DiMP: Architectural Support for Direct Message Passing on Shared Memory Multi-cores.
Proceedings of the 44th International Conference on Parallel Processing, 2015

VPM: Virtual power meter tool for low-power many-core/heterogeneous data center prototypes.
Proceedings of the 33rd IEEE International Conference on Computer Design, 2015

VSR sort: A novel vectorised sorting algorithm & architecture extensions for future microprocessors.
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

Trigeneous Platforms for Energy Efficient Computing of HPC Applications.
Proceedings of the 22nd IEEE International Conference on High Performance Computing, 2015

High Level Synthesis Based Hardware Accelerator Design for Processing SQL Queries.
Proceedings of the 12th FPGAworld Conference 2015, 2015

Accelerating Complete Decision Support Queries Through High-Level Synthesis Technology (Abstract Only).
Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015

Heterogeneous Platform to Accelerate Compute Intensive Applications.
Proceedings of the 23rd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2015

High-Level Debugging and Verification for FPGA-Based Multicore Architectures.
Proceedings of the 23rd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2015

Hardware Round-Robin Scheduler for Single-ISA Asymmetric Multi-core.
Proceedings of the Euro-Par 2015: Parallel Processing, 2015


Verification Tools for Transactional Programs.
Proceedings of the Transactional Memory. Foundations, Algorithms, Tools, and Applications, 2015

An energy efficient hybrid FPGA-GPU based embedded platform to accelerate face recognition application.
Proceedings of the 2015 IEEE Symposium in Low-Power and High-Speed Chips, 2015

Programmer-directed partial redundancy for resilient HPC.
Proceedings of the 12th ACM International Conference on Computing Frontiers, 2015

ViPS: Visual processing system for medical imaging.
Proceedings of the 8th International Conference on Biomedical Engineering and Informatics, 2015

2014
Exploiting Existing Comparators for Fine-Grained Low-Cost Error Detection.
ACM Trans. Archit. Code Optim., 2014

Bit Impact Factor: Towards making fair vulnerability comparison.
Microprocess. Microsystems, 2014

Using Dynamic Runtime Testing for Rapid Development of Architectural Simulators.
Int. J. Parallel Program., 2014

DESSERT: DESign Space ExploRation Tool based on power and energy at System-Level.
Proceedings of the 27th IEEE International System-on-Chip Conference, 2014

Neighbor-cell assisted error correction for MLC NAND flash memories.
Proceedings of the ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems, 2014

DeTrans: Deterministic and Parallel execution of Transactions.
Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing, 2014

A Case Study of Hybrid Dataflow and Shared-Memory Programming Models: Dependency-Based Parallel Game Engine.
Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing, 2014

Dynamic-vector execution on a general purpose EDGE chip multiprocessor.
Proceedings of the XIVth International Conference on Embedded Computer Systems: Architectures, 2014

Dynamic Verification for Hybrid Concurrent Programming Models.
Proceedings of the Runtime Verification - 5th International Conference, 2014

PAMS: Pattern Aware Memory System for embedded systems.
Proceedings of the 2014 International Conference on ReConFigurable Computing and FPGAs, 2014

System-level power estimation tool for embedded processor based platforms.
Proceedings of the 2014 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, 2014

Combining Error Detection and Transactional Memory for Energy-Efficient Computing below Safe Operation Margins.
Proceedings of the 22nd Euromicro International Conference on Parallel, 2014

VPPET: Virtual platform power and energy estimation tool for heterogeneous MPSoC based FPGA platforms.
Proceedings of the 24th International Workshop on Power and Timing Modeling, 2014

System-Level Power and Energy Estimation Methodology for Open Multimedia Applications Platforms.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2014

Physical vs. Physically-Aware Estimation Flow: Case Study of Design Space Exploration of Adders.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2014

PETS: Power and energy estimation tool at system-level.
Proceedings of the Fifteenth International Symposium on Quality Electronic Design, 2014

Exploiting a fast and simple ECC for scaling supply voltage in level-1 caches.
Proceedings of the 2014 IEEE 20th International On-Line Testing Symposium, 2014

Performance analysis of the memory management unit under scale-out workloads.
Proceedings of the 2014 IEEE International Symposium on Workload Characterization, 2014

Evaluation of vectorization potential of Graph500 on Intel's Xeon Phi.
Proceedings of the International Conference on High Performance Computing & Simulation, 2014

Advanced Pattern based Memory Controller for FPGA based HPC applications.
Proceedings of the International Conference on High Performance Computing & Simulation, 2014

AMMC: Advanced Multi-Core Memory Controller.
Proceedings of the 2014 International Conference on Field-Programmable Technology, 2014

MAPC: Memory access pattern based controller.
Proceedings of the 24th International Conference on Field Programmable Logic and Applications, 2014

An empirical evaluation of High-Level Synthesis languages and tools for database acceleration.
Proceedings of the 24th International Conference on Field Programmable Logic and Applications, 2014

Power estimation tool for system on programmable chip based platforms (abstract only).
Proceedings of the 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2014

APMC: advanced pattern based memory controller (abstract only).
Proceedings of the 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2014

T-Rex: a dynamic race detection tool for C/C++ transactional memory applications.
Proceedings of the Ninth Eurosys Conference 2014, 2014

System-level power & energy estimation methodology and optimization techniques for CPU-GPU based mobile platforms.
Proceedings of the 12th IEEE Symposium on Embedded Systems for Real-time Multimedia, 2014

ParaDIME: Parallel Distributed Infrastructure for Minimization of Energy.
Proceedings of the 17th Euromicro Conference on Digital System Design, 2014

EVX: Vector execution on low power EDGE cores.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014

Dynamic transaction coalescing.
Proceedings of the Computing Frontiers Conference, CF'14, 2014

VALib and SimpleVector: tools for rapid initial research on vector architectures.
Proceedings of the Computing Frontiers Conference, CF'14, 2014

DaSH: a benchmark suite for hybrid dataflow and shared memory programming models: with comparative evaluation of three hybrid dataflow models.
Proceedings of the Computing Frontiers Conference, CF'14, 2014

Flexicache: Highly Reliable and Low Power Cache under Supply Voltage Scaling.
Proceedings of the High Performance Computing - First HPCLATAM, 2014

PVMC: Programmable Vector Memory Controller.
Proceedings of the IEEE 25th International Conference on Application-Specific Systems, 2014

Stand-Alone Memory Controller for Graphics System.
Proceedings of the Reconfigurable Computing: Architectures, Tools, and Applications, 2014

2013
Profile-guided transaction coalescing - lowering transactional overheads by merging transactions.
ACM Trans. Archit. Code Optim., 2013

Techniques to improve performance in requester-wins hardware transactional memory.
ACM Trans. Archit. Code Optim., 2013

On the selection of adder unit in energy efficient vector processing.
Proceedings of the International Symposium on Quality Electronic Design, 2013

TM-dietlibc: A TM-aware Real-World System Library.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

EcoTM: Conflict-Aware Economical Unbounded Hardware Transactional Memory.
Proceedings of the International Conference on Computational Science, 2013

HARP: Adaptive abort recurrence prediction for Hardware Transactional Memory.
Proceedings of the 20th Annual International Conference on High Performance Computing, 2013

Circuit design of a novel adaptable and reliable L1 data cache.
Proceedings of the Great Lakes Symposium on VLSI 2013 (part of ECRC), 2013

FaulTM: error detection and recovery using hardware transactional memory.
Proceedings of the Design, Automation and Test in Europe, 2013

Improving the energy efficiency of hardware-assisted watchpoint systems.
Proceedings of the 50th Annual Design Automation Conference 2013, 2013

Fault tolerance for multi-threaded applications by leveraging hardware transactional memory.
Proceedings of the Computing Frontiers Conference, 2013

2012
Hardware transactional memory with software-defined conflicts.
ACM Trans. Archit. Code Optim., 2012

Resource-bounded multicore emulation using Beefarm.
Microprocess. Microsystems, 2012

Circuit design of a dual-versioning L1 data cache.
Integr., 2012

Profiling and Optimizing Transactional Memory Applications.
Int. J. Parallel Program., 2012

Integrating Dataflow Abstractions into the Shared Memory Model.
Proceedings of the IEEE 24th International Symposium on Computer Architecture and High Performance Computing, 2012

PaRV: Parallelizing Runtime Detection and Prevention of Concurrency Errors.
Proceedings of the Runtime Verification, Third International Conference, 2012

Novel SRAM bias control circuits for a low power L1 data cache.
Proceedings of the NORCHIP 2012, Copenhagen, Denmark, November 12-13, 2012, 2012

Vector Extensions for Decision Support DBMS Acceleration.
Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012

Enhancing the performance of assisted execution runtime systems through hardware/software techniques.
Proceedings of the International Conference on Supercomputing, 2012

Flash correct-and-refresh: Retention-aware error management for increased flash memory lifetime.
Proceedings of the 30th International IEEE Conference on Computer Design, 2012

A Low-Overhead Profiling and Visualization Framework for Hybrid Transactional Memory.
Proceedings of the 2012 IEEE 20th Annual International Symposium on Field-Programmable Custom Computing Machines, 2012

TagTM - accelerating STMs with hardware tags for fast meta-data access.
Proceedings of the 2012 Design, Automation & Test in Europe Conference & Exhibition, 2012

Transactional prefetching: narrowing the window of contention in hardware transactional memory.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

Supporting stateful tasks in a dataflow graph.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011
RMS-TM: a comprehensive benchmark suite for transactional memory systems (abstracts only).
SIGMETRICS Perform. Evaluation Rev., 2011

Hybrid Transactional Memory with Pessimistic Concurrency Control.
Int. J. Parallel Program., 2011

RMS-TM: a comprehensive benchmark suite for transactional memory systems.
Proceedings of the ICPE'11, 2011

Rapid Development of Error-Free Architectural Simulators Using Dynamic Runtime Testing.
Proceedings of the 23rd International Symposium on Computer Architecture and High Performance Computing, 2011

FIMSIM: A fault injection infrastructure for microarchitectural simulators.
Proceedings of the IEEE 29th International Conference on Computer Design, 2011

Circuit design of a dual-versioning L1 data cache for optimistic concurrency.
Proceedings of the 21st ACM Great Lakes Symposium on VLSI 2010, 2011

TMbox: A Flexible and Reconfigurable 16-Core Hybrid Transactional Memory System.
Proceedings of the IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines, 2011

From Plasma to BeeFarm: Design Experience of an FPGA-Based Multicore Prototype.
Proceedings of the Reconfigurable Computing: Architectures, Tools and Applications, 2011

SymptomTM: Symptom-Based Error Detection and Recovery Using Hardware Transactional Memory.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

DiDi: Mitigating the Performance Impact of TLB Shootdowns Using a Shared TLB Directory.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

STM2: A Parallel STM for High Performance Simultaneous Multithreading Systems.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

Using a Reconfigurable L1 Data Cache for Efficient Version Management in Hardware Transactional Memory.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

2010
The Velox Transactional Memory Stack.
IEEE Micro, 2010

Debugging programs that use atomic blocks and transactional memory.
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010

Architectural Support for Fair Reader-Writer Locking.
Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

Dynamic filtering: multi-purpose architecture support for language runtime systems.
Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, 2010

Discovering and understanding performance bottlenecks in transactional applications.
Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

2009
Atomic quake: using transactional memory in an interactive multiplayer game server.
Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009

Turbocharging boosted transactions or: how i learnt to stop worrying and love longer transactions.
Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009

EazyHTM: eager-lazy hardware transactional memory.
Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

Taking the heat off transactions: Dynamic selection of pessimistic concurrency control.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Clock gate on abort: Towards energy-efficient hardware Transactional Memory.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

QuakeTM: parallelizing a complex sequential application using transactional memory.
Proceedings of the 23rd international conference on Supercomputing, 2009

Dynamically Filtering Thread-Local Variables in Lazy-Lazy Hardware Transactional Memory.
Proceedings of the 11th IEEE International Conference on High Performance Computing and Communications, 2009

2008
Nebelung: Execution Environment for Transactional OpenMP.
Int. J. Parallel Program., 2008

A distributed processor state management architecture for large-window processors.
Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-41 2008), 2008

WormBench: a configurable workload for evaluating transactional memory systems.
Proceedings of the 9th workshop on MEmory performance, 2008

A Two-Level Load/Store Queue Based on Execution Locality.
Proceedings of the 35th International Symposium on Computer Architecture (ISCA 2008), 2008

The limits of software transactional memory (STM): dissecting Haskell STM applications on a many-core environment.
Proceedings of the 5th Conference on Computing Frontiers, 2008

2007
Transactional Memory: An Overview.
IEEE Micro, 2007

unreadTVar: Extending Haskell Software Transactional Memory for Performance.
Proceedings of the Eighth Symposium on Trends in Functional Programming, 2007

Multithreaded software transactional memory and OpenMP.
Proceedings of the 2007 workshop on MEmory performance, 2007

Transactional Memory and OpenMP.
Proceedings of the A Practical Programming Model for the Multi-Core Era, 2007

Hardware Transactional Memory with Operating System Support, HTMOS.
Proceedings of the Euro-Par 2007 Workshops: Parallel Processing, 2007

Implicit Transactional Memory in Kilo-Instruction Multiprocessors.
Proceedings of the Advances in Computer Systems Architecture, 2007

A Flexible Heterogeneous Multi-Core Architecture.
Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007

2006
A decoupled KILO-instruction processor.
Proceedings of the 12th International Symposium on High-Performance Computer Architecture, 2006

2005
Kilo-Instruction Processors: Overcoming the Memory Wall.
IEEE Micro, 2005

Chained In-Order/Out-of-Order DoubleCore Architecture.
Proceedings of the 17th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2005), 2005

Decoupled State-Execute Architecture.
Proceedings of the High-Performance Computing - 6th International Symposium, 2005

Exploiting Execution Locality with a Decoupled Kilo-Instruction Processor.
Proceedings of the High-Performance Computing - 6th International Symposium, 2005

An asymmetric clustered processor based on value content.
Proceedings of the 19th Annual International Conference on Supercomputing, 2005

Implementing Kilo-Instruction Multiprocessors.
Proceedings of the International Conference on Pervasive Services 2005, 2005

A New Pointer-based Instruction Queue Design and Its Power-Performance Evaluation.
Proceedings of the 23rd International Conference on Computer Design (ICCD 2005), 2005

2004
Toward kilo-instruction processors.
ACM Trans. Archit. Code Optim., 2004

A case for resource-conscious out-of-order processors: towards kilo-instruction in-flight processors.
SIGARCH Comput. Archit. News, 2004

A partitioned instruction queue to reduce instruction wakeup energy.
Int. J. High Perform. Comput. Netw., 2004

Future ILP processors.
Int. J. High Perform. Comput. Netw., 2004

Evaluating kilo-instruction multiprocessors.
Proceedings of the 3rd Workshop on Memory Performance Issues, 2004

An Optimized Front-End Physical Register File with Banking and Writeback Filtering.
Proceedings of the Power-Aware Computer Systems, 4th International Workshop, 2004

A Content Aware Integer Register File Organization.
Proceedings of the 31st International Symposium on Computer Architecture (ISCA 2004), 2004

Out-of-Order Commit Processors.
Proceedings of the 10th International Conference on High-Performance Computer Architecture (HPCA-10 2004), 2004

Maintaining Thousands of In-flight Instructions.
Proceedings of the Euro-Par 2004 Parallel Processing, 2004

A first glance at Kilo-instruction based multiprocessors.
Proceedings of the First Conference on Computing Frontiers, 2004

2003
A Case for Resource-conscious Out-of-order Processors.
IEEE Comput. Archit. Lett., 2003

A Simple Low-Energy Instruction Wakeup Mechanism.
Proceedings of the High Performance Computing, 5th International Symposium, 2003

Kilo-instruction Processors.
Proceedings of the High Performance Computing, 5th International Symposium, 2003


  Loading...