Josep Torrellas

Orcid: 0000-0003-2595-5228

Affiliations:
  • University of Illinois at Urbana-Champaign, Urbana, IL, USA


According to our database1, Josep Torrellas authored at least 264 papers between 1990 and 2024.

Collaborative distances:

Awards

ACM Fellow

ACM Fellow 2010, "For contributions to shared-memory multiprocessor architectures and thread-level speculation.".

IEEE Fellow

IEEE Fellow 2004, "For contributions to shared-memory multiprocessors.".

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Towards Greener LLMs: Bringing Energy-Efficiency to the Forefront of LLM Inference.
CoRR, 2024

MINOS: Distributed Consistency and Persistency Protocol Implementation & Offloading to SmartNICs.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024

HotTiles: Accelerating SpMM with Heterogeneous Accelerator Architectures.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024

Last-Level Cache Side-Channel Attacks Are Feasible in the Modern Public Cloud.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

Everywhere All at Once: Co-Location Attacks on Public Cloud FaaS.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

Two-Face: Combining Collective and One-Sided Communication for Efficient Distributed SpMM.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023
Input-sensitive dense-sparse primitive compositions for GNN acceleration.
CoRR, 2023

Defensive ML: Defending Architectural Side-channels with Adversarial Obfuscation.
CoRR, 2023

WISE: Predicting the Performance of Sparse Matrix Vector Multiplication with Machine Learning.
Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2023

Micro-Armed Bandit: Lightweight & Reusable Reinforcement Learning for Microarchitecture Decision-Making.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

MXFaaS: Resource Sharing in Serverless Environments for Parallelism and Efficiency.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

μManycore: A Cloud-Native CPU for Tail at Scale.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

SPADE: A Flexible and Scalable Accelerator for SpMM and SDDMM.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

SpecFaaS: Accelerating Serverless Applications with Speculative Function Execution.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

Memory-Efficient Hashed Page Tables.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

Untangle: A Principled Framework to Design Low-Leakage, High-Performance Dynamic Partitioning Schemes.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2022
Distributed Data Persistency.
IEEE Micro, 2022

Binoculars: Contention-Based Side-Channel Attacks Exploiting the Page Walker.
Proceedings of the 31st USENIX Security Symposium, 2022

Graphite: optimizing graph neural networks on CPUs through cooperative software-hardware techniques.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

Dense dynamic blocks: optimizing SpMM for processors with vector and matrix units using machine learning techniques.
Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022

Cloak: tolerating non-volatile cache read latency.
Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022

Pinned loads: taming speculative loads in secure processors.
Proceedings of the ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022, 2022

Parallel virtualized memory translation with nested elastic cuckoo page tables.
Proceedings of the ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022, 2022

2021
BabelFish: Fusing Address Translations for Containers.
IEEE Micro, 2021

A Method for Hiding the Increased Non-Volatile Cache Read Latency.
CoRR, 2021

UniHeap: managing persistent objects across managed runtimes for non-volatile memory.
Proceedings of the SYSTOR '21: The 14th ACM International Systems and Storage Conference, 2021

One Protocol to Rule Them All: Wireless Network-on-Chip using Deep Reinforcement Learning.
Proceedings of the 18th USENIX Symposium on Networked Systems Design and Implementation, 2021

Execution Dependence Extension (EDE): ISA Support for Eliminating Fences.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

Maya: Using Formal Control to Obfuscate Power Side Channels.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

WiDir: A Wireless-Enabled Directory Cache Coherence Protocol.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

Fuzzy-Token: An Adaptive MAC Protocol for Wireless-Enabled Manycores.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2021

Speculative interference attacks: breaking invisible speculation schemes.
Proceedings of the ASPLOS '21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2021

Jamais vu: thwarting microarchitectural replay attacks.
Proceedings of the ASPLOS '21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2021

2020
Engineer the Channel and Adapt to it: Enabling Wireless Intra-Chip Communication.
IEEE Trans. Commun., 2020

Speculative Taint Tracking (STT): A Comprehensive Protection for Speculatively Accessed Data.
IEEE Micro, 2020

MicroScope: Enabling Microarchitectural Replay Attacks.
IEEE Micro, 2020

Cache Telepathy: Leveraging Shared Resource Attacks to Learn DNN Architectures.
Proceedings of the 29th USENIX Security Symposium, 2020

Millimeter wave wireless network on chip using deep reinforcement learning.
Proceedings of the SIGCOMM '20: ACM SIGCOMM 2020 Conference, 2020

Speeding up SpMV for power-law graph analytics by enhancing locality & vectorization.
Proceedings of the International Conference for High Performance Computing, 2020

Speculation Invariance (InvarSpec): Faster Safe Execution Through Program Analysis.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

Draco: Architectural and Operating System Support for System Call Security.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

P-INSPECT: Architectural Support for Programmable Non-Volatile Memory Frameworks.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

SAVE: Sparsity-Aware Vector Engine for Accelerating DNN Training and Inference on CPUs.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

Speculative Data-Oblivious Execution: Mobilizing Safe Prediction For Safe and Efficient Speculative Execution.
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

V-Combiner: speeding-up iterative graph processing on a shared-memory platform with vertex merging.
Proceedings of the ICS '20: 2020 International Conference on Supercomputing, 2020

Snug: architectural support for relaxed concurrent priority queueing in chip multiprocessors.
Proceedings of the ICS '20: 2020 International Conference on Supercomputing, 2020

Elastic Cuckoo Page Tables: Rethinking Virtual Memory Translation for Parallelism.
Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020

SparseTrain: Leveraging Dynamic Sparsity in Software for Training DNNs on General-Purpose SIMD Processors.
Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

2019
SparseTrain: Leveraging Dynamic Sparsity in Training DNNs on General-Purpose SIMD Processors.
CoRR, 2019

Maya: Falsifying Power Sidechannels with Operating System Support.
CoRR, 2019

QuickCheck: using speculation to reduce the overhead of checks in NVM frameworks.
Proceedings of the 15th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, 2019

Attack Directories, Not Caches: Side Channel Attacks in a Non-Inclusive World.
Proceedings of the 2019 IEEE Symposium on Security and Privacy, 2019

Understanding priority-based scheduling of graph algorithms on a shared-memory platform.
Proceedings of the International Conference for High Performance Computing, 2019

AutoPersist: an easy-to-use Java NVM framework based on reachability.
Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2019

Reusable inline caching for JavaScript performance.
Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2019

InvisiSpec: Making Speculative Execution Invisible in the Cache Hierarchy (Corrigendum).
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

Tangram: Integrated Control of Heterogeneous Computers.
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

Opportunistic Beamforming in Wireless Network-on-Chip.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2019

SecDir: a secure directory to defeat directory side-channel attacks.
Proceedings of the 46th International Symposium on Computer Architecture, 2019

Designing vertical processors in monolithic 3D.
Proceedings of the 46th International Symposium on Computer Architecture, 2019

NoMap: Speeding-Up JavaScript Using Hardware Transactional Memory.
Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019

PageSeer: Using Page Walks to Trigger Page Swaps in Hybrid Memory Systems.
Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019

Replica: A Wireless Manycore for Communication-Intensive and Approximate Data.
Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019

2018
OrthoNoC: A Broadcast-Oriented Dual-Plane Wireless Network-on-Chip Architecture.
IEEE Trans. Parallel Distributed Syst., 2018

An empirical study of the effect of source-level loop transformations on compiler stability.
Proc. ACM Program. Lang., 2018

Medium Access Control in Wireless Network-on-Chip: A Context Analysis.
IEEE Commun. Mag., 2018

Defining a high-level programming model for emerging NVRAM technologies.
Proceedings of the 15th International Conference on Managed Languages & Runtimes, 2018

InvisiSpec: Making Speculative Execution Invisible in the Cache Hierarchy.
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

Millimeter-Wave Propagation within a Computer Chip Package.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2018

Yukta: Multilayer Resource Controllers to Maximize Efficiency.
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

HetCore: TFET-CMOS Hetero-Device Architecture for CPUs and GPUs.
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

Record-Replay Architecture as a General Security Framework.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

Structured Singular Value Control for Modular Resource Management in Multilayer Computers.
Proceedings of the 57th IEEE Conference on Decision and Control, 2018

Biased reference counting: minimizing atomic operations in garbage collection.
Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, 2018

2017
Survive: Pointer-Based In-DRAM Incremental Checkpointing for Low-Cost Data Persistence and Rollback-Recovery.
IEEE Comput. Archit. Lett., 2017

Pageforge: a near-memory content-aware page-merging architecture.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

Xylem: enhancing vertical thermal conduction in 3D processor-memory stacks.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

Secure Hierarchy-Aware Cache Replacement Policy (SHARP): Defending Against Cache-Based Side Channel Attacks.
Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

ShortCut: Architectural Support for Fast Object Access in Scripting Languages.
Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

LORE: A loop repository for the evaluation of compilers.
Proceedings of the 2017 IEEE International Symposium on Workload Characterization, 2017

Multilayer Compute Resource Management with Robust Control Theory.
Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

Sthira: A Formal Approach to Minimize Voltage Guardbands under Variation in Networks-on-Chip for Energy Efficiency.
Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

2016
21st Century Computer Architecture.
CoRR, 2016

ReplayConfusion: Detecting cache-based covert channel attacks using record and replay.
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

Snatch: Opportunistically reassigning power allocation between processor and memory in 3D stacks.
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

A MAC protocol for Reliable Broadcast Communications in Wireless Network-on-Chip.
Proceedings of the 9th International Workshop on Network on Chip Architectures, 2016

Using Multiple Input, Multiple Output Formal Control to Maximize Resource Efficiency in Architectures.
Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

Architecting and Programming a Hardware-Incoherent Multiprocessor Cache Hierarchy.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

ScalCore: Designing a core for voltage scalability.
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

SCsafe: Logging sequential consistency violations continuously and precisely.
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

Toward Extreme-Scale Processor Chips.
Proceedings of the 23rd IEEE International Conference on High Performance Computing, 2016

Compiler Support for Software Cache Coherence.
Proceedings of the 23rd IEEE International Conference on High Performance Computing, 2016

CASPAR: Breaking Serialization in Lock-Free Multicore Synchronization.
Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, 2016

WiSync: An Architecture for Fast Synchronization through On-Chip Wireless Communication.
Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, 2016

WearCore: A Core for Wearable Workloads.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

Many-Core Architecture for NTC: Energy Efficiency from the Ground Up.
Proceedings of the Near Threshold Computing, Technology, Methods and Applications., 2016

2015
Asymmetric Memory Fences: Optimizing Both Performance and Implementability.
Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, 2015

2014
Using an Adaptive HPC Runtime System to Reconfigure the Cache Hierarchy.
Proceedings of the International Conference for High Performance Computing, 2014

Improving JavaScript performance by deconstructing the type system.
Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2014

OmniOrder: Directory-based conflict serialization of transactions.
Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

Replay debugging: Leveraging record and replay for program debugging.
Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

Dynamically detecting and tolerating IF-Condition Data Races.
Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, 2014

Tangle: Route-oriented dynamic voltage minimization for variation-afflicted, energy-efficient on-chip networks.
Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, 2014

Mosaic: Exploiting the spatial locality of process variation to reduce refresh energy in on-chip eDRAM modules.
Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, 2014

Extreme-scale computer architecture: Energy efficiency from the ground up<sup>‡</sup>.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014

RelaxReplay: record and replay for relaxed-consistency multiprocessors.
Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2014

2013
Coping with Parametric Variation at Near-Threshold Voltages.
IEEE Micro, 2013

BulkCommit: scalable and fast commit of atomic blocks in a lazy multiprocessor environment.
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

QuickRec: prototyping an intel architecture extension for record and replay of multithreaded programs.
Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013

WeeFence: toward making fences free in TSO.
Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013

EnergySmart: Toward energy-efficient manycores for Near-Threshold Computing.
Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013

Runnemede: An architecture for Ubiquitous High-Performance Computing.
Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013

Illusionist: Transforming lightweight cores into aggressive cores on demand.
Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013

Refrint: Intelligent refresh to minimize power in on-chip multiprocessor cache hierarchies.
Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013

Volition: scalable and precise sequential consistency violation detection.
Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2013

Cyrus: unintrusive application-level record-replay for replay parallelism.
Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2013

DeAliaser: alias speculation using atomic region support.
Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2013

Extreme scale computer architecture: Energy efficiency from the ground up.
Proceedings of the 24th International Conference on Application-Specific Systems, 2013

2012
2012 International Symposium on Computer Architecture Influential Paper Award.
IEEE Micro, 2012

Vulcan: Hardware Support for Detecting Sequential Consistency Violations Dynamically.
Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012

Comparing the power and performance of Intel's SCC to state-of-the-art CPUs and GPUs.
Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2012

FlexRAM: Toward an advanced Intelligent Memory system: A retrospective paper.
Proceedings of the 30th International IEEE Conference on Computer Design, 2012

FlexRAM: Toward an advanced Intelligent Memory system.
Proceedings of the 30th International IEEE Conference on Computer Design, 2012

BulkSMT: Designing SMT processors for atomic-block execution.
Proceedings of the 18th IEEE International Symposium on High Performance Computer Architecture, 2012

Pacman: Tolerating asymmetric data races with unintrusive hardware.
Proceedings of the 18th IEEE International Symposium on High Performance Computer Architecture, 2012

BulkCompactor: Optimized deterministic execution via Conflict-Aware commit of atomic blocks.
Proceedings of the 18th IEEE International Symposium on High Performance Computer Architecture, 2012

VARIUS-NTV: A microarchitectural model to capture the increased sensitivity of manycores to process variations at near-threshold voltages.
Proceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks, 2012

2011
Speculation, Thread-Level.
Proceedings of the Encyclopedia of Parallel Computing, 2011

Cache-Only Memory Architecture (COMA).
Proceedings of the Encyclopedia of Parallel Computing, 2011

FlexBulk: intelligently forming atomic blocks in blocked-execution multiprocessors to minimize squashes.
Proceedings of the 38th International Symposium on Computer Architecture (ISCA 2011), 2011

Rebound: scalable checkpointing for coherent shared memory.
Proceedings of the 38th International Symposium on Computer Architecture (ISCA 2011), 2011

2010
ScalableBulk: Scalable Cache Coherence for Atomic Blocks in a Lazy Environment.
Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

InstantCheck: Checking the Determinism of Parallel Programs Using On-the-Fly Incremental Hashing.
Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

AtomTracker: A Comprehensive Approach to Atomic Region Inference and Violation Detection.
Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

Extreme scale computing: Challenges and opportunities.
Proceedings of the 16th International Conference on High-Performance Computer Architecture (HPCA-16 2010), 2010

LeadOut: Composing low-overhead frequency-enhancing techniques for single-thread performance in configurable multicores.
Proceedings of the 16th International Conference on High-Performance Computer Architecture (HPCA-16 2010), 2010

2009
SoftSig: Software-Exposed Hardware Signatures for Code Analysis and Optimization.
IEEE Micro, 2009

Architectures for Extreme-Scale Computing.
Computer, 2009

The Bulk Multicore architecture for improved programmability.
Commun. ACM, 2009

Two hardware-based approaches for deterministic multiprocessor replay.
Commun. ACM, 2009

Light64: lightweight hardware support for data race detection during systematic testing of parallel programs.
Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

The BubbleWrap many-core: popping cores for sequential acceleration.
Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

BulkCompiler: high-performance sequential consistency through cooperative compiler and hardware support.
Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

SigRace: signature-based data race detection.
Proceedings of the 36th International Symposium on Computer Architecture (ISCA 2009), 2009

How to build a useful thousand-core manycore system?
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Blueshift: Designing processors for timing speculation from the ground up.
Proceedings of the 15th International Conference on High-Performance Computer Architecture (HPCA-15 2009), 2009

Capo: a software-hardware interface for practical deterministic multiprocessor replay.
Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, 2009

2008
Facelift: Hiding and slowing down aging in multicores.
Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-41 2008), 2008

EVAL: Utilizing processors with variation-induced timing errors.
Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-41 2008), 2008

Variation-Aware Application Scheduling and Power Management for Chip Multiprocessors.
Proceedings of the 35th International Symposium on Computer Architecture (ISCA 2008), 2008

DeLorean: Recording and Deterministically Replaying Shared-Memory Multiprocessor Execution Effciently.
Proceedings of the 35th International Symposium on Computer Architecture (ISCA 2008), 2008

Concurrency control with data coloring.
Proceedings of the 2008 ACM SIGPLAN workshop on Memory Systems Performance and Correctness: held in conjunction with the Thirteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '08), 2008

2007
Patching Processor Design Errors with Programmable Hardware.
IEEE Micro, 2007

Estimating design time for system circuits.
Proceedings of the IFIP VLSI-SoC 2007, 2007

Mitigating Parameter Variation with Dynamic Fine-Grain Body Biasing.
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-40 2007), 2007

Uncorq: Unconstrained Snoop Request Delivery in Embedded-Ring Multiprocessors.
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-40 2007), 2007

A Model for Timing Errors in Processors with Parameter Variation.
Proceedings of the 8th International Symposium on Quality of Electronic Design (ISQED 2007), 2007

Threshold Voltage Variation Effects on Aging-Related Hard Failure Rates.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2007), 2007

ReCycle: : pipeline adaptation to tolerate process variation.
Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), 2007

BulkSC: bulk enforcement of sequential consistency.
Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), 2007

CAP: Criticality analysis for power-efficient speculative multithreading.
Proceedings of the 25th International Conference on Computer Design, 2007

Colorama: Architectural Support for Data-Centric Synchronization.
Proceedings of the 13st International Conference on High-Performance Computer Architecture (HPCA-13 2007), 2007

Using Register Lifetime Predictions to Protect Register Files against Soft Errors.
Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2007

Paceline: Improving Single-Thread Performance in Nanoscale CMPs through Core Overclocking.
Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007

2006
CAVA: Using checkpoint-assisted value prediction to hide L2 misses.
ACM Trans. Archit. Code Optim., 2006

Guest Editor's Introduction: Micro's Top Picks from Microarchitecture Conferences.
IEEE Micro, 2006

SWICH: A Prototype for Efficient Cache-Level Checkpointing and Rollback.
IEEE Micro, 2006

Energy-Efficient Thread-Level Speculation.
IEEE Micro, 2006

POSH: a TLS compiler that exploits program structure.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2006

Scalable Cache Miss Handling for High Memory-Level Parallelism.
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-39 2006), 2006

Phoenix: Detecting and Recovering from Permanent Processor Design Bugs with Programmable Hardware.
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-39 2006), 2006

PathExpander: Architectural Support for Increasing the Path Coverage of Dynamic Bug Detection.
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-39 2006), 2006

Flexible Snooping: Adaptive Forwarding and Filtering of Snoops in Embedded-Ring Multiprocessors.
Proceedings of the 33rd International Symposium on Computer Architecture (ISCA 2006), 2006

Bulk Disambiguation of Speculative Threads in Multiprocessors.
Proceedings of the 33rd International Symposium on Computer Architecture (ISCA 2006), 2006

ReViveI/O: efficient handling of I/O in highly-available rollback-recovery servers.
Proceedings of the 12th International Symposium on High-Performance Computer Architecture, 2006

CADRE: Cycle-Accurate Deterministic Replay for Hardware Debugging.
Proceedings of the 2006 International Conference on Dependable Systems and Networks (DSN 2006), 2006

Accurate and efficient filtering for the Intel thread checker race detector.
Proceedings of the 1st Workshop on Architectural and System Support for Improving Software Dependability, 2006

2005
Efficient and flexible architectural support for dynamic monitoring.
ACM Trans. Archit. Code Optim., 2005

Tradeoffs in buffering speculative memory state for thread-level speculation in multiprocessors.
ACM Trans. Archit. Code Optim., 2005

Tasking with out-of-order spawn in TLS chip multiprocessors: microarchitecture and compilation.
Proceedings of the 19th Annual International Conference on Supercomputing, 2005

Thread-Level Speculation on a CMP can be energy efficient.
Proceedings of the 19th Annual International Conference on Supercomputing, 2005

Prototyping Architectural Support for Program Rollback Using FPGAs.
Proceedings of the 13th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2005), 2005

2004
iWatcher: Simple, General Architectural Support for Software Debugging.
IEEE Micro, 2004

CAVA: Hiding L2 Misses with Checkpoint-Assisted Value Prediction.
IEEE Comput. Archit. Lett., 2004

AccMon: Automatically Detecting Memory-Related Bugs via Program Counter-Based Invariants.
Proceedings of the 37th Annual International Symposium on Microarchitecture (MICRO-37 2004), 2004

iWatcher: Efficient Architectural Support for Software Debugging.
Proceedings of the 31st International Symposium on Computer Architecture (ISCA 2004), 2004

2003
Correlation Prefetching with a User-Level Memory Thread.
IEEE Trans. Parallel Distributed Syst., 2003

Speculative Synchronization: Programmability and Performance for Parallel Codes.
IEEE Micro, 2003

Programming the FlexRAM parallel intelligent memory system.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2003

ReEnact: Using Thread-Level Speculation Mechanisms to Debug Data Races in Multithreaded Codes.
Proceedings of the 30th International Symposium on Computer Architecture (ISCA 2003), 2003

Positional Adaptation of Processors: Application to Energy Reduction.
Proceedings of the 30th International Symposium on Computer Architecture (ISCA 2003), 2003

Tradeoffs in Buffering Memory State for Thread-Level Speculation in Multiprocessors.
Proceedings of the Ninth International Symposium on High-Performance Computer Architecture (HPCA'03), 2003

Design Trade-Offs in High-Throughput Coherence Controllers.
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques (PACT 2003), 27 September, 2003

Using Software Logging to Support Multi-Version Buffering in Thread-Level Speculation.
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques (PACT 2003), 27 September, 2003

2002
Software Trace Cache for Commercial Applications.
Int. J. Parallel Program., 2002

Cherry: checkpointed early resource recycling in out-of-order microprocessors.
Proceedings of the 35th Annual International Symposium on Microarchitecture, 2002

Energy-efficient hybrid wakeup logic.
Proceedings of the 2002 International Symposium on Low Power Electronics and Design, 2002

Using a User-Level Memory Thread for Correlation Prefetching.
Proceedings of the 29th International Symposium on Computer Architecture (ISCA 2002), 2002

ReVive: Cost-Effective Architectural Support for Rollback Recovery in Shared-Memory Multiprocessors.
Proceedings of the 29th International Symposium on Computer Architecture (ISCA 2002), 2002

SmartApps: An Application Centric Approach to High Performance Computing: Compiler-Assisted Software and Hardware Support for Reduction Operations.
Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002

Speculative Multithreading Eliminating Squashes through Learning Cross-Thread Violations in Speculative Parallelization for Multiprocessors.
Proceedings of the Eighth International Symposium on High-Performance Computer Architecture (HPCA'02), 2002

Speculative synchronization: applying thread-level speculation to explicitly parallel applications.
Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-X), 2002

2001
Automatic Code Mapping on an Intelligent Memory Architecture.
IEEE Trans. Computers, 2001

The Design of DEETM: a Framework for Dynamic Energy Efficiency and Temperature Management.
J. Instr. Level Parallelism, 2001

The Need for Fast Communication in Hardware-Based Speculative Chip Multiprocessors.
Int. J. Parallel Program., 2001

L1 data cache decomposition for energy efficiency.
Proceedings of the 2001 International Symposium on Low Power Electronics and Design, 2001

Removing architectural bottlenecks to the scalability of speculative parallelization.
Proceedings of the 28th Annual International Symposium on Computer Architecture, 2001

Automatically Mapping Code on an Intelligent Memory Architecture.
Proceedings of the Seventh International Symposium on High-Performance Computer Architecture (HPCA'01), 2001

Architectural Support for Parallel Reductions in Scalable Shared-Memory Multiprocessors.
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques (PACT 2001), 2001

2000
A framework for dynamic energy efficiency and temperature management.
Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture, 2000

SmartApps: An Application Centric Approach to High Performance Computing.
Proceedings of the Languages and Compilers for Parallel Computing, 2000

Architectural support for scalable speculative parallelization in shared-memory multiprocessors.
Proceedings of the 27th International Symposium on Computer Architecture (ISCA 2000), 2000

Adaptively Mapping Code in an Intelligent Memory Architecture.
Proceedings of the Intelligent Memory Systems, Second International Workshop, 2000

Energy/Performance Design of Memory Hierarchies for Processor-in-Memory Chips.
Proceedings of the Intelligent Memory Systems, Second International Workshop, 2000

Unified Fine-Granularity Buffering of Index and Data: Approach and Implementation.
Proceedings of the IEEE International Conference On Computer Design: VLSI In Computers & Processors, 2000

Toward a Cost-Effective DSM Organization That Exploits Processor-Memory Integration.
Proceedings of the Sixth International Symposium on High-Performance Computer Architecture, 2000

1999
Excel-NUMA: Toward Programmability, Simplicity, and High Performance.
IEEE Trans. Computers, 1999

Comprehensive Hardware and Software Support for Operating Systems to Exploit.
IEEE Trans. Computers, 1999

A Chip-Multiprocessor Architecture with Speculative Multithreading.
IEEE Trans. Computers, 1999

Cache-Only Memory Architectures.
Computer, 1999

Scal-Tool: Pinpointing and Quantifying Scalability Bottlenecks in DSM Multiprocessors.
Proceedings of the ACM/IEEE Conference on Supercomputing, 1999

Upcoming Architectural Advances in DSM Machines and Their Impact on Programmability.
Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing, 1999

Software trace cache.
Proceedings of the 13th international conference on Supercomputing, 1999

Improving the performance of bristled CC-NUMA systems using virtual channels and adaptivity.
Proceedings of the 13th international conference on Supercomputing, 1999

Optimization of Instruction Fetch for Decision Support Workloads.
Proceedings of the International Conference on Parallel Processing 1999, 1999

Compiler Support for Data Forwarding in Scalable Shared-Memory Multiprocessors.
Proceedings of the International Conference on Parallel Processing 1999, 1999

Cache Optimization for Memory-Resident Decision Support Commercial Workloads.
Proceedings of the IEEE International Conference On Computer Design, 1999

Detailed Characterization of a Quad Pentium Pro Server Running TPC-D.
Proceedings of the IEEE International Conference On Computer Design, 1999

Hardware for Speculative Parallelization of Partially-Parallel Loops in DSM Multiprocessors.
Proceedings of the Fifth International Symposium on High-Performance Computer Architecture, 1999

Second Workshop on Computer Architecture Evaluation Using Commercial Workloads.
Proceedings of the Fifth International Symposium on High-Performance Computer Architecture, 1999

1998
Optimizing the Instruction Cache Performance of the Operating System.
IEEE Trans. Computers, 1998

Computer architecture education at the University of Illinois.
Proceedings of the 1998 workshop on Computer architecture education, 1998

A Clustered Approach to Multithreaded Processors.
Proceedings of the 12th International Parallel Processing Symposium / 9th Symposium on Parallel and Distributed Processing (IPPS/SPDP '98), March 30, 1998

Hardware and Software Support for Speculative Execution of Sequential Binaries on a Chip-multiprocessor.
Proceedings of the 12th international conference on Supercomputing, 1998

Comparing Data Forwarding and Prefetching for Communication-induced Misses in Shared-memory MPs.
Proceedings of the 12th international conference on Supercomputing, 1998

An IRAM architecture for image analysis and pattern recognition.
Proceedings of the Fourteenth International Conference on Pattern Recognition, 1998

Use IRAM for Rasterization.
Proceedings of the 1998 IEEE International Conference on Image Processing, 1998

Hardware for Speculative Run-Time Parallelization in Distributed Shared-Memory Multiprocessors.
Proceedings of the Fourth International Symposium on High-Performance Computer Architecture, Las Vegas, Nevada, USA, January 31, 1998

Enhancing Memory Use in Simple Coma: Multiplexed Simple Coma.
Proceedings of the Fourth International Symposium on High-Performance Computer Architecture, Las Vegas, Nevada, USA, January 31, 1998

An Direct-Execution Framework for Fast and Accurate Simulation of Superscalar Processors.
Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques, 1998

1997
The Performance of the Cedar Multistage Switching Network.
IEEE Trans. Parallel Distributed Syst., 1997

Reducing Remote Conflict Misses: NUMA with Remote Cache versus COMA.
Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture (HPCA '97), 1997

Speeding up the Memory Hierarchy in Flat COMA Multiprocessors.
Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture (HPCA '97), 1997

The Memory Performance of DSS Commercial Workloads in Shared-Memory Multiprocessors.
Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture (HPCA '97), 1997

1996
Data Forwarding in Scalable Shared-Memory Multiprocessors.
IEEE Trans. Parallel Distributed Syst., 1996

Computer architecture education at the University of Illinois: current status and some thoughts.
Proceedings of the 1996 workshop on Computer architecture education, 1996

An efficient implementation of tree-based multicast routing for distributed shared-memory multiprocessors.
Proceedings of the Eighth IEEE Symposium on Parallel and Distributed Processing, 1996

Instruction Prefetching of Systems Codes with Layout Optimized for Reduced Cache Misses.
Proceedings of the 23rd Annual International Symposium on Computer Architecture, 1996

Optimizing Primary Data Caches for Parallel Scientific Applications: The Pool Buffer Approach.
Proceedings of the 10th international conference on Supercomputing, 1996

The Impact of Speeding up Critical Sections with Data Prefetching and Forwarding.
Proceedings of the 1996 International Conference on Parallel Processing, 1996

The Augmint multiprocessor simulation toolkit for Intel x86 architectures.
Proceedings of the 1996 International Conference on Computer Design (ICCD '96), 1996

Improving the Data Cache Performance of Multiprocessor Operating Systems.
Proceedings of the Second International Symposium on High-Performance Computer Architecture, 1996

Distance-Adaptive Update Protocols for Scalable Shared-Memory Multiprocessors.
Proceedings of the Second International Symposium on High-Performance Computer Architecture, 1996

1995
Evaluating the Performance of Cache-Affinity Scheduling in Shared-Memory Multiprocessors.
J. Parallel Distributed Comput., 1995

Speeding Up Irregular Applications in Shared-Memory Multiprocessors: Memory Binding and Group Prefetching.
Proceedings of the 22nd Annual International Symposium on Computer Architecture, 1995

Optimizing Instruction Cache Performance for Operating System Intensive Workloads.
Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture (HPCA 1995), 1995

1994
False Sharing ans Spatial Locality in Multiprocessor Caches.
IEEE Trans. Computers, 1994

An efficient algorithm for the run-time parallelization of DOACROSS loops.
Proceedings of the Proceedings Supercomputing '94, 1994

Comparing the Performance of the DASH and CEDAR Multiprocessors.
Proceedings of the 1994 International Conference on Parallel Processing, 1994

1993
Benefits of Cache-Affinity Scheduling in Shared-Memory Multiprocessors: A Summary.
Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems, 1993

1992
Characterizing the Caching and Synchronization Performance of a Multiprocessor Operating System.
Proceedings of the ASPLOS-V Proceedings, 1992

1990
Analysis of Critical Architectural and Program Parameters in a Hierarchical Shared Memory Multiprocessor.
Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems, 1990

Share Data Placement Optimizations to Reduce Multiprocessor Cache Miss Rates.
Proceedings of the 1990 International Conference on Parallel Processing, 1990

Estimating the Performance Advantages of Relaxing Consistency in a Shared Memory Multiprocessor.
Proceedings of the 1990 International Conference on Parallel Processing, 1990


  Loading...