Christoforos E. Kozyrakis

Orcid: 0000-0002-3154-7530

Affiliations:
  • Stanford University, USA


According to our database1, Christoforos E. Kozyrakis authored at least 214 papers between 1997 and 2024.

Collaborative distances:

Awards

ACM Fellow

ACM Fellow 2016, "For contributions to transactional memory and data center architecture".

IEEE Fellow

IEEE Fellow 2015, "For contributions to high-performance, energy-efficient and secure memory systems".

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
cedar: Composable and Optimized Machine Learning Input Data Pipelines.
CoRR, 2024

2023
R<sup>3</sup>: Record-Replay-Retroaction for Database-Backed Applications.
Proc. VLDB Endow., 2023

Efficiently Programming Large Language Models using SGLang.
CoRR, 2023

Zelda: Video Analytics using Vision-Language Models.
CoRR, 2023

FlexShard: Flexible Sharding for Industry-Scale Sequence Recommendation Models.
CoRR, 2023

Honeycomb: Secure and Efficient GPU Executions via Static Validation.
Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, 2023

Transactions Make Debugging Easy.
Proceedings of the 13th Conference on Innovative Data Systems Research, 2023

2022
RAIL: Predictable, Low Tail Latency for NVMe Flash.
ACM Trans. Storage, 2022

Optimizing Video Analytics with Declarative Model Relationships.
Proc. VLDB Endow., 2022

RecD: Deduplication for End-to-End Deep Learning Recommendation Model Training Infrastructure.
CoRR, 2022

Apiary: A DBMS-Backed Transactional Function-as-a-Service Framework.
CoRR, 2022

Towards <i>μ</i>s tail latency and terabit ethernet: disaggregating the host network stack.
Proceedings of the SIGCOMM '22: ACM SIGCOMM 2022 Conference, Amsterdam, The Netherlands, August 22, 2022

Understanding data storage and ingestion for large-scale deep recommendation model training: industrial product.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

Hermod: principled and practical scheduling for serverless functions.
Proceedings of the 13th Symposium on Cloud Computing, SoCC 2022, 2022

VIVA: An End-to-End System for Interactive Video Analytics.
Proceedings of the 12th Conference on Innovative Data Systems Research, 2022

A Progress Report on DBOS: A Database-oriented Operating System.
Proceedings of the 12th Conference on Innovative Data Systems Research, 2022

ShEF: shielded enclaves for cloud FPGAs.
Proceedings of the ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022, 2022

SOL: safe on-node learning in cloud platforms.
Proceedings of the ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022, 2022

RecShard: statistical feature-based memory optimization for industry-scale neural recommendation.
Proceedings of the ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022, 2022

2021
DBOS: A DBMS-oriented Operating System.
Proc. VLDB Endow., 2021

Practical Scheduling for Real-World Serverless Computing.
CoRR, 2021

Understanding and Co-designing the Data Ingestion Pipeline for Industry-Scale RecSys Training.
CoRR, 2021

RAMBO: Resource Allocation for Microservices Using Bayesian Optimization.
IEEE Comput. Archit. Lett., 2021

INFaaS: Automated Model-less Inference Serving.
Proceedings of the 2021 USENIX Annual Technical Conference, 2021

Syrup: User-Defined Scheduling Across the Stack.
Proceedings of the SOSP '21: ACM SIGOPS 28th Symposium on Operating Systems Principles, 2021

ghOSt: Fast & Flexible User-Space Delegation of Linux Scheduling.
Proceedings of the SOSP '21: ACM SIGOPS 28th Symposium on Operating Systems Principles, 2021

A case against (most) context switches.
Proceedings of the HotOS '21: Workshop on Hot Topics in Operating Systems, 2021

SmartHarvest: harvesting idle CPUs safely and efficiently in the cloud.
Proceedings of the EuroSys '21: Sixteenth European Conference on Computer Systems, 2021

Interference-Aware Scheduling for Inference Serving.
Proceedings of the EuroMLSys@EuroSys 2021, 2021

Llama: A Heterogeneous & Serverless Framework for Auto-Tuning Video Analytics Pipelines.
Proceedings of the SoCC '21: ACM Symposium on Cloud Computing, 2021

Faa$T: A Transparent Auto-Scaling Cache for Serverless Applications.
Proceedings of the SoCC '21: ACM Symposium on Cloud Computing, 2021

2020
AsmDB: Understanding and Mitigating Front-End Stalls in Warehouse-Scale Computers.
IEEE Micro, 2020

The Hot Chips Renaissance.
IEEE Micro, 2020

RackSched: A Microsecond-Scale Scheduler for Rack-Scale Computers (Technical Report).
CoRR, 2020

DBOS: A Proposal for a Data-Centric Operating System.
CoRR, 2020

RackSched: A Microsecond-Scale Scheduler for Rack-Scale Computers.
Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation, 2020

A Polystore Based Database Operating System (DBOS).
Proceedings of the Heterogeneous Data Management, Polystores, and Analytics for Healthcare, 2020

Leveraging application classes to save power in highly-utilized data centers.
Proceedings of the SoCC '20: ACM Symposium on Cloud Computing, 2020

Interstellar: Using Halide's Scheduling Language to Analyze DNN Accelerators.
Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020

Classifying Memory Access Patterns for Prefetching.
Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020

2019
Pocket: Elastic Ephemeral Storage for Serverless Analytics.
login Usenix Mag., 2019

Outsourcing Everyday Jobs to Thousands of Cloud Functions with gg.
login Usenix Mag., 2019

INFaaS: Managed & Model-less Inference Serving.
CoRR, 2019

A New Frontier for Pull-Based Graph Processing.
CoRR, 2019

From Laptop to Lambda: Outsourcing Everyday Jobs to Thousands of Transient Functional Containers.
Proceedings of the 2019 USENIX Annual Technical Conference, 2019

Shinjuku: Preemptive Scheduling for μsecond-scale Tail Latency.
Proceedings of the 16th USENIX Symposium on Networked Systems Design and Implementation, 2019

A Case for Managed and Model-less Inference Serving.
Proceedings of the Workshop on Hot Topics in Operating Systems, 2019

Mind the Gap: A Case for Informed Request Scheduling at the NIC.
Proceedings of the 18th ACM Workshop on Hot Topics in Networks, 2019

Centralized Core-granular Scheduling for Serverless Functions.
Proceedings of the ACM Symposium on Cloud Computing, SoCC 2019, 2019

TANGRAM: Optimized Coarse-Grained Dataflow for Scalable NN Accelerators.
Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019

2018
<i>QuMan</i>: Profile-based Improvement of Cluster Utilization.
ACM Trans. Archit. Code Optim., 2018

Plasticine: A Reconfigurable Accelerator for Parallel Patterns.
IEEE Micro, 2018

Uncovering the Security Implications of Cloud Multi-Tenancy with Bolt.
IEEE Micro, 2018

Trevor: Automatic configuration and scaling of stream processing pipelines.
CoRR, 2018

DNN Dataflow Choice Is Overrated.
CoRR, 2018

Amdahl's law for tail latency.
Commun. ACM, 2018

Understanding Ephemeral Storage for Serverless Analytics.
Proceedings of the 2018 USENIX Annual Technical Conference, 2018

Selecta: Heterogeneous Cloud Storage Configuration for Data Analytics.
Proceedings of the 2018 USENIX Annual Technical Conference, 2018

Making pull-based graph processing performant.
Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

Spatial: a language and compiler for application accelerators.
Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2018

Learning Memory Access Patterns.
Proceedings of the 35th International Conference on Machine Learning, 2018

GraphP: Reducing Communication for PIM-Based Graph Processing with Efficient Data Partition.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

Memory Hierarchy for Web Search.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

2017
Corrigendum to "The IX Operating System: Combining Low Latency, High Throughput and Efficiency in a Protected Dataplane".
ACM Trans. Comput. Syst., 2017

The IX Operating System: Combining Low Latency, High Throughput, and Efficiency in a Protected Dataplane.
ACM Trans. Comput. Syst., 2017

DRAF: A Low-Power DRAM-Based Reconfigurable Acceleration Fabric.
IEEE Micro, 2017

AppSwitch: Resolving the Application Identity Crisis.
CoRR, 2017

Persona: A High-Performance Bioinformatics Framework.
Proceedings of the 2017 USENIX Annual Technical Conference, 2017

Plasticine: A Reconfigurable Architecture For Parallel Paterns.
Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

3D nanosystems enable <i>embedded</i> abundant-data computing: special session paper.
Proceedings of the Twelfth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis Companion, 2017

ReFlex: Remote Flash ≈ Local Flash.
Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017

TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory.
Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017

Bolt: I Know What You Did Last Summer... In The Cloud.
Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017

2016
Improving Resource Efficiency at Scale with Heracles.
ACM Trans. Comput. Syst., 2016

Security Implications of Data Mining in Cloud Scheduling.
IEEE Comput. Archit. Lett., 2016

Automatic Generation of Efficient Accelerators for Reconfigurable Hardware.
Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

HRL: Efficient and flexible reconfigurable logic for near-data processing.
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

Flash storage disaggregation.
Proceedings of the Eleventh European Conference on Computer Systems, 2016

Generating Configurable Hardware from Parallel Patterns.
Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, 2016

HCloud: Resource-Efficient Provisioning in Shared Cloud Systems.
Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, 2016

2015
Energy-Efficient Abundant-Data Computing: The N3XT 1, 000x.
Computer, 2015

Convolution engine: balancing efficiency and flexibility in specialized computing.
Commun. ACM, 2015

Heracles: improving resource efficiency at scale.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

Energy proportionality and workload consolidation for latency-critical applications.
Proceedings of the Sixth ACM Symposium on Cloud Computing, 2015

Tarcil: reconciling scheduling speed and quality in large shared clusters.
Proceedings of the Sixth ACM Symposium on Cloud Computing, 2015

Practical Near-Data Processing for In-Memory Analytics Frameworks.
Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

2014
Quality-of-Service-Aware Scheduling in Heterogeneous Data centers with Paragon.
IEEE Micro, 2014

IX: A Protected Dataplane Operating System for High Throughput and Low Latency.
Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation, 2014

Towards energy proportionality for large-scale latency-critical workloads.
Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

Dynamic management of TurboMode in modern multi-core chips.
Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, 2014

Reconciling high server utilization and sub-millisecond quality-of-service.
Proceedings of the Ninth Eurosys Conference 2014, 2014

Quasar: resource-efficient and QoS-aware cluster management.
Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2014

2013
QoS-Aware scheduling in heterogeneous datacenters with paragon.
ACM Trans. Comput. Syst., 2013

Measuring and analyzing the energy use of enterprise computing systems.
Sustain. Comput. Informatics Syst., 2013

Selected Research from Hot Chips 24.
IEEE Micro, 2013

The Netflix Challenge: Datacenter Edition.
IEEE Comput. Archit. Lett., 2013

Locality-aware task management for unstructured parallelism: a quantitative limit study.
Proceedings of the 25th ACM Symposium on Parallelism in Algorithms and Architectures, 2013

Advancing computer systems without technology progress.
Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2013

ZSim: fast and accurate microarchitectural simulation of thousand-core systems.
Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013

Convolution engine: balancing efficiency & flexibility in specialized computing.
Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013

iBench: Quantifying interference for datacenter applications.
Proceedings of the IEEE International Symposium on Workload Characterization, 2013

QoS-Aware Admission Control in Heterogeneous Datacenters.
Proceedings of the 10th International Conference on Autonomic Computing, 2013

Resource efficient computing for warehouse-scale datacenters.
Proceedings of the Design, Automation and Test in Europe, 2013

Paragon: QoS-aware scheduling for heterogeneous datacenters.
Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2013

2012
Improving System Energy Efficiency with Memory Rank Subsetting.
ACM Trans. Archit. Code Optim., 2012

Scalable and Efficient Fine-Grained Cache Partitioning with Vantage.
IEEE Micro, 2012

Decoupling Datacenter Storage Studies from Access to Large-Scale Applications.
IEEE Comput. Archit. Lett., 2012

Dune: Safe User-level Access to Privileged CPU Features.
Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation, 2012

Towards energy-proportional datacenter memory with mobile DRAM.
Proceedings of the 39th International Symposium on Computer Architecture (ISCA 2012), 2012

ECHO: Recreating network traffic maps for datacenters with tens of thousands of servers.
Proceedings of the 2012 IEEE International Symposium on Workload Characterization, 2012

SCD: A scalable coherence directory with flexible sharer set encoding.
Proceedings of the 18th IEEE International Symposium on High Performance Computer Architecture, 2012

Welcome to Hot Chips 24.
Proceedings of the 2012 IEEE Hot Chips 24 Symposium (HCS), 2012

Green enterprise computing data: Assumptions and realities.
Proceedings of the 2012 International Green Computing Conference, 2012

A case of system-level hardware/software co-design and co-verification of a commodity multi-processor system with custom hardware.
Proceedings of the 10th International Conference on Hardware/Software Codesign and System Synthesis, 2012

2011
The case for RAMCloud.
Commun. ACM, 2011

Understanding sources of ineffciency in general-purpose chips.
Commun. ACM, 2011

Time and Cost-Efficient Modeling and Generation of Large-Scale TPCC/TPCE/TPCH Workloads.
Proceedings of the Topics in Performance Evaluation, Measurement and Characterization, 2011

MARS: adaptive remote execution for multi-threaded mobile devices.
Proceedings of the 3rd ACM SOSP Workshop on Networking, 2011

Storage I/O generation and replay for datacenter applications.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2011

Vantage: scalable and efficient fine-grain cache partitioning.
Proceedings of the 38th International Symposium on Computer Architecture (ISCA 2011), 2011

Decoupling datacenter studies from access to large-scale applications: A modeling approach for storage workloads.
Proceedings of the 2011 IEEE International Symposium on Workload Characterization, 2011

Cross-Examination of Datacenter Workload Modeling Techniques.
Proceedings of the 31st IEEE International Conference on Distributed Computing Systems Workshops (ICDCS 2011 Workshops), 2011

A few ways can take you a long way: Efficient and highly associative caches with scalable partitioning for many-core CMPs.
Proceedings of the 2011 IEEE Hot Chips 23 Symposium (HCS), 2011

Hardware acceleration of transactional memory on commodity systems.
Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems, 2011

Dynamic Fine-Grain Scheduling of Pipeline Parallelism.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

2010
An analysis of on-chip interconnection networks for large-scale chip multiprocessors.
ACM Trans. Archit. Code Optim., 2010

On the energy (in)efficiency of Hadoop clusters.
ACM SIGOPS Oper. Syst. Rev., 2010

Tainting is not pointless.
ACM SIGOPS Oper. Syst. Rev., 2010

Server Engineering Insights for Large-Scale Online Services.
IEEE Micro, 2010

Implementing and evaluating nested parallel transactions in software transactional memory.
Proceedings of the SPAA 2010: Proceedings of the 22nd Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2010

Evaluating Bufferless Flow Control for On-chip Networks.
Proceedings of the NOCS 2010, 2010

The ZCache: Decoupling Ways and Associativity.
Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

Understanding sources of inefficiency in general-purpose chips.
Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010

Eigenbench: A simple exploration tool for orthogonal TM characteristics.
Proceedings of the 2010 IEEE International Symposium on Workload Characterization, 2010

Making nested parallel transactions practical using lightweight hardware support.
Proceedings of the 24th International Conference on Supercomputing, 2010

Implementing and Evaluating a Model Checker for Transactional Memory Systems.
Proceedings of the 15th IEEE International Conference on Engineering of Complex Computer Systems, 2010

FARM: A Prototyping Environment for Tightly-Coupled, Heterogeneous Architectures.
Proceedings of the 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2010

Evaluating impact of manageability features on device performance.
Proceedings of the 6th International Conference on Network and Service Management, 2010

Flexible architectural support for fine-grain scheduling.
Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, 2010

2009
Optimizing Memory Transactions for Multicore Systems.
Proceedings of the Multicore Processors and Systems, 2009

The case for RAMClouds: scalable high-performance storage entirely in DRAM.
ACM SIGOPS Oper. Syst. Rev., 2009

Guest Editors' Introduction: Hot Chips Turns 20.
IEEE Micro, 2009

Power Management of Datacenter Workloads Using Per-Core Power Gating.
IEEE Comput. Archit. Lett., 2009

Nemesis: Preventing Authentication & Access Control Vulnerabilities in Web Applications.
Proceedings of the 18th USENIX Security Symposium, 2009

Future scaling of processor-memory interfaces.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

Feedback-directed barrier optimization in a strongly isolated STM.
Proceedings of the 36th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2009

A memory system design framework: creating smart memories.
Proceedings of the 36th International Symposium on Computer Architecture (ISCA 2009), 2009

Phoenix rebirth: Scalable MapReduce on a large-scale shared-memory system.
Proceedings of the 2009 IEEE International Symposium on Workload Characterization, 2009

Fast memory snapshot for concurrent programmingwithout synchronization.
Proceedings of the 23rd international conference on Supercomputing, 2009

The stanford pervasive parallelism lab.
Proceedings of the 2009 IEEE Hot Chips 21 Symposium (HCS), 2009

Decoupling Dynamic Information Flow Tracking with a dedicated coprocessor.
Proceedings of the 2009 IEEE/IFIP International Conference on Dependable Systems and Networks, 2009

2008
Comparative evaluation of memory models for chip multiprocessors.
ACM Trans. Archit. Code Optim., 2008

Transactional memory.
Commun. ACM, 2008

Real-World Buffer Overflow Protection for Userspace and Kernelspace.
Proceedings of the 17th USENIX Security Symposium, 2008

Improving software concurrency with hardware-assisted memory snapshot.
Proceedings of the SPAA 2008: Proceedings of the 20th Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2008

Ased: availability, security, and debugging support usingtransactional memory.
Proceedings of the SPAA 2008: Proceedings of the 20th Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2008

Hardware Enforcement of Application Security Policies Using Tagged Memory.
Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation, 2008

A Comparison of High-Level Full-System Power Models.
Proceedings of the Workshop on Power Aware Computing and Systems, 2008

STAMP: Stanford Transactional Applications for Multi-Processing.
Proceedings of the 4th International Symposium on Workload Characterization (IISWC 2008), 2008

Thread-safe dynamic binary translation using transactional memory.
Proceedings of the 14th International Conference on High-Performance Computer Architecture (HPCA-14 2008), 2008

2007
From chaos to QoS: case studies in CMP resource management.
SIGARCH Comput. Archit. News, 2007

RAMP: Research Accelerator for Multiple Processors.
IEEE Micro, 2007

Transactional Memory: The Hardware-Software Interface.
IEEE Micro, 2007

Models and Metrics to Enable Energy-Efficiency Optimizations.
Computer, 2007

Towards soft optimization techniques for parallel cognitive applications.
Proceedings of the SPAA 2007: Proceedings of the 19th Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2007

JouleSort: a balanced energy-efficiency benchmark.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2007

Transactional collection classes.
Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007

Transactional programming in a multi-core environment.
Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007

Potential show-stoppers for transactional synchronization.
Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007

An effective hybrid transactional memory system with strong isolation guarantees.
Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), 2007

Comparing memory systems for chip multiprocessors.
Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), 2007

Raksha: a flexible information flow architecture for software security.
Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), 2007

Evaluating MapReduce for Multi-core and Multiprocessor Systems.
Proceedings of the 13st International Conference on High-Performance Computer Architecture (HPCA-13 2007), 2007

A Scalable, Non-blocking Approach to Transactional Memory.
Proceedings of the 13st International Conference on High-Performance Computer Architecture (HPCA-13 2007), 2007

A practical FPGA-based framework for novel CMP research.
Proceedings of the ACM/SIGDA 15th International Symposium on Field Programmable Gate Arrays, 2007

Register pointer architecture for efficient embedded processors.
Proceedings of the 2007 Design, Automation and Test in Europe Conference and Exposition, 2007

ATLAS: a chip-multiprocessor with transactional memory support.
Proceedings of the 2007 Design, Automation and Test in Europe Conference and Exposition, 2007

A low power front-end for embedded processors using a block-aware instruction set.
Proceedings of the 2007 International Conference on Compilers, 2007

The OpenTM Transactional Application Programming Interface.
Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007

2006
Block-aware instruction set architecture.
ACM Trans. Archit. Code Optim., 2006

Executing Java programs with transactional memory.
Sci. Comput. Program., 2006

Unlocking concurrency.
ACM Queue, 2006

The Atomos transactional programming language.
Proceedings of the ACM SIGPLAN 2006 Conference on Programming Language Design and Implementation, 2006

Architectural Semantics for Practical Transactional Memory.
Proceedings of the 33rd International Symposium on Computer Architecture (ISCA 2006), 2006

Vector Lane Threading.
Proceedings of the 2006 International Conference on Parallel Processing (ICPP 2006), 2006

The common case transactional behavior of multithreaded programs.
Proceedings of the 12th International Symposium on High-Performance Computer Architecture, 2006

Research accelerator for multiple processors.
Proceedings of the 2006 IEEE Hot Chips 18 Symposium (HCS), 2006

Transactional memory implementation overview.
Proceedings of the 2006 IEEE Hot Chips 18 Symposium (HCS), 2006

Simultaneously improving code size, performance, and energy in embedded processors.
Proceedings of the Conference on Design, Automation and Test in Europe, 2006

Tradeoffs in transactional memory virtualization.
Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, 2006

Testing implementations of transactional memory.
Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques (PACT 2006), 2006

2005
Energy-efficient and high-performance instruction fetch using a block-aware ISA.
Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005

TAPE: a transactional application profiling environment.
Proceedings of the 19th Annual International Conference on Supercomputing, 2005

Heuristics for Profile-Driven Method-Level Speculative Parallelization.
Proceedings of the 34th International Conference on Parallel Processing (ICPP 2005), 2005

Automatic power management schemes for Internet servers and data centers.
Proceedings of the Global Telecommunications Conference, 2005. GLOBECOM '05, St. Louis, Missouri, USA, 28 November, 2005

Improving Instruction Delivery with a Block-Aware ISA.
Proceedings of the Euro-Par 2005, Parallel Processing, 11th International Euro-Par Conference, Lisbon, Portugal, August 30, 2005

Characterization of TCC on Chip-Multiprocessors.
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT 2005), 2005

2004
Transactional Coherence and Consistency: Simplifying Parallel Hardware and Software.
IEEE Micro, 2004

Transactional Memory Coherence and Consistency.
Proceedings of the 31st International Symposium on Computer Architecture (ISCA 2004), 2004

Programming with transactional coherence and consistency (TCC).
Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems, 2004

The Stream Virtual Machine.
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques (PACT 2004), 29 September, 2004

2003
Scalable Vector Processors for Embedded Systems.
IEEE Micro, 2003

Overcoming the Limitations of Conventional Vector Processors.
Proceedings of the 30th International Symposium on Computer Architecture (ISCA 2003), 2003

2002
Vector vs. superscalar and VLIW architectures for embedded multimedia benchmarks.
Proceedings of the 35th Annual International Symposium on Microarchitecture, 2002

2001
Hardware/compiler codevelopment for an embedded media processor.
Proc. IEEE, 2001

2000
Exploiting On-Chip Memory Bandwidth in the VIRAM Compiler.
Proceedings of the Intelligent Memory Systems, Second International Workshop, 2000

How to Solve the Current Memory Access and Data Transfer Bottlenecks: At the Processor Architecture or at the Compiler Level?
Proceedings of the 2000 Design, 2000

1998
A New Direction for Computer Architecture Research.
Computer, 1998

Embedded memories in system design - from technology to systems architecture.
Proceedings of the 1998 IEEE/ACM International Conference on Computer-Aided Design, 1998

1997
A case for intelligent RAM.
IEEE Micro, 1997

Scalable Processors in the Billion-Transistor Era: IRAM.
Computer, 1997

The Energy Efficiency of IRAM Architectures.
Proceedings of the 24th International Symposium on Computer Architecture, 1997

Intelligent RAM (IRAM): The Industrial Setting, Applications and Architectures.
Proceedings of the Proceedings 1997 International Conference on Computer Design: VLSI in Computers & Processors, 1997

Pipelined Multi-Queue Management in a VLSI ATM Switch Chip with Credit-Based Flow-Control.
Proceedings of the 17th Conference on Advanced Research in VLSI (ARVLSI '97), 1997


  Loading...