David J. Lilja

Orcid: 0000-0003-3785-8206

According to our database1, David J. Lilja authored at least 233 papers between 1988 and 2022.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Awards

IEEE Fellow

IEEE Fellow 2006, "For contributions to statistical methodologies for performance assessment of computing systems.".

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2022
Work-in-Progress: ExpCache: Online-Learning based Cache Replacement Policy for Non-Volatile Memory.
Proceedings of the International Conference on Compilers, 2022

2021
High Quality Down-Sampling for Deterministic Approaches to Stochastic Computing.
IEEE Trans. Emerg. Top. Comput., 2021

Analysis of a ThunderX2 System Using Top-Down and Purchasing Power Parity Methods.
Proceedings of the PEARC '21: Practice and Experience in Advanced Research Computing, 2021

HeuristicDB: a hybrid storage database system using a non-volatile memory block device.
Proceedings of the SYSTOR '21: The 14th ACM International Systems and Storage Conference, 2021

2020
Exploring Performance Characteristics of the Optane 3D Xpoint Storage Technology.
ACM Trans. Model. Perform. Evaluation Comput. Syst., 2020

Adaptive-Length Coding of Image Data for Low-Cost Approximate Storage.
IEEE Trans. Computers, 2020

Enhancing the Top-Down Microarchitectural Analysis Method Using Purchasing Power Parity Theory.
Proceedings of the Languages and Compilers for Parallel Computing, 2020

Energy-Efficient Pulse-Based Convolution for Near-Sensor Processing.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2020

AdaEmb-Encoder: Adaptive Embedding Spatial Encoder-Based Deduplication for Backing Up Classifier Training Data.
Proceedings of the 39th IEEE International Performance Computing and Communications Conference, 2020

PBCCF: Accelerated Deduplication by Prefetching Backup Content Correlated Fingerprints.
Proceedings of the 38th IEEE International Conference on Computer Design, 2020

2019
Performing Stochastic Computation Deterministically.
IEEE Trans. Very Large Scale Integr. Syst., 2019

NetStorage: A synchronized trace-driven replayer for network-storage system evaluation.
Perform. Evaluation, 2019

Neural Network Classifiers Using a Hardware-Based Approximate Activation Function with a Hybrid Stochastic Multiplier.
ACM J. Emerg. Technol. Comput. Syst., 2019

Low-Cost Stochastic Hybrid Multiplier for Quantized Neural Networks.
ACM J. Emerg. Technol. Comput. Syst., 2019

Exploring A Forecasting Structure for the Capacity Usage in Backup Storage Systems.
Proceedings of the 10th IEEE Annual Ubiquitous Computing, 2019

Accelerating Deterministic Bit-Stream Computing with Resolution Splitting.
Proceedings of the 20th International Symposium on Quality Electronic Design, 2019

Using DCT-based Approximate Communication to Improve MPI Performance in Parallel Clusters.
Proceedings of the 38th IEEE International Performance Computing and Communications Conference, 2019

HAML-SSD: A Hardware Accelerated Hotness-Aware Machine Learning based SSD Management.
Proceedings of the International Conference on Computer-Aided Design, 2019

Low Cost Hybrid Spin-CMOS Compressor for Stochastic Neural Networks.
Proceedings of the 2019 on Great Lakes Symposium on VLSI, 2019

Energy-Efficient Convolutional Neural Networks with Deterministic Bit-Stream Processing.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2019

Energy-Efficient Near-Sensor Convolution using Pulsed Unary Processing.
Proceedings of the 30th IEEE International Conference on Application-specific Systems, 2019

2018
Low-Cost Sorting Network Circuits Using Unary Processing.
IEEE Trans. Very Large Scale Integr. Syst., 2018

Approximate Communication: Techniques for Reducing Communication Bottlenecks in Large-Scale Parallel Systems.
ACM Comput. Surv., 2018

On Memory System Design for Stochastic Computing.
IEEE Comput. Archit. Lett., 2018

Enhancing the Ensemble of Exemplar-SVMs for Binary Classification Using Concurrent Selection and Ensemble Learning.
Proceedings of the 9th IEEE Annual Ubiquitous Computing, 2018

Reducing Relational Database Performance Bottlenecks Using 3D XPoint Storage Technology.
Proceedings of the 17th IEEE International Conference On Trust, 2018

Efficient and Fast Approximate Consensus with Epidemic Failure Detection at Extreme Scale.
Proceedings of the 26th Euromicro International Conference on Parallel, 2018

Tier-Code: An XOR-Based RAID-6 Code with Improved Write and Degraded-Mode Read Performance.
Proceedings of the 2018 IEEE International Conference on Networking, 2018

Architectural Support for Probabilistic Branches.
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

Towards Theoretical Cost Limit of Stochastic Number Generators for Stochastic Computing.
Proceedings of the 2018 IEEE Computer Society Annual Symposium on VLSI, 2018

Parallel implementation of finite state machines for reducing the latency of stochastic computing.
Proceedings of the 19th International Symposium on Quality Electronic Design, 2018

Quantized neural networks with new stochastic multipliers.
Proceedings of the 19th International Symposium on Quality Electronic Design, 2018

TNT: A Solver for Large Dense Least-Squares Problems that Takes Conjugate Gradient from Bad in Theory, to Good in Practice.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018

HyperProtect: Enhancing the Performance of a Dynamic Backup System Using Intelligent Scheduling.
Proceedings of the 37th IEEE International Performance Computing and Communications Conference, 2018

Deterministic methods for stochastic computing using low-discrepancy sequences.
Proceedings of the International Conference on Computer-Aided Design, 2018

2017
Time-Encoded Values for Highly Efficient Stochastic Circuits.
IEEE Trans. Very Large Scale Integr. Syst., 2017

Polysynchronous Clocking: Exploiting the Skew Tolerance of Stochastic Circuits.
IEEE Trans. Computers, 2017

An Overview of Time-Based Computing with Stochastic Constructs.
IEEE Micro, 2017

A Reconfigurable Architecture with Sequential Logic-Based Stochastic Computing.
ACM J. Emerg. Technol. Comput. Syst., 2017

Impact of spintronic memory on multicore cache hierarchy design.
IET Comput. Digit. Tech., 2017

TraceRAR: An I/O Performance Evaluation Tool for Replaying, Analyzing, and Regenerating Traces.
Proceedings of the 2017 International Conference on Networking, Architecture, and Storage, 2017

Cost-quality trade-offs of approximate memory repair mechanisms for image data.
Proceedings of the 18th International Symposium on Quality Electronic Design, 2017

Determining work partitioning on closely coupled heterogeneous computing systems using statistical design of experiments.
Proceedings of the 2017 IEEE International Symposium on Workload Characterization, 2017

Kinetic Action: Performance Analysis of Integrated Key-Value Storage Devices vs. LevelDB Servers.
Proceedings of the 23rd IEEE International Conference on Parallel and Distributed Systems, 2017

Power and Area Efficient Sorting Networks Using Unary Processing.
Proceedings of the 2017 IEEE International Conference on Computer Design, 2017

Neural Network Classifiers Using Stochastic Computing with a Hardware-Oriented Approximate Activation Function.
Proceedings of the 2017 IEEE International Conference on Computer Design, 2017

TNT-NN: A Fast Active Set Method for Solving Large Non-Negative Least Squares Problems.
Proceedings of the International Conference on Computational Science, 2017

High-speed stochastic circuits using synchronous analog pulses.
Proceedings of the 22nd Asia and South Pacific Design Automation Conference, 2017

Stochastic computing implementation of trigonometric and hyperbolic functions.
Proceedings of the 12th IEEE International Conference on ASIC, 2017

2016
A High-Capacity Separable Reversible Method for Hiding Multiple Messages in Encrypted Images.
CoRR, 2016

Ps-Code: A New Code for Improved Degraded Mode Read and Write Performance of RAID Systems.
Proceedings of the IEEE International Conference on Networking, 2016

Using Stochastic Computing to Reduce the Hardware Requirements for a Restricted Boltzmann Machine Classifier.
Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2016

Polysynchronous stochastic circuits.
Proceedings of the 21st Asia and South Pacific Design Automation Conference, 2016

2015
Design space exploration for efficient computing in Solid State drives with the Storage Processing Unit.
Proceedings of the 10th IEEE International Conference on Networking, 2015

GPU-Accelerated Nick Local Image Thresholding Algorithm.
Proceedings of the 21st IEEE International Conference on Parallel and Distributed Systems, 2015

A hardware implementation of a radial basis function neural network using stochastic logic.
Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015

An FPGA implementation of a Restricted Boltzmann Machine classifier using stochastic bit streams.
Proceedings of the 26th IEEE International Conference on Application-specific Systems, 2015

2014
Computation on Stochastic Bit Streams Digital Image Processing Case Studies.
IEEE Trans. Very Large Scale Integr. Syst., 2014

Introduction.
ACM Trans. Parallel Comput., 2014

Logical Computation on Stochastic Bit Streams with Linear Finite-State Machines.
IEEE Trans. Computers, 2014

Improving Energy and Performance with Spintronics Caches in Multicore Systems.
Proceedings of the Euro-Par 2014: Parallel Processing Workshops, 2014

IIR filters using stochastic arithmetic.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014

2013
Introduction to the theme issue on performance modeling.
Softw. Syst. Model., 2013

Comparing the performance of stochastic simulation on GPUs and OpenMP.
Int. J. Comput. Sci. Eng., 2013

Stochastic functions using sequential logic.
Proceedings of the 2013 IEEE 31st International Conference on Computer Design, 2013

Exploiting free silicon for energy-efficient computing directly in NAND flash-based solid-state storage systems.
Proceedings of the IEEE High Performance Extreme Computing Conference, 2013

A Stepwise Approach to Software-Hardware Performance Co-optimization Using Design of Experiments.
Proceedings of the 39. International Computer Measurement Group Conference, 2013

A divide-and-conquer approach for solving singular value decomposition on a heterogeneous system.
Proceedings of the Computing Frontiers Conference, 2013

Accelerating the performance of stochastic encoding-based computations by sharing bits in consecutive bit streams.
Proceedings of the 24th International Conference on Application-Specific Systems, 2013

2012
Sparse Fast Fourier Transform on GPUs and Multi-core CPUs.
Proceedings of the IEEE 24th International Symposium on Computer Architecture and High Performance Computing, 2012

Case Studies of Logical Computation on Stochastic Bit Streams.
Proceedings of the Integrated Circuit and System Design. Power and Timing Modeling, 2012

PASS: A Hybrid Storage System for Performance-Synchronization Tradeoffs Using SSDs.
Proceedings of the 10th IEEE International Symposium on Parallel and Distributed Processing with Applications, 2012

Memory module-level testing and error behaviors for phase change memory.
Proceedings of the 30th International IEEE Conference on Computer Design, 2012

A stochastic reconfigurable architecture for fault-tolerant computation with sequential logic.
Proceedings of the 30th International IEEE Conference on Computer Design, 2012

An efficient implementation of numerical integration using logical computation on stochastic bit streams.
Proceedings of the 2012 IEEE/ACM International Conference on Computer-Aided Design, 2012

The synthesis of complex arithmetic computation on stochastic bit streams using sequential logic.
Proceedings of the 2012 IEEE/ACM International Conference on Computer-Aided Design, 2012

Weighted area technique for electromechanically enabled logic computation with cantilever-based NEMS switches.
Proceedings of the 2012 Design, Automation & Test in Europe Conference & Exhibition, 2012

Romano: autonomous storage management using performance prediction in multi-tenant datacenters.
Proceedings of the ACM Symposium on Cloud Computing, SOCC '12, 2012

The synthesis of linear Finite State Machine-based Stochastic Computational Elements.
Proceedings of the 17th Asia and South Pacific Design Automation Conference, 2012

Design of a storage processing unit.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011
An Architecture for Fault-Tolerant Computation with Stochastic Logic.
IEEE Trans. Computers, 2011

Fault tolerance for nanotechnology devices at the bit and module levels with history index of correct computation.
IET Comput. Digit. Tech., 2011

Performance analysis of single-phase, multiphase, and multicomponent lattice-Boltzmann fluid flow simulations on GPU clusters.
Concurr. Comput. Pract. Exp., 2011

Sampling-based garbage collection metadata management scheme for flash-based storage.
Proceedings of the IEEE 27th Symposium on Mass Storage Systems and Technologies, 2011

BloomFlash: Bloom Filter on Flash-Based Storage.
Proceedings of the 2011 International Conference on Distributed Computing Systems, 2011

Using stochastic computing to implement digital image processing algorithms.
Proceedings of the IEEE 29th International Conference on Computer Design, 2011

A programmable and scalable technique to design spintronic logic circuits based on magnetic tunnel junctions.
Proceedings of the 21st ACM Great Lakes Symposium on VLSI 2010, 2011

Performing bitwise logic operations in cache using spintronics-based magnetic tunnel junctions.
Proceedings of the 8th Conference on Computing Frontiers, 2011

A low power fault-tolerance architecture for the kernel density estimation based image segmentation algorithm.
Proceedings of the 22nd IEEE International Conference on Application-specific Systems, 2011

2010
Cross-layer speculative architecture for end systems and gateways in computer networks with lossy links.
Wirel. Networks, 2010

Using Resampling Techniques to Compute Confidence Intervals for the Harmonic Mean of Rate-Based Performance Metrics.
IEEE Comput. Archit. Lett., 2010

High performance solid state storage under Linux.
Proceedings of the IEEE 26th Symposium on Mass Storage Systems and Technologies, 2010

Deferred updates for flash-based storage.
Proceedings of the IEEE 26th Symposium on Mass Storage Systems and Technologies, 2010

Characterizing datasets for data deduplication in backup applications.
Proceedings of the 2010 IEEE International Symposium on Workload Characterization, 2010

Spintronic logic gates for spintronic data using magnetic tunnel junctions.
Proceedings of the 28th International Conference on Computer Design, 2010

2009
History Index of Correct Computation for Fault-Tolerant Nano-Computing.
IEEE Trans. Very Large Scale Integr. Syst., 2009

Accelerating geoscience and engineering system simulations on graphics hardware.
Comput. Geosci., 2009

Improving risk assessment methodology: a statistical design of experiments approach.
Proceedings of the 2nd International Conference on Security of Information and Networks, 2009

Large Block CLOCK (LB-CLOCK): A write caching algorithm for solid state disks.
Proceedings of the 17th Annual Meeting of the IEEE/ACM International Symposium on Modelling, 2009

Accelerating Lattice Boltzmann Fluid Flow Simulations Using Graphics Processors.
Proceedings of the ICPP 2009, 2009

The synthesis of combinational logic to generate probabilities.
Proceedings of the 2009 International Conference on Computer-Aided Design, 2009

A reconfigurable stochastic architecture for highly reliable computing.
Proceedings of the 19th ACM Great Lakes Symposium on VLSI 2009, 2009

Using a Statistical Approach for Optimal Security Parameter Determination.
Proceedings of the 2009 International Conference on Security & Management, 2009

2008
MMV: A Metamodeling Based Microprocessor Validation Environment.
IEEE Trans. Very Large Scale Integr. Syst., 2008

Exploiting the Impact of Database System Configuration Parameters: A Design of Experiments Approach.
IEEE Data Eng. Bull., 2008

Independent Component Analysis and Evolutionary Algorithms for Building Representative Benchmark Subsets.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2008

Low power/area branch prediction using complementary branch predictors.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

SARD: A statistical approach for ranking database tuning parameters.
Proceedings of the 24th International Conference on Data Engineering Workshops, 2008

Statistically translating low-level error probabilities to increase the accuracy and efficiency of reliability simulations in hardware description languages.
Proceedings of the 18th ACM Great Lakes Symposium on VLSI 2008, 2008

Guiding Circuit Level Fault-Tolerance Design with Statistical Methods.
Proceedings of the Design, Automation and Test in Europe, 2008

Archer: A Community Distributed Computing Infrastructure for Computer Architecture Research and Education.
Proceedings of the Collaborative Computing: Networking, 2008

Design of a spintronic arithmetic and logic unit using magnetic tunnel junctions.
Proceedings of the 5th Conference on Computing Frontiers, 2008

2007
Speed versus Accuracy Trade-Offs in Microarchitectural Simulations.
IEEE Trans. Computers, 2007

An adaptive dual control framework for QoS design.
Clust. Comput., 2007

CIM: A Reliable Metric for Evaluating Program Phase Classifications.
IEEE Comput. Archit. Lett., 2007

Model Based Test Generation for Microprocessor Architecture Validation.
Proceedings of the 20th International Conference on VLSI Design (VLSI Design 2007), 2007

Improving nanoelectronic designs using a statistical approach to identify key parameters in circuit level SEU simulations.
Proceedings of the 2007 IEEE International Symposium on Nanoscale Architectures, 2007

MEMESTAR: A Simulation Framework for Reliability Evaluation over Multiple Environments.
Proceedings of the 8th International Symposium on Quality of Electronic Design (ISQED 2007), 2007

SCRAP: A Statistical Approach for Creating a Database Query Workload Based on Performance Bottlenecks.
Proceedings of the IEEE 10th International Symposium on Workload Characterization, 2007

Analysis of Statistical Sampling in Microarchitecture Simulation: Metric, Methodology and Program Characterization.
Proceedings of the IEEE 10th International Symposium on Workload Characterization, 2007

Exploring subsets of standard cell libraries to exploit natural fault masking capabilities for reliable logic.
Proceedings of the 17th ACM Great Lakes Symposium on VLSI 2007, 2007

Scaling Analytical Models for Soft Error Rate Estimation Under a Multiple-Fault Environment.
Proceedings of the Tenth Euromicro Conference on Digital System Design: Architectures, 2007

Design fault directed test generation for microprocessor validation.
Proceedings of the 2007 Design, Automation and Test in Europe Conference and Exposition, 2007

2006
Simulation of Computer Architectures: Simulators, Benchmarks, Methodologies, and Recommendations.
IEEE Trans. Computers, 2006

Circulating shared-registers for multiprocessor systems.
J. Syst. Archit., 2006

Layered view of QoS issues in IP-based mobile wireless networks.
Int. J. Commun. Syst., 2006

The Future of Simulation: A Field of Dreams.
Computer, 2006

Comparing simulation techniques for microarchitecture-aware floorplanning.
Proceedings of the 2006 IEEE International Symposium on Performance Analysis of Systems and Software, 2006

Evaluating the efficacy of statistical simulation for design space exploration.
Proceedings of the 2006 IEEE International Symposium on Performance Analysis of Systems and Software, 2006

Temperature-aware floorplanning of microarchitecture blocks with IPC-power dependence modeling and transient analysis.
Proceedings of the 2006 International Symposium on Low Power Electronics and Design, 2006

Evaluating Benchmark Subsetting Approaches.
Proceedings of the 2006 IEEE International Symposium on Workload Characterization, 2006

The exigency of benchmark and compiler drift: designing tomorrow's processors with yesterday's tools.
Proceedings of the 20th Annual International Conference on Supercomputing, 2006

MMV: Metamodeling Based Microprocessor Valiation Environment.
Proceedings of the Eleventh Annual IEEE International High-Level Design Validation and Test Workshop 2006, 2006

Computer Architecture.
Proceedings of the Handbook of Nature-Inspired and Innovative Computing, 2006

2005
The Impact of Incorrectly Speculated Memory Operations in a Multithreaded Architecture.
IEEE Trans. Parallel Distributed Syst., 2005

Improving Computer Architecture Simulation Methodology by Adding Statistical Rigor.
IEEE Trans. Computers, 2005

A Novel Memory Structure for Embedded Systems: Flexible Sequential and Random Access Memory.
J. Comput. Sci. Technol., 2005

Communicating Quality of Service Requirements to an Object-Based Storage Device.
Proceedings of the 22nd IEEE / 13th NASA Goddard Conference on Mass Storage Systems and Technologies (MSST 2005), 2005

The Applicability of Adaptive Control Theory to QoS Design: Limitations and Solutions.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Characterizing and Comparing Prevailing Simulation Techniques.
Proceedings of the 11th International Conference on High-Performance Computer Architecture (HPCA-11 2005), 2005

Dynamic Code Region (DCR) Based Program Phase Tracking and Prediction for Dynamic Optimizations.
Proceedings of the High Performance Embedded Architectures and Compilers, 2005

Microarchitecture-aware floorplanning using a statistical design of experiments approach.
Proceedings of the 42nd Design Automation Conference, 2005

Efficiently generating test vectors with state pruning.
Proceedings of the 2005 Conference on Asia South Pacific Design Automation, 2005

2004
Static Classification of Value Predictability Using Compiler Hints.
IEEE Trans. Computers, 2004

State Pruning for Test Vector Generation for a Multiprocessor Cache Coherence Protocol.
Proceedings of the 15th IEEE International Workshop on Rapid System Prototyping (RSP 2004), 2004

The NanoBox Project: Exploring Fabrics of Self-Correcting Logic Blocks for High Defect Rate Molecular Device Technologies.
Proceedings of the 2004 IEEE Computer Society Annual Symposium on VLSI (ISVLSI 2004), 2004

Self-tuning Speculation for Maintaining the Consistency of Client-Cached Data.
Proceedings of the 10th International Conference on Parallel and Distributed Systems, 2004

Using ECN Marks to Improve TCP Performance over Lossy Links.
Proceedings of the ICETE 2004, 2004

Comparing Exact and Approximate Spatial Auto-regression Model Solutions for Spatial Data Analysis.
Proceedings of the Geographic Information Science, Third International Conference, 2004

Improving Data Cache Performance via Address Correlation: An Upper Bound Study.
Proceedings of the Euro-Par 2004 Parallel Processing, 2004

The Recursive NanoBox Processor Grid: A Reliable System Architecture for Unreliable Nanotechnology Devices.
Proceedings of the 2004 International Conference on Dependable Systems and Networks (DSN 2004), 28 June, 2004

An active data-aware cache consistency protocol for highly-scalable data-shipping DBMS architectures.
Proceedings of the First Conference on Computing Frontiers, 2004

Wireless Sensor Network for Aircraft Health Monitoring.
Proceedings of the 1st International Conference on Broadband Networks (BROADNETS 2004), 2004

Enhancing the Memory Performance of Embedded Systems with the Flexible Sequential and Random Access Memory.
Proceedings of the Advances in Computer Systems Architecture, 9th Asia-Pacific Conference, 2004

2003
Balancing Reuse Opportunities and Performance Gains with Subblock Value Reuse.
IEEE Trans. Computers, 2003

Challenges in Computer Architecture Evaluation.
Computer, 2003

Address Correlation: Exceeding the Limits of Locality.
IEEE Comput. Archit. Lett., 2003

Using Incorrect Speculation to Prefetch Data in a Concurrent Multithreaded Processor.
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

So Many States, So Little Time: Verifying Memory Coherence in the Cray X1.
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

Exploring Memory Access Regularity in Pointer-Intensive Application Programs.
Proceedings of the Intelligent Data Engineering and Automated Learning, 2003

A Statistically Rigorous Approach for Improving Simulation Methodology.
Proceedings of the Ninth International Symposium on High-Performance Computer Architecture (HPCA'03), 2003

2002
Dynamically adapting to system load and program behavior in multiprogrammed multiprocessor systems.
Concurr. Comput. Pract. Exp., 2002

MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research.
IEEE Comput. Archit. Lett., 2002

Improving Processor Performance by Simplifying and Bypassing Trivial Computations.
Proceedings of the 20th International Conference on Computer Design (ICCD 2002), 2002

Increasing Instruction-Level Parallelism with Instruction Precomputation (Research Note).
Proceedings of the Euro-Par 2002, 2002

Exploiting the Prefetching Effect Provided by Executing Mispredicted Load Instructions.
Proceedings of the Euro-Par 2002, 2002

2001
Coarse-Grained Thread Pipelining: A Speculative Parallel Execution Model for Shared-Memory Multiprocessors.
IEEE Trans. Parallel Distributed Syst., 2001

Teaching computer systems performance analysis.
IEEE Trans. Educ., 2001

Implementing a dynamic processor allocation policy for multiprogrammed parallel applications in the Solaris.
Concurr. Comput. Pract. Exp., 2001

Compiler-Directed Classification of Value Locality Behavior.
Proceedings of the 19th International Conference on Computer Design (ICCD 2001), 2001

Automatic Verification of Instruction Set Simulation Using Synchronized State Comparison.
Proceedings of the Proceedings 34th Annual Simulation Symposium (SS 2001), 2001

2000
Dynamic Task Scheduling Using Online Optimization.
IEEE Trans. Parallel Distributed Syst., 2000

Extending Value Reuse to Basic Blocks with Compiler Support.
IEEE Trans. Computers, 2000

JaViz: A client/server Java profiling tool.
IBM Syst. J., 2000

Data prefetch mechanisms.
ACM Comput. Surv., 2000

Techniques for obtaining high performance in Java programs.
ACM Comput. Surv., 2000

Shared-memory multiprocessing: Current state and future directions.
Adv. Comput., 2000

JavaSpMT: A Speculative Thread Pipelining Parallelization Model for Java Programs.
Proceedings of the 14th International Parallel & Distributed Processing Symposium (IPDPS'00), 2000

A Comprehensive Dynamic Processor Allocation Scheme for Multiprogrammed Multiprocessor Systems.
Proceedings of the 2000 International Conference on Parallel Processing, 2000

A Balanced Approach to High-Level Verification: Performance Trade-Offs in Verifying Large-Scale Multiprocessors.
Proceedings of the 2000 International Conference on Parallel Processing, 2000

Exploring Sub-Block Value Reuse for Superscalar Processors.
Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques (PACT'00), 2000

1999
Performance-Based Path Determination for Interprocessor Communication in Distributed Computing Systems.
IEEE Trans. Parallel Distributed Syst., 1999

The Superthreaded Processor Architecture.
IEEE Trans. Computers, 1999

Special Issue on Compilation and Architectural Support for Parallel Applications - Guest Editor's Introduction.
J. Parallel Distributed Comput., 1999

Education at a distance: a report from the front.
Proceedings of the 1999 workshop on Computer architecture education, 1999

A Network Status Predictor to Support Dynamic Scheduling in Network-Based Computing Systems.
Proceedings of the 13th International Parallel Processing Symposium / 10th Symposium on Parallel and Distributed Processing (IPPS / SPDP '99), 1999

A Compiler-Assisted Data Prefetch Controller.
Proceedings of the IEEE International Conference On Computer Design, 1999

Exploiting Basic Block Value Locality with Block Reuse.
Proceedings of the Fifth International Symposium on High-Performance Computer Architecture, 1999

1998
Comparing Processor Allocation Strategies in Multiprogrammed Shared-Memory Multiprocessors.
J. Parallel Distributed Comput., 1998

Integrating Parallelizing Compilation Technology and Processor Architecture for Cost-Effective Concurrent multithreading.
J. Inf. Sci. Eng., 1998

A comparative analysis of parallel programming language complexity and performance.
Concurr. Pract. Exp., 1998

An Efficient Strategy for Developing a Simulator for a Novel Concurrent Multithreaded Processor Architecture.
Proceedings of the MASCOTS 1998, 1998

Dynamic Processor Allocation with the Solaris Operating System.
Proceedings of the 12th International Parallel Processing Symposium / 9th Symposium on Parallel and Distributed Processing (IPPS/SPDP '98), March 30, 1998

Coarse-grained Speculative Execution in Shared-memory Multiprocessors.
Proceedings of the 12th international conference on Supercomputing, 1998

The Effect of using State-Based Priority Information in a Shared-Memory Multiprocessor Cache Replacement Policy.
Proceedings of the 1998 International Conference on Parallel Processing (ICPP '98), 1998

High-Level Information - An Approach for Integrating Front-End and Back-End Compilers.
Proceedings of the 1998 International Conference on Parallel Processing (ICPP '98), 1998

Characterization of Communication Patterns in Message-Passing Parallel Scientific Application Programs.
Proceedings of the Network-Based Parallel Computing: Communication, 1998

1997
An Effective Processor Allocation Strategy for Multiprogrammed Shared-Memory Multiprocessors.
IEEE Trans. Parallel Distributed Syst., 1997

Low-Cost, High-Performance Barrier Synchronization on Networks of Workstations.
J. Parallel Distributed Comput., 1997

When Caches Aren't Enough: Data Prefetching Techniques.
Computer, 1997

Trends in Shared Memory Multiprocessing.
Computer, 1997

Changing Interaction of Compiler and Architecture.
Computer, 1997

Utilizing Heterogeneous Networks in Distributed Parallel Computing Systems.
Proceedings of the 6th International Symposium on High Performance Distributed Computing, 1997

Complexity and Performance in Parallel Programming Languages.
Proceedings of the 1997 Workshop on High-Level Programming Models and Supportive Environments (HIPS '97), 1997

Exploiting multiple heterogeneous networks to reduce communication costs in parallel programs.
Proceedings of the 6th Heterogeneous Computing Workshop, 1997

1996
Computer architecture research: teaching the basic skills.
Proceedings of the 1996 workshop on Computer architecture education, 1996

Efficient Execution of Parallel Applications in Multiprogrammed Multiprocessor Systems.
Proceedings of IPPS '96, 1996

Performance Analysis and Prediction of Processor Scheduling Strategies in Multiprogrammed Shared-Memory Multiprocessors.
Proceedings of the 1996 International Conference on Parallel Processing, 1996

Dynamic Scheduling Strategies for Shared-memory Multiprocessors.
Proceedings of the 16th International Conference on Distributed Computing Systems, 1996

1995
The Potential of Compile-Time Analysis to Adapt the Cache Coherence Enforcement Strategy to the Data Sharing Characteristics.
IEEE Trans. Parallel Distributed Syst., 1995

Partitioning tasks between a pair of interconnected heterogeneous processors: A case study.
Concurr. Pract. Exp., 1995

Dynamic scheduling techniques for heterogeneous computing systems.
Concurr. Pract. Exp., 1995

Loop-Level Process Control: An Effective Processor Allocation Policy for Multiprogrammed Shared-Memory Multiprocessors.
Proceedings of the Job Scheduling Strategies for Parallel Processing, 1995

A Circulating Active Barrier Synchronization Mechanism.
Proceedings of the 1995 International Conference on Parallel Processing, 1995

Write buffer design for cache-coherent shared-memory multiprocessors.
Proceedings of the 1995 International Conference on Computer Design (ICCD '95), 1995

Parameter estimation for a generalized parallel loop scheduling algorithm.
Proceedings of the 28th Annual Hawaii International Conference on System Sciences (HICSS-28), 1995

1994
The Impact of Parallel Loop Scheduling Strategies on Prefetching in a Shared Memory Multiprocessor.
IEEE Trans. Parallel Distributed Syst., 1994

A Multiprocessor Architecture Combining Fine-Grained and Coarse-Grained Parallelism Strategies.
Parallel Comput., 1994

Exploiting the Parallelism Available in Loops.
Computer, 1994

A Compiler-Assisted Scheme for Adaptive Cache Coherence Enforcement.
Proceedings of the Parallel Architectures and Compilation Techniques, 1994

An evaluation of a compiler optimization for improving the performance of a coherence directory.
Proceedings of the 8th international conference on Supercomputing, 1994

A Distributed Hardware Mechanism for Process Synchronization on Shared-Bus Multiprocessors.
Proceedings of the 1994 International Conference on Parallel Processing, 1994

Self-Adjusting Scheduling: An On-Line Optimization Technique for Locality Management and Load Balancing.
Proceedings of the 1994 International Conference on Parallel Processing, 1994

A Superassociative Tagged Cache Coherence Directory.
Proceedings of the Proceedings 1994 IEEE International Conference on Computer Design: VLSI in Computer & Processors, 1994

A Data Parallel Implementation of the TRFD Program from the Perfect Benchmarks.
Proceedings of the Massively Parallel Processing Applications and Develompent, 1994

1993
Improving Memory Utilization in Cache Coherence Directories.
IEEE Trans. Parallel Distributed Syst., 1993

Cache Coherence in Large-Scale Shared-Memory Multiprocessors: Issues and Comparisons.
ACM Comput. Surv., 1993

Efficient Use of Dynamically Tagged Directories Through Compiler Analysis
Proceedings of the 1993 International Conference on Parallel Processing, 1993

1991
Processor parallelism considerations and memory latency reduction in shared memory multiprocessors
PhD thesis, 1991

Combining hardware and software cache coherence strategies.
Proceedings of the 5th international conference on Supercomputing, 1991

Architectural alternatives for exploiting parallelism.
IEEE, ISBN: 978-0-8186-2642-5, 1991

1990
Comparing Parallelism Extraction Techniques: Superscalar Processors, Pipelined Processors, and Multiprocessors.
Proceedings of the 1990 International Conference on Parallel Processing, 1990

1988
Reducing the Branch Penalty in Pipelined Processors.
Computer, 1988


  Loading...