Lizy Kurian John

Orcid: 0000-0002-8747-5214

Affiliations:
  • University of Texas at Austin, USA


According to our database1, Lizy Kurian John authored at least 317 papers between 1991 and 2024.

Collaborative distances:

Awards

ACM Fellow

ACM Fellow 2020, "For contributions to the design, modeling and benchmarking of computer architectures ".

IEEE Fellow

IEEE Fellow 2009, "For contributions to power modeling and performance evaluation of microprocessors".

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
NEM-GNN: DAC/ADC-less, Scalable, Reconfigurable, Graph and Sparsity-Aware Near-Memory Accelerator for Graph Neural Networks.
ACM Trans. Archit. Code Optim., June, 2024

SecurityCloak: Protection against cache timing and speculative memory access attacks.
J. Syst. Archit., 2024

HLSFactory: A Framework Empowering High-Level Synthesis Datasets for Machine Learning and Beyond.
CoRR, 2024

Accelerating ML Workloads using GPU Tensor Cores: The Good, the Bad, and the Ugly.
Proceedings of the 15th ACM/SPEC International Conference on Performance Engineering, 2024

SACHI: A Stationarity-Aware, All-Digital, Near-Memory, Ising Architecture.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024

Cross-FPGA Power Estimation from High Level Synthesis via Transfer-Learning.
Proceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2024

BLQ: Light-Weight Locality-Aware Runtime for Blocking-Less Queuing.
Proceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction, 2024

2023
ULEEN: A Novel Architecture for Ultra-low-energy Edge Neural Networks.
ACM Trans. Archit. Code Optim., December, 2023

Koios 2.0: Open-Source Deep Learning Benchmarks for FPGA Architecture and CAD Research.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., November, 2023

A conditional branch predictor based on weightless neural networks.
Neurocomputing, October, 2023

CoMeFa: Deploying Compute-in-Memory on FPGAs for Deep Learning Acceleration.
ACM Trans. Reconfigurable Technol. Syst., September, 2023

TinyML but by No Means a Tiny Feat!
IEEE Micro, 2023

Hardware Security and Privacy: Threats and Opportunities.
IEEE Micro, 2023

Top Picks From Computer Architecture Conferences!
IEEE Micro, 2023

Hot Chips 34 and More!
IEEE Micro, 2023

Emerging System Interconnects Enabling More Opportunities Than Ever!
IEEE Micro, 2023

Environmentally Sustainable Computing.
IEEE Micro, 2023

PIMSAB: A Processing-In-Memory System with Spatially-Aware Communication and Bit-Serial-Aware Computation.
CoRR, 2023

HLSDataset: Open-Source Dataset for ML-Assisted FPGA Design using High Level Synthesis.
CoRR, 2023

Guard Cache: Creating Noisy Side-Channels.
IEEE Comput. Archit. Lett., 2023

Guard Cache: Creating False Cache Hits and Misses To Mitigate Side-Channel Attacks.
Proceedings of the Silicon Valley Cybersecurity Conference, 2023

Performance Implications of Async Memcpy and UVM: A Tale of Two Data Transfer Modes.
Proceedings of the IEEE International Symposium on Workload Characterization, 2023

Do Video Encoding Workloads Stress the Microarchitecture?
Proceedings of the IEEE International Symposium on Workload Characterization, 2023

Dendrite-inspired Computing to Improve Resilience of Neural Networks to Faults in Emerging Memory Technologies.
Proceedings of the IEEE International Conference on Rebooting Computing, 2023

NextGen-Malloc: Giving Memory Allocator Its Own Room in the House.
Proceedings of the 19th Workshop on Hot Topics in Operating Systems, 2023

An FPGA-Based Weightless Neural Network for Edge Network Intrusion Detection.
Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2023

LAWS: Large-Scale Accelerated Wave Simulations on FPGAs.
Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2023

Infinity Stream: Portable and Programmer-Friendly In-/Near-Memory Fusion.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

HLSDataset: Open-Source Dataset for ML-Assisted FPGA Design using High Level Synthesis.
Proceedings of the 34th IEEE International Conference on Application-specific Systems, 2023

COIN: Combinational Intelligent Networks.
Proceedings of the 34th IEEE International Conference on Application-specific Systems, 2023

FAWS: FPGA Acceleration of Large-Scale Wave Simulations.
Proceedings of the 34th IEEE International Conference on Application-specific Systems, 2023

2022
Tensor Slices: FPGA Building Blocks For The Deep Learning Era.
ACM Trans. Reconfigurable Technol. Syst., 2022

Artificial Intelligence at the Edge: Designs and Architectures for Pervasive Intelligence.
IEEE Micro, 2022

Automatic Compilation Will Be Key for Success of the Accelerator Revolution!
IEEE Micro, 2022

Top Picks from 2021 Computer Architecture Conferences!
IEEE Micro, 2022

Hot Chips 33 and More!
IEEE Micro, 2022

Special Issue on Cool Chips and Hot Interconnects.
IEEE Micro, 2022

Smart Agriculture and Smart Memories.
IEEE Micro, 2022

Performance of Java in Function-as-a-Service Computing.
Proceedings of the 15th IEEE/ACM International Conference on Utility and Cloud Computing, 2022

Performance Impact of NVMe-Over-TCP on HDFS Workloads.
Proceedings of the 15th IEEE/ACM International Conference on Utility and Cloud Computing, 2022

LogGen: A Parameterized Generator for Designing Floating-Point Logarithm Units for Deep Learning.
Proceedings of the 23rd International Symposium on Quality Electronic Design, 2022

Hardware-aware 3D Model Workload Selection and Characterization for Graphics and ML Applications.
Proceedings of the 23rd International Symposium on Quality Electronic Design, 2022

Microarchitectural Performance Evaluation of AV1 Video Encoding Workloads.
Proceedings of the International IEEE Symposium on Performance Analysis of Systems and Software, 2022

GAPS: GPU-acceleration of PDE solvers for wave simulation.
Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022

SPAMeR: Speculative Push for Anticipated Message Requests in Multi-Core Systems.
Proceedings of the 51st International Conference on Parallel Processing, 2022

MathRAMs: Configurable Fused Compute-Memory Blocks for FPGAs.
Proceedings of the FPGA '22: The 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Virtual Event, USA, 27 February 2022, 2022

CoMeFa: Compute-in-Memory Blocks for FPGAs.
Proceedings of the 30th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2022

A WiSARD-based conditional branch predictor.
Proceedings of the 30th European Symposium on Artificial Neural Networks, 2022


Distributive Thermometer: A New Unary Encoding for Weightless Neural Networks.
Proceedings of the 30th European Symposium on Artificial Neural Networks, 2022

Characterization of Emerging AI Workloads: Neural Logic Machines and Graph Convolutional Networks.
Proceedings of the International Conference on Computational Science and Computational Intelligence, 2022

LogicWiSARD: Memoryless Synthesis of Weightless Neural Networks.
Proceedings of the 33rd IEEE International Conference on Application-specific Systems, 2022

Weightless Neural Networks for Efficient Edge Inference.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2022

2021
Hardware Accelerator Integration Tradeoffs for High-Performance Computing: A Case Study of GEMM Acceleration in N-Body Methods.
IEEE Trans. Parallel Distributed Syst., 2021

Microprocessor at 50: Industry Leaders Speak.
IEEE Micro, 2021

Microprocessor at 50: A Time to Celebrate and Energize for the Future.
IEEE Micro, 2021

From the Memory Lane!
IEEE Micro, 2021

Microprocessor at 50: Looking Back and Looking Forward.
IEEE Micro, 2021

Quantum Computing and More!
IEEE Micro, 2021

FPGA Computing and More!
IEEE Micro, 2021

Top Picks From Year 2020.
IEEE Micro, 2021

CPUs, GPUs, and More From Hot Chips'32.
IEEE Micro, 2021

Connectivity - More Needed Than Ever Before.
IEEE Micro, 2021

Intel Wins in Four Decades, but AMD Catches Up.
IEEE Micro, 2021

Neuro-Symbolic AI: An Emerging Class of AI Workloads and their Characterization.
CoRR, 2021

Improving CNN performance on FPGA clusters through topology exploration.
Proceedings of the SAC '21: The 36th ACM/SIGAPP Symposium on Applied Computing, 2021

Virtual-Link: A Scalable Multi-Producer Multi-Consumer Message Queue Architecture for Cross-Core Communication.
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

Wave-PIM: Accelerating Wave Simulation Using Processing-in-Memory.
Proceedings of the ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9, 2021

Koios: A Deep Learning Benchmark Suite for FPGA Architecture and CAD Research.
Proceedings of the 31st International Conference on Field-Programmable Logic and Applications, 2021

Tensor Slices to the Rescue: Supercharging ML Acceleration on FPGAs.
Proceedings of the FPGA '21: The 2021 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Virtual Event, USA, February 28, 2021

Compute RAMs: Adaptable Compute and Storage Blocks for DL-Optimized FPGAs.
Proceedings of the 55th Asilomar Conference on Signals, Systems, and Computers, 2021

2020
Chip Design 2020.
IEEE Micro, 2020

Machine Learning for Systems, Biological Computing, and More.
IEEE Micro, 2020

Agile Hardware Design.
IEEE Micro, 2020

Enjoy These Top Picks, While You Work From Home!
IEEE Micro, 2020

Did ML Chips Heat Up the Chip Design Arena?
IEEE Micro, 2020

Connectivity! Connectivity! Connectivity! May You Be More Connected Than Ever!!
IEEE Micro, 2020

Demystifying graph processing frameworks and benchmarks.
Sci. China Inf. Sci., 2020

Demystifying the MLPerf Training Benchmark Suite.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2020

SimTrace: Capturing over Time Program Phase Behavior.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2020

Accelerating Force-directed Graph Layout with Processing-in-Memory Architecture.
Proceedings of the 27th IEEE International Conference on High Performance Computing, 2020

The Case for Hard Matrix Multiplier Blocks in an FPGA.
Proceedings of the FPGA '20: The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2020

Design Space Exploration for Softmax Implementations.
Proceedings of the 31st IEEE International Conference on Application-specific Systems, 2020

Hamamu: Specializing FPGAs for ML Applications by Adding Hard Matrix Multiplier Blocks.
Proceedings of the 31st IEEE International Conference on Application-specific Systems, 2020

ATTC (@C): Addressable-TLB based Translation Coherence.
Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

2019
UT-LCA/Scalability-Phase-Simpoint-of-SPEC-CPU2017: SPEC CPU2017 Integer Speed Suite SimPoint Pinballs.
Dataset, August, 2019

UT-LCA/Scalability-Phase-Simpoint-of-SPEC-CPU2017: SPEC CPU2017 Integer Speed Suite SimPoint Pinballs.
Dataset, August, 2019

SelSMaP: A Selective Stride Masking Prefetching Scheme.
ACM Trans. Archit. Code Optim., 2019

3-D Chips! Chips are Getting Denser and Taller Than Ever!!
IEEE Micro, 2019

Machine Learning Accelerators and More.
IEEE Micro, 2019

Secure Architectures.
IEEE Micro, 2019

Top Picks.
IEEE Micro, 2019

Emerging Hot Chips and Systems.
IEEE Micro, 2019

To the Era of Intelligent Chips and Systems.
IEEE Micro, 2019

Efficient Prediction of Network Traffic for Real-Time Applications.
J. Comput. Networks Commun., 2019

Demystifying the MLPerf Benchmark Suite.
CoRR, 2019

A Study of Core Utilization and Residency in Heterogeneous Smart Phone Architectures.
Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering, 2019

Can we trust profiling results?: understanding and fixing the inaccuracy in modern profilers.
Proceedings of the ACM International Conference on Supercomputing, 2019

Reducing Data Movement and Energy in Multilevel Cache Hierarchies without Losing Performance: Can you have it all?
Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques, 2019

2018
Benchmarking Big Data Systems: A Review.
IEEE Trans. Serv. Comput., 2018

Start Late or Finish Early: A Distributed Graph Processing System with Redundancy Reduction.
Proc. VLDB Endow., 2018

Memristor-Based Computing.
IEEE Micro, 2018

Invited Paper for the Hot Workloads Special Session Hot Regions in SPEC CPU2017.
Proceedings of the 2018 IEEE International Symposium on Workload Characterization, 2018

A Case for Granularity Aware Page Migration.
Proceedings of the 32nd International Conference on Supercomputing, 2018

HALO: A Hierarchical Memory Access Locality Modeling Technique For Memory System Explorations.
Proceedings of the 32nd International Conference on Supercomputing, 2018

Puzzle Memory: Multifractional Partitioned Heterogeneous Memory Scheme.
Proceedings of the 36th IEEE International Conference on Computer Design, 2018

Wait of a Decade: Did SPEC CPU 2017 Broaden the Performance Horizon?
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

Characterization of Smartphone Governor Strategies.
Proceedings of the Euro-Par 2018: Parallel Processing, 2018

CAMP: Accurate modeling of core and memory locality for proxy generation of big-data applications.
Proceedings of the 2018 Design, Automation & Test in Europe Conference & Exhibition, 2018

BUQS: Battery- and user-aware QoS scaling for interactive mobile devices.
Proceedings of the 23rd Asia and South Pacific Design Automation Conference, 2018

ComP-net: command processor networking for efficient intra-kernel communications on GPUs.
Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, 2018

2017
Source-Level Performance, Energy, Reliability, Power and Thermal (PERPT) Simulation.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2017

LACross: Learning-Based Analytical Cross-Platform Performance and Power Prediction.
Int. J. Parallel Program., 2017

GPU triggered networking for intra-kernel communications.
Proceedings of the International Conference for High Performance Computing, 2017

CSALT: context switch aware large TLB.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

Prefetching for cloud workloads: An analysis based on address patterns.
Proceedings of the 2017 IEEE International Symposium on Performance Analysis of Systems and Software, 2017

Accurate address streams for LLC and beyond (SLAB): A methodology to enable system exploration.
Proceedings of the 2017 IEEE International Symposium on Performance Analysis of Systems and Software, 2017

Machine learning for performance and power modeling/prediction.
Proceedings of the 2017 IEEE International Symposium on Performance Analysis of Systems and Software, 2017

Rethinking TLB Designs in Virtualized Environments: A Very Large Part-of-Memory TLB.
Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

Cloud-Guided QoS and Energy Management for Mobile Interactive Web Applications.
Proceedings of the 4th IEEE/ACM International Conference on Mobile Software Engineering and Systems, 2017

SILC-FM: Subblocked InterLeaved Cache-Like Flat Memory Organization.
Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

POWSER: A novel user-experience based power management metric.
Proceedings of the Eighth International Green and Sustainable Computing Conference, 2017

Fine-Grain Program Snippets Generator for Mobile Core Design.
Proceedings of the on Great Lakes Symposium on VLSI 2017, 2017

Exploring Heterogeneous-ISA Core Architectures for High-Performance and Energy-Efficient Mobile SoCs.
Proceedings of the on Great Lakes Symposium on VLSI 2017, 2017

Sampling-based binary-level cross-platform performance estimation.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2017

High-level synthesis of approximate hardware under joint precision and voltage scaling.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2017

Statistical Pattern Based Modeling of GPU Memory Access Streams.
Proceedings of the 54th Annual Design Automation Conference, 2017

Proxy Benchmarks for Emerging Big-Data Workloads.
Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

2016
Dynamic Core Allocation and Packet Scheduling in Multicore Network Processors.
IEEE Trans. Computers, 2016

Extended task queuing: active messages for heterogeneous systems.
Proceedings of the International Conference for High Performance Computing, 2016

Genesys: Automatically generating representative training sets for predictive benchmarking.
Proceedings of the International Conference on Embedded Computer Systems: Architectures, 2016

Simulator calibration for accelerator-rich architecture studies.
Proceedings of the International Conference on Embedded Computer Systems: Architectures, 2016

Statistical quality modeling of approximate hardware.
Proceedings of the 17th International Symposium on Quality Electronic Design, 2016

Prefetching Techniques for Near-memory Throughput Processors.
Proceedings of the 2016 International Conference on Supercomputing, 2016

Optimizing GPGPU Kernel Summation for Performance and Energy Efficiency.
Proceedings of the 45th International Conference on Parallel Processing Workshops, 2016

Proxy-Guided Load Balancing of Graph Processing Workloads on Heterogeneous Clusters.
Proceedings of the 45th International Conference on Parallel Processing, 2016

Accurate phase-level cross-platform power and performance estimation.
Proceedings of the 53rd Annual Design Automation Conference, 2016

Fine-grained power analysis of emerging graph processing workloads for cloud operations management.
Proceedings of the 2016 IEEE International Conference on Big Data (IEEE BigData 2016), 2016

Identifying performance bottlenecks in Hive: Use of processor counters.
Proceedings of the 2016 IEEE International Conference on Big Data (IEEE BigData 2016), 2016

POSTER: SILC-FM: Subblocked InterLeaved Cache-Like Flat Memory Organization.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015
Mechanistic Modeling of Architectural Vulnerability Factor.
ACM Trans. Comput. Syst., 2015

GPGPU-MiniBench: Accelerating GPGPU Micro-Architecture Simulation.
IEEE Trans. Computers, 2015

BigDataBench-MT: A Benchmark Tool for Generating Realistic Mixed Data Center Workloads.
CoRR, 2015

Data partitioning strategies for graph workloads on heterogeneous clusters.
Proceedings of the International Conference for High Performance Computing, 2015

i-MIRROR: A Software Managed Die-Stacked DRAM-Based Memory Subsystem.
Proceedings of the 27th International Symposium on Computer Architecture and High Performance Computing, 2015

Performance Characterization of Modern Databases on Out-of-Order CPUs.
Proceedings of the 27th International Symposium on Computer Architecture and High Performance Computing, 2015

Watt Watcher: Fine-Grained Power Estimation for Emerging Workloads.
Proceedings of the 27th International Symposium on Computer Architecture and High Performance Computing, 2015

Learning-based analytical cross-platform performance prediction.
Proceedings of the 2015 International Conference on Embedded Computer Systems: Architectures, 2015

PowerTrain: A learning-based calibration of McPAT power models.
Proceedings of the IEEE/ACM International Symposium on Low Power Electronics and Design, 2015

GPGPU Benchmark Suites: How Well Do They Sample the Performance Spectrum?
Proceedings of the 44th International Conference on Parallel Processing, 2015

Learning-Based Power Modeling of System-Level Black-Box IPs.
Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 2015

Dynamic power and performance back-annotation for fast and accurate functional hardware simulation.
Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015

BigDataBench-MT: A Benchmark Tool for Generating Realistic Mixed Data Center Workloads.
Proceedings of the Big Data Benchmarks, Performance Optimization, and Emerging Hardware, 2015

2014
Cache Friendliness-Aware Managementof Shared Last-Level Caches for HighPerformance Multi-Core Systems.
IEEE Trans. Computers, 2014

Automatic Generation of Miniaturized Synthetic Proxies for Target Applications to Efficiently Design Multicore Processors.
IEEE Trans. Computers, 2014

Predictive Heterogeneity-Aware Application Scheduling for Chip Multiprocessors.
IEEE Trans. Computers, 2014

FastSpot: Host-compiled thermal estimation for early design space exploration.
Proceedings of the Fifteenth International Symposium on Quality Electronic Design, 2014

Data analytics workloads: Characterization and similarity analysis.
Proceedings of the IEEE 33rd International Performance Computing and Communications Conference, 2014

Control flow behavior of cloud workloads.
Proceedings of the 2014 IEEE International Symposium on Workload Characterization, 2014

Performance analysis of HPC applications with irregular tree data structures.
Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems, 2014

2013
Automating Stressmark Generation for Testing Processor Voltage Fluctuations.
IEEE Micro, 2013

Accelerating GPGPU architecture simulation.
Proceedings of the ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems, 2013

Flow Migration on Multicore Network Processors: Load Balancing While Minimizing Packet Reordering.
Proceedings of the 42nd International Conference on Parallel Processing, 2013

Performance boosting under reliability and power constraints.
Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 2013

Store-Load-Branch (SLB) predictor: A compiler assisted branch prediction for data dependent branches.
Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013

2012
Complete System Power Estimation Using Processor Performance Events.
IEEE Trans. Computers, 2012

AUDIT: Stress Testing the Automatic Way.
Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012

Power and performance analysis of network traffic prediction techniques.
Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2012

A first-order mechanistic model for architectural vulnerability factor.
Proceedings of the 39th International Symposium on Computer Architecture (ISCA 2012), 2012

Performance impact of virtual machine placement in a datacenter.
Proceedings of the 31st IEEE International Performance Computing and Communications Conference, 2012

Compiler Support for Value-Based Indirect Branch Prediction.
Proceedings of the Compiler Construction - 21st International Conference, 2012

Efficient traffic aware power management in multicore communications processors.
Proceedings of the Symposium on Architecture for Networking and Communications Systems, 2012

2011
Coordinating DRAM and Last-Level-Cache Policies with the Virtual Write Queue.
IEEE Micro, 2011

Core-Level Activity Prediction for Multicore Power Management.
IEEE J. Emerg. Sel. Topics Circuits Syst., 2011

Proprietary code to non-proprietary benchmarks: synthesis techniques for scalable benchmarks.
Proceedings of the ICPE'11, 2011

Modeling program resource demand using inherent program characteristics.
Proceedings of the SIGMETRICS 2011, 2011

Autocorrelation analysis: a new and improved method for measuring branch predictability.
Proceedings of the SIGMETRICS 2011, 2011

MAximum Multicore POwer (MAMPO): an automatic multithreaded synthetic power virus generation framework for multicore systems.
Proceedings of the Conference on High Performance Computing Networking, 2011

Minimalist open-page: a DRAM page-mode scheduling policy for the many-core era.
Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

Automated di/dt stressmark generation for microprocessor power delivery networks.
Proceedings of the 2011 International Symposium on Low Power Electronics and Design, 2011

Hierarchically characterizing CUDA program behavior.
Proceedings of the 2011 IEEE International Symposium on Workload Characterization, 2011

Autocorrelation analysis: A new and improved method for branch predictability characterization.
Proceedings of the 2011 IEEE International Symposium on Workload Characterization, 2011

Predictive coordination of multiple on-chip resources for chip multiprocessors.
Proceedings of the 25th International Conference on Supercomputing, 2011, Tucson, AZ, USA, May 31, 2011

MCFQ: Leveraging Memory-level Parallelism and Application's Cache Friendliness for Efficient Management of Quasi-partitioned Last-level Caches.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

2010
Elastic Refresh: Techniques to Mitigate Refresh Penalties in High Density Memory.
Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

AVF Stressmark: Towards an Automated Methodology for Bounding the Worst-Case Vulnerability to Soft Errors.
Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

CantorSim: Simplifying Acceleration of Micro-architecture Simulations.
Proceedings of the MASCOTS 2010, 2010

Synthesizing memory-level parallelism aware miniature clones for SPEC CPU2006 and ImplantBench workloads.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2010

The virtual write queue: coordinating DRAM and last-level cache policies.
Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010

Predictive Power Management for Multi-core Processors.
Proceedings of the Computer Architecture, 2010

A bandwidth-aware memory-subsystem resource management using non-invasive resource profilers for large CMP systems.
Proceedings of the 16th International Conference on High-Performance Computer Architecture (HPCA-16 2010), 2010

Value Based BTB Indexing for indirect jump prediction.
Proceedings of the 16th International Conference on High-Performance Computer Architecture (HPCA-16 2010), 2010

System-level max power (SYMPO): a systematic approach for escalating system-level power consumption using synthetic benchmarks.
Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

2009
Hardware Acceleration for Media/Transaction Applications in Network Processors.
IEEE Trans. Very Large Scale Integr. Syst., 2009

Embedded Java benchmark analysis on the ARM processor.
Int. J. Embed. Syst., 2009

A Tale of Two Processors: Revisiting the RISC-CISC Debate.
Proceedings of the Computer Performance Evaluation and Benchmarking, 2009

Generation, Validation and Analysis of SPEC CPU2006 Simulation Points Based on Branch, Memory and TLB Characteristics.
Proceedings of the Computer Performance Evaluation and Benchmarking, 2009

ESKIMO: Energy savings using Semantic Knowledge of Inconsequential Memory Occupancy for DRAM subsystem.
Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

TSS: Applying two-stage sampling in micro-architecture simulations.
Proceedings of the 17th Annual Meeting of the IEEE/ACM International Symposium on Modelling, 2009

Bank-aware Dynamic Cache Partitioning for Multicore Architectures.
Proceedings of the ICPP 2009, 2009

Compiler Controlled Speculation for Power Aware ILP Extraction in Dataflow Architectures.
Proceedings of the High Performance Embedded Architectures and Compilers, 2009

Efficient program scheduling for heterogeneous multi-core processors.
Proceedings of the 46th Design Automation Conference, 2009

Loop-Aware Instruction Scheduling with Dynamic Contention Tracking for Tiled Dataflow Architectures.
Proceedings of the Compiler Construction, 18th International Conference, 2009

2008
Distilling the essence of proprietary workloads into miniature benchmarks.
ACM Trans. Archit. Code Optim., 2008

Analysing and improving clustering based sampling for microprocessor simulation.
Int. J. High Perform. Comput. Netw., 2008

On the representativeness of embedded Java benchmarks.
Proceedings of the 4th International Symposium on Workload Characterization (IISWC 2008), 2008

Energy-aware application scheduling on a heterogeneous multi-core system.
Proceedings of the 4th International Symposium on Workload Characterization (IISWC 2008), 2008

Analysis of dynamic power management on multi-core processors.
Proceedings of the 22nd Annual International Conference on Supercomputing, 2008

A Performance Counter Based Workload Characterization on Blue Gene/P.
Proceedings of the 2008 International Conference on Parallel Processing, 2008

Simulation points for SPEC CPU 2006.
Proceedings of the 26th International Conference on Computer Design, 2008

Automated microprocessor stressmark generation.
Proceedings of the 14th International Conference on High-Performance Computer Architecture (HPCA-14 2008), 2008

Archer: A Community Distributed Computing Infrastructure for Computer Architecture Research and Education.
Proceedings of the Collaborative Computing: Networking, 2008

2007
OS-Aware Branch Prediction: Improving Microprocessor Control Flow Prediction for Operating Systems.
IEEE Trans. Computers, 2007

Applying Statistical Sampling for Fast and Efficient Simulation of Commercial Workloads.
IEEE Trans. Computers, 2007

Subsetting the SPEC CPU2006 benchmark suite.
SIGARCH Comput. Archit. News, 2007

Hardware Efficient Piecewise Linear Branch Predictor.
Proceedings of the 20th International Conference on VLSI Design (VLSI Design 2007), 2007

Complete System Power Estimation: A Trickle-Down Approach Based on Performance Events.
Proceedings of the 2007 IEEE International Symposium on Performance Analysis of Systems and Software, 2007

Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite.
Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), 2007

Exploring the Application Behavior Space Using Parameterized Synthetic Benchmarks.
Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007

2006
Hybrid-Scheduling for Reduced Energy Consumption in High-Performance Processors.
IEEE Trans. Very Large Scale Integr. Syst., 2006

Architectural enhancements for network congestion control applications.
IEEE Trans. Very Large Scale Integr. Syst., 2006

Measuring Benchmark Similarity Using Inherent Program Characteristics.
IEEE Trans. Computers, 2006

Effective management of multiple configurable units using dynamic optimization.
ACM Trans. Archit. Code Optim., 2006

Operating system power minimization through run-time processor resource adaptation.
Microprocess. Microsystems, 2006

The Future of Simulation: A Field of Dreams.
Computer, 2006

Impact of virtual execution environments on processor energy consumption and hardware adaptation.
Proceedings of the 2nd International Conference on Virtual Execution Environments, 2006

Evaluating the efficacy of statistical simulation for design space exploration.
Proceedings of the 2006 IEEE International Symposium on Performance Analysis of Systems and Software, 2006

Automatic testcase synthesis and performance model validation for high performance PowerPC processors.
Proceedings of the 2006 IEEE International Symposium on Performance Analysis of Systems and Software, 2006

Power phase variation in a commercial server workload.
Proceedings of the 2006 International Symposium on Low Power Electronics and Design, 2006

OS-aware tuning: improving instruction cache energy efficiency on system workloads.
Proceedings of the 25th IEEE International Performance Computing and Communications Conference, 2006

Avoiding store misses to fully modified cache blocks.
Proceedings of the 25th IEEE International Performance Computing and Communications Conference, 2006

Evaluating Benchmark Subsetting Approaches.
Proceedings of the 2006 IEEE International Symposium on Workload Characterization, 2006

Performance Cloning: A Technique for Disseminating Proprietary Applications as Benchmarks.
Proceedings of the 2006 IEEE International Symposium on Workload Characterization, 2006

Performance prediction based on inherent program similarity.
Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques (PACT 2006), 2006

2005
Reducing Server Data Traffic Using a Hierarchical Computation Model.
IEEE Trans. Parallel Distributed Syst., 2005

Implications of Executing Compression and Encryption Applications on General Purpose Processors.
IEEE Trans. Computers, 2005

Adapting branch-target buffer to improve the target predictability of java code.
ACM Trans. Archit. Code Optim., 2005

SMA: A Self-Monitored Adaptive Cache Warm-Up Scheme for Microprocessor Simulation.
Int. J. Parallel Program., 2005

BLRL: Accurate and Efficient Warmup for Sampled Processor Simulation.
Comput. J., 2005

Analyzing and Improving Clustering Based Sampling for Microprocessor Simulation.
Proceedings of the 17th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2005), 2005

Measuring Program Similarity: Experiments with SPEC CPU Benchmark Suites.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005

Runtime identification of microprocessor energy saving opportunities.
Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005

On sampling unit size in sampled microprocessor simulation.
Proceedings of the 24th IEEE International Performance Computing and Communications Conference, 2005

Low-power, low-complexity instruction issue using compiler assistance.
Proceedings of the 19th Annual International Conference on Supercomputing, 2005

Improved automatic testcase synthesis for performance model validation.
Proceedings of the 19th Annual International Conference on Supercomputing, 2005

Simulating Commercial Java Throughput Workloads: A Case Study.
Proceedings of the 23rd International Conference on Computer Design (ICCD 2005), 2005

Effective Adaptive Computing Environment Management via Dynamic Optimization.
Proceedings of the 3nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2005), 2005

Architectural Support for Accelerating Congestion Control Applications in Network Processors.
Proceedings of the 16th IEEE International Conference on Application-Specific Systems, 2005

2004
Locality-Based Online Trace Compression.
IEEE Trans. Computers, 2004

More on finding a single number to indicate overall performance of a benchmark suite.
SIGARCH Comput. Archit. News, 2004

Scaling to the End of Silicon with EDGE Architectures.
Computer, 2004

Efficiently Evaluating Speedup Using Sampled Processor Simulation.
IEEE Comput. Archit. Lett., 2004

Improving Server Performance on Transaction Processing Workloads by Enhanced Data Placement.
Proceedings of the 16th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2004), 2004

Self-Monitored Adaptive Cache Warm-Up for Microprocessor Simulation.
Proceedings of the 16th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2004), 2004

Control Flow Modeling in Statistical Simulation for Accurate and Efficient Processor Design Studies.
Proceedings of the 31st International Symposium on Computer Architecture (ISCA 2004), 2004

Analysis of the Execution of a Next Generation Application on Superscalar and Grid Processors.
Proceedings of the 10th International Conference on Parallel and Distributed Systems, 2004

2003
Bottlenecks in Multimedia Processing with SIMD Style Extensions and Architectural Enhancements.
IEEE Trans. Computers, 2003

The Role of Return Value Prediction in Exploiting Speculative Method-Level Parallelism.
J. Instr. Level Parallelism, 2003

Benchmarking Internet Servers on Superscalar Machines.
Computer, 2003

Interface Design Techniques for Single-Chip Systems.
Proceedings of the 16th International Conference on VLSI Design (VLSI Design 2003), 2003

Run-time modeling and estimation of operating system power consumption.
Proceedings of the International Conference on Measurements and Modeling of Computer Systems, 2003

Exploiting compiler-generated schedules for energy savings in high-performance processors.
Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003

Routine based OS-aware microprocessor resource adaptation for run-time operating system power saving.
Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003

On load latency in low-power caches.
Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003

Improving Dynamic Cluster Assignment for Clustered Trace Cache Processors.
Proceedings of the 30th International Symposium on Computer Architecture (ISCA 2003), 2003

NpBench: A Benchmark Suite for Control plane and Data plane Applications for Network Processors.
Proceedings of the 21st International Conference on Computer Design (ICCD 2003), 2003

2002
Modeling and Evaluation of Control Flow Prediction Schemes Using Complete System Simulation and Java Workloads.
Proceedings of the 10th International Workshop on Modeling, 2002

Latency and energy aware value prediction for high-frequency processors.
Proceedings of the 16th international conference on Supercomputing, 2002

Using Complete Machine Simulation for Software Power Estimation: The SoftWatt Approach.
Proceedings of the Eighth International Symposium on High-Performance Computer Architecture (HPCA'02), 2002

Rehashable BTB: An Adaptive Branch Target Buffer to Improve the Target Predictability of Java Code.
Proceedings of the High Performance Computing, 2002

Understanding and improving operating system effects in control flow prediction.
Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-X), 2002

Implications of Programmable General Purpose Processors for Compression/Encryption Applications.
Proceedings of the 13th IEEE International Conference on Application-Specific Systems, 2002

2001
Java Runtime Systems: Characterization and Architectural Implications.
IEEE Trans. Computers, 2001

ADir_pNB: A Cost-Effective Way to Implement Full Map Directory-Based Cache Coherence Protocols.
IEEE Trans. Computers, 2001

MediaBreeze: a decoupled architecture for accelerating multimedia applications.
SIGARCH Comput. Archit. News, 2001

Workload characterization of multithreaded java servers.
Proceedings of the 2001 IEEE International Symposium on Performance Analysis of Systems and Software, 2001

Understanding control flow transfer and its predictability in java processing.
Proceedings of the 2001 IEEE International Symposium on Performance Analysis of Systems and Software, 2001

Improving Java performance using hardware translation.
Proceedings of the 15th international conference on Supercomputing, 2001

Cost-effective Hardware Acceleration of Multimedia Applications.
Proceedings of the 19th International Conference on Computer Design (ICCD 2001), 2001

2000
Data Placement Schemes to Reduce Conflicts in Interleaved Memories.
Comput. J., 2000

Issues in the design of store buffers in dynamically scheduled processors.
Proceedings of the 2000 IEEE International Symposium on Performance Analysis of Systems and Software, 2000

Allowing for ILP in an embedded Java processor.
Proceedings of the 27th International Symposium on Computer Architecture (ISCA 2000), 2000

Using complete system simulation to characterize SPECjvm98 benchmarks.
Proceedings of the 14th international conference on Supercomputing, 2000

Evaluating Signal Processing and Multimedia Applications on SIMD, VLIW and Superscalar Architectures.
Proceedings of the IEEE International Conference On Computer Design: VLSI In Computers & Processors, 2000

Architectural Issues in Java Runtime Systems.
Proceedings of the Sixth International Symposium on High-Performance Computer Architecture, 2000

1999
Memory Chips with Adjustable Configurations.
VLSI Design, 1999

Annex cache: a cache assist to implement selective caching.
Microprocess. Microsystems, 1999

Formal Verification of a Snoop-Based Cache Coherence Protocol Using Symbolic Model Checking.
Proceedings of the 12th International Conference on VLSI Design (VLSI Design 1999), 1999

Contrasting branch characteristics and branch predictor performance of C++ and C programs.
Proceedings of the IEEE International Performance Computing and Communications Conference, 1999

Accurately modeling speculative instruction fetching in trace-driven simulation.
Proceedings of the IEEE International Performance Computing and Communications Conference, 1999

Exploiting SIMD parallelism in DSP and multimedia algorithms using the AltiVec technology.
Proceedings of the 13th international conference on Supercomputing, 1999

On the Use of Pseudorandom Sequences for High Speed Resource Allocators in Superscalar Processors.
Proceedings of the IEEE International Conference On Computer Design, 1999

Characterization of Java Applications at Bytecode and Ultra-SPARC Machine Code Levels.
Proceedings of the IEEE International Conference On Computer Design, 1999

Performance Evaluation of Configurable Hardware Features on the AMD-K5.
Proceedings of the IEEE International Conference On Computer Design, 1999

A Novel Low Power Energy Recovery Full Adder Cell.
Proceedings of the 9th Great Lakes Symposium on VLSI (GLS-VLSI '99), 1999

Performance Evaluation and Benchmarking of Native Signal Processing.
Proceedings of the Euro-Par '99 Parallel Processing, 5th International Euro-Par Conference, Toulouse, France, August 31, 1999

A Performance Study of Modern Web Server Applications.
Proceedings of the Euro-Par '99 Parallel Processing, 5th International Euro-Par Conference, Toulouse, France, August 31, 1999

1998
A dynamically reconfigurable interconnect for array processors.
IEEE Trans. Very Large Scale Integr. Syst., 1998

The undergraduate curriculum in the electrical and computer engineering department at the University of Texas at Austin.
Proceedings of the 1998 workshop on Computer architecture education, 1998

Novel Memory Bus Driver/Receiver Architecture for Higher Throughput.
Proceedings of the 11th International Conference on VLSI Design (VLSI Design 1991), 1998

Evaluating MMX Technology Using DSP and Multimedia Applications.
Proceedings of the 31st Annual IEEE/ACM International Symposium on Microarchitecture, 1998

Execution characteristics of object oriented programs on the UltraSPARC-II.
Proceedings of the 5th International Conference On High Performance Computing, 1998

Hybrid Tree: A Scalable Optoelectronic Interconnection Network for Parallel Computing.
Proceedings of the Thirty-First Annual Hawaii International Conference on System Sciences, 1998

Modeling and Analysis of The Difference-Bit Cache.
Proceedings of the 8th Great Lakes Symposium on VLSI (GLS-VLSI '98), 1998

1997
Design and Performance Evaluation of a Cache Assist to implement Selective Caching.
Proceedings of the Proceedings 1997 International Conference on Computer Design: VLSI in Computers & Processors, 1997

1996
Performance Model for a Prioritized Multiple-Bus Multiprocessor System.
IEEE Trans. Computers, 1996

VaWiRAM: a variable width random access memory module.
Proceedings of the 9th International Conference on VLSI Design (VLSI Design 1996), 1996

Improving the parallelism and concurrency in decoupled architectures.
Proceedings of the Eighth IEEE Symposium on Parallel and Distributed Processing, 1996

1995
Design of a highly reconfigurable interconnect for array processors.
Proceedings of the 8th International Conference on VLSI Design (VLSI Design 1995), 1995

Program Balance and Its Impact on High Performance RISC Architectures.
Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture (HPCA 1995), 1995

A comparative evaluation of software techniques to hide memory latency.
Proceedings of the 28th Annual Hawaii International Conference on System Sciences (HICSS-28), 1995

1994
Memory Latency Effects in Decoupled Architectures.
IEEE Trans. Computers, 1994

Module Partitioning and Interlaced Data Placement Schemes to Reduce Conflicts in Interleaved Memories.
Proceedings of the 1994 International Conference on Parallel Processing, 1994

1992
Design and VLSI implementation of an access processor for a decoupled architecture.
Microprocess. Microsystems, 1992

Memory Latency Effects in Decoupled Architectures With a Single Data Memory Module.
Proceedings of the 19th Annual International Symposium on Computer Architecture. Gold Coast, 1992

1991
Classification and Performance Evaluation of Instruction Buffering Techniques.
Proceedings of the 18th Annual International Symposium on Computer Architecture. Toronto, 1991

Effect of Hot Spots on Multiprocessor Systems Using Circuit Switched Interconnection Networks.
Proceedings of the International Conference on Parallel Processing, 1991


  Loading...