Hsien-Hsin S. Lee

Orcid: 0000-0002-8926-8243

Affiliations:
  • Intel
  • Facebook (former)
  • Taiwan Semiconductor Manufacturing (former)
  • Georgia Institute of Technology, Atlanta GA, USA (former)
  • University of Michigan, USA (former)


According to our database1, Hsien-Hsin S. Lee authored at least 136 papers between 1974 and 2024.

Collaborative distances:

Awards

IEEE Fellow

IEEE Fellow 2017, "For contributions to 3D integrated circuits and computer architecture".

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Beyond Wires: The Future of Interconnects.
IEEE Micro, 2024

Computing With COOL Chips.
IEEE Micro, 2024

GPU-based Private Information Retrieval for On-Device Machine Learning Inference.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023
Exploring Memory-Oriented Design Optimization of Edge AI Hardware for Extended Reality Applications.
IEEE Micro, 2023

Architectural CO<sub>2</sub> Footprint Tool: Designing Sustainable Computer Systems With an Architectural Carbon Modeling Tool.
IEEE Micro, 2023

Information Flow Control in Machine Learning through Modular Model Architecture.
CoRR, 2023

Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference.
CoRR, 2023

Cocktail Party Attack: Breaking Aggregation-Based Privacy in Federated Learning Using Independent Component Analysis.
Proceedings of the International Conference on Machine Learning, 2023

MACTA: A Multi-agent Reinforcement Learning Approach for Cache Timing Attacks and Detection.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

AutoCAT: Reinforcement Learning for Automated Exploration of Cache-Timing Attacks.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

2022
Near-Memory Processing in Action: Accelerating Personalized Recommendation With AxDIMM.
IEEE Micro, 2022

Chasing Carbon: The Elusive Environmental Footprint of Computing.
IEEE Micro, 2022

Data Leakage via Access Patterns of Sparse Features in Deep Learning-based Recommendation Systems.
CoRR, 2022

DisaggRec: Architecting Disaggregated Systems for Large-Scale Personalized Recommendation.
CoRR, 2022

AutoCAT: Reinforcement Learning for Automated Exploration of Cache Timing-Channel Attacks.
CoRR, 2022

Memory-Oriented Design-Space Exploration of Edge-AI Hardware for XR Applications.
CoRR, 2022


Characterization of MPC-based Private Inference for Transformer-based Models.
Proceedings of the International IEEE Symposium on Performance Analysis of Systems and Software, 2022

ACT: designing sustainable computer systems with an architectural carbon modeling tool.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

Hercules: Heterogeneity-Aware Inference Serving for At-Scale Personalized Recommendation.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

2021
Special Issue on Commercial Products 2021.
IEEE Micro, 2021

SecNDP: Secure Near-Data Processing with Untrusted Memory.
IACR Cryptol. ePrint Arch., 2021

Sustainable AI: Environmental Implications, Challenges and Opportunities.
CoRR, 2021

RecPipe: Co-designing Models and Hardware to Jointly Optimize Recommendation Quality and Performance.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

Cheetah: Optimizing and Accelerating Homomorphic Encryption for Private Inference.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

2020
Cheetah: Optimizations and Methods for PrivacyPreserving Inference via Homomorphic Encryption.
CoRR, 2020

RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing.
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

DeepRecSys: A System for Optimizing End-To-End At-Scale Neural Recommendation Inference.
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

The Architectural Implications of Facebook's DNN-Based Personalized Recommendation.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020

2019
The Architectural Implications of Facebook's DNN-based Personalized Recommendation.
CoRR, 2019

2018
Automotive Computing.
IEEE Micro, 2018

2017
Fault clustering technique for 3D memory BISR.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2017

2015
Design and Analysis of 3D-MAPS (3D Massively Parallel Processor with Stacked Memory).
IEEE Trans. Computers, 2015

IC design challenges and opportunities for advanced process technology.
Proceedings of the VLSI Design, Automation and Test, 2015

COMPSAC 2015 Plenary Panel on "Rebooting Computing".
Proceedings of the 39th IEEE Annual Computer Software and Applications Conference, 2015

2014
GPUMech: GPU Performance Modeling Technique Based on Interval Analysis.
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

TBPoint: Reducing Simulation Time for Large-Scale GPGPU Kernels.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

ATAC: Ambient Temperature-Aware Capping for Power Efficient Datacenters.
Proceedings of the ACM Symposium on Cloud Computing, 2014

Cache-conscious graph collaborative filtering on multi-socket multicore systems.
Proceedings of the Computing Frontiers Conference, CF'14, 2014

2013
Pragmatic Integration of an SRAM Row Cache in Heterogeneous 3-D DRAM Architecture Using TSV.
IEEE Trans. Very Large Scale Integr. Syst., 2013

An efficient scheduling scheme using estimated execution time for heterogeneous computing systems.
J. Supercomput., 2013

The quest for a new dimension of system integration.
Proceedings of the 2013 International Symposium on VLSI Design, Automation, and Test, 2013

Tri-level-cell phase change memory: toward an efficient and reliable memory system.
Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013

Reducing False Transactional Conflicts with Speculative Sub-Blocking State - An Empirical Study for ASF Transactional Memory System.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

2012
SimWare: A Holistic Warehouse-Scale Computer Simulator.
Computer, 2012


Adaptive Dynamic Frequency Scaling for Thermal-Aware 3D Multi-core Processors.
Proceedings of the Computational Science and Its Applications - ICCSA 2012, 2012

Migration energy-aware workload consolidation in enterprise clouds.
Proceedings of the 4th IEEE International Conference on Cloud Computing Technology and Science Proceedings, 2012

2011
Integrated microarchitectural floorplanning and run-time controller for inductive noise mitigation.
ACM Trans. Design Autom. Electr. Syst., 2011

Low-Power Clock Tree Design for Pre-Bond Testing of 3-D Stacked ICs.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2011

Security Refresh: Protecting Phase-Change Memory against Malicious Wear Out.
IEEE Micro, 2011

Data Prefetching by Exploiting Global and Local Access Patterns.
J. Instr. Level Parallelism, 2011

Using Mathematical Modeling in Provisioning a Heterogeneous Cloud Computing Environment.
Computer, 2011

Symbiotic Scheduling for Shared Caches in Multi-core Systems Using Memory Footprint Signature.
Proceedings of the International Conference on Parallel Processing, 2011

Designing 3D test wrappers for pre-bond and post-bond test of 3D embedded cores.
Proceedings of the IEEE 29th International Conference on Computer Design, 2011

Ally: OS-Transparent Packet Inspection Using Sequestered Cores.
Proceedings of the 2011 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS), 2011

Global Built-In Self-Repair for 3D memories with redundancy sharing and parallel testing.
Proceedings of the 2011 IEEE International 3D Systems Integration Conference (3DIC), Osaka, Japan, January 31, 2011

2010
Chameleon: Virtualizing idle acceleration cores of a heterogeneous multicore processor for caching and prefetching.
ACM Trans. Archit. Code Optim., 2010

A low-cost memory remapping scheme for address bus protection.
J. Parallel Distributed Comput., 2010

Architecture/OS Support for Embedded Multi-core Systems.
Comput. J., 2010

SAFER: Stuck-At-Fault Error Recovery for Memories.
Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

Security refresh: prevent malicious wear-out and increase durability for phase-change memory with dynamically randomized address mapping.
Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010

An optimized 3D-stacked memory architecture by exploiting excessive, high-density TSV bandwidth.
Proceedings of the 16th International Conference on High-Performance Computer Architecture (HPCA-16 2010), 2010

Design and analysis of 3D-MAPS: A many-core 3D processor with stacked memory.
Proceedings of the IEEE Custom Integrated Circuits Conference, 2010

COMPASS: a programmable data prefetcher using idle GPU shaders.
Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, 2010

2009
PROPHET: goal-oriented provisioning for highly tunable multicore processors in cloud computing.
ACM SIGOPS Oper. Syst. Rev., 2009

Test Challenges for 3D Integrated Circuits.
IEEE Des. Test Comput., 2009

High Performance Non-blocking Switch Design in 3D Die-Stacking Technology.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2009

Testing Circuit-Partitioned 3D IC Designs.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2009

Way guard: a segmented counting bloom filter approach to reducing energy for set-associative caches.
Proceedings of the 2009 International Symposium on Low Power Electronics and Design, 2009

Pre-bond testable low-power clock tree design for 3D stacked ICs.
Proceedings of the 2009 International Conference on Computer-Aided Design, 2009

Thermal optimization in multi-granularity multi-core floorplanning.
Proceedings of the 14th Asia South Pacific Design Automation Conference, 2009

Architectural evaluation of 3D stacked RRAM caches.
Proceedings of the IEEE International Conference on 3D System Integration, 2009

2008
POD: A 3D-Integrated Broad-Purpose Acceleration Layer.
IEEE Micro, 2008

DLL-conscious instruction fetch optimization for SMT processors.
J. Syst. Archit., 2008

Extending Amdahl's Law for Energy-Efficient Computing in the Many-Core Era.
Computer, 2008

Kicking the tires of software transactional memory: why the going gets tough.
Proceedings of the SPAA 2008: Proceedings of the 20th Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2008

Adaptive transaction scheduling for transactional memory systems.
Proceedings of the SPAA 2008: Proceedings of the 20th Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2008

Improving TLB energy for java applications on JVM.
Proceedings of the 2008 International Conference on Embedded Computer Systems: Architectures, 2008

SHARK: Architectural support for autonomic protection against stealth by rootkit exploits.
Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-41 2008), 2008

Total Recall: A Debugging Framework for GPUs.
Proceedings of the EUROGRAPHICS/ACM SIGGRAPH Conference on Graphics Hardware 2008, 2008

Exploiting access semantics and program behavior to reduce snoop power in chip multiprocessors.
Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems, 2008

A unified methodology for power supply noise reduction in modern microarchitecture design.
Proceedings of the 13th Asia South Pacific Design Automation Conference, 2008

2007
Memory-Centric Security Architecture.
Trans. High Perform. Embed. Archit. Compil., 2007

Multiobjective Microarchitectural Floorplanning for 2-D and 3-D ICs.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2007

Reducing Cache Pollution via Dynamic Data Prefetch Filtering.
IEEE Trans. Computers, 2007

Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs.
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-40 2007), 2007

A scanisland based design enabling prebond testability in die-stacked microprocessors.
Proceedings of the 2007 IEEE International Test Conference, 2007

Hierarchical Means: Single Number Benchmarking with Workload Cluster Analysis.
Proceedings of the IEEE 10th International Symposium on Workload Characterization, 2007

Virtual Exclusion: An architectural approach to reducing leakage energy in caches for multiprocessor systems.
Proceedings of the 13th International Conference on Parallel and Distributed Systems, 2007

Optimizing Katsevich image reconstruction algorithm on multicore processors.
Proceedings of the 13th International Conference on Parallel and Distributed Systems, 2007

An FPGA Approach to Quantifying Coherence Traffic Efficiency on Multiprocessor Systems.
Proceedings of the FPL 2007, 2007

Accelerating memory decryption and authentication with frequent value prediction.
Proceedings of the 4th Conference on Computing Frontiers, 2007

Noise-Direct: A Technique for Power Supply Noise Aware Floorplanning Using Microarchitecture Profiling.
Proceedings of the 12th Conference on Asia South Pacific Design Automation, 2007

2006
Profile-guided microarchitectural floor planning for deep submicron processor design.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2006

M-TREE: A high efficiency security architecture for protecting integrity and privacy of software.
J. Parallel Distributed Comput., 2006

Authentication Control Point and Its Implications For Secure Processor Design.
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-39 2006), 2006

A Floorplan-Aware Dynamic Inductive Noise Controller for Reliable Processor Design.
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-39 2006), 2006

An Integrated Framework for Dependable and Revivable Architectures Using Multicore Processors.
Proceedings of the 33rd International Symposium on Computer Architecture (ISCA 2006), 2006

Constructing a Non-Linear Model with Neural Networks for Workload Characterization.
Proceedings of the 2006 IEEE International Symposium on Workload Characterization, 2006

InfoShield: a security architecture for protecting information usage in memory.
Proceedings of the 12th International Symposium on High-Performance Computer Architecture, 2006

A Digital Rights Enabled Graphics Processing System.
Proceedings of the 21st ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware, 2006

Microarchitectural floorplanning under performance and thermal tradeoff.
Proceedings of the Conference on Design, Automation and Test in Europe, 2006

Reducing energy of virtual cache synonym lookup using bloom filters.
Proceedings of the 2006 International Conference on Compilers, 2006

Entropy-based low power data TLB design.
Proceedings of the 2006 International Conference on Compilers, 2006

Efficient System-on-Chip Energy Management with a Segmented Bloom Filter.
Proceedings of the Architecture of Computing Systems, 2006

A low-cost memory remapping scheme for address bus protection.
Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques (PACT 2006), 2006

2005
Towards the issues in architectural support for protection of software execution.
SIGARCH Comput. Archit. News, 2005

Synonymous address compaction for energy reduction in data TLB.
Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005

Wire-driven microarchitectural design space exploration.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2005), 2005

High Efficiency Counter Mode Security Architecture via Prediction and Precomputation.
Proceedings of the 32st International Symposium on Computer Architecture (ISCA 2005), 2005

An Intrusion-Tolerant and Self-Recoverable Network Service System Using A Security Enhanced Chip Multiprocessor.
Proceedings of the Second International Conference on Autonomic Computing (ICAC 2005), 2005

Cache coherence support for non-shared bus architecture on heterogeneous MPSoCs.
Proceedings of the 42nd Design Automation Conference, 2005

Owl: next generation system monitoring.
Proceedings of the Second Conference on Computing Frontiers, 2005

2004
Integrating Cache Coherence Protocols for Heterogeneous Multiprocessor Systems, Part 2.
IEEE Micro, 2004

Integrating Cache Coherence Protocols for Heterogeneous Multiprocessor Systems, Part 1.
IEEE Micro, 2004

CoolPression - a hybrid significance compression technique for reducing energy in caches.
Proceedings of the Proceedings 2004 IEEE International SOC Conference, 2004

Attacks and risk analysis for hardware supported software copy protection systems.
Proceedings of the 2004 ACM Workshop on Digital Rights Management 2004, Washington, 2004

Supporting Cache Coherence in Heterogeneous Multiprocessor Systems.
Proceedings of the 2004 Design, 2004

Profile-guided microarchitectural floorplanning for deep submicron processor design.
Proceedings of the 41th Design Automation Conference, 2004

Hardware assisted control flow obfuscation for embedded processors.
Proceedings of the 2004 International Conference on Compilers, 2004

Choice Predictor for Free.
Proceedings of the Advances in Computer Systems Architecture, 9th Asia-Pacific Conference, 2004

Architectural Support for High Speed Protection of Memory Integrity and Confidentiality in Multiprocessor Systems.
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques (PACT 2004), 29 September, 2004

2003
Energy-Efficient Network Memory for Ubiquitous Devices.
IEEE Micro, 2003

Energy efficient D-TLB and data cache using semantic-aware multilateral partitioning.
Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003

A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches.
Proceedings of the 32nd International Conference on Parallel Processing (ICPP 2003), 2003

Algorithm for Achieving Minimum Energy Consumption in CMOS Circuits Using Multiple Supply and Threshold Voltages at the Module Level.
Proceedings of the 2003 International Conference on Computer-Aided Design, 2003

Predicate-Aware Scheduling: A Technique for Reducing Resource Constraints.
Proceedings of the 1st IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2003), 2003

2001
Improving energy and performance of data cache architectures by exploiting memory reference characteristics.
PhD thesis, 2001

Improving Bandwidth Utilization using Eager Writeback.
J. Instr. Level Parallelism, 2001

Stack Value File: Custom Microarchitecture for the Stack.
Proceedings of the Seventh International Symposium on High-Performance Computer Architecture (HPCA'01), 2001

2000
Eager writeback - a technique for improving bandwidth utilization.
Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture, 2000

Quantifying instruction-level parallelism limits on an EPIC architecture.
Proceedings of the 2000 IEEE International Symposium on Performance Analysis of Systems and Software, 2000

Region-based caching: an energy-delay efficient memory architecture for embedded processors.
Proceedings of the 2000 International Conference on Compilers, 2000

1994
A Hierarchical Approach to Modeling and Improving the Performance of Scientific Applications on the KSR1.
Proceedings of the 1994 International Conference on Parallel Processing, 1994

1974
Redundancy Testing in Combinational Networks.
IEEE Trans. Computers, 1974


  Loading...