Sheng Li

Orcid: 0000-0003-1068-5261

Affiliations:
  • Google, Mountain View, CA, USA
  • Intel Labs, Santa Clara, CA, USA (former)
  • Hewlett-Packard Labs (former)
  • University of Notre Dame, IN, USA (former)


According to our database1, Sheng Li authored at least 34 papers between 2007 and 2023.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2023
Experts Weights Averaging: A New General Training Scheme for Vision Transformers.
CoRR, 2023

FLIQS: One-Shot Mixed-Precision Floating-Point and Integer Quantization Search.
CoRR, 2023

Lightwave Fabrics: At-Scale Optical Circuit Switching for Datacenter and Machine Learning Systems.
Proceedings of the ACM SIGCOMM 2023 Conference, 2023

TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

TripLe: Revisiting Pretrained Model Reuse and Progressive Learning for Efficient Vision Transformer Scaling and Searching.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Hyperscale Hardware Optimized Neural Architecture Search.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2021
The Design Process for Google's Training Chips: TPUv2 and TPUv3.
IEEE Micro, 2021

Ten Lessons From Three Generations Shaped Google's TPUv4i : Industrial Product.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

NeuroMeter: An Integrated Power, Area, and Timing Modeling Framework for Machine Learning Accelerators Industry Track Paper.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

Searching for Fast Model Families on Datacenter Accelerators.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020
A domain-specific supercomputer for training deep neural networks.
Commun. ACM, 2020

Google's Training Chips Revealed: TPUv2 and TPUv3.
Proceedings of the IEEE Hot Chips 32 Symposium, 2020

2019
Parallelizing Word2Vec in Shared and Distributed Memory.
IEEE Trans. Parallel Distributed Syst., 2019

2017
Work as a team or individual: Characterizing the system-level impacts of main memory partitioning.
Proceedings of the 2017 IEEE International Symposium on Workload Characterization, 2017

2016
Full-Stack Architecting to Achieve a Billion-Requests-Per-Second Throughput on a Single Key-Value Store Server Platform.
ACM Trans. Comput. Syst., 2016

Achieving One Billion Key-Value Requests per Second on a Single Server.
IEEE Micro, 2016

Parallelizing Word2Vec in Multi-Core and Many-Core Architectures.
CoRR, 2016

Large Pages on Steroids: Small Ideas to Accelerate Big Memory Applications.
IEEE Comput. Archit. Lett., 2016

2015
Buri: Scaling Big-Memory Computing with Hardware-Based Memory Expansion.
ACM Trans. Archit. Code Optim., 2015

Architecting to achieve a billion requests per second throughput on a single key-value store server platform.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

History-Assisted Adaptive-Granularity Caches (HAAG$) for High Performance 3D DRAM Architectures.
Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

2013
The McPAT Framework for Multicore and Manycore Architectures: Simultaneously Modeling Power, Area, and Timing.
ACM Trans. Archit. Code Optim., 2013

Kiln: closing the performance gap between systems with and without persistence support.
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

McSimA+: A manycore simulator with application-level+ simulation and detailed microarchitecture modeling.
Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2013

2012
MAGE: adaptive granularity and ECC for resilient and power efficient memory systems.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

CACTI-3DD: Architecture-level modeling for 3D die-stacked DRAM main memory.
Proceedings of the 2012 Design, Automation & Test in Europe Conference & Exhibition, 2012

2011
Lightweight Chip Multi-Threading (LCMT): Maximizing Fine-Grained Parallelism On-Chip.
IEEE Trans. Parallel Distributed Syst., 2011

System implications of memory reliability in exascale computing.
Proceedings of the Conference on High Performance Computing Networking, 2011

System-level integrated server architectures for scale-out datacenters.
Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

CACTI-P: Architecture-level modeling for SRAM-based structures with advanced leakage reduction techniques.
Proceedings of the 2011 IEEE/ACM International Conference on Computer-Aided Design, 2011

2009
McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures.
Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

2008
Memory model effects on application performance for a lightweight multithreaded architecture.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Design of a mask-programmable memory/multiplier array using G4-FET technology.
Proceedings of the 45th Design Automation Conference, 2008

2007
A Heterogeneous Lightweight Multithreaded Architecture.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007


  Loading...