Peipei Zhou

Orcid: 0000-0002-0493-1844

Affiliations:
  • University of Pittsburgh, PA, USA
  • University of California, Los Angeles, CA, USA (Ph.D.)


According to our database1, Peipei Zhou authored at least 32 papers between 2014 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
SSR: Spatial Sequential Hybrid Architecture for Latency Throughput Tradeoff in Transformer Acceleration.
CoRR, 2024

Towards Carbon Modeling of Cloud Servers with Accelerators.
CoRR, 2024

2023
Sustainable AI Processing at the Edge.
IEEE Micro, 2023

REFRESH FPGAs: Sustainable FPGA Chiplet Architectures.
CoRR, 2023

Challenges and Opportunities to Enable Large-Scale Computing via Heterogeneous Chiplets.
CoRR, 2023

Enabling On-Device Large Language Model Personalization with Self-Supervised Data Selection and Synthesis.
CoRR, 2023

AutoMM: Energy-Efficient Multi-Data-Type Matrix Multiply Design on Heterogeneous Programmable System-on-Chip.
CoRR, 2023

AIM: Accelerating Arbitrary-Precision Integer Multiplication on Heterogeneous Reconfigurable Computing Platform Versal ACAP.
Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 2023

CHARM: Composing Heterogeneous AcceleRators for Matrix Multiply on Versal ACAP Architecture.
Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2023

High Performance, Low Power Matrix Multiply Design on ACAP: from Architecture, Design Challenges and DSE Perspectives.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

Caffeine: Towards Uniformed Representation and Acceleration for Deep Convolutional Neural Networks.
Proceedings of the ACM Turing Award Celebration Conference - China 2023, 2023

2022
EF-Train: Enable Efficient On-device CNN Training on FPGA through Data Reshaping for Online Adaptation or Personalization.
ACM Trans. Design Autom. Electr. Syst., 2022

Enabling Weakly Supervised Temporal Action Localization From On-Device Learning of the Video Stream.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022

Sustainable AI Processing at the Edge.
CoRR, 2022

H2H: heterogeneous model to heterogeneous system mapping with computation and communication awareness.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

2021
Algorithm-hardware Co-design of Attention Mechanism on FPGA Devices.
ACM Trans. Embed. Comput. Syst., 2021

MOCHA: Multinode Cost Optimization in Heterogeneous Clouds with Accelerators.
Proceedings of the FPGA '21: The 2021 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Virtual Event, USA, February 28, 2021

2020
Algorithm-Hardware Co-design for BQSR Acceleration in Genome Analysis ToolKit.
Proceedings of the 28th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2020

2019
Modeling and Optimization for Customized Computing: Performance, Energy and Cost Perspective.
PhD thesis, 2019

Caffeine: Toward Uniformed Representation and Acceleration for Deep Convolutional Neural Networks.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2019

2018
Best-Effort FPGA Programming: A Few Steps Can Go a Long Way.
CoRR, 2018

Doppio: I/O-Aware Performance Analysis, Modeling and Optimization for In-memory Computing Framework.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2018

SODA: stencil with optimized dataflow architecture.
Proceedings of the International Conference on Computer-Aided Design, 2018

An Optimal Microarchitecture for Stencil Computation with Data Reuse and Fine-Grained Parallelism: (Abstract Only).
Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2018

ST-Accel: A High-Level Programming Platform for Streaming Applications on FPGA.
Proceedings of the 26th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2018

Latte: Locality Aware Transformation for High-Level Synthesis.
Proceedings of the 26th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2018

2017
Bandwidth Optimization Through On-Chip Memory Restructuring for HLS.
Proceedings of the 54th Annual Design Automation Conference, 2017

2016
ARAPrototyper: Enabling Rapid Prototyping and Evaluation for Accelerator-Rich Architectures.
CoRR, 2016

Caffeine: towards uniformed representation and acceleration for deep convolutional neural networks.
Proceedings of the 35th International Conference on Computer-Aided Design, 2016

ARAPrototyper: Enabling Rapid Prototyping and Evaluation for Accelerator-Rich Architecture (Abstact Only).
Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2016

Energy Efficiency of Full Pipelining: A Case Study for Matrix Multiplication.
Proceedings of the 24th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2016

2014
A Fully Pipelined and Dynamically Composable Architecture of CGRA.
Proceedings of the 22nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2014


  Loading...