Peng Zhang

Affiliations:
  • Peking University, Advanced Institute of Information Technology, Hangzhou, China
  • Falcon Computing Solutions, Inc., Los Angeles, CA, USA
  • University of California, Los Angeles, CA, USA (former)


According to our database1, Peng Zhang authored at least 40 papers between 2011 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
A hardware-friendly algorithm for LCU-level pipe-lined integer motion estimation.
Multim. Tools Appl., January, 2024

2023
A Reconfigurable Multiple Transform Selection Architecture for VVC.
IEEE Trans. Very Large Scale Integr. Syst., May, 2023

An Efficient Real-Time Hardware Architecture for Deblocking Filter in AVS3.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2023

Scanline-based fast algorithm and pipelined hardware design of rate-distortion optimized quantization for AVS3.
Proceedings of the IEEE International Conference on Consumer Electronics, 2023

Architecture Design of AVS3 Fractional Motion Estimation for 4K UHD Video Coding.
Proceedings of the IEEE International Conference on Consumer Electronics, 2023

Fast Algorithm and VLSI Architecture Design of Rough Mode Decision for AVS3.
Proceedings of the IEEE International Conference on Consumer Electronics, 2023

An Improved Hardware Architecture for Integer-Pixel Motion Estimation in AVS3.
Proceedings of the IEEE International Conference on Consumer Electronics, 2023

2022
An Area-efficient Unified Transform Architecture for VVC.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2022

A 3.1 Gbin/s advanced entropy coding hardware design for AVS3.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2022

Efficient Algorithm and Hardware Architecture for Rate Estimation in Mode Decision of AVS3.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2022

A Parallel and Pipelined Hardware Architecture for Fractional-Pixel Motion Estimation in AVS3.
Proceedings of the IEEE International Conference on Consumer Electronics, 2022

A Fast CU Partition Decision Strategy for AVS3 Intra Coding.
Proceedings of the IEEE International Conference on Consumer Electronics, 2022

2021
A Multiplier-less Transform Architecture with the Diagonal Data Mapping Transpose Memory for The AVS3 Standard.
Proceedings of the 14th IEEE International Conference on ASIC, 2021

2019
Overcoming Data Transfer Bottlenecks in DNN Accelerators via Layer-Conscious Memory Managment.
Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

2018
AutoAccel: Automated Accelerator Generation and Optimization with Composable, Parallel and Pipeline Architecture.
CoRR, 2018

TGPA: tile-grained pipeline architecture for low latency CNN inference.
Proceedings of the International Conference on Computer-Aided Design, 2018

S2FA: an accelerator automation framework for heterogeneous computing in datacenters.
Proceedings of the 55th Annual Design Automation Conference, 2018

Automated accelerator generation and optimization with composable, parallel and pipeline architecture.
Proceedings of the 55th Annual Design Automation Conference, 2018

2017
HLScope+, : Fast and accurate performance estimation for FPGA HLS.
Proceedings of the 2017 IEEE/ACM International Conference on Computer-Aided Design, 2017

Automated Systolic Array Architecture Synthesis for High Throughput CNN Inference on FPGAs.
Proceedings of the 54th Annual Design Automation Conference, 2017

2016
An Optimal Microarchitecture for Stencil Computation Acceleration Based on Nonuniform Partitioning of Data Reuse Buffers.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2016

Software Infrastructure for Enabling FPGA-Based Accelerations in Data Centers: Invited Paper.
Proceedings of the 2016 International Symposium on Low Power Electronics and Design, 2016

Source-to-Source Optimization for HLS.
Proceedings of the FPGAs for Software Programmers, 2016

2015
High efficiency VLSI implementation of an edge-directed video up-scaler using high level synthesis.
Proceedings of the IEEE International Conference on Consumer Electronics, 2015

Resource-Aware Throughput Optimization for High-Level Synthesis.
Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015

CMOST: a system-level FPGA compilation framework.
Proceedings of the 52nd Annual Design Automation Conference, 2015

2014
Combining computation and communication optimizations in system synthesis for streaming applications.
Proceedings of the 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2014

FPGA Acceleration for Simultaneous Medical Image Reconstruction and Segmentation.
Proceedings of the 22nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2014

An Optimal Microarchitecture for Stencil Computation Acceleration Based on Non-Uniform Partitioning of Data Reuse Buffers.
Proceedings of the 51st Annual Design Automation Conference 2014, 2014

2013
Automatic multidimensional memory partitioning for FPGA-based accelerators (abstract only).
Proceedings of the 2013 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2013

Polyhedral-based data reuse optimization for configurable computing.
Proceedings of the 2013 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2013

Efficient system-level mapping from streaming applications to FPGAs (abstract only).
Proceedings of the 2013 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2013

Memory partitioning for multidimensional arrays in high-level synthesis.
Proceedings of the 50th Annual Design Automation Conference 2013, 2013

2012
Task-Level Data Model for Hardware Synthesis Based on Concurrent Collections.
J. Electr. Comput. Eng., 2012

A Study on the Impact of Compiler Optimizations on High-Level Synthesis.
Proceedings of the Languages and Compilers for Parallel Computing, 2012

Memory partitioning and scheduling co-optimization in behavioral synthesis.
Proceedings of the 2012 IEEE/ACM International Conference on Computer-Aided Design, 2012

Combining module selection and replication for throughput-driven streaming programs.
Proceedings of the 2012 Design, Automation & Test in Europe Conference & Exhibition, 2012

Optimizing memory hierarchy allocation with loop transformations for high-level synthesis.
Proceedings of the 49th Annual Design Automation Conference 2012, 2012

An integrated and automated memory optimization flow for FPGA behavioral synthesis.
Proceedings of the 17th Asia and South Pacific Design Automation Conference, 2012

2011
Combined loop transformation and hierarchy allocation for data reuse optimization.
Proceedings of the 2011 IEEE/ACM International Conference on Computer-Aided Design, 2011


  Loading...