Zhiyuan Shao

A survey on dynamic graph processing on GPUs: concepts, terminologies and systems.
Frontiers Comput. Sci., August, 2024

Evaluating RISC-V Vector Instruction Set Architecture Extension with Computer Vision Workloads.
J. Comput. Sci. Technol., July, 2023

Accelerating Backward Aggregation in GCN Training With Execution Path Preparing on GPUs.
IEEE Trans. Parallel Distributed Syst., 2022

Cross-Language Binary-Source Code Matching with Intermediate Representations.
Proceedings of the IEEE International Conference on Software Analysis, 2022

Towards Fast GPU-based Sparse DNN Inference: A Hybrid Compute Model.
Proceedings of the IEEE High Performance Extreme Computing Conference, 2022

Efficient Graph Processing with Invalid Update Filtration.
IEEE Trans. Big Data, 2021

ScalaBFS: A Scalable BFS Accelerator on HBM-Enhanced FPGAs.
CoRR, 2021

ScalaBFS: A Scalable BFS Accelerator on FPGA-HBM Platform.
Proceedings of the FPGA '21: The 2021 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Virtual Event, USA, February 28, 2021

Predicting Hepatoma-Related Genes Based on Representation Learning of PPI network and Gene Ontology Annotations.
Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine, 2021

Processing Grid-format Real-world Graphs on DRAM-based FPGA Accelerators with Application-specific Caching Mechanisms.
ACM Trans. Reconfigurable Technol. Syst., 2020

Optimizing Memory Performance of Xilinx FPGAs under Vitis.
CoRR, 2020

Scaph: Scalable GPU-Accelerated Graph Processing with Value-Driven Differential Scheduling.
Proceedings of the 2020 USENIX Annual Technical Conference, 2020

Efficient Recommendation of De-Identification Policies Using MapReduce.
IEEE Trans. Big Data, 2019

BlockGraphChi: Enabling Block Update in Out-of-Core Graph Processing.
Int. J. Parallel Program., 2019

Improving Performance of Graph Processing on FPGA-DRAM Platform by Two-level Vertex Caching.
Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

Fast Maximal Clique Enumeration for Real-World Graphs.
Proceedings of the Database Systems for Advanced Applications, 2019

Scalable Data Race Detection for Lock-Intensive Programs with Pending Period Representation.
IEEE Trans. Parallel Distributed Syst., 2018

MomentSA: A Fast and Accurate Method for Stochastic Kronecker Graph Parameter Computing.
Proceedings of the 22nd IEEE International Conference on Computer Supported Cooperative Work in Design, 2018

FOG: A Fast Out-of-Core Graph Processing Framework.
Int. J. Parallel Program., 2017

A task-based approach for finding SCCs in real-world graphs on external memory.
Concurr. Comput. Pract. Exp., 2017

Data Race Detection by Understanding Synchronization Relationships of Thread Segments.
Proceedings of the 25th Euromicro International Conference on Parallel, 2017

Finding SCCs in Real-World Graphs on External Memory: A Task-Based Approach.
Proceedings of the 15th International Symposium on Parallel and Distributed Computing, 2016

Improving fairness of network bandwidth allocation for virtual machines in cloud environment.
Proceedings of the 2016 IEEE International Black Sea Conference on Communications and Networking, 2016

Is Your Graph Algorithm Eligible for Nondeterministic Execution?
Proceedings of the 44th International Conference on Parallel Processing, 2015

A segment-based sparse matrix-vector multiplication on CUDA.
Concurr. Comput. Pract. Exp., 2014

A GPU-based parallel method for evolutionary tree construction.
Comput. Electr. Eng., 2014

VSA: An offline scheduling analyzer for Xen virtual machine monitor.
Future Gener. Comput. Syst., 2013

FRESA: A Frequency-Sensitive Sampling-Based Approach for Data Race Detection.
Proceedings of the Network and Parallel Computing - 10th IFIP International Conference, 2013

RTRM: A Response Time-Based Replica Management Strategy for Cloud Storage System.
Proceedings of the Grid and Pervasive Computing - 8th International Conference, 2013

Parallelization Mechanisms of Neighbor-Joining for CUDA Enabled Devices.
Proceedings of the Seventh ChinaGrid Annual Conference, ChinaGrid 2012, Beijing, 2012

Implementing Smith-Waterman Algorithm with Two-Dimensional Cache on GPUs.
Proceedings of the 2012 Second International Conference on Cloud and Green Computing, 2012

Analyzing and Improving MPI Communication Performance in Overcommitted Virtualized Systems.
Proceedings of the MASCOTS 2011, 2011

Optimization of Sparse Matrix-Vector Multiplication with Variant CSR on GPUs.
Proceedings of the 17th IEEE International Conference on Parallel and Distributed Systems, 2011

FTDS: Adjusting Virtual Computing Resources in Threshing Cases.
Proceedings of the 18th Euromicro Conference on Parallel, 2010

ClientVisor: leverage COTS OS functionalities for power management in virtualized desktop environment.
ACM SIGOPS Oper. Syst. Rev., 2009

ClientVisor: leverage COTS OS functionalities for power management in virtualized desktop environment.
Proceedings of the 5th International Conference on Virtual Execution Environments, 2009

Virtual Machine Resource Management for High Performance Computing Applications.
Proceedings of the IEEE International Symposium on Parallel and Distributed Processing with Applications, 2009

A performance study of web server based on Hardware-assisted Virtual Machine.
Proceedings of the 7th IEEE/ACS International Conference on Computer Systems and Applications, 2009

ER-TCP: an efficient TCP fault-tolerance scheme for cluster computing.
J. Supercomput., 2008

Optimized Implementation of Ray Tracing on Cell Broadband Engine.
Proceedings of the 2008 International Conference on Multimedia and Ubiquitous Engineering (MUE 2008), 2008

Two-Level Parallel Implementation of FDTD Algorithm on CBE.
Proceedings of the IEEE International Conference on Networking, Sensing and Control, 2008

ChinaV: Building Virtualized Computing System.
Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications, 2008

FreeSpeech: A Novel Wireless Approach for Conference Projecting and Cooperating.
Proceedings of the Ubiquitous Intelligence and Computing, Third International Conference, 2006

Middleware Based High Performance and High Available Database Cluster.
Proceedings of the Grid and Cooperative Computing, 2006

AR-TCP: Actively Replicated TCP Connections for Cluster of Workstations.
Proceedings of the Japan-China Joint Workshop on Frontier of Computer Science and Technology, 2006

TCP-ABC: From Multiple TCP Connections to Atomic Broadcasting.
Proceedings of the Network and Parallel Computing, IFIP International Conference, 2005

ER-TCP: An Efficient Fault-Tolerance Scheme for TCP Connections.
Proceedings of the Parallel and Distributed Processing and Applications, 2005

HARTs: high availability cluster architecture with redundant TCP stacks.
Proceedings of the 22nd IEEE International Performance Computing and Communications Conference, 2003

Cluster Architecture with Lightweighted Redundant TCP Stacks.
Proceedings of the 2003 IEEE International Conference on Cluster Computing (CLUSTER 2003), 2003