Youhui Zhang

Orcid: 0000-0003-2333-3580

According to our database1, Youhui Zhang authored at least 81 papers between 1999 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
A Row Decomposition-based Approach for Sparse Matrix Multiplication on GPUs.
Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2024

2023
ENLARGE: An Efficient SNN Simulation Framework on GPU Clusters.
IEEE Trans. Parallel Distributed Syst., September, 2023

MAICC : A Lightweight Many-core Architecture with In-Cache Computing for Multi-DNN Parallel Inference.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

Multi-Objective Optimization for Floating Point Mix-Precision Tuning.
Proceedings of the IEEE/ACM International Symposium on Low Power Electronics and Design, 2023

FABLE: A Development and Computing Framework for Brain-inspired Learning Algorithms.
Proceedings of the International Joint Conference on Neural Networks, 2023

2022
Polyhedral-Based Compilation Framework for In-Memory Neural Network Accelerators.
ACM J. Emerg. Technol. Comput. Syst., 2022

Editorial: Machine learning for computational neural modeling and data analyses.
Frontiers Comput. Neurosci., 2022

EcoForecast: An interpretable data-driven approach for short-term macroeconomic forecasting using N-BEATS neural network.
Eng. Appl. Artif. Intell., 2022

A review of basic software for brain-inspired computing.
CCF Trans. High Perform. Comput., 2022

GaBAN: a generic and flexibly programmable vector neuro-processor on FPGA.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

Accelerating Neural Network Training with Processing-in-Memory GPU.
Proceedings of the 22nd IEEE International Symposium on Cluster, 2022

2021
A Reduced Architecture for ReRAM-Based Neural Network Accelerator and Its Software Stack.
IEEE Trans. Computers, 2021

AIPerf: Automated machine learning as an AI-HPC benchmark.
Big Data Min. Anal., 2021

Regu2D: Accelerating Vectorization of SpMV on Intel Processors through 2D-partitioning and Regular Arrangement.
Proceedings of the ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9, 2021

2020
High Performance Simulation of Spiking Neural Network on GPGPUs.
IEEE Trans. Parallel Distributed Syst., 2020

ERA-LSTM: An Efficient ReRAM-Based Architecture for Long Short-Term Memory.
IEEE Trans. Parallel Distributed Syst., 2020

Brain-inspired global-local hybrid learning towards human-like intelligence.
CoRR, 2020

SuSy: A Programming Model for Productive Construction of High-Performance Systolic Arrays on FPGAs.
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2020

2019
Programmable Neural Network Trojan for Pre-Trained Feature Extractor.
CoRR, 2019

A Unified Framework for Training, Mapping and Simulation of ReRAM-Based Convolutional Neural Network Acceleration.
IEEE Comput. Archit. Lett., 2019

Design Guidelines of RRAM based Neural-Processing-Unit: A Joint Device-Circuit-Algorithm Analysis.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

FPSA: A Full System Stack Solution for Reconfigurable ReRAM-based NN Accelerator Architecture.
Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019

2018
Bridging the Gap Between Neural Networks and Neuromorphic Hardware with A Neural Network Compiler.
CoRR, 2018

TETRIS: TilE-matching the TRemendous Irregular Sparsity.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Bridge the Gap between Neural Networks and Neuromorphic Hardware with a Neural Network Compiler.
Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018

2017
Service-Oriented Architecture on FPGA-Based MPSoC.
IEEE Trans. Parallel Distributed Syst., 2017

Parallel Turing Machine, a Proposal.
J. Comput. Sci. Technol., 2017

Hardware support for message-passing in chip multi-processors.
Int. J. High Perform. Comput. Netw., 2017

In-Place Irregular Computation for Message-Passing Chip-Multiprocessors.
Proceedings of the 46th International Conference on Parallel Processing Workshops, 2017

POSTER: Bridge the Gap Between Neural Networks and Neuromorphic Hardware.
Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

2016
A Cloud Gaming System Based on User-Level Virtualization and Its Resource Scheduling.
IEEE Trans. Parallel Distributed Syst., 2016

Modelling Spiking Neural Network from the Architecture Evaluation Perspective.
J. Comput. Sci. Technol., 2016

NEUTRAMS: Neural network transformation and co-design under neuromorphic hardware constraints.
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

Optimized Mapping Spiking Neural Networks onto Network-on-Chip.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2016

Near Data Computation for Message-Passing Chip-Multiprocessors.
Proceedings of the 18th IEEE International Conference on High Performance Computing and Communications; 14th IEEE International Conference on Smart City; 2nd IEEE International Conference on Data Science and Systems, 2016

Neural network transformation under hardware constraints.
Proceedings of the 2016 International Conference on Compilers, 2016

2015
Solving the Global Atmospheric Equations through Heterogeneous Reconfigurable Platforms.
ACM Trans. Reconfigurable Technol. Syst., 2015

Software-Based Lightweight Multithreading to Overlap Memory-Access Latencies of Commodity Processors.
Proceedings of the 44th International Conference on Parallel Processing, 2015

Position-aware thread-level speculative parallelization for large-scale chip-multiprocessor.
Proceedings of the 12th ACM International Conference on Computing Frontiers, 2015

2014
Customized Network-on-Chip for Message Reduction.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2014

An approach of processor core customization for stencil computation.
Proceedings of the IEEE 25th International Conference on Application-Specific Systems, 2014

2013
Automatic software deployment using user-level virtualization for cloud-computing.
Future Gener. Comput. Syst., 2013

Employing intelligence in object-based storage devices to provide attribute-based file access.
Sci. China Inf. Sci., 2013

Software/Hardware Hybrid Network-on-Chip Simulation on FPGA.
Proceedings of the Network and Parallel Computing - 10th IFIP International Conference, 2013

Aegis: partitioning data block for efficient recovery of stuck-at-faults in phase change memory.
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

Cache Optimizations of Distributed Storage for Software Streaming Services.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2013

Accelerating solvers for global atmospheric equations through mixed-precision data flow engine.
Proceedings of the 23rd International Conference on Field programmable Logic and Applications, 2013

HW/SW approaches to accelerate GRAPES in an FU array.
Proceedings of the 2013 IEEE Symposium on Low-Power and High-Speed Chips, 2013

2012
Spatial Density Voronoi Diagram and Construction.
J. Comput., 2012

A Performance Model for Network-on-Chip Wormhole Routers.
J. Comput., 2012

2011
Model of Network-on-Chip routers and performance analysis.
IEICE Electron. Express, 2011

Employing Object-Based Storage Devices to Embed File Access Control in Storage.
Intell. Autom. Soft Comput., 2011

A user-space file system for on-demand legacy desktop software.
Sci. China Inf. Sci., 2011

Using User-Level Virtualization in Desktop Grid Clients for Application Delivery and Sandboxing.
Proceedings of the Fourth International Symposium on Parallel Architectures, 2011

2010
Converting Legacy Desktop Applications into On-Demand Personalized Software.
IEEE Trans. Serv. Comput., 2010

Efficient Monte Carlo-based options pricing on graphics processors and its optimizations.
Sci. China Inf. Sci., 2010

A Performance Analytical Approach Based on Queuing Model for Network-on-Chip.
Proceedings of the Third International Symposium on Parallel Architectures, 2010

2009
Codec-on-Demand Based on User-Level Virtualization.
IEICE Trans. Inf. Syst., 2009

Optimized mapping of pixels into memory for H.264/AVC decoding.
IEICE Electron. Express, 2009

2008
On Virtual-Machine-Based Windows File Reads: A Performance Study.
Proceedings of the PACIIA 2008, 2008

Portable Desktop Applications Based on P2P Transportation and Virtualization.
Proceedings of the 22nd Large Installation System Administration Conference, 2008

Portable desktop applications based on user-level virtualization.
Proceedings of the 13th Asia-Pacific Computer Systems Architecture Conference, 2008

Semantic Data De-duplication for archival storage systems.
Proceedings of the 13th Asia-Pacific Computer Systems Architecture Conference, 2008

IDRS: Combining File-level Intrusion Detection with Block-level Data Recovery based on iSCSI.
Proceedings of the The Third International Conference on Availability, 2008

2006
Virtual-Machine-based Intrusion Detection on File-aware Block Level Storage.
Proceedings of the 18th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2006), 2006

Research on Object-Storage-Based Intrusion Detection.
Proceedings of the 12th International Conference on Parallel and Distributed Systems, 2006

Seamless Peripherals Integration for Network Computers based on the Reversed Server Message Block Protocol.
Proceedings of the 2006 International Conference on Networking and Services (ICNS 2006), 2006

2005
User-level checkpoint and recovery for LAM/MPI.
ACM SIGOPS Oper. Syst. Rev., 2005

Thckpt: Transparent Checkpointing of Linux Processes Under IA-64.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 2005

A Checkpointing/Recovery System for MPI Applications on Cluster of IA-64 Computers.
Proceedings of the 34th International Conference on Parallel Processing Workshops (ICPP 2005 Workshops), 2005

Exploring Design Space Using Transaction Level Models.
Proceedings of the Advances in Computer Systems Architecture, 10th Asia-Pacific Conference, 2005

2004
The Flexible Replication Method in an Object-Oriented Data Storage System.
Proceedings of the Network and Parallel Computing, IFIP International Conference, 2004

Parallel Checkpoint/Recovery on Cluster of IA-64 Computers.
Proceedings of the Parallel and Distributed Processing and Applications, 2004

Using Model-Based Test Program Generator for Simulation Validation.
Proceedings of the Embedded Software and Systems, First International Conference, 2004

A JDO Storage Cluster Based on Object Devices.
Proceedings of the Grid and Cooperative Computing, 2004

An Object-Oriented Data Storage System on Network-Attached Object Devices.
Proceedings of the Advances in Computer Systems Architecture, 9th Asia-Pacific Conference, 2004

2003
User-level communication based cooperative caching.
ACM SIGOPS Oper. Syst. Rev., 2003

2002
A checkpoint-based high availability run-time system for Windows NT clusters.
ACM SIGOPS Oper. Syst. Rev., 2002

LND: A Reliable Multi-Tier Storage Device in NOW.
ACM SIGOPS Oper. Syst. Rev., 2002

2001
Transparent Checkpointing and Rollback Recovery Mechanism for Windows NT Applications.
ACM SIGOPS Oper. Syst. Rev., 2001

1999
Quasi-Asynchronous Migration: A Novel Migration Protocol for PVM Tasks.
ACM SIGOPS Oper. Syst. Rev., 1999


  Loading...