Shouyi Yin
Orcid: 0000-0003-2309-572X
According to our database1,
Shouyi Yin
authored at least 319 papers
between 2005 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2024
MulTCIM: Digital Computing-in-Memory-Based Multimodal Transformer Accelerator With Attention-Token-Bit Hybrid Sparsity.
IEEE J. Solid State Circuits, January, 2024
SOFA: A Compute-Memory Optimized Sparsity Accelerator via Cross-Stage Coordinated Tiling.
CoRR, 2024
PALM: A Efficient Performance Simulator for Tiled Accelerators with Large-scale Model Training.
CoRR, 2024
CoRR, 2024
Sci. China Inf. Sci., 2024
15.1 A 0.795fJ/bit Physically-Unclonable Function-Protected TCAM for a Software-Defined Networking Switch.
Proceedings of the IEEE International Solid-State Circuits Conference, 2024
34.1 A 28nm 83.23TFLOPS/W POSIT-Based Compute-in-Memory Macro for High-Accuracy AI Applications.
Proceedings of the IEEE International Solid-State Circuits Conference, 2024
20.2 A 28nm 74.34TFLOPS/W BF16 Heterogenous CIM-Based Accelerator Exploiting Denoising-Similarity for Diffusion Models.
Proceedings of the IEEE International Solid-State Circuits Conference, 2024
Exploiting Similarity Opportunities of Emerging Vision AI Models on Hybrid Bonding Architecture.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024
A 28nm 314.6TLFOPS/W Reconfigurable Floating-Point Analog Compute-In-Memory Macro with Exponent Approximation and Two-Stage Sharing TD-ADC.
Proceedings of the IEEE Custom Integrated Circuits Conference, 2024
RCPE: An Excellent Performance Training Processor with RISC-V based Compression Mechanism.
Proceedings of the 6th IEEE International Conference on AI Circuits and Systems, 2024
RTPE: A High Energy Efficiency Inference Processor with RISC-V based Transformation Mechanism.
Proceedings of the 6th IEEE International Conference on AI Circuits and Systems, 2024
2023
M2STaR: A Multimode Spatio-Temporal Redundancy Design for Fault-Tolerant Coarse-Grained Reconfigurable Architectures.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., September, 2023
Reconfigurability, Why It Matters in AI Tasks Processing: A Survey of Reconfigurable AI Chips.
IEEE Trans. Circuits Syst. I Regul. Pap., March, 2023
TT@CIM: A Tensor-Train In-Memory-Computing Processor Using Bit-Level-Sparsity Optimization and Variable Precision Quantization.
IEEE J. Solid State Circuits, March, 2023
SPCIM: Sparsity-Balanced Practical CIM Accelerator With Optimized Spatial-Temporal Multi-Macro Utilization.
IEEE Trans. Circuits Syst. I Regul. Pap., January, 2023
IEEE Trans. Circuits Syst. I Regul. Pap., 2023
SDP: Co-Designing Algorithm, Dataflow, and Architecture for In-SRAM Sparse NN Acceleration.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2023
TAEM 2.0: A Faster Transfer-Aware Effective Loop Mapping for Heterogeneous Resources on CGRA.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2023
RePQC: A 3.4-uJ/Op 48-kOPS Post-Quantum Crypto-Processor for Multiple-Mathematical Problems.
IEEE J. Solid State Circuits, 2023
An Energy-Efficient Transformer Processor Exploiting Dynamic Weak Relevances in Global Attention.
IEEE J. Solid State Circuits, 2023
ReDCIM: Reconfigurable Digital Computing- In -Memory Processor With Unified FP/INT Pipeline for Cloud AI Acceleration.
IEEE J. Solid State Circuits, 2023
TranCIM: Full-Digital Bitline-Transpose CIM-based Sparse Transformer Accelerator With Pipeline/Parallel Reconfigurable Modes.
IEEE J. Solid State Circuits, 2023
CoRR, 2023
Towards Efficient Control Flow Handling in Spatial Architecture via Architecting the Control Flow Plane.
CoRR, 2023
CoRR, 2023
A 28nm 77.35TOPS/W Similar Vectors Traceable Transformer Processor with Principal-Component-Prior Speculating and Dynamic Bit-wise Stationary Computing.
Proceedings of the 2023 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), 2023
Towards Efficient Control Flow Handling in Spatial Architecture via Architecting the Control Flow Plane.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023
CV-CIM: A 28nm XOR-Derived Similarity-Aware Computation-in-Memory for Cost-Volume Construction.
Proceedings of the IEEE International Solid- State Circuits Conference, 2023
TensorCIM: A 28nm 3.7nJ/Gather and 8.3TFLOPS/W FP32 Digital-CIM Tensor Processor for MCM-CIM-Based Beyond-NN Acceleration.
Proceedings of the IEEE International Solid- State Circuits Conference, 2023
MuITCIM: A 28nm $2.24 \mu\mathrm{J}$/Token Attention-Token-Bit Hybrid Sparse Digital CIM-Based Accelerator for Multimodal Transformers.
Proceedings of the IEEE International Solid- State Circuits Conference, 2023
FACT: FFN-Attention Co-optimized Transformer Architecture with Eager Correlation Prediction.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023
RMP-MEM: A HW/SW Reconfigurable Multi-Port Memory Architecture for Multi-PEA Oriented CGRA.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023
CPE: An Energy-Efficient Edge-Device Training with Multi-dimensional Compression Mechanism.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023
A 28nm 49.7TOPS/W Sparse Transformer Processor with Random-Projection-Based Speculation, Multi-Stationary Dataflow, and Redundant Partial Product Elimination.
Proceedings of the IEEE Asian Solid-State Circuits Conference, 2023
CIMFormer: A 38.9TOPS/W-8b Systolic CIM-Array Based Transformer Processor with Token-Slimmed Attention Reformulating and Principal Possibility Gathering.
Proceedings of the IEEE Asian Solid-State Circuits Conference, 2023
TPE: A High-Performance Edge-Device Inference with Multi-level Transformational Mechanism.
Proceedings of the 5th IEEE International Conference on Artificial Intelligence Circuits and Systems, 2023
A Systolic Computing-in-Memory Array based Accelerator with Predictive Early Activation for Spatiotemporal Convolutions.
Proceedings of the 5th IEEE International Conference on Artificial Intelligence Circuits and Systems, 2023
2022
IACR Trans. Cryptogr. Hardw. Embed. Syst., 2022
CFNTT: Scalable Radix-2/4 NTT Multiplication Architecture with an Efficient Conflict-free Memory Mapping Scheme.
IACR Trans. Cryptogr. Hardw. Embed. Syst., 2022
IEEE Trans. Circuits Syst. I Regul. Pap., 2022
GQNA: Generic Quantized DNN Accelerator With Weight-Repetition-Aware Activation Aggregating.
IEEE Trans. Circuits Syst. I Regul. Pap., 2022
An Energy-Efficient Approximate Divider Based on Logarithmic Conversion and Piecewise Constant Approximation.
IEEE Trans. Circuits Syst. I Regul. Pap., 2022
SWPU: A 126.04 TFLOPS/W Edge-Device Sparse DNN Training Processor With Dynamic Sub-Structured Weight Pruning.
IEEE Trans. Circuits Syst. I Regul. Pap., 2022
PL-NPU: An Energy-Efficient Edge-Device DNN Training Processor With Posit-Based Logarithm-Domain Computing.
IEEE Trans. Circuits Syst. I Regul. Pap., 2022
Dynamic-II Pipeline: Compiling Loops With Irregular Branches on Static-Scheduling CGRA.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022
BitCluster: Fine-Grained Weight Quantization for Load-Balanced Bit-Serial Neural Network Accelerators.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022
Trainer: An Energy-Efficient Edge-Device Training Processor Supporting Dynamic Weight Pruning.
IEEE J. Solid State Circuits, 2022
A 12.1 TOPS/W Quantized Network Acceleration Processor With Effective-Weight-Based Convolution and Error-Compensation-Based Prediction.
IEEE J. Solid State Circuits, 2022
Guest Editorial Introduction to the Special Section on the 2021 Asian Solid-State Circuits Conference (A-SSCC).
IEEE J. Solid State Circuits, 2022
CoRR, 2022
FAQS: Communication-efficient Federate DNN Architecture and Quantization Co-Search for personalized Hardware-aware Preferences.
CoRR, 2022
An energy-efficient dynamically reconfigurable cryptographic engine with improved power/EM-side-channel-attack resistance.
Sci. China Inf. Sci., 2022
A 28nm 48KOPS 3.4µJ/Op Agile Crypto-Processor for Post-Quantum Cryptography on Multi-Mathematical Problems.
Proceedings of the IEEE International Solid-State Circuits Conference, 2022
A 28nm 27.5TOPS/W Approximate-Computing-Based Transformer Processor with Asymptotic Sparsity Speculating and Out-of-Order Computing.
Proceedings of the IEEE International Solid-State Circuits Conference, 2022
A 28nm 15.59µJ/Token Full-Digital Bitline-Transpose CIM-Based Sparse Transformer Accelerator with Pipeline/Parallel Reconfigurable Modes.
Proceedings of the IEEE International Solid-State Circuits Conference, 2022
A 28nm 29.2TFLOPS/W BF16 and 36.5TOPS/W INT8 Reconfigurable Digital CIM Processor with Unified FP/INT Pipeline and Bitwise In-Memory Booth Multiplication for Cloud Deep Learning Acceleration.
Proceedings of the IEEE International Solid-State Circuits Conference, 2022
CaSMap: agile mapper for reconfigurable spatial architectures by automatically clustering intermediate representations and scattering mapping process.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022
Atomic Dataflow based Graph-Level Workload Orchestration for Scalable DNN Accelerators.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022
MC-CIM: a reconfigurable computation-in-memory for efficient stereo matching cost computation.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022
Efficient access scheme for multi-bank based NTT architecture through conflict graph.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022
2021
IEEE Trans. Parallel Distributed Syst., 2021
A 460 GOPS/W Improved Mnemonic Descent Method-Based Hardwired Accelerator for Face Alignment.
IEEE Trans. Multim., 2021
IEEE Trans. Instrum. Meas., 2021
Efficient Comparison and Addition for FHE With Weighted Computational Complexity Model.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2021
A Deflection-Based Deadlock Recovery Framework to Achieve High Throughput for Faulty NoCs.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2021
Jintide: Utilizing Low-Cost Reconfigurable External Monitors to Substantially Enhance Hardware Security of Large-Scale CPU Clusters.
IEEE J. Solid State Circuits, 2021
TIMAQ: A Time-Domain Computing-in-Memory-Based Processor Using Predictable Decomposed Convolution for Arbitrary Quantized DNNs.
IEEE J. Solid State Circuits, 2021
Erratum to "Evolver: a Deep Learning Processor With On-Device Quantization-Voltage-Frequency Tuning".
IEEE J. Solid State Circuits, 2021
Evolver: A Deep Learning Processor With On-Device Quantization-Voltage-Frequency Tuning.
IEEE J. Solid State Circuits, 2021
Fast substitution-box evaluation algorithm and its efficient masking scheme for block ciphers.
Sci. China Inf. Sci., 2021
A 28nm 276.55TFLOPS/W Sparse Deep-Neural-Network Training Processor with Implicit Redundancy Speculation and Batch Normalization Reformulation.
Proceedings of the 2021 Symposium on VLSI Circuits, Kyoto, Japan, June 13-19, 2021, 2021
A 6.54-to-26.03 TOPS/W Computing-In-Memory RNN Processor using Input Similarity Optimization and Attention-based Context-breaking with Output Speculation.
Proceedings of the 2021 Symposium on VLSI Circuits, Kyoto, Japan, June 13-19, 2021, 2021
9.2A 28nm 12.1TOPS/W Dual-Mode CNN Processor Using Effective-Weight-Based Convolution and Error-Compensation-Based Prediction.
Proceedings of the IEEE International Solid-State Circuits Conference, 2021
15.4 A 5.99-to-691.1TOPS/W Tensor-Train In-Memory-Computing Processor Using Bit-Level-Sparsity-Based Optimization and Variable-Precision Quantization.
Proceedings of the IEEE International Solid-State Circuits Conference, 2021
ABC-DIMM: Alleviating the Bottleneck of Communication in DIMM-based Near-Memory Processing with Inter-DIMM Broadcast.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the 26th International Conference on Automation and Computing, 2021
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2021
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2021
ADROIT: An Adaptive Dynamic Refresh Optimization Framework for DRAM Energy Saving In DNN Training.
Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021
Towards a Secure Integrated Heterogeneous Platform via Cooperative CPU/GPU Encryption.
Proceedings of the 30th IEEE Asian Test Symposium, 2021
GLMSnet: Single Channel Speech Separation Framework in Noisy and Reverberant Environments.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021
A Multiple-Precision Multiply and Accumulation Design with Multiply-Add Merged Strategy for AI Accelerating.
Proceedings of the ASPDAC '21: 26th Asia and South Pacific Design Automation Conference, 2021
Combining Memory Partitioning and Subtask Generation for Parallel Data Access on CGRAs.
Proceedings of the ASPDAC '21: 26th Asia and South Pacific Design Automation Conference, 2021
Proceedings of the 3rd IEEE International Conference on Artificial Intelligence Circuits and Systems, 2021
Proceedings of the 3rd IEEE International Conference on Artificial Intelligence Circuits and Systems, 2021
2020
IEEE Trans. Wirel. Commun., 2020
Energy- and Area-Efficient Recursive-Conjugate-Gradient-Based MMSE Detector for Massive MIMO Systems.
IEEE Trans. Signal Process., 2020
IEEE Trans. Parallel Distributed Syst., 2020
Pattern-Based Dynamic Compilation System for CGRAs With Online Configuration Transformation.
IEEE Trans. Parallel Distributed Syst., 2020
IEEE Trans. Circuits Syst. Video Technol., 2020
IACR Trans. Cryptogr. Hardw. Embed. Syst., 2020
A 4K × 2K@60fps Multifunctional Video Display Processor for High Perceptual Image Quality.
IEEE Trans. Circuits Syst. I Regul. Pap., 2020
A 60 Gb/s-Level Coarse-Grained Reconfigurable Cryptographic Processor With Less Than 1-W Power.
IEEE Trans. Circuits Syst. II Express Briefs, 2020
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020
Enabling Latency-Aware Data Initialization for Integrated CPU/GPU Heterogeneous Platform.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020
NTTU: An Area-Efficient Low-Power NTT-Uncoupled Architecture for NTT-Based Multiplication.
IEEE Trans. Computers, 2020
A 2.92-Gb/s/W and 0.43-Gb/s/MG Flexible and Scalable CGRA-Based Baseband Processor for Massive MIMO Detection.
IEEE J. Solid State Circuits, 2020
A Survey of Coarse-Grained Reconfigurable Architecture and Design: Taxonomy, Challenges, and Applications.
ACM Comput. Surv., 2020
TFE: Energy-efficient Transferred Filter-based Engine to Compress and Accelerate Convolutional Neural Networks.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the ICDSP 2020: 4th International Conference on Digital Signal Processing, 2020
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Proceedings of the 28th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2020
STC: Significance-aware Transform-based Codec Framework for External Memory Access Reduction.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020
TAEM: Fast Transfer-Aware Effective Loop Mapping for Heterogeneous Resources on CGRA.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020
A Time-Domain Computing-in-Memory based Processor using Predictable Decomposed Convolution for Arbitrary Quantized DNNs.
Proceedings of the IEEE Asian Solid-State Circuits Conference, 2020
2019
Parana: A Parallel Neural Architecture Considering Thermal Problem of 3D Stacked Memory.
IEEE Trans. Parallel Distributed Syst., 2019
IEEE Trans. Multim., 2019
IEEE Trans. Circuits Syst. Video Technol., 2019
IEEE Trans. Circuits Syst. Video Technol., 2019
An Ultra-Low Power Binarized Convolutional Neural Network-Based Speech Recognition Processor With On-Chip Self-Learning.
IEEE Trans. Circuits Syst. I Regul. Pap., 2019
IEEE Trans. Circuits Syst. II Express Briefs, 2019
A High Throughput Acceleration for Hybrid Neural Networks With Efficient Resource Management on FPGA.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2019
A Lifetime Reliability-Constrained Runtime Mapping for Throughput Optimization in Many-Core Systems.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2019
A Binary-Feature-Based Object Recognition Accelerator With 22 M-Vector/s Throughput and 0.68 G-Vector/J Energy-Efficiency for Full-HD Resolution.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2019
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2019
Low Area-Overhead Low-Entropy Masking Scheme (LEMS) Against Correlation Power Analysis Attack.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2019
IEEE Trans. Computers, 2019
An Energy-Efficient Reconfigurable Processor for Binary-and Ternary-Weight Neural Networks With Flexible Data Bit Width.
IEEE J. Solid State Circuits, 2019
A 5.1pJ/Neuron 127.3us/Inference RNN-based Speech Recognition Processor using 16 Computing-in-Memory SRAM Macros in 65nm CMOS.
Proceedings of the 2019 Symposium on VLSI Circuits, Kyoto, Japan, June 9-14, 2019, 2019
Proceedings of the 17th IEEE International New Circuits and Systems Conference, 2019
An Energy-Efficient Architecture for Accelerating Inference of Memory-Augmented Neural Networks.
Proceedings of the IEEE/ACM International Symposium on Nanoscale Architectures, 2019
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019
Proceedings of the 2019 IEEE Computer Society Annual Symposium on VLSI, 2019
Sandwich-RAM: An Energy-Efficient In-Memory BWN Architecture with Pulse-Width Modulation.
Proceedings of the IEEE International Solid- State Circuits Conference, 2019
Proceedings of the IEEE International Symposium on Circuits and Systems, 2019
ReDESK: A Reconfigurable Dataflow Engine for Sparse Kernels on Heterogeneous Platforms.
Proceedings of the International Conference on Computer-Aided Design, 2019
Jintide®: A Hardware Security Enhanced Server CPU with Xeon® Cores under Runtime Surveillance by an In-Package Dynamically Reconfigurable Processor.
Proceedings of the 2019 IEEE Hot Chips 31 Symposium (HCS), 2019
A Skyrmion Racetrack Memory based Computing In-memory Architecture for Binary Neural Convolutional Network.
Proceedings of the 2019 on Great Lakes Symposium on VLSI, 2019
Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019
2018
Bit-Level Disturbance-Aware Memory Partitioning for Parallel Data Access for MLC STT-RAM.
IEEE Trans. Very Large Scale Integr. Syst., 2018
Algorithm and Architecture of a Low-Complexity and High-Parallelism Preprocessing-Based K -Best Detector for Large-Scale MIMO Systems.
IEEE Trans. Signal Process., 2018
Triggered-Issuance and Triggered-Execution: A Control Paradigm to Minimize Pipeline Stalls in Distributed Controlled Coarse-Grained Reconfigurable Arrays.
IEEE Trans. Parallel Distributed Syst., 2018
IEEE Trans. Parallel Distributed Syst., 2018
A 1.58 Gbps/W 0.40 Gbps/mm2 ASIC Implementation of MMSE Detection for $128\times 8~64$ -QAM Massive MIMO in 65 nm CMOS.
IEEE Trans. Circuits Syst. I Regul. Pap., 2018
A Fast and Power-Efficient Hardware Architecture for Visual Feature Detection in Affine-SIFT.
IEEE Trans. Circuits Syst. I Regul. Pap., 2018
HReA: An Energy-Efficient Embedded Dynamically Reconfigurable Fabric for 13-Dwarfs Processing.
IEEE Trans. Circuits Syst. II Express Briefs, 2018
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018
DRMaSV: Enhanced Capability Against Hardware Trojans in Coarse Grained Reconfigurable Architectures.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018
CDPM: Context-Directed Pattern Matching Prefetching to Improve Coarse-Grained Reconfigurable Array Performance.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018
Anole: A Highly Efficient Dynamically Reconfigurable Crypto-Processor for Symmetric-Key Algorithms.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018
A High Energy Efficient Reconfigurable Hybrid Neural Network Processor for Deep Learning Applications.
IEEE J. Solid State Circuits, 2018
Optimization of Softmax Layer in Deep Neural Network Using Integral Stochastic Computation.
J. Low Power Electron., 2018
IEEE Comput. Archit. Lett., 2018
IEEE Access, 2018
A 141 UW, 2.46 PJ/Neuron Binarized Convolutional Neural Network Based Self-Learning Speech Recognition Processor in 28NM CMOS.
Proceedings of the 2018 IEEE Symposium on VLSI Circuits, 2018
An Ultra-High Energy-Efficient Reconfigurable Processor for Deep Neural Networks with Binary/Ternary Weights in 28NM CMOS.
Proceedings of the 2018 IEEE Symposium on VLSI Circuits, 2018
An Energy Efficient JPEG Encoder with Neural Network Based Approximation and Near-Threshold Computing.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2018
Proceedings of the IEEE International Symposium on Circuits and Systems, 2018
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018
Proceedings of the 23rd IEEE International Conference on Digital Signal Processing, 2018
An efficient kernel transformation architecture for binary- and ternary-weight neural network inference.
Proceedings of the 55th Annual Design Automation Conference, 2018
LCP: a layer clusters paralleling mapping method for accelerating inception and residual networks on FPGA.
Proceedings of the 55th Annual Design Automation Conference, 2018
Proceedings of the International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, 2018
A 2.69 Mbps/mW 1.09 Mbps/kGE Conjugate Gradient-based MMSE Detector for 64-QAM 128×8 Massive MIMO Systems.
Proceedings of the IEEE Asian Solid-State Circuits Conference, 2018
A 4K×2K@60fps Multi-format Multi-function Display Processor for High Perceptual Quality.
Proceedings of the 2018 IEEE Asia Pacific Conference on Circuits and Systems, 2018
2017
Deep Convolutional Neural Network Architecture With Reconfigurable Computation Patterns.
IEEE Trans. Very Large Scale Integr. Syst., 2017
Low-Computing-Load, High-Parallelism Detection Method Based on Chebyshev Iteration for Massive MIMO Systems With VLSI Architecture.
IEEE Trans. Signal Process., 2017
Conflict-Free Loop Mapping for Coarse-Grained Reconfigurable Architecture with Multi-Bank Memory.
IEEE Trans. Parallel Distributed Syst., 2017
CIACP: A Correlation- and Iteration- Aware Cache Partitioning Mechanism to Improve Performance of Multiple Coarse-Grained Reconfigurable Arrays.
IEEE Trans. Parallel Distributed Syst., 2017
IEEE Trans. Parallel Distributed Syst., 2017
Exploration of Benes Network in Cryptographic Processors: A Random Infection Countermeasure for Block Ciphers Against Fault Attacks.
IEEE Trans. Inf. Forensics Secur., 2017
PMCC: Fast and Accurate System-Level Power Modeling for Processors on Heterogeneous SoC.
IEEE Trans. Circuits Syst. II Express Briefs, 2017
An AdaBoost-Based Face Detection System Using Parallel Configurable Architecture With Optimized Computation.
IEEE Syst. J., 2017
IET Image Process., 2017
IEEE Access, 2017
Proceedings of the Symposium on Applied Computing, 2017
Proceedings of the IEEE 6th Non-Volatile Memory Systems and Applications Symposium, 2017
Proceedings of the IEEE International Symposium on Circuits and Systems, 2017
Proceedings of the IEEE International Symposium on Circuits and Systems, 2017
Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017
A Power Efficient Architecture with Optimized Parallel Memory Accessing for Feature Generation.
Proceedings of the on Great Lakes Symposium on VLSI 2017, 2017
Learning Convolutional Neural Networks for Data-Flow Graph Mapping on Spatial Programmable Architectures (Abstract Only).
Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017
Joint Modulo Scheduling and Memory Partitioning with Multi-Bank Memory for High-Level Synthesis (Abstract Only).
Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017
Proceedings of the 25th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2017
Proceedings of the 54th Annual Design Automation Conference, 2017
A Fast and Power Efficient Architecture to Parallelize LSTM based RNN for Cognitive Intelligence Applications.
Proceedings of the 54th Annual Design Automation Conference, 2017
Proceedings of the 54th Annual Design Automation Conference, 2017
Proceedings of the 22nd Asia and South Pacific Design Automation Conference, 2017
2016
IEEE Trans. Very Large Scale Integr. Syst., 2016
IEEE Trans. Very Large Scale Integr. Syst., 2016
IEEE Trans. Very Large Scale Integr. Syst., 2016
A Configurable Parallel Hardware Architecture for Efficient Integral Histogram Image Computing.
IEEE Trans. Very Large Scale Integr. Syst., 2016
IEEE Trans. Very Large Scale Integr. Syst., 2016
Exploiting Parallelism of Imperfect Nested Loops on Coarse-Grained Reconfigurable Architectures.
IEEE Trans. Parallel Distributed Syst., 2016
TLIA: Efficient Reconfigurable Architecture for Control-Intensive Kernels with Triggered-Long-Instructions.
IEEE Trans. Parallel Distributed Syst., 2016
Against Double Fault Attacks: Injection Effort Model, Space and Time Randomization Based Countermeasures for Reconfigurable Array Architecture.
IEEE Trans. Inf. Forensics Secur., 2016
A 135-frames/s 1080p 87.5-mW Binary-Descriptor-Based Image Feature Extraction Accelerator.
IEEE Trans. Circuits Syst. Video Technol., 2016
IEEE Trans. Circuits Syst. II Express Briefs, 2016
Joint Modulo Scheduling and V<sub>dd</sub> Assignment for Loop Mapping on Dual- V<sub>dd</sub> CGRAs.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2016
An Implementation of Multiple-Standard Video Decoder on a Mixed-Grained Reconfigurable Computing Platform.
IEICE Trans. Inf. Syst., 2016
A fast face detection architecture for auto-focus in smart-phones and digital cameras.
Sci. China Inf. Sci., 2016
A Coarse-Grained Reconfigurable Architecture for Compute-Intensive MapReduce Acceleration.
IEEE Comput. Archit. Lett., 2016
Proceedings of the IEEE/ACM International Symposium on Nanoscale Architectures, 2016
Joint loop mapping and data placement for coarse-grained reconfigurable architecture with multi-bank memory.
Proceedings of the 35th International Conference on Computer-Aided Design, 2016
Proceedings of the 35th International Conference on Computer-Aided Design, 2016
Data cache prefetching via context directed pattern matching for coarse-grained reconfigurable arrays.
Proceedings of the 53rd Annual Design Automation Conference, 2016
Exploiting parallelism of imperfect nested loops with sibling inner loops on coarse-grained reconfigurable architectures.
Proceedings of the 21st Asia and South Pacific Design Automation Conference, 2016
2015
A Hybrid Reconfigurable Architecture and Design Methods Aiming at Control-Intensive Kernels.
IEEE Trans. Very Large Scale Integr. Syst., 2015
IEEE Trans. Very Large Scale Integr. Syst., 2015
Optimizing Spatial Mapping of Nested Loop for Coarse-Grained Reconfigurable Architectures.
IEEE Trans. Very Large Scale Integr. Syst., 2015
A Flexible Energy- and Reliability-Aware Application Mapping for NoC-Based Reconfigurable Architectures.
IEEE Trans. Very Large Scale Integr. Syst., 2015
ACM Trans. Reconfigurable Technol. Syst., 2015
Correction to "An Energy-Efficient Coarse-Grained Reconfigurable Processing Unit for Multiple-Standard Video Decoding".
IEEE Trans. Multim., 2015
An Energy-Efficient Coarse-Grained Reconfigurable Processing Unit for Multiple-Standard Video Decoding.
IEEE Trans. Multim., 2015
IEEE Trans. Consumer Electron., 2015
A Fast Integral Image Computing Hardware Architecture With High Power and Area Efficiency.
IEEE Trans. Circuits Syst. II Express Briefs, 2015
An Efficient Application Mapping Approach for the Co-Optimization of Reliability, Energy, and Performance in Reconfigurable NoC Architectures.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2015
Fast Traffic Sign Recognition with a Rotation Invariant Binary Pattern Based Feature.
Sensors, 2015
Sensors, 2015
Sensors, 2015
A 181 GOPS AKAZE Accelerator Employing Discrete-Time Cellular Neural Networks for Real-Time Feature Extraction.
Sensors, 2015
Configuration Approaches to Enhance Computing Efficiency of Coarse-Grained Reconfigurable Array.
J. Circuits Syst. Comput., 2015
IEICE Trans. Inf. Syst., 2015
The Implementation of Texture-Based Video Up-Scaling on Coarse-Grained Reconfigurable Architecture.
IEICE Trans. Inf. Syst., 2015
IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2015
Reliability-aware mapping for various NoC topologies and routing algorithms under performance constraints.
Sci. China Inf. Sci., 2015
Proceedings of the 2015 IEEE Winter Conference on Applications of Computer Vision, 2015
Proceedings of the 2015 IEEE International Symposium on Circuits and Systems, 2015
Proceedings of the IEEE International Conference on Consumer Electronics, 2015
Proceedings of the IEEE International Conference on Consumer Electronics, 2015
Proceedings of the IEEE International Conference on Consumer Electronics, 2015
Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 2015
Cost-Effective Memory Architecture to Achieve Flexible Configuration and Efficient Data Transmission for Coarse-Grained Reconfigurable Array (Abstract Only).
Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015
A Novel Composite Method to Accelerate Control Flow on Reconfigurable Architecture (Abstract Only).
Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015
A Mixed-Grained Reconfigurable Computing Platform for Multiple-Standard Video Decoding (Abstract Only).
Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015
Cooperatively managing dynamic writeback and insertion policies in a last-level DRAM cache.
Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015
Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015
Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015
Acceleration of control flows on reconfigurable architecture with a composite method.
Proceedings of the 52nd Annual Design Automation Conference, 2015
Proceedings of the 52nd Annual Design Automation Conference, 2015
A 127 fps in full hd accelerator based on optimized AKAZE with efficiency and effectiveness for image feature extraction.
Proceedings of the 52nd Annual Design Automation Conference, 2015
A 83fps 1080P resolution 354 mW silicon implementation for computing the improved robust feature in affine space.
Proceedings of the 2015 IEEE Custom Integrated Circuits Conference, 2015
Proceedings of the 20th Asia and South Pacific Design Automation Conference, 2015
A novel approach using a minimum cost maximum flow algorithm for fault-tolerant topology reconfiguration in NoC architectures.
Proceedings of the 20th Asia and South Pacific Design Automation Conference, 2015
2014
On-Chip Memory Hierarchy in One Coarse-Grained Reconfigurable Architecture to Compress Memory Space and to Reduce Reconfiguration Time and Data-Reference Time.
IEEE Trans. Very Large Scale Integr. Syst., 2014
IEEE Trans. Very Large Scale Integr. Syst., 2014
An uneven-dual-core processor based mobile platform for facilitating the collaboration among various embedded electronic devices.
IEEE Trans. Consumer Electron., 2014
A Multi-Modal Face Recognition Method Using Complete Local Derivative Patterns and Depth Maps.
Sensors, 2014
IEEE J. Solid State Circuits, 2014
Sci. China Inf. Sci., 2014
Row-based configuration mechanism for a 2-D processing element array in coarse-grained reconfigurable architecture.
Sci. China Inf. Sci., 2014
Implementation of AVS Jizhun decoder with HW/SW partitioning on a coarse-grained reconfigurable multimedia system.
Sci. China Inf. Sci., 2014
Implementation of multi-standard video decoder on a heterogeneous coarse-grained reconfigurable processor.
Sci. China Inf. Sci., 2014
Sci. China Inf. Sci., 2014
A fast and robust traffic sign recognition method using ring of RIBP histograms based feature.
Proceedings of the 2014 IEEE International Conference on Robotics and Biomimetics, 2014
Proceedings of the IEEE 57th International Midwest Symposium on Circuits and Systems, 2014
A 65 nm uneven-dual-core SoC based platform for multi-device collaborative computing.
Proceedings of the IEEE International Symposium on Circuits and Systemss, 2014
Proceedings of the IEEE International Symposium on Circuits and Systemss, 2014
Proceedings of the IEEE International Symposium on Circuits and Systemss, 2014
Proceedings of the 22nd International Conference on Pattern Recognition, 2014
Configuration approaches to improve computing efficiency of coarse-grained reconfigurable multimedia processor.
Proceedings of the 24th International Conference on Field Programmable Logic and Applications, 2014
Teach Reconfigurable Computing using mixed-grained fabrics based hardware infrastructure.
Proceedings of the IEEE Frontiers in Education Conference, 2014
Exploiting Outer Loop Parallelism of Nested Loop on Coarse-Grained Reconfigurable Architectures.
Proceedings of the 22nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2014
Extending lifetime of battery-powered coarse-grained reconfigurable computing platforms.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014
2013
IEEE Trans. Circuits Syst. II Express Briefs, 2013
Sensors, 2013
A fault tolerant NoC architecture using quad-spare mesh topology and dynamic reconfiguration.
J. Syst. Archit., 2013
Int. J. Distributed Sens. Networks, 2013
Concurrent Detection and Recognition of Individual Object Based on Colour and p-SIFT Features.
IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2013
IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2013
IEICE Trans. Inf. Syst., 2013
Affine Transformations for Communication and Reconfiguration Optimization of Mapping Loop Nests on CGRAs.
IEICE Trans. Inf. Syst., 2013
The Organization of On-Chip Data Memory in One Coarse-Grained Reconfigurable Architecture.
IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2013
Parallelization of Computing-Intensive Tasks of SIFT Algorithm on a Reconfigurable Architecture System.
IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2013
An efficient VLSI architecture of speeded-up robust feature extraction for high resolution and high frame rate video.
Sci. China Inf. Sci., 2013
Hierarchical representation of on-chip context to reduce reconfiguration time and implementation area for coarse-grained reconfigurable architecture.
Sci. China Inf. Sci., 2013
Sci. China Inf. Sci., 2013
SPC: An Approach to Guarantee Performance in Cost Oriented Mapping Algorithm for NoC Architectures.
Proceedings of the IEEE Eighth International Conference on Networking, 2013
Battery-Aware MAC Analytical Modeling for Extending Lifetime of Low Duty-Cycled Wireless Sensor Network.
Proceedings of the IEEE Eighth International Conference on Networking, 2013
A VLSI architecture for enhancing the fault tolerance of NoC using quad-spare mesh topology and dynamic reconfiguration.
Proceedings of the 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013), 2013
Affine transformations for communication and reconfiguration optimization of loops on CGRAs.
Proceedings of the 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013), 2013
Implementation of multi-standard video decoding algorithms on a coarse-grained reconfigurable multimedia processor.
Proceedings of the 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013), 2013
Mapping IDCT of MPEG2 on Coarse-Grained Reconfigurable Array for Matching 1080p Video Decoding.
Proceedings of the Advanced Technologies, Embedded and Multimedia for Human-centric Computing, 2013
Proceedings of the 50th Annual Design Automation Conference 2013, 2013
SURFEX: A 57fps 1080P resolution 220mW silicon implementation for simplified speeded-up robust feature with 65nm process.
Proceedings of the IEEE 2013 Custom Integrated Circuits Conference, 2013
An energy-efficient coarse-grained dynamically reconfigurable fabric for multiple-standard video decoding applications.
Proceedings of the IEEE 2013 Custom Integrated Circuits Conference, 2013
2012
IEICE Trans. Inf. Syst., 2012
IEICE Trans. Electron., 2012
IEICE Trans. Inf. Syst., 2012
Proceedings of the 2012 IEEE International Symposium on Circuits and Systems, 2012
2011
Proceedings of the 2011 IEEE 9th International Conference on ASIC, 2011
Proceedings of the 2011 IEEE 9th International Conference on ASIC, 2011
2010
IEICE Trans. Inf. Syst., 2010
IEICE Trans. Commun., 2010
Parallelization of Computing-Intensive Tasks of the H.264 High Profile Decoding Algorithm on a Reconfigurable Multimedia System.
IEICE Trans. Inf. Syst., 2010
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2010), May 30, 2010
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2010), May 30, 2010
Parallel implementation of computing-intensive decoding algorithms of H.264 on reconfigurable SoC.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2010), May 30, 2010
Proceedings of the 2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops, 2010
Proceedings of the IEEE Asia Pacific Conference on Circuits and Systems, 2010
Proceedings of the IEEE Asia Pacific Conference on Circuits and Systems, 2010
2009
IEICE Trans. Electron., 2009
Sci. China Ser. F Inf. Sci., 2009
2006
Wirel. Commun. Mob. Comput., 2006
Prediction-based routing for real time communications in wireless multi-hop networks.
Proceedings of the 3rd International ICST Conference on Quality of Service in Heterogeneous Wired/Wireless Networks, 2006
2005
Proceedings of the IEEE Wireless Communications and Networking Conference, 2005
Proceedings of the Networking, 2005
Proceedings of IEEE International Conference on Communications, 2005