A 28nm 47.3TFLOPs/W 894mJ/Inference Visual Autoregressive Accelerator with Differential-Amplifier Speculation and Chain-Reaction-Like Parallel Generation.

[BibT_eX]

[DOI]

Zhiheng Yue

Xujiang Xiang

Proceedings of the IEEE International Solid-State Circuits Conference, 2026

2.9 A 0.24mJ/Frame Quadratic Interpolation 4DGS Processor with Recursive Computation Reuse and Tree-Based Parallel-Rendering.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Solid-State Circuits Conference, 2026

A 28nm Speculative-Decoding LLM Processor Achieving 105-to-685µs/Token Latency for Billion-Parameter Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Solid-State Circuits Conference, 2026

HR-DCIM: High-Reliability Floating-Point Digital CIM Architecture With Unified Low-Cost Iterative Error Correction.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2026

Hetero-ChipletSim: Bridging Chiplet, Interconnect and Packaging Heterogeneity in Multi-Chiplet System Simulation.

[BibT_eX]

[DOI]

Proceedings of the Design, Automation & Test in Europe Conference, 2026

MoBiLE: Efficient Mixture-of-Experts Inference on Consumer GPU with Mixture of Big Little Experts.

[BibT_eX]

[DOI]

Proceedings of the 31st Asia and South Pacific Design Automation Conference, 2026

BitStopper: An Efficient Transformer Attention Accelerator via Stage-fusion and Early Termination.

[BibT_eX]

[DOI]

Proceedings of the 31st Asia and South Pacific Design Automation Conference, 2026

LAPA: Log-Domain Prediction-Driven Dynamic Sparsity Accelerator for Transformer Model.

[BibT_eX]

[DOI]

Proceedings of the 31st Asia and South Pacific Design Automation Conference, 2026

2025

Exploiting Fine-Grained Task-Level Parallelism for Variant Calling Acceleration.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., November, 2025

BETA: A Bit-Grained Transformer Attention Accelerator With Efficient Early Termination.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. II Express Briefs, October, 2025

A 28-nm 239-bp/μJ Agile Pangenome Analysis Accelerator for Multi-Scheme Read Mapping.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, October, 2025

From Quarter to All: Accelerating Speculative LLM Decoding via Floating-Point Exponent Remapping and Parameter Sharing.

[BibT_eX]

[DOI]

CoRR, October, 2025

CAPSim: A Fast CPU Performance Simulator Using Attention-based Predictor.

[BibT_eX]

[DOI]

CoRR, October, 2025

SSS-DIMM: Removing Redundant Data Movement in Trusted DIMM-Based Near-Memory-Processing Kernel Offloading via Secure Space Sharing.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., August, 2025

A 28-nm Software-Defined Accelerator Chip With Circuit-Pipeline Scaling and Intrinsic Physical Unclonable Function Enabling Secure Configuration.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, August, 2025

An Energy-Efficient POSIT Compute-in-Memory Macro for High-Accuracy AI Applications.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, August, 2025

Raccoon: Lightweight Support for Comprehensive Control Flows in Reconfigurable Spatial Architectures.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., June, 2025

PQPU: A 4.4-μJ/Op 69.4-kOPS Agile Post-Quantum Crypto-Processor Across Multiple Mathematical Problems.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, June, 2025

Dyn-Bitpool: A 28 nm 27 TOPS/W Two-Sided Sparse CIM Accelerator Featuring a Balanced Workload Scheme and High CIM Macro Utilization.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. I Regul. Pap., May, 2025

BeaCIM: A Digital Compute-in-Memory DNN Processor With Bi-Directional Exponent Alignment for FP8 Training.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. II Express Briefs, April, 2025

CV-CIM: A Hybrid Domain Xor-Derived Similarity-Aware Computation-in-Memory Supporting Cost-Volume Construction.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, February, 2025

TensorCIM: Digital Computing-in-Memory Tensor Processor With Multichip-Module-Based Architecture for Beyond-NN Acceleration.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, February, 2025

Rethinking Control Flow in Spatial Architectures: Insights Into Control Flow Plane Design.

[BibT_eX]

[DOI]

IEEE Trans. Computers, January, 2025

A 28-nm 28.8-TOPS/W Attention-Based NN Processor With Correlative CIM Ring Architecture and Dataflow-Reshaped Digital-Assisted CIM Array.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, January, 2025

FalconSign: An Efficient and High-Throughput Hardware Architecture for Falcon Signature Generation.

[BibT_eX]

[DOI]

IACR Trans. Cryptogr. Hardw. Embed. Syst., 2025

A High-performance NTT/MSM Accelerator for Zero-knowledge Proof Using Load-balanced Fully-pipelined Montgomery Multiplier.

[BibT_eX]

[DOI]

IACR Trans. Cryptogr. Hardw. Embed. Syst., 2025

Software-defined process-near-memory architecture using 3D hybrid bonding integration.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2025

3D-PATH: A Hierarchy LUT Processing-in-memory Accelerator with Thermal-aware Hybrid Bonding Integration.

[BibT_eX]

[DOI]

Proceedings of the 58th IEEE/ACM International Symposium on Microarchitecture, 2025

MCBP: A Memory-Compute Efficient LLM Inference Accelerator Leveraging Bit-Slice-enabled Sparsity and Repetitiveness.

[BibT_eX]

[DOI]

Proceedings of the 58th IEEE/ACM International Symposium on Microarchitecture, 2025

PointISA: ISA-Extensions for Efficient Point Cloud Analytics via Architecture and Algorithm Co-Design.

[BibT_eX]

[DOI]

Proceedings of the 58th IEEE/ACM International Symposium on Microarchitecture, 2025

14.4 A 51.6TFLOPs/W Full-Datapath CIM Macro Approaching Sparsity Bound and <-30 Loss for Compound AI.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Solid-State Circuits Conference, 2025

23.8 An 88.36TOPS/W Bit-Level-Weight-Compressed Large-Language-Model Accelerator with Cluster-Aligned INT-FP-GEMM and Bi-Dimensional Workflow Reformulation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Solid-State Circuits Conference, 2025

17.2 A 28nm 4.05µJ/Encryption 8.72kHMul/s Reconfigurable Multi-Scheme Fully Homomorphic Encryption Processor for Encrypted Client-Server Computing.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Solid-State Circuits Conference, 2025

WSC-LLM: Efficient LLM Service and Architecture Co-exploration for Wafer-scale Chips.

[BibT_eX]

[DOI]

Proceedings of the 52nd Annual International Symposium on Computer Architecture, 2025

P2P-Chiplet: Partition and Placement Co-Optimization for Multi-Chiplet Architecture.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2025

Lincoln: Real-Time 50~100B LLM Inference on Consumer Devices with LPDDR-Interfaced, Compute-Enabled Flash Memory.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2025

EFFACT: A Highly Efficient Full-Stack FHE Acceleration Platform.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2025

ER-DCIM: Error-Resilient Digital CIM Architecture with Run-Time MAC-Cell Error Correction.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2025

Chameleon-SAT: An Adaptive Boolean Satisfiability Accelerator Using Mixed-Signal In-Memory Computing for Versatile SAT Problems.

[BibT_eX]

[DOI]

Proceedings of the 62nd ACM/IEEE Design Automation Conference, 2025

Computing Efficiency Improvement for Multi-PEA CGRA with Built-in Control Design.

[BibT_eX]

[DOI]

Proceedings of the 22nd ACM International Conference on Computing Frontiers, 2025

DIAG: A Refined Four-layer Agile Hardware Developing Flow for Generating Flexible Reconfigurable Architectures.

[BibT_eX]

[DOI]

Proceedings of the 30th Asia and South Pacific Design Automation Conference, 2025

PAMA: Large-Scale GNN Acceleration with Pre-Aggregation in Multi-Node Architecture.

[BibT_eX]

[DOI]

Proceedings of the 36th IEEE International Conference on Application-specific Systems, 2025

2024

Efficient Orchestrated AI Workflows Execution on Scale-Out Spatial Architecture.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Artif. Intell., December, 2024

Ayaka: A Versatile Transformer Accelerator With Low-Rank Estimation and Heterogeneous Dataflow.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, October, 2024

CIMFormer: A Systolic CIM-Array-Based Transformer Accelerator With Token-Pruning-Aware Attention Reformulating and Principal Possibility Gathering.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, October, 2024

A High-Performance Genomic Accelerator for Accurate Sequence-to-Graph Alignment Using Dynamic Programming Algorithm.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., February, 2024

MulTCIM: Digital Computing-in-Memory-Based Multimodal Transformer Accelerator With Attention-Token-Bit Hybrid Sparsity.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, January, 2024

Breaking Ground: A New Area Record for Low-Latency First-Order Masked SHA-3 Advancing from the 4x Area Era to the 3x Area Era.

[BibT_eX]

[DOI]

IACR Trans. Cryptogr. Hardw. Embed. Syst., 2024

UpWB: An Uncoupled Architecture Design for White-box Cryptography Using Vectorized Montgomery Multiplication.

[BibT_eX]

[DOI]

IACR Trans. Cryptogr. Hardw. Embed. Syst., 2024

A Low-Latency High-Order Arithmetic to Boolean Masking Conversion.

[BibT_eX]

[DOI]

IACR Cryptol. ePrint Arch., 2024

SWG: an architecture for sparse weight gradient computation.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2024

CATCAM: a 28 nm constant-time alteration TCAM enabling less than 50 ns update latency.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2024

A 52.01 TFLOPS/W Diffusion Model Processor with Inter-Time-Step Convolution-Attention-Redundancy Elimination and Bipolar Floating-Point Multiplication.

[BibT_eX]

[DOI]

Proceedings of the IEEE Symposium on VLSI Technology and Circuits 2024, 2024

A 28nm 4170-TFLOPS/W/b and 195-TFLOPS/mm2/b Multiply-Free Fully-Digital Floating-Point Compute-In-Memory Macro with Mitchell's Approximation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Symposium on VLSI Technology and Circuits 2024, 2024

A 22nm 54.94TFLOPS/W Transformer Fine-Tuning Processor with Exponent-Stationary Re-Computing, Aggressive Linear Fitting, and Logarithmic Domain Multiplicating.

[BibT_eX]

[DOI]

Proceedings of the IEEE Symposium on VLSI Technology and Circuits 2024, 2024

ETCIM: An Error-Tolerant Digital-CIM Processor with Redundancy-Free Repair and Run-Time MAC and Cell Error Correction.

[BibT_eX]

[DOI]

Proceedings of the IEEE Symposium on VLSI Technology and Circuits 2024, 2024

Optimizing Vo-Viso: A Modified Methodology to Parallel Computing with Isolating Data in Memristor Arrays.

[BibT_eX]

[DOI]

Proceedings of the Network and Parallel Computing, 2024

16.2 A 28nm 69.4kOPS 4.4μJ/Op Versatile Post-Quantum Crypto-Processor Across Multiple Mathematical Problems.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Solid-State Circuits Conference, 2024

15.1 A 0.795fJ/bit Physically-Unclonable Function-Protected TCAM for a Software-Defined Networking Switch.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Solid-State Circuits Conference, 2024

34.1 A 28nm 83.23TFLOPS/W POSIT-Based Compute-in-Memory Macro for High-Accuracy AI Applications.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Solid-State Circuits Conference, 2024

20.2 A 28nm 74.34TFLOPS/W BF16 Heterogenous CIM-Based Accelerator Exploiting Denoising-Similarity for Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Solid-State Circuits Conference, 2024

Exploiting Similarity Opportunities of Emerging Vision AI Models on Hybrid Bonding Architecture.

[BibT_eX]

[DOI]

Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

MECLA: Memory-Compute-Efficient LLM Accelerator with Scaling Sub-matrix Partition.

[BibT_eX]

[DOI]

Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

Sparse Polynomial Multiplication-Based High-Performance Hardware Implementation for CRYSTALS-Dilithium.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Hardware Oriented Security and Trust, 2024

CAP: A General Purpose Computation-in-memory with Content Addressable Processing Paradigm.

[BibT_eX]

[DOI]

Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

FQP: A Fibonacci Quantization Processor with Multiplication-Free Computing and Topological-Order Routing.

[BibT_eX]

[DOI]

Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

GSPO: A Graph Substitution and Parallelization Joint Optimization Framework for DNN Inference.

[BibT_eX]

[DOI]

Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

Dyn-Bitpool: A Two-sided Sparse CIM Accelerator Featuring a Balanced Workload Scheme and High CIM Macro Utilization.

[BibT_eX]

[DOI]

Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

Research on Performance Optimization of Encryption Algorithms for Network Security Framework.

[BibT_eX]

[DOI]

Proceedings of the 2024 3rd International Conference on Cyber Security, 2024

A 28nm 118.26TOPS/W Multi-Dimensional Fault-Tolerant Al Processor Enabling Voltage-Frequency Scaling Below Point-of-First-Failure.

[BibT_eX]

[DOI]

Proceedings of the IEEE Asian Solid-State Circuits Conference, 2024

Harp: Leveraging Quasi-Sequential Characteristics to Accelerate Sequence-to-Graph Mapping of Long Reads.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

GRS: A General RISC-V SIMD Vector Acceleration Processor for Artificial Intelligence Applications.

[BibT_eX]

[DOI]

Proceedings of the IEEE Asia Pacific Conference on Circuits and Systems, 2024

SSPE: A Device-edge SNN Inference Artificial Intelligence Processor in Supporting Smart Computing.

[BibT_eX]

[DOI]

Proceedings of the IEEE Asia Pacific Conference on Circuits and Systems, 2024

RCPE: An Excellent Performance Training Processor with RISC-V based Compression Mechanism.

[BibT_eX]

[DOI]

Proceedings of the 6th IEEE International Conference on AI Circuits and Systems, 2024

RTPE: A High Energy Efficiency Inference Processor with RISC-V based Transformation Mechanism.

[BibT_eX]

[DOI]

Proceedings of the 6th IEEE International Conference on AI Circuits and Systems, 2024

2023

GEM: Ultra-Efficient Near-Memory Reconfigurable Acceleration for Read Mapping by Dividing and Predictive Scattering.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., December, 2023

M2STaR: A Multimode Spatio-Temporal Redundancy Design for Fault-Tolerant Coarse-Grained Reconfigurable Architectures.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., September, 2023

Reconfigurability, Why It Matters in AI Tasks Processing: A Survey of Reconfigurable AI Chips.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. I Regul. Pap., March, 2023

TT@CIM: A Tensor-Train In-Memory-Computing Processor Using Bit-Level-Sparsity Optimization and Variable Precision Quantization.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, March, 2023

SPCIM: Sparsity-Balanced Practical CIM Accelerator With Optimized Spatial-Temporal Multi-Macro Utilization.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. I Regul. Pap., January, 2023

STAR: An STGCN ARchitecture for Skeleton-Based Human Action Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. I Regul. Pap., 2023

SDP: Co-Designing Algorithm, Dataflow, and Architecture for In-SRAM Sparse NN Acceleration.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2023

TAEM 2.0: A Faster Transfer-Aware Effective Loop Mapping for Heterogeneous Resources on CGRA.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2023

RePQC: A 3.4-uJ/Op 48-kOPS Post-Quantum Crypto-Processor for Multiple-Mathematical Problems.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, 2023

An Energy-Efficient Transformer Processor Exploiting Dynamic Weak Relevances in Global Attention.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, 2023

ReDCIM: Reconfigurable Digital Computing- In -Memory Processor With Unified FP/INT Pipeline for Cloud AI Acceleration.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, 2023

TranCIM: Full-Digital Bitline-Transpose CIM-based Sparse Transformer Accelerator With Pipeline/Parallel Reconfigurable Modes.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, 2023

A Closer Look at the Chaotic Ring Oscillators based TRNG Design.

[BibT_eX]

[DOI]

IACR Cryptol. ePrint Arch., 2023

Wafer-scale Computing: Advancements, Challenges, and Future Perspectives.

[BibT_eX]

[DOI]

CoRR, 2023

WindMill: A Parameterized and Pluggable CGRA Implemented by DIAG Design Flow.

[BibT_eX]

[DOI]

CoRR, 2023

Towards Efficient Control Flow Handling in Spatial Architecture via Architecting the Control Flow Plane.

[BibT_eX]

[DOI]

CoRR, 2023

A 28nm 77.35TOPS/W Similar Vectors Traceable Transformer Processor with Principal-Component-Prior Speculating and Dynamic Bit-wise Stationary Computing.

[BibT_eX]

[DOI]

Proceedings of the 2023 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), 2023

CASA: An Energy-Efficient and High-Speed CAM-based SMEM Seeding Accelerator for Genome Alignment.

[BibT_eX]

[DOI]

Konstantinos Mamouras

Shaojun Wei

Kaiyuan Yang

Leibo Liu

Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

Towards Efficient Control Flow Handling in Spatial Architecture via Architecting the Control Flow Plane.

[BibT_eX]

[DOI]

Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

CV-CIM: A 28nm XOR-Derived Similarity-Aware Computation-in-Memory for Cost-Volume Construction.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Solid- State Circuits Conference, 2023

TensorCIM: A 28nm 3.7nJ/Gather and 8.3TFLOPS/W FP32 Digital-CIM Tensor Processor for MCM-CIM-Based Beyond-NN Acceleration.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Solid- State Circuits Conference, 2023

MuITCIM: A 28nm $2.24 \mu\mathrm{J}$/Token Attention-Token-Bit Hybrid Sparse Digital CIM-Based Accelerator for Multimodal Transformers.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Solid- State Circuits Conference, 2023

Shogun: A Task Scheduling Framework for Graph Mining Accelerators.

[BibT_eX]

[DOI]

Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

FACT: FFN-Attention Co-optimized Transformer Architecture with Eager Correlation Prediction.

[BibT_eX]

[DOI]

Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

MapZero: Mapping for Coarse-grained Reconfigurable Architectures with Reinforcement Learning and Monte-Carlo Tree Search.

[BibT_eX]

[DOI]

Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

Orinoco: Ordered Issue and Unordered Commit with Non-Collapsible Queues.

[BibT_eX]

[DOI]

Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

A Low-Randomness First-Order Masked Xoodyak.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Hardware Oriented Security and Trust, 2023

Mckeycutter: A High-throughput Key Generator of Classic McEliece on Hardware.

[BibT_eX]

[DOI]

Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

RMP-MEM: A HW/SW Reconfigurable Multi-Port Memory Architecture for Multi-PEA Oriented CGRA.

[BibT_eX]

[DOI]

Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

CPE: An Energy-Efficient Edge-Device Training with Multi-dimensional Compression Mechanism.

[BibT_eX]

[DOI]

Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

A 28nm 49.7TOPS/W Sparse Transformer Processor with Random-Projection-Based Speculation, Multi-Stationary Dataflow, and Redundant Partial Product Elimination.

[BibT_eX]

[DOI]

Proceedings of the IEEE Asian Solid-State Circuits Conference, 2023

CIMFormer: A 38.9TOPS/W-8b Systolic CIM-Array Based Transformer Processor with Token-Slimmed Attention Reformulating and Principal Possibility Gathering.

[BibT_eX]

[DOI]

Proceedings of the IEEE Asian Solid-State Circuits Conference, 2023

TPE: A High-Performance Edge-Device Inference with Multi-level Transformational Mechanism.

[BibT_eX]

[DOI]

Proceedings of the 5th IEEE International Conference on Artificial Intelligence Circuits and Systems, 2023

A Systolic Computing-in-Memory Array based Accelerator with Predictive Early Activation for Spatiotemporal Convolutions.

[BibT_eX]

[DOI]

Proceedings of the 5th IEEE International Conference on Artificial Intelligence Circuits and Systems, 2023

2022

A Compact and High-Performance Hardware Architecture for CRYSTALS-Dilithium.

[BibT_eX]

[DOI]

IACR Trans. Cryptogr. Hardw. Embed. Syst., 2022

CFNTT: Scalable Radix-2/4 NTT Multiplication Architecture with an Efficient Conflict-free Memory Mapping Scheme.

[BibT_eX]

[DOI]

IACR Trans. Cryptogr. Hardw. Embed. Syst., 2022

BR-CIM: An Efficient Binary Representation Computation-In-Memory Design.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. I Regul. Pap., 2022

GQNA: Generic Quantized DNN Accelerator With Weight-Repetition-Aware Activation Aggregating.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. I Regul. Pap., 2022

An Energy-Efficient Approximate Divider Based on Logarithmic Conversion and Piecewise Constant Approximation.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. I Regul. Pap., 2022

SWPU: A 126.04 TFLOPS/W Edge-Device Sparse DNN Training Processor With Dynamic Sub-Structured Weight Pruning.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. I Regul. Pap., 2022

PL-NPU: An Energy-Efficient Edge-Device DNN Training Processor With Posit-Based Logarithm-Domain Computing.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. I Regul. Pap., 2022

Dynamic-II Pipeline: Compiling Loops With Irregular Branches on Static-Scheduling CGRA.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022

BitCluster: Fine-Grained Weight Quantization for Load-Balanced Bit-Serial Neural Network Accelerators.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022

Efficient FHE Radix-2 Arithmetic Operations Based on Redundant Encoding.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022

Trainer: An Energy-Efficient Edge-Device Training Processor Supporting Dynamic Weight Pruning.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, 2022

A 12.1 TOPS/W Quantized Network Acceleration Processor With Effective-Weight-Based Convolution and Error-Compensation-Based Prediction.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, 2022

Compact GF(2) systemizer and optimized constant-time hardware sorters for Key Generation in Classic McEliece.

[BibT_eX]

[DOI]

IACR Cryptol. ePrint Arch., 2022

HQNAS: Auto CNN deployment framework for joint quantization and architecture search.

[BibT_eX]

[DOI]

CoRR, 2022

FAQS: Communication-efficient Federate DNN Architecture and Quantization Co-Search for personalized Hardware-aware Preferences.

[BibT_eX]

[DOI]

CoRR, 2022

An energy-efficient dynamically reconfigurable cryptographic engine with improved power/EM-side-channel-attack resistance.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2022

A 28nm 48KOPS 3.4µJ/Op Agile Crypto-Processor for Post-Quantum Cryptography on Multi-Mathematical Problems.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Solid-State Circuits Conference, 2022

A 28nm 27.5TOPS/W Approximate-Computing-Based Transformer Processor with Asymptotic Sparsity Speculating and Out-of-Order Computing.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Solid-State Circuits Conference, 2022

A 28nm 15.59µJ/Token Full-Digital Bitline-Transpose CIM-Based Sparse Transformer Accelerator with Pipeline/Parallel Reconfigurable Modes.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Solid-State Circuits Conference, 2022

A 28nm 29.2TFLOPS/W BF16 and 36.5TOPS/W INT8 Reconfigurable Digital CIM Processor with Unified FP/INT Pipeline and Bitwise In-Memory Booth Multiplication for Cloud Deep Learning Acceleration.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Solid-State Circuits Conference, 2022

CaSMap: agile mapper for reconfigurable spatial architectures by automatically clustering intermediate representations and scattering mapping process.

[BibT_eX]

[DOI]

Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

A SHA-512 Hardware Implementation Based on Block RAM Storage Structure.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022

Dynamically Reconfigurable Memory Address Mapping for General-Purpose Graphics Processing Unit.

[BibT_eX]

[DOI]

Proceedings of the 2022 IEEE International Conference on Integrated Circuits, 2022

Atomic Dataflow based Graph-Level Workload Orchestration for Scalable DNN Accelerators.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

Upward Packet Popup for Deadlock Freedom in Modular Chiplet-Based Systems.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

MC-CIM: a reconfigurable computation-in-memory for efficient stereo matching cost computation.

[BibT_eX]

[DOI]

Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

Mixed-granularity parallel coarse-grained reconfigurable architecture.

[BibT_eX]

[DOI]

Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

Efficient access scheme for multi-bank based NTT architecture through conflict graph.

[BibT_eX]

[DOI]

Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

Software Defined Chips - Volume I, 2

[BibT_eX]

[DOI]

Springer, ISBN: 978-981-19-6993-5, 2022

2021

An Elastic Task Scheduling Scheme on Coarse-Grained Reconfigurable Architectures.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2021

A 460 GOPS/W Improved Mnemonic Descent Method-Based Hardwired Accelerator for Face Alignment.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2021

LWRpro: An Energy-Efficient Configurable Crypto-Processor for Module-LWR.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. I Regul. Pap., 2021

Efficient Comparison and Addition for FHE With Weighted Computational Complexity Model.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2021

A Deflection-Based Deadlock Recovery Framework to Achieve High Throughput for Faulty NoCs.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2021

A Design Flow for Click-Based Asynchronous Circuits Design With Conventional EDA Tools.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2021

Jintide: Utilizing Low-Cost Reconfigurable External Monitors to Substantially Enhance Hardware Security of Large-Scale CPU Clusters.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, 2021

TIMAQ: A Time-Domain Computing-in-Memory-Based Processor Using Predictable Decomposed Convolution for Arbitrary Quantized DNNs.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, 2021

Erratum to "Evolver: a Deep Learning Processor With On-Device Quantization-Voltage-Frequency Tuning".

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, 2021

Evolver: A Deep Learning Processor With On-Device Quantization-Voltage-Frequency Tuning.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, 2021

Fast substitution-box evaluation algorithm and its efficient masking scheme for block ciphers.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2021

A 28nm 276.55TFLOPS/W Sparse Deep-Neural-Network Training Processor with Implicit Redundancy Speculation and Batch Normalization Reformulation.

[BibT_eX]

[DOI]

Proceedings of the 2021 Symposium on VLSI Circuits, Kyoto, Japan, June 13-19, 2021, 2021

A 6.54-to-26.03 TOPS/W Computing-In-Memory RNN Processor using Input Similarity Optimization and Attention-based Context-breaking with Output Speculation.

[BibT_eX]

[DOI]

Proceedings of the 2021 Symposium on VLSI Circuits, Kyoto, Japan, June 13-19, 2021, 2021

9.2A 28nm 12.1TOPS/W Dual-Mode CNN Processor Using Effective-Weight-Based Convolution and Error-Compensation-Based Prediction.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Solid-State Circuits Conference, 2021

15.4 A 5.99-to-691.1TOPS/W Tensor-Train In-Memory-Computing Processor Using Bit-Level-Sparsity-Based Optimization and Variable-Precision Quantization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Solid-State Circuits Conference, 2021

ABC-DIMM: Alleviating the Bottleneck of Communication in DIMM-based Near-Memory Processing with Inter-DIMM Broadcast.

[BibT_eX]

[DOI]

Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

FuseKNA: Fused Kernel Convolution based Accelerator for Deep Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

HeteroKV: A Scalable Line-rate Key-Value Store on Heterogeneous CPU-FPGA Platforms.

[BibT_eX]

[DOI]

Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2021

ADROIT: An Adaptive Dynamic Refresh Optimization Framework for DRAM Energy Saving In DNN Training.

[BibT_eX]

[DOI]

Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021

A 28nm Configurable Asynchronous SNN Accelerator with Energy-Efficient Learning.

[BibT_eX]

[DOI]

Proceedings of the 27th IEEE International Symposium on Asynchronous Circuits and Systems, 2021

A Multiple-Precision Multiply and Accumulation Design with Multiply-Add Merged Strategy for AI Accelerating.

[BibT_eX]

[DOI]

Proceedings of the ASPDAC '21: 26th Asia and South Pacific Design Automation Conference, 2021

Combining Memory Partitioning and Subtask Generation for Parallel Data Access on CGRAs.

[BibT_eX]

[DOI]

Proceedings of the ASPDAC '21: 26th Asia and South Pacific Design Automation Conference, 2021

HPPU: An Energy-Efficient Sparse DNN Training Processor with Hybrid Weight Pruning.

[BibT_eX]

[DOI]

Proceedings of the 3rd IEEE International Conference on Artificial Intelligence Circuits and Systems, 2021

LPE: Logarithm Posit Processing Element for Energy-Efficient Edge-Device Training.

[BibT_eX]

[DOI]

Proceedings of the 3rd IEEE International Conference on Artificial Intelligence Circuits and Systems, 2021

2020

Near-Optimal MIMO-SCMA Uplink Detection With Low-Complexity Expectation Propagation.

[BibT_eX]

[DOI]

IEEE Trans. Wirel. Commun., 2020

Energy- and Area-Efficient Recursive-Conjugate-Gradient-Based MMSE Detector for Massive MIMO Systems.

[BibT_eX]

[DOI]

IEEE Trans. Signal Process., 2020

Achieving Flexible Global Reconfiguration in NoCs Using Reconfigurable Rings.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2020

Pattern-Based Dynamic Compilation System for CGRAs With Online Configuration Transformation.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2020

A Multi-Task Hardwired Accelerator for Face Detection and Alignment.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., 2020

Highly Efficient Architecture of NewHope-NIST on FPGA using Low-Complexity NTT/INTT.

[BibT_eX]

[DOI]

IACR Trans. Cryptogr. Hardw. Embed. Syst., 2020

A 4K × 2K@60fps Multifunctional Video Display Processor for High Perceptual Image Quality.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. I Regul. Pap., 2020

A 60 Gb/s-Level Coarse-Grained Reconfigurable Cryptographic Processor With Less Than 1-W Power.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. II Express Briefs, 2020

Efficient Scheduling of Irregular Network Structures on CNN Accelerators.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

Aggressive Fine-Grained Power Gating of NoC Buffers.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

NTTU: An Area-Efficient Low-Power NTT-Uncoupled Architecture for NTT-Based Multiplication.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2020

A 2.92-Gb/s/W and 0.43-Gb/s/MG Flexible and Scalable CGRA-Based Baseband Processor for Massive MIMO Detection.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, 2020

A High-performance Hardware Implementation of Saber Based on Karatsuba Algorithm.

[BibT_eX]

[DOI]

IACR Cryptol. ePrint Arch., 2020

A Survey of Coarse-Grained Reconfigurable Architecture and Design: Taxonomy, Challenges, and Applications.

[BibT_eX]

[DOI]

ACM Comput. Surv., 2020

TFE: Energy-efficient Transferred Filter-based Engine to Compress and Accelerate Convolutional Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

CATCAM: Constant-time Alteration Ternary CAM with Scalable In-Memory Architecture.

[BibT_eX]

[DOI]

Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

GraphABCD: Scaling Out Graph Analytics with Asynchronous Block Coordinate Descent.

[BibT_eX]

[DOI]

Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

A Reconfigurable Branch Predictor for Spatial Computing Architectures.

[BibT_eX]

[DOI]

Proceedings of the ICDSP 2020: 4th International Conference on Digital Signal Processing, 2020

PAGAN: A Phase-Adapted Generative Adversarial Networks for Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

A High-performance Inference Accelerator Exploiting Patterned Sparsity in CNNs.

[BibT_eX]

[DOI]

Proceedings of the 28th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2020

STC: Significance-aware Transform-based Codec Framework for External Memory Access Reduction.

[BibT_eX]

[DOI]

Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

CDRing: Reconfigurable Ring Architecture by Exploiting Cycle Decomposition of Torus Topology.

[BibT_eX]

[DOI]

Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

TAEM: Fast Transfer-Aware Effective Loop Mapping for Heterogeneous Resources on CGRA.

[BibT_eX]

[DOI]

Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

A Time-Domain Computing-in-Memory based Processor using Predictable Decomposed Convolution for Arbitrary Quantized DNNs.

[BibT_eX]

[DOI]

Proceedings of the IEEE Asian Solid-State Circuits Conference, 2020

2019

Parana: A Parallel Neural Architecture Considering Thermal Problem of 3D Stacked Memory.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2019

Face Alignment With Expression- and Pose-Based Adaptive Initialization.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2019

Reconfigurable Architecture for Neural Approximation in Multimedia Computing.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., 2019

A Face Alignment Accelerator Based on Optimized Coarse-to-Fine Shape Searching.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., 2019

An Ultra-Low Power Binarized Convolutional Neural Network-Based Speech Recognition Processor With On-Chip Self-Learning.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. I Regul. Pap., 2019

A Fast and Power-Efficient Hardware Architecture for Non-Maximum Suppression.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. II Express Briefs, 2019

A High Throughput Acceleration for Hybrid Neural Networks With Efficient Resource Management on FPGA.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2019

A Lifetime Reliability-Constrained Runtime Mapping for Throughput Optimization in Many-Core Systems.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2019

A Binary-Feature-Based Object Recognition Accelerator With 22 M-Vector/s Throughput and 0.68 G-Vector/J Energy-Efficiency for Full-HD Resolution.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2019

Data-Flow Graph Mapping Optimization for CGRA With Deep Reinforcement Learning.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2019

Low Area-Overhead Low-Entropy Masking Scheme (LEMS) Against Correlation Power Analysis Attack.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2019

An STT-MRAM Based in Memory Architecture for Low Power Integral Computing.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2019

An Energy-Efficient Reconfigurable Processor for Binary-and Ternary-Weight Neural Networks With Flexible Data Bit Width.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, 2019

A 5.1pJ/Neuron 127.3us/Inference RNN-based Speech Recognition Processor using 16 Computing-in-Memory SRAM Macros in 65nm CMOS.

[BibT_eX]

[DOI]

Proceedings of the 2019 Symposium on VLSI Circuits, Kyoto, Japan, June 9-14, 2019, 2019

MoNA: Mobile Neural Architecture with Reconfigurable Parallel Dimensions.

[BibT_eX]

[DOI]

Proceedings of the 17th IEEE International New Circuits and Systems Conference, 2019

An Energy-Efficient Architecture for Accelerating Inference of Memory-Augmented Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Symposium on Nanoscale Architectures, 2019

FPGA-Accelerated Optimistic Concurrency Control for Transactional Memory.

[BibT_eX]

[DOI]

Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

Towards Efficient Compact Network Training on Edge-Devices.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE Computer Society Annual Symposium on VLSI, 2019

A Reliable Physical Unclonable Function Based on Differential Charging Capacitors.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Circuits and Systems, 2019

ReDESK: A Reconfigurable Dataflow Engine for Sparse Kernels on Heterogeneous Platforms.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computer-Aided Design, 2019

Jintide®: A Hardware Security Enhanced Server CPU with Xeon® Cores under Runtime Surveillance by an In-Package Dynamically Reconfigurable Processor.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE Hot Chips 31 Symposium (HCS), 2019

A Skyrmion Racetrack Memory based Computing In-memory Architecture for Binary Neural Convolutional Network.

[BibT_eX]

[DOI]

Proceedings of the 2019 on Great Lakes Symposium on VLSI, 2019

Constructing Concurrent Data Structures on FPGA with Channels.

[BibT_eX]

[DOI]

Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

A 1.17 TOPS/W, 150fps Accelerator for Multi-Face Detection and Alignment.

[BibT_eX]

[DOI]

Proceedings of the 56th Annual Design Automation Conference 2019, 2019

A General Pattern-Based Dynamic Compilation Framework for Coarse-Grained Reconfigurable Architectures.

[BibT_eX]

[DOI]

Proceedings of the 56th Annual Design Automation Conference 2019, 2019

L-MPC: A LUT based Multi-Level Prediction-Correction Architecture for Accelerating Binary-Weight Hourglass Network.

[BibT_eX]

[DOI]

Proceedings of the 56th Annual Design Automation Conference 2019, 2019

An Asynchronous Reconfigurable SNN Accelerator With Event-Driven Time Step Update.

[BibT_eX]

[DOI]

Proceedings of the IEEE Asian Solid-State Circuits Conference, 2019

Small-Footprint Keyword Spotting with Graph Convolutional Network.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Massive MIMO Detection Algorithm and VLSI Architecture

[BibT_eX]

[DOI]

Leibo Liu

Guiqiang Peng

Shaojun Wei

Springer, ISBN: 978-981-13-6361-0, 2019

2018

Bit-Level Disturbance-Aware Memory Partitioning for Parallel Data Access for MLC STT-RAM.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2018

Algorithm and Architecture of a Low-Complexity and High-Parallelism Preprocessing-Based K -Best Detector for Large-Scale MIMO Systems.

[BibT_eX]

[DOI]

IEEE Trans. Signal Process., 2018

Triggered-Issuance and Triggered-Execution: A Control Paradigm to Minimize Pipeline Stalls in Distributed Controlled Coarse-Grained Reconfigurable Arrays.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2018

Stress-Aware Loops Mapping on CGRAs with Dynamic Multi-Map Reconfiguration.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2018

A 1.58 Gbps/W 0.40 Gbps/mm2 ASIC Implementation of MMSE Detection for $128\times 8~64$ -QAM Massive MIMO in 65 nm CMOS.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. I Regul. Pap., 2018

A Fast and Power-Efficient Hardware Architecture for Visual Feature Detection in Affine-SIFT.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. I Regul. Pap., 2018

HReA: An Energy-Efficient Embedded Dynamically Reconfigurable Fabric for 13-Dwarfs Processing.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. II Express Briefs, 2018

Memory Partitioning for Parallel Multipattern Data Access in Multiple Data Arrays.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018

GNA: Reconfigurable and Efficient Architecture for Generative Network Acceleration.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018

DRMaSV: Enhanced Capability Against Hardware Trojans in Coarse Grained Reconfigurable Architectures.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018

CDPM: Context-Directed Pattern Matching Prefetching to Improve Coarse-Grained Reconfigurable Array Performance.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018

Anole: A Highly Efficient Dynamically Reconfigurable Crypto-Processor for Symmetric-Key Algorithms.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018

A High Energy Efficient Reconfigurable Hybrid Neural Network Processor for Deep Learning Applications.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, 2018

Optimization of Softmax Layer in Deep Neural Network Using Integral Stochastic Computation.

[BibT_eX]

[DOI]

J. Low Power Electron., 2018

FP-BNN: Binarized neural network on FPGA.

[BibT_eX]

[DOI]

Neurocomputing, 2018

Breaking the Synchronization Bottleneck with Reconfigurable Transactional Execution.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2018

Multi-Bank Memory Aware Force Directed Scheduling for High-Level Synthesis.

[BibT_eX]

[DOI]

IEEE Access, 2018

A 141 UW, 2.46 PJ/Neuron Binarized Convolutional Neural Network Based Self-Learning Speech Recognition Processor in 28NM CMOS.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Symposium on VLSI Circuits, 2018

An Ultra-High Energy-Efficient Reconfigurable Processor for Deep Neural Networks with Binary/Ternary Weights in 28NM CMOS.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Symposium on VLSI Circuits, 2018

An Energy Efficient JPEG Encoder with Neural Network Based Approximation and Near-Threshold Computing.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Circuits and Systems, 2018

Bit-width Adaptive Accelerator Design for Convolution Neural Network.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Circuits and Systems, 2018

RANA: Towards Efficient Neural Acceleration with Refresh-Optimized Embedded DRAM.

[BibT_eX]

[DOI]

Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

Efficient Hardware Architecture of Softmax Layer in Deep Neural Network.

[BibT_eX]

[DOI]

Proceedings of the 23rd IEEE International Conference on Digital Signal Processing, 2018

An efficient kernel transformation architecture for binary- and ternary-weight neural network inference.

[BibT_eX]

[DOI]

Proceedings of the 55th Annual Design Automation Conference, 2018

LCP: a layer clusters paralleling mapping method for accelerating inception and residual networks on FPGA.

[BibT_eX]

[DOI]

Proceedings of the 55th Annual Design Automation Conference, 2018

A Full Multicast Reconfigurable Non-blocking Permutation Network.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, 2018

A 2.69 Mbps/mW 1.09 Mbps/kGE Conjugate Gradient-based MMSE Detector for 64-QAM 128×8 Massive MIMO Systems.

[BibT_eX]

[DOI]

Proceedings of the IEEE Asian Solid-State Circuits Conference, 2018

An Asynchronous Energy-Efficient CNN Accelerator with Reconfigurable Architecture.

[BibT_eX]

[DOI]

Proceedings of the IEEE Asian Solid-State Circuits Conference, 2018

A 4K×2K@60fps Multi-format Multi-function Display Processor for High Perceptual Quality.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Asia Pacific Conference on Circuits and Systems, 2018

2017

Deep Convolutional Neural Network Architecture With Reconfigurable Computation Patterns.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2017

Low-Computing-Load, High-Parallelism Detection Method Based on Chebyshev Iteration for Massive MIMO Systems With VLSI Architecture.

[BibT_eX]

[DOI]

IEEE Trans. Signal Process., 2017

Conflict-Free Loop Mapping for Coarse-Grained Reconfigurable Architecture with Multi-Bank Memory.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2017

CIACP: A Correlation- and Iteration- Aware Cache Partitioning Mechanism to Improve Performance of Multiple Coarse-Grained Reconfigurable Arrays.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2017

A Multi-Objective Model Oriented Mapping Approach for NoC-based Computing Systems.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2017

Exploration of Benes Network in Cryptographic Processors: A Random Infection Countermeasure for Block Ciphers Against Fault Attacks.

[BibT_eX]

[DOI]

IEEE Trans. Inf. Forensics Secur., 2017

PMCC: Fast and Accurate System-Level Power Modeling for Processors on Heterogeneous SoC.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. II Express Briefs, 2017

An AdaBoost-Based Face Detection System Using Parallel Configurable Architecture With Optimized Computation.

[BibT_eX]

[DOI]

IEEE Syst. J., 2017

Implementation of in-loop filter for HEVC decoder on reconfigurable processor.

[BibT_eX]

[DOI]

IET Image Process., 2017

Reconfigurable VLSI Architecture for Real-Time 2D-to-3D Conversion.

[BibT_eX]

[DOI]

IEEE Access, 2017

Multi-CNN and decision tree based driving behavior evaluation.

[BibT_eX]

[DOI]

Proceedings of the Symposium on Applied Computing, 2017

AEPE: An area and power efficient RRAM crossbar-based accelerator for deep CNNs.

[BibT_eX]

[DOI]

Proceedings of the IEEE 6th Non-Volatile Memory Systems and Applications Symposium, 2017

DFGNet: Mapping dataflow graph onto CGRA by a deep learning approach.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Circuits and Systems, 2017

Memory fartitioning-based modulo scheduling for high-level synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Circuits and Systems, 2017

Aggressive Pipelining of Irregular Applications on Reconfigurable Hardware.

[BibT_eX]

[DOI]

Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

A Power Efficient Architecture with Optimized Parallel Memory Accessing for Feature Generation.

[BibT_eX]

[DOI]

Proceedings of the on Great Lakes Symposium on VLSI 2017, 2017

Learning Convolutional Neural Networks for Data-Flow Graph Mapping on Spatial Programmable Architectures (Abstract Only).

[BibT_eX]

[DOI]

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017

Joint Modulo Scheduling and Memory Partitioning with Multi-Bank Memory for High-Level Synthesis (Abstract Only).

[BibT_eX]

[DOI]

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017

Bit-Width Based Resource Partitioning for CNN Acceleration on FPGA.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2017

Disturbance Aware Memory Partitioning for Parallel Data Access in STT-RAM.

[BibT_eX]

[DOI]

Shouyi Yin

Zhicong Xie

Shaojun Wei

Proceedings of the 54th Annual Design Automation Conference, 2017

A 700fps Optimized Coarse-to-Fine Shape Searching Based Hardware Accelerator for Face Alignment.

[BibT_eX]

[DOI]

Proceedings of the 54th Annual Design Automation Conference, 2017

A Fast and Power Efficient Architecture to Parallelize LSTM based RNN for Cognitive Intelligence Applications.

[BibT_eX]

[DOI]

Peng Ouyang

Shouyi Yin

Shaojun Wei

Proceedings of the 54th Annual Design Automation Conference, 2017

Minimizing Pipeline Stalls in Distributed-Controlled Coarse-Grained Reconfigurable Arrays with Triggered Instruction Issue and Execution.

[BibT_eX]

[DOI]

Proceedings of the 54th Annual Design Automation Conference, 2017

Stress-Aware Loops Mapping on CGRAs with Considering NBTI Aging Effect.

[BibT_eX]

[DOI]

Jiangyuan Gu

Shouyi Yin

Shaojun Wei

Proceedings of the 54th Annual Design Automation Conference, 2017

Energy-aware loops mapping on multi-vdd CGRAs without performance degradation.

[BibT_eX]

[DOI]

Proceedings of the 22nd Asia and South Pacific Design Automation Conference, 2017

2016

Trigger-Centric Loop Mapping on CGRAs.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2016

Memory-Aware Loop Mapping on Coarse-Grained Reconfigurable Architectures.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2016

CWFP: Novel Collective Writeback and Fill Policy for Last-Level DRAM Cache.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2016

A Configurable Parallel Hardware Architecture for Efficient Integral Histogram Image Computing.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2016

Improving Nested Loop Pipelining on Coarse-Grained Reconfigurable Architectures.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2016

Exploiting Parallelism of Imperfect Nested Loops on Coarse-Grained Reconfigurable Architectures.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2016

TLIA: Efficient Reconfigurable Architecture for Control-Intensive Kernels with Triggered-Long-Instructions.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2016

Against Double Fault Attacks: Injection Effort Model, Space and Time Randomization Based Countermeasures for Reconfigurable Array Architecture.

[BibT_eX]

[DOI]

IEEE Trans. Inf. Forensics Secur., 2016

A 135-frames/s 1080p 87.5-mW Binary-Descriptor-Based Image Feature Extraction Accelerator.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., 2016

A Fast and Power-Efficient Memory-Centric Architecture for Affine Computation.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. II Express Briefs, 2016

Joint Modulo Scheduling and Vdd Assignment for Loop Mapping on Dual- Vdd CGRAs.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2016

A pipelined area-efficient and high-speed reconfigurable processor for floating-point FFT/IFFT and DCT/IDCT computations.

[BibT_eX]

[DOI]

Microelectron. J., 2016

Temperature-aware multi-application mapping on network-on-chip based many-core systems.

[BibT_eX]

[DOI]

Microprocess. Microsystems, 2016

An Implementation of Multiple-Standard Video Decoder on a Mixed-Grained Reconfigurable Computing Platform.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2016

A fast face detection architecture for auto-focus in smart-phones and digital cameras.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2016

A Coarse-Grained Reconfigurable Architecture for Compute-Intensive MapReduce Acceleration.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2016

Energy management on DVS based coarse-grained reconfigurable platform.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Symposium on Nanoscale Architectures, 2016

Temperature-aware task scheduling heuristics on Network-on-Chips.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Circuits and Systems, 2016

Joint loop mapping and data placement for coarse-grained reconfigurable architecture with multi-bank memory.

[BibT_eX]

[DOI]

Proceedings of the 35th International Conference on Computer-Aided Design, 2016

Multibank memory optimization for parallel data access in multiple data arrays.

[BibT_eX]

[DOI]

Proceedings of the 35th International Conference on Computer-Aided Design, 2016

Data cache prefetching via context directed pattern matching for coarse-grained reconfigurable arrays.

[BibT_eX]

[DOI]

Proceedings of the 53rd Annual Design Automation Conference, 2016

Exploiting parallelism of imperfect nested loops with sibling inner loops on coarse-grained reconfigurable architectures.

[BibT_eX]

[DOI]

Proceedings of the 21st Asia and South Pacific Design Automation Conference, 2016

2015

A Hybrid Reconfigurable Architecture and Design Methods Aiming at Control-Intensive Kernels.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2015

Energy Management on Battery-Powered Coarse-Grained Reconfigurable Platforms.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2015

Optimizing Spatial Mapping of Nested Loop for Coarse-Grained Reconfigurable Architectures.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2015

A Flexible Energy- and Reliability-Aware Application Mapping for NoC-Based Reconfigurable Architectures.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2015

A Low-Latency and Low-Power Hybrid Scheme for On-Chip Networks.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2015

Efficient Fault-Tolerant Topology Reconfiguration Using a Maximum Flow Algorithm.

[BibT_eX]

[DOI]

ACM Trans. Reconfigurable Technol. Syst., 2015

Correction to "An Energy-Efficient Coarse-Grained Reconfigurable Processing Unit for Multiple-Standard Video Decoding".

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2015

An Energy-Efficient Coarse-Grained Reconfigurable Processing Unit for Multiple-Standard Video Decoding.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2015

A real-time time-consistent 2D-to-3D video conversion system using color histogram.

[BibT_eX]

[DOI]

IEEE Trans. Consumer Electron., 2015

A Fast Integral Image Computing Hardware Architecture With High Power and Area Efficiency.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. II Express Briefs, 2015

An Efficient Application Mapping Approach for the Co-Optimization of Reliability, Energy, and Performance in Reconfigurable NoC Architectures.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2015

Fast Traffic Sign Recognition with a Rotation Invariant Binary Pattern Based Feature.

[BibT_eX]

[DOI]

Sensors, 2015

A Novel 2D-to-3D Video Conversion Method Using Time-Coherent Depth Maps.

[BibT_eX]

[DOI]

Sensors, 2015

High-Performance Motion Estimation for Image Sensors with Video Compression.

[BibT_eX]

[DOI]

Sensors, 2015

A 181 GOPS AKAZE Accelerator Employing Discrete-Time Cellular Neural Networks for Real-Time Feature Extraction.

[BibT_eX]

[DOI]

Sensors, 2015

Configuration Approaches to Enhance Computing Efficiency of Coarse-Grained Reconfigurable Array.

[BibT_eX]

[DOI]

J. Circuits Syst. Comput., 2015

Low-Power Loop Parallelization onto CGRA Utilizing Variable Dual VDD.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2015

The Implementation of Texture-Based Video Up-Scaling on Coarse-Grained Reconfigurable Architecture.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2015

Battery-Aware Loop Nests Mapping for CGRAs.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2015

Mapping Multi-Level Loop Nests onto CGRAs Using Polyhedral Optimizations.

[BibT_eX]

[DOI]

IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2015

Exploring partitioning methods for multicast in 3D bufferless Network on Chip.

[BibT_eX]

[DOI]

IEICE Electron. Express, 2015

Mapping of Embedded Applications on Hybrid Networks-on-Chip with Multiple Switching Mechanisms.

[BibT_eX]

[DOI]

IEEE Embed. Syst. Lett., 2015

Reliability-aware mapping for various NoC topologies and routing algorithms under performance constraints.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2015

A Multi-modal 2D + 3D Face Recognition Method with a Novel Local Feature Descriptor.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE Winter Conference on Applications of Computer Vision, 2015

Partitioning Methods for Multicast in Bufferless 3D Network on Chip.

[BibT_eX]

[DOI]

Proceedings of the Computer Engineering and Technology - 19th CCF Conference, 2015

Neural approximating architecture targeting multiple application domains.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Symposium on Circuits and Systems, 2015

Real-time time-consistent 2D-to-3D video conversion based on color histogram.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Consumer Electronics, 2015

Efficient lane detection system based on monocular camera.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Consumer Electronics, 2015

An automatic depth map generation method by image classification.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Consumer Electronics, 2015

Acceleration of Nested Conditionals on CGRAs via Trigger Scheme.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 2015

Cost-Effective Memory Architecture to Achieve Flexible Configuration and Efficient Data Transmission for Coarse-Grained Reconfigurable Array (Abstract Only).

[BibT_eX]

[DOI]

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015

A Novel Composite Method to Accelerate Control Flow on Reconfigurable Architecture (Abstract Only).

[BibT_eX]

[DOI]

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015

A Mixed-Grained Reconfigurable Computing Platform for Multiple-Standard Video Decoding (Abstract Only).

[BibT_eX]

[DOI]

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015

Cooperatively managing dynamic writeback and insertion policies in a last-level DRAM cache.

[BibT_eX]

[DOI]

Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015

Joint affine transformation and loop pipelining for mapping nested loop on CGRAs.

[BibT_eX]

[DOI]

Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015

RNA: a reconfigurable architecture for hardware neural acceleration.

[BibT_eX]

[DOI]

Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015

Acceleration of control flows on reconfigurable architecture with a composite method.

[BibT_eX]

[DOI]

Proceedings of the 52nd Annual Design Automation Conference, 2015

Efficient memory partitioning for parallel data access in multidimensional arrays.

[BibT_eX]

[DOI]

Proceedings of the 52nd Annual Design Automation Conference, 2015

A 127 fps in full hd accelerator based on optimized AKAZE with efficiency and effectiveness for image feature extraction.

[BibT_eX]

[DOI]

Proceedings of the 52nd Annual Design Automation Conference, 2015

A 83fps 1080P resolution 354 mW silicon implementation for computing the improved robust feature in affine space.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE Custom Integrated Circuits Conference, 2015

Scheduling stream programs with improving arithmetic unit usage on NoC-based VLIW multi-core architectures.

[BibT_eX]

[DOI]

Proceedings of the 12th ACM International Conference on Computing Frontiers, 2015

Battery-aware mapping optimization of loop nests for CGRAs.

[BibT_eX]

[DOI]

Proceedings of the 20th Asia and South Pacific Design Automation Conference, 2015

A novel approach using a minimum cost maximum flow algorithm for fault-tolerant topology reconfiguration in NoC architectures.

[BibT_eX]

[DOI]

Proceedings of the 20th Asia and South Pacific Design Automation Conference, 2015

2014

On-Chip Memory Hierarchy in One Coarse-Grained Reconfigurable Architecture to Compress Memory Space and to Reduce Reconfiguration Time and Data-Reference Time.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2014

SimRPU: A Simulation Environment for Reconfigurable Architecture Exploration.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2014

Software/Hardware Parallel Long-Period Random Number Generation Framework Based on the WELL Method.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2014

Compiler-Assisted Leakage- and Temperature- Aware Instruction-Level VLIW Scheduling.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2014

A High-Utilization Scheduling Schemeof Stream Programs on ClusteredVLIW Stream Architectures.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2014

A Multi-Modal Face Recognition Method Using Complete Local Derivative Patterns and Depth Maps.

[BibT_eX]

[DOI]

Sensors, 2014

A 1/2.5 inch VGA 400 fps CMOS Image Sensor With High Sensitivity for Machine Vision.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, 2014

Hybrid circuit-switched network for on-chip communication in large-scale chip-multiprocessors.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2014

MapReduce inspired loop mapping for coarse-grained reconfigurable architecture.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2014

Row-based configuration mechanism for a 2-D processing element array in coarse-grained reconfigurable architecture.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2014

Implementation of AVS Jizhun decoder with HW/SW partitioning on a coarse-grained reconfigurable multimedia system.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2014

Implementation of multi-standard video decoder on a heterogeneous coarse-grained reconfigurable processor.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2014

Optimization of speeded-up robust feature algorithm for hardware implementation.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2014

A fast and robust traffic sign recognition method using ring of RIBP histograms based feature.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE International Conference on Robotics and Biomimetics, 2014

Low-power loop pipelining mapping onto CGRA utilizing variable dual VDD.

[BibT_eX]

[DOI]

Proceedings of the IEEE 57th International Midwest Symposium on Circuits and Systems, 2014

A 65 nm uneven-dual-core SoC based platform for multi-device collaborative computing.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Circuits and Systemss, 2014

A parallel hardware architecture for fast integral image computing.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Circuits and Systemss, 2014

Map-reduce inspired loop parallelization on CGRA.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Circuits and Systemss, 2014

A FAST Extreme Illumination Robust Feature in Affine Space.

[BibT_eX]

[DOI]

Proceedings of the 22nd International Conference on Pattern Recognition, 2014

Configuration approaches to improve computing efficiency of coarse-grained reconfigurable multimedia processor.

[BibT_eX]

[DOI]

Proceedings of the 24th International Conference on Field Programmable Logic and Applications, 2014

Teach Reconfigurable Computing using mixed-grained fabrics based hardware infrastructure.

[BibT_eX]

[DOI]

Proceedings of the IEEE Frontiers in Education Conference, 2014

Exploiting Outer Loop Parallelism of Nested Loop on Coarse-Grained Reconfigurable Architectures.

[BibT_eX]

[DOI]

Proceedings of the 22nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2014

Extending lifetime of battery-powered coarse-grained reconfigurable computing platforms.

[BibT_eX]

[DOI]

Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014

2013

Low-Power Reconfigurable Processor Utilizing Variable Dual VDD.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. II Express Briefs, 2013

A fault tolerant NoC architecture using quad-spare mesh topology and dynamic reconfiguration.

[BibT_eX]

[DOI]

J. Syst. Archit., 2013

Energy-efficient stream task scheduling scheme for embedded multimedia applications on multi-issued stream architectures.

[BibT_eX]

[DOI]

J. Syst. Archit., 2013

Calibration Techniques for Low-Power Wireless Multiband Transceiver.

[BibT_eX]

[DOI]

Int. J. Distributed Sens. Networks, 2013

Concurrent Detection and Recognition of Individual Object Based on Colour and p-SIFT Features.

[BibT_eX]

[DOI]

IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2013

Battery-Aware Task Mapping for Coarse-Grained Reconfigurable Architecture.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2013

Affine Transformations for Communication and Reconfiguration Optimization of Mapping Loop Nests on CGRAs.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2013

The Organization of On-Chip Data Memory in One Coarse-Grained Reconfigurable Architecture.

[BibT_eX]

[DOI]

IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2013

Parallelization of Computing-Intensive Tasks of SIFT Algorithm on a Reconfigurable Architecture System.

[BibT_eX]

[DOI]

IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2013

Hardware Software Co-design of H.264 Baseline Encoder on Coarse-Grained Dynamically Reconfigurable Computing System-on-Chip.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2013

An efficient VLSI architecture of speeded-up robust feature extraction for high resolution and high frame rate video.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2013

Hierarchical representation of on-chip context to reduce reconfiguration time and implementation area for coarse-grained reconfigurable architecture.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2013

ReSSIM: a mixed-level simulator for dynamic coarse-grained reconfigurable processor.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2013

SPC: An Approach to Guarantee Performance in Cost Oriented Mapping Algorithm for NoC Architectures.

[BibT_eX]

[DOI]

Proceedings of the IEEE Eighth International Conference on Networking, 2013

Battery-Aware MAC Analytical Modeling for Extending Lifetime of Low Duty-Cycled Wireless Sensor Network.

[BibT_eX]

[DOI]

Proceedings of the IEEE Eighth International Conference on Networking, 2013

Compiler-assisted leakage energy optimization of media applications on stream architectures.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Quality Electronic Design, 2013

An inductive-coupling interconnected application-specific 3D NoC design.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013), 2013

A VLSI architecture for enhancing the fault tolerance of NoC using quad-spare mesh topology and dynamic reconfiguration.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013), 2013

Affine transformations for communication and reconfiguration optimization of loops on CGRAs.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013), 2013

Implementation of multi-standard video decoding algorithms on a coarse-grained reconfigurable multimedia processor.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013), 2013

Mapping IDCT of MPEG2 on Coarse-Grained Reconfigurable Array for Matching 1080p Video Decoding.

[BibT_eX]

[DOI]

Proceedings of the Advanced Technologies, Embedded and Multimedia for Human-centric Computing, 2013

Polyhedral model based mapping optimization of loop nests for CGRAs.

[BibT_eX]

[DOI]

Proceedings of the 50th Annual Design Automation Conference 2013, 2013

SURFEX: A 57fps 1080P resolution 220mW silicon implementation for simplified speeded-up robust feature with 65nm process.

[BibT_eX]

[DOI]

Proceedings of the IEEE 2013 Custom Integrated Circuits Conference, 2013

An energy-efficient coarse-grained dynamically reconfigurable fabric for multiple-standard video decoding applications.

[BibT_eX]

[DOI]

Proceedings of the IEEE 2013 Custom Integrated Circuits Conference, 2013

A power-efficient network-on-chip for multi-core stream processors.

[BibT_eX]

[DOI]

Proceedings of the IEEE 10th International Conference on ASIC, 2013

2012

Configuration Context Reduction for Coarse-Grained Reconfigurable Architecture.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2012

Hybrid Wired/Wireless On-Chip Network Design for Application-Specific SoC.

[BibT_eX]

[DOI]

IEICE Trans. Electron., 2012

Multi-Battery Scheduling for Battery-Powered DVS Systems.

[BibT_eX]

[DOI]

IEICE Trans. Commun., 2012

Mapping Optimization of Affine Loop Nests for Reconfigurable Computing Architecture.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2012

Reconfiguration Process Optimization of Dynamically Coarse Grain Reconfigurable Architecture for Multimedia Applications.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2012

Reducing configuration contexts for coarse-grained reconfigurable architecture.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Symposium on Circuits and Systems, 2012

Low Power Schedule Algorithm for Embedded Multimedia Applications Basing on Imagine-L Processor.

[BibT_eX]

[DOI]

Proceedings of the 14th IEEE International Conference on High Performance Computing and Communication & 9th IEEE International Conference on Embedded Software and Systems, 2012

Software/hardware framework for generating parallel Gaussian random numbers based on the Monty Python method.

[BibT_eX]

[DOI]

Proceedings of the 2012 International Conference on Field-Programmable Technology, 2012

An Efficient Hardware Random Number Generator Based on the MT Method.

[BibT_eX]

[DOI]

Proceedings of the 12th IEEE International Conference on Computer and Information Technology, 2012

2011

A high efficient baseband transceiver for IEEE 802.15.4 LR-WPAN systems.

[BibT_eX]

[DOI]

Proceedings of the 2011 IEEE 9th International Conference on ASIC, 2011

Performance evaluation modeling for reconfigurable processor.

[BibT_eX]

[DOI]

Proceedings of the 2011 IEEE 9th International Conference on ASIC, 2011

An energy efficiency task scheduling algorithm for streaming applications on multiprocessor SoC.

[BibT_eX]

[DOI]

Shan Cao

Zhaolin Li

Shaojun Wei

Proceedings of the 2011 IEEE 9th International Conference on ASIC, 2011

2010

A Cycle-Accurate Simulator for a Reconfigurable Multi-Media System.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2010

CropNET: A Wireless Multimedia Sensor Network for Agricultural Monitoring.

[BibT_eX]

[DOI]

IEICE Trans. Commun., 2010

Parallelization of Computing-Intensive Tasks of the H.264 High Profile Decoding Algorithm on a Reconfigurable Multimedia System.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2010

A reconfigurable multi-processor SoC for media applications.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Circuits and Systems (ISCAS 2010), May 30, 2010

A VLSI design of sensor node for wireless image sensor network.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Circuits and Systems (ISCAS 2010), May 30, 2010

Parallel implementation of computing-intensive decoding algorithms of H.264 on reconfigurable SoC.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Circuits and Systems (ISCAS 2010), May 30, 2010

Reconfigurable computing - evolution of Von Neumann architecture.

[BibT_eX]

[DOI]

Shaojun Wei

Proceedings of the International Conference on Field-Programmable Technology, 2010

Battery aware tasks allocating algorithm for multi-battery operated system.

[BibT_eX]

[DOI]

Proceedings of the IEEE Asia Pacific Conference on Circuits and Systems, 2010

Mixed-level modeling for network on chip infrastructure in SoC design.

[BibT_eX]

[DOI]

Proceedings of the IEEE Asia Pacific Conference on Circuits and Systems, 2010

2009

Compiler Framework for Reconfigurable Computing Architecture.

[BibT_eX]

[DOI]

IEICE Trans. Electron., 2009

Buffer planning for application-specific networks-on-chip design.

[BibT_eX]

[DOI]

Shouyi Yin

Leibo Liu

Shaojun Wei

Sci. China Ser. F Inf. Sci., 2009

2008

Key technologies of system on chip design.

[BibT_eX]

[DOI]

Shaojun Wei

Sci. China Ser. F Inf. Sci., 2008

2007

Battery-Aware Variable Voltage Scheduling on Real-Time Multiprocessor Platforms.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Circuits and Systems (ISCAS 2007), 2007

2006

On handling the fixed-outline constraints of floorplanning using less flexibility first principles.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Circuits and Systems (ISCAS 2006), 2006

2003

Emerging markets: design goes global.

[BibT_eX]

[DOI]

Proceedings of the 40th Design Automation Conference, 2003

Shaojun Wei

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...