Wenqi Lou

Orcid: 0000-0002-2240-6672

According to our database1, Wenqi Lou authored at least 38 papers between 2017 and 2026.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
LORA: A Latency-Oriented Recurrent Architecture for Large Language Model on Multi-FPGA Platform With Communication Optimization.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., July, 2026

UniSparTa: A Unified Sparse Tensor Program Tuning Framework.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., July, 2026

Scheduling Cause-Effect Chains without Timing Anomalies in End-to-End Latency.
CoRR, April, 2026

TETRIS: A Novel FPGA Virtualization Framework for Fine-grained Sharing via Hierarchical Reconfiguration.
ACM Trans. Reconfigurable Technol. Syst., March, 2026

Hermes: A Unified High-Performance NTT Architecture with Hybrid Dataflow.
CoRR, March, 2026

MoE-Sched: Enabling Efficient FPGA Deployment of Mixture-of-Experts Vision Transformers via Coordinated Scheduling.
IEEE Trans. Very Large Scale Integr. Syst., January, 2026

UniCoX: A Unified Cost Model for Tensorized Program Tuning Across Ubiquitous Accelerators.
IEEE Trans. Computers, January, 2026

A Timing-Anomaly Free Dynamic Scheduling on Heterogeneous Systems.
CoRR, January, 2026

Reducing End-to-End Latency of Cause-Effect Chains with Shared Cache Analysis.
CoRR, January, 2026

Window-Diffusion: Accelerating Diffusion Language Model Inference with Windowed Token Pruning and Caching.
CoRR, January, 2026

Crystal-KV: Efficient KV Cache Management for Chain-of-Thought LLMs via Answer-First Principle.
CoRR, January, 2026

CloserToMe: A Unified Framework for Accurate and Transferable Latency Prediction Across Heterogeneous Devices.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
ActionFlow: A Pipelined Action Acceleration for Vision Language Models on Edge.
CoRR, December, 2025

Picasso: Analyzing Prompt Design for Text-to-Image Generative Diffusion Models from a Temporal-Spatial Perspective.
ACM Trans. Multim. Comput. Commun. Appl., November, 2025

QLlama: An FPGA-Based Microscaling Quantization Accelerator for Energy-Efficient Llama2 Inference.
IEEE Embed. Syst. Lett., October, 2025

Optimizing utilization in logical execution time system with preserved externally-observable timed I/O semantics.
J. Syst. Archit., 2025

TSI: A Time-Semantic Instruction Set for Deterministic Data-Flow Execution in Real-Time Embedded Systems.
Proceedings of the IEEE Real-Time Systems Symposium, 2025

UbiMoE: A Ubiquitous Mixture-of-Experts Vision Transformer Accelerator With Hybrid Computation Pattern on FPGA.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2025

Automated FPGA Accelerator Generation Framework for Transformers with Dataflow Optimization.
Proceedings of the 54th International Conference on Parallel Processing, 2025

Spectral Enhanced Tuning: An Efficient Plug-and-Play Framework for Frequency-Aware Dehazing.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2025

CoQMoE: Co-Designed Quantization and Computation Orchestration for Mixture-of-Experts Vision Transformer on FPGA.
Proceedings of the Euro-Par 2025: Parallel Processing, 2025

UniCoS: A Unified Neural and Accelerator Co-Search Framework for CNNs and ViTs.
Proceedings of the 62nd ACM/IEEE Design Automation Conference, 2025

2024
FlexBCM: Hybrid Block-Circulant Neural Network and Accelerator Co-Search on FPGAs.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., November, 2024

Unleashing Network/Accelerator Co-Exploration Potential on FPGAs: A Deeper Joint Search.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., October, 2024

MFNAS: Multi-fidelity Exploration in Neural Architecture Search with Stable Zero-Shot Proxy.
Proceedings of the PRICAI 2024: Trends in Artificial Intelligence, 2024

Fine-Grained Shared Cache Interference Analysis Using Basic Block's Execution Time.
Proceedings of the 42nd IEEE International Conference on Computer Design, 2024

UniCoMo: A Unified Learning-Based Cost Model for Tensorized Program Tuning.
Proceedings of the 42nd IEEE International Conference on Computer Design, 2024

AutoSparse: A Source-to-Source Format and Schedule Auto- Tuning Framework for Sparse Tensor Program.
Proceedings of the 42nd IEEE International Conference on Computer Design, 2024

Enhancing Long Sequence Input Processing in FPGA-Based Transformer Accelerators through Attention Fusion.
Proceedings of the Great Lakes Symposium on VLSI 2024, 2024

Beyond Training: A Zero-Shot Framework to Neural Architecture and Accelerator Co-Exploration.
Proceedings of the IEEE International Conference on Cluster Computing, 2024

2023
hAP: A Spatial-von Neumann Heterogeneous Automata Processor with Optimized Resource and IO Overhead on FPGA.
Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2023

NAF: Deeper Network/Accelerator Co-Exploration for Customizing CNNs on FPGA.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2023

2022
OctCNN: A High Throughput FPGA Accelerator for CNNs Using Octave Convolution Algorithm.
IEEE Trans. Computers, 2022

TCL-Net: A Lightweight and Efficient Dehazing Network with Frequency-Domain Fusion and Multi-Angle Attention.
Proceedings of the Computer Vision - ACCV 2024, 2022

2021
Neural Network Instruction Set Extension and Code Mapping Mechanism.
Int. J. Softw. Informatics, 2021

2020
OctCNN: An Energy-Efficient FPGA Accelerator for CNNs using Octave Convolution Algorithm.
Proceedings of the IEEE International Conference on Cluster Computing, 2020

2019
RV-CNN: Flexible and Efficient Instruction Set for CNNs Based on RISC-V Processors.
Proceedings of the Advanced Parallel Processing Technologies, 2019

2017
Reconfigurable Hardware Accelerators: Opportunities, Trends, and Challenges.
CoRR, 2017


  Loading...