Wenqi Lou

Orcid: 0000-0002-2240-6672

According to our database¹, Wenqi Lou authored at least 38 papers between 2017 and 2026.

Collaborative distances:

Dijkstra number² of five.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

LORA: A Latency-Oriented Recurrent Architecture for Large Language Model on Multi-FPGA Platform With Communication Optimization.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., July, 2026

UniSparTa: A Unified Sparse Tensor Program Tuning Framework.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., July, 2026

Scheduling Cause-Effect Chains without Timing Anomalies in End-to-End Latency.

[BibT_eX]

[DOI]

CoRR, April, 2026

TETRIS: A Novel FPGA Virtualization Framework for Fine-grained Sharing via Hierarchical Reconfiguration.

[BibT_eX]

[DOI]

ACM Trans. Reconfigurable Technol. Syst., March, 2026

Hermes: A Unified High-Performance NTT Architecture with Hybrid Dataflow.

[BibT_eX]

[DOI]

CoRR, March, 2026

MoE-Sched: Enabling Efficient FPGA Deployment of Mixture-of-Experts Vision Transformers via Coordinated Scheduling.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., January, 2026

UniCoX: A Unified Cost Model for Tensorized Program Tuning Across Ubiquitous Accelerators.

[BibT_eX]

[DOI]

IEEE Trans. Computers, January, 2026

A Timing-Anomaly Free Dynamic Scheduling on Heterogeneous Systems.

[BibT_eX]

[DOI]

CoRR, January, 2026

Reducing End-to-End Latency of Cause-Effect Chains with Shared Cache Analysis.

[BibT_eX]

[DOI]

CoRR, January, 2026

Window-Diffusion: Accelerating Diffusion Language Model Inference with Windowed Token Pruning and Caching.

[BibT_eX]

[DOI]

CoRR, January, 2026

Crystal-KV: Efficient KV Cache Management for Chain-of-Thought LLMs via Answer-First Principle.

[BibT_eX]

[DOI]

CoRR, January, 2026

CloserToMe: A Unified Framework for Accurate and Transferable Latency Prediction Across Heterogeneous Devices.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

ActionFlow: A Pipelined Action Acceleration for Vision Language Models on Edge.

[BibT_eX]

[DOI]

CoRR, December, 2025

Picasso: Analyzing Prompt Design for Text-to-Image Generative Diffusion Models from a Temporal-Spatial Perspective.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., November, 2025

QLlama: An FPGA-Based Microscaling Quantization Accelerator for Energy-Efficient Llama2 Inference.

[BibT_eX]

[DOI]

IEEE Embed. Syst. Lett., October, 2025

Optimizing utilization in logical execution time system with preserved externally-observable timed I/O semantics.

[BibT_eX]

[DOI]

J. Syst. Archit., 2025

TSI: A Time-Semantic Instruction Set for Deterministic Data-Flow Execution in Real-Time Embedded Systems.

[BibT_eX]

[DOI]

Proceedings of the IEEE Real-Time Systems Symposium, 2025

UbiMoE: A Ubiquitous Mixture-of-Experts Vision Transformer Accelerator With Hybrid Computation Pattern on FPGA.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Circuits and Systems, 2025

Automated FPGA Accelerator Generation Framework for Transformers with Dataflow Optimization.

[BibT_eX]

[DOI]

Proceedings of the 54th International Conference on Parallel Processing, 2025

Spectral Enhanced Tuning: An Efficient Plug-and-Play Framework for Frequency-Aware Dehazing.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2025

CoQMoE: Co-Designed Quantization and Computation Orchestration for Mixture-of-Experts Vision Transformer on FPGA.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2025: Parallel Processing, 2025

UniCoS: A Unified Neural and Accelerator Co-Search Framework for CNNs and ViTs.

[BibT_eX]

[DOI]

Proceedings of the 62nd ACM/IEEE Design Automation Conference, 2025

2024

FlexBCM: Hybrid Block-Circulant Neural Network and Accelerator Co-Search on FPGAs.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., November, 2024

Unleashing Network/Accelerator Co-Exploration Potential on FPGAs: A Deeper Joint Search.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., October, 2024

MFNAS: Multi-fidelity Exploration in Neural Architecture Search with Stable Zero-Shot Proxy.

[BibT_eX]

[DOI]

Proceedings of the PRICAI 2024: Trends in Artificial Intelligence, 2024

Fine-Grained Shared Cache Interference Analysis Using Basic Block's Execution Time.

[BibT_eX]

[DOI]

Proceedings of the 42nd IEEE International Conference on Computer Design, 2024

UniCoMo: A Unified Learning-Based Cost Model for Tensorized Program Tuning.

[BibT_eX]

[DOI]

Proceedings of the 42nd IEEE International Conference on Computer Design, 2024

AutoSparse: A Source-to-Source Format and Schedule Auto- Tuning Framework for Sparse Tensor Program.

[BibT_eX]

[DOI]

Proceedings of the 42nd IEEE International Conference on Computer Design, 2024

Enhancing Long Sequence Input Processing in FPGA-Based Transformer Accelerators through Attention Fusion.

[BibT_eX]

[DOI]

Proceedings of the Great Lakes Symposium on VLSI 2024, 2024

Beyond Training: A Zero-Shot Framework to Neural Architecture and Accelerator Co-Exploration.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Cluster Computing, 2024

2023

hAP: A Spatial-von Neumann Heterogeneous Automata Processor with Optimized Resource and IO Overhead on FPGA.

[BibT_eX]

[DOI]

Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2023

NAF: Deeper Network/Accelerator Co-Exploration for Customizing CNNs on FPGA.

[BibT_eX]

[DOI]

Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2023

2022

OctCNN: A High Throughput FPGA Accelerator for CNNs Using Octave Convolution Algorithm.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2022

TCL-Net: A Lightweight and Efficient Dehazing Network with Frequency-Domain Fusion and Multi-Angle Attention.

[BibT_eX]

[DOI]

Cheng Tang

Wenqi Lou

Proceedings of the Computer Vision - ACCV 2024, 2022

2021

Neural Network Instruction Set Extension and Code Mapping Mechanism.

[BibT_eX]

[DOI]

Int. J. Softw. Informatics, 2021

2020

OctCNN: An Energy-Efficient FPGA Accelerator for CNNs using Octave Convolution Algorithm.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Cluster Computing, 2020

2019

RV-CNN: Flexible and Efficient Instruction Set for CNNs Based on RISC-V Processors.

[BibT_eX]

[DOI]

Proceedings of the Advanced Parallel Processing Technologies, 2019

2017

Reconfigurable Hardware Accelerators: Opportunities, Trends, and Challenges.

[BibT_eX]

[DOI]

CoRR, 2017

Wenqi Lou

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...