Jun Liu

Orcid: 0009-0003-8280-9072

Affiliations:
  • Shanghai Jiao Tong University, Qingyuan Research Institute, Shanghai, China


According to our database1, Jun Liu authored at least 20 papers between 2021 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
DynSplit-KV: Dynamic Semantic Splitting for KVCache Compression in Efficient Long-Context LLM Inference.
CoRR, February, 2026

2025
TB-STC: Transposable Block-wise N: M Structured Sparse Tensor Core.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2025

FlightVGM: Efficient Video Generation Model Inference with Online Sparsification and Hybrid Precision on FPGAs.
Proceedings of the 2025 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2025

Harnessing Conventional Video Processing Insights for Emerging 3D Video Generation Models: A Comprehensive Attention-aware Way.
Proceedings of the 62nd ACM/IEEE Design Automation Conference, 2025

SG-Filter: Enhancing Similar Text Retrieval via Hierarchical Summarized-Semantic Index and Adaptive Filtering.
Proceedings of the 34th ACM International Conference on Information and Knowledge Management, 2025

Accelerator for LLM-Enhanced GNN with Product Quantization and Unified Indexing.
Proceedings of the 30th Asia and South Pacific Design Automation Conference, 2025

ViDA: Video Diffusion Transformer Acceleration with Differential Approximation and Adaptive Dataflow.
Proceedings of the 30th Asia and South Pacific Design Automation Conference, 2025

2024
Efficient and Effective Retrieval of Dense-Sparse Hybrid Vectors using Graph-based Approximate Nearest Neighbor Search.
CoRR, 2024

Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective.
CoRR, 2024

FlashDecoding++: Faster Large Language Model Inference with Asynchronization, Flat GEMM Optimization, and Heuristics.
Proceedings of the Seventh Annual Conference on Machine Learning and Systems, 2024

Fast and Efficient 2-bit LLM Inference on GPU: 2/4/16-bit in a Weight Matrix with Asynchronous Dequantization.
Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design, 2024

MARCA: Mamba Accelerator with Reconfigurable Architecture.
Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design, 2024

FlightLLM: Efficient Large Language Model Inference with a Complete Mapping Flow on FPGAs.
Proceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2024

2023
FlashDecoding++: Faster Large Language Model Inference on GPUs.
CoRR, 2023

DF-GAS: a Distributed FPGA-as-a-Service Architecture towards Billion-Scale Graph-based Approximate Nearest Neighbor Search.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

TSTC: Two-Level Sparsity Tensor Core Enabling both Algorithm Flexibility and Hardware Efficiency.
Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 2023

Processing-In-Hierarchical-Memory Architecture for Billion-Scale Approximate Nearest Neighbor Search.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

2022
A Unified FPGA Virtualization Framework for General-Purpose Deep Neural Networks in the Cloud.
ACM Trans. Reconfigurable Technol. Syst., 2022

Optimizing Graph-based Approximate Nearest Neighbor Search: Stronger and Smarter.
Proceedings of the 23rd IEEE International Conference on Mobile Data Management, 2022

2021
3M-AI: A Multi-task and Multi-core Virtualization Framework for Multi-FPGA AI Systems in the Cloud.
Proceedings of the FPGA '21: The 2021 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Virtual Event, USA, February 28, 2021


  Loading...