Shijie Cao

Orcid: 0009-0000-2001-3763

According to our database1, Shijie Cao authored at least 34 papers between 2013 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Less Is More: Training-Free Sparse Attention with Global Locality for Efficient Reasoning.
CoRR, August, 2025

ReflecSched: Solving Dynamic Flexible Job-Shop Scheduling via LLM-Powered Hierarchical Reflection.
CoRR, August, 2025

Fine-Grained Structured Sparse Computing for FPGA-Based AI Inference.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., July, 2025

Data Efficacy for Language Model Training.
CoRR, June, 2025

SeerAttention-R: Sparse Attention Adaptation for Long Reasoning.
CoRR, June, 2025

Rectified Sparse Attention.
CoRR, June, 2025

BitDecoding: Unlocking Tensor Cores for Long-Context LLMs Decoding with Low-Bit KV Cache.
CoRR, March, 2025

ROMA: a Read-Only-Memory-based Accelerator for QLoRA-based On-Device LLM.
CoRR, March, 2025

CoCoEvo: Co-Evolution of Programs and Test Cases to Enhance Code Generation.
CoRR, February, 2025

Dissecting Bit-Level Scaling Laws in Quantizing Vision Generative Models.
CoRR, January, 2025

LUT Tensor Core: A Software-Hardware Co-Design for LUT-Based Low-Bit LLM Inference.
Proceedings of the 52nd Annual International Symposium on Computer Architecture, 2025

T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge.
Proceedings of the Twentieth European Conference on Computer Systems, 2025

Bitnet.cpp: Efficient Edge Inference for Ternary LLMs.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024
Automating Energy-Efficient GPU Kernel Generation: A Fast Search-Based Compilation Approach.
CoRR, 2024

SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs.
CoRR, 2024

LUT Tensor Core: Lookup Table Enables Efficient Low-Bit LLM Inference Acceleration.
CoRR, 2024

Ladder: Enabling Efficient Low-Precision Deep Learning Computing through Hardware-aware Tensor Transformation.
Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation, 2024

Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

AFPQ: Asymmetric Floating Point Quantization for LLMs.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-Distillation.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
Inverse model and adaptive neighborhood search based cooperative optimizer for energy-efficient distributed flexible job shop scheduling.
Swarm Evol. Comput., December, 2023

Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models.
CoRR, 2023

NN-Stretch: Automatic Neural Network Branching for Parallel Inference on Heterogeneous Multi-Processors.
Proceedings of the 21st Annual International Conference on Mobile Systems, 2023

Efficient GPU Kernels for N: M-Sparse Weights in Deep Learning.
Proceedings of the Sixth Conference on Machine Learning and Systems, 2023

Accurate and Structured Pruning for Efficient Automatic Speech Recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Adam Accumulation to Reduce Memory Footprints of Both Activations and Gradients for Large-Scale DNN Training.
Proceedings of the ECAI 2023 - 26th European Conference on Artificial Intelligence, September 30 - October 4, 2023, Kraków, Poland, 2023

2021
Dense-to-Sparse Gate for Mixture-of-Experts.
CoRR, 2021

Building a COVID-19 Literature Knowledge Graph Based on PubMed.
Proceedings of 2021 International Conference on Medical Imaging and Computer-Aided Diagnosis, 2021

2019
FlexSaaS: A Reconfigurable Accelerator for Web Search Selection.
ACM Trans. Reconfigurable Technol. Syst., 2019

Efficient and Effective Sparse LSTM on FPGA with Bank-Balanced Sparsity.
Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

SeerNet: Predicting Convolutional Neural Network Feature-Map Sparsity Through Low-Bit Quantization.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Balanced Sparsity for Efficient DNN Inference on GPU.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2013
Information Technology Education Based on Cloud Computing.
Proceedings of the Information Computing and Applications - 4th International Conference, 2013


  Loading...