Shijie Cao

Orcid: 0009-0000-2001-3763

According to our database¹, Shijie Cao authored at least 34 papers between 2013 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

Less Is More: Training-Free Sparse Attention with Global Locality for Efficient Reasoning.

[BibT_eX]

[DOI]

CoRR, August, 2025

ReflecSched: Solving Dynamic Flexible Job-Shop Scheduling via LLM-Powered Hierarchical Reflection.

[BibT_eX]

[DOI]

Shijie Cao

Yuan Yuan

CoRR, August, 2025

Fine-Grained Structured Sparse Computing for FPGA-Based AI Inference.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., July, 2025

Data Efficacy for Language Model Training.

[BibT_eX]

[DOI]

CoRR, June, 2025

SeerAttention-R: Sparse Attention Adaptation for Long Reasoning.

[BibT_eX]

[DOI]

CoRR, June, 2025

Rectified Sparse Attention.

[BibT_eX]

[DOI]

CoRR, June, 2025

BitDecoding: Unlocking Tensor Cores for Long-Context LLMs Decoding with Low-Bit KV Cache.

[BibT_eX]

[DOI]

CoRR, March, 2025

ROMA: a Read-Only-Memory-based Accelerator for QLoRA-based On-Device LLM.

[BibT_eX]

[DOI]

CoRR, March, 2025

CoCoEvo: Co-Evolution of Programs and Test Cases to Enhance Code Generation.

[BibT_eX]

[DOI]

CoRR, February, 2025

Dissecting Bit-Level Scaling Laws in Quantizing Vision Generative Models.

[BibT_eX]

[DOI]

CoRR, January, 2025

LUT Tensor Core: A Software-Hardware Co-Design for LUT-Based Low-Bit LLM Inference.

[BibT_eX]

[DOI]

Proceedings of the 52nd Annual International Symposium on Computer Architecture, 2025

T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge.

[BibT_eX]

[DOI]

Proceedings of the Twentieth European Conference on Computer Systems, 2025

Bitnet.cpp: Efficient Edge Inference for Ternary LLMs.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

Automating Energy-Efficient GPU Kernel Generation: A Fast Search-Based Compilation Approach.

[BibT_eX]

[DOI]

CoRR, 2024

SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs.

[BibT_eX]

[DOI]

CoRR, 2024

LUT Tensor Core: Lookup Table Enables Efficient Low-Bit LLM Inference Acceleration.

[BibT_eX]

[DOI]

CoRR, 2024

Ladder: Enabling Efficient Low-Precision Deep Learning Computing through Hardware-aware Tensor Transformation.

[BibT_eX]

[DOI]

Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation, 2024

Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference.

[BibT_eX]

[DOI]

Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

AFPQ: Asymmetric Floating Point Quantization for LLMs.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-Distillation.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

Inverse model and adaptive neighborhood search based cooperative optimizer for energy-efficient distributed flexible job shop scheduling.

[BibT_eX]

[DOI]

Swarm Evol. Comput., December, 2023

Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

NN-Stretch: Automatic Neural Network Branching for Parallel Inference on Heterogeneous Multi-Processors.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual International Conference on Mobile Systems, 2023

Efficient GPU Kernels for N: M-Sparse Weights in Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the Sixth Conference on Machine Learning and Systems, 2023

Accurate and Structured Pruning for Efficient Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Adam Accumulation to Reduce Memory Footprints of Both Activations and Gradients for Large-Scale DNN Training.

[BibT_eX]

[DOI]

Proceedings of the ECAI 2023 - 26th European Conference on Artificial Intelligence, September 30 - October 4, 2023, Kraków, Poland, 2023

2021

Dense-to-Sparse Gate for Mixture-of-Experts.

[BibT_eX]

[DOI]

CoRR, 2021

Building a COVID-19 Literature Knowledge Graph Based on PubMed.

[BibT_eX]

[DOI]

Hualing Liu

Yi Sun

Shijie Cao

Proceedings of 2021 International Conference on Medical Imaging and Computer-Aided Diagnosis, 2021

2019

FlexSaaS: A Reconfigurable Accelerator for Web Search Selection.

[BibT_eX]

[DOI]

ACM Trans. Reconfigurable Technol. Syst., 2019

Efficient and Effective Sparse LSTM on FPGA with Bank-Balanced Sparsity.

[BibT_eX]

[DOI]

Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

SeerNet: Predicting Convolutional Neural Network Feature-Map Sparsity Through Low-Bit Quantization.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Balanced Sparsity for Efficient DNN Inference on GPU.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2013

Information Technology Education Based on Cloud Computing.

[BibT_eX]

[DOI]

Proceedings of the Information Computing and Applications - 4th International Conference, 2013

Shijie Cao

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...