Jiecao Yu

Orcid: 0000-0003-2085-0312

According to our database1, Jiecao Yu authored at least 16 papers between 2017 and 2026.

Collaborative distances:

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
LoKA: Low-precision Kernel Applications for Recommendation Models At Scale.
CoRR, May, 2026

Scaling Multi-Node Mixture-of-Experts Inference Using Expert Activation Patterns.
CoRR, April, 2026

Unveiling the Potential of Quantization with MXFP4: Strategies for Quantization Error Reduction.
CoRR, March, 2026

2025
Fast and Simplex: 2-Simplicial Attention in Triton.
CoRR, July, 2025

Accelerating Transformer Inference and Training with 2:4 Activation Sparsity.
CoRR, March, 2025

Scaling Llama 3 Training with Efficient Parallelism Strategies.
Proceedings of the 52nd Annual International Symposium on Computer Architecture, 2025

2023
BitSET: Bit-Serial Early Termination for Computation Reduction in Convolutional Neural Networks.
ACM Trans. Embed. Comput. Syst., October, 2023

2021
First-Generation Inference Accelerator Deployment at Facebook.
CoRR, 2021

Alternate Model Growth and Pruning for Efficient Training of Recommendation Systems.
Proceedings of the 20th IEEE International Conference on Machine Learning and Applications, 2021

Compute-Capable Block RAMs for Efficient Deep Learning Acceleration on FPGAs.
Proceedings of the 29th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2021

2020
Adaptive Dense-to-Sparse Paradigm for Pruning Online Recommendation System with Non-Stationary Data.
CoRR, 2020

2019
Efficient Deep Neural Network Computation on Processors.
PhD thesis, 2019

TF-Net: Deploying Sub-Byte Deep Neural Networks on Microcontrollers.
ACM Trans. Embed. Comput. Syst., 2019

Spatial-Winograd Pruning Enabling Sparse Winograd Convolution.
CoRR, 2019

Bit Prudent In-Cache Acceleration of Deep Convolutional Neural Networks.
Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019

2017
Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism.
Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017


  Loading...