Shiyang Chen

Orcid: 0000-0003-2626-7865

Affiliations:
  • Rutgers University, New Brunswick, NJ, USA
  • Stevens Institute of Technology, Hoboken, NJ, USA (former)


According to our database1, Shiyang Chen authored at least 21 papers between 2021 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Deal: Distributed End-to-End GNN Inference for All Nodes.
CoRR, March, 2025

KVDirect: Distributed Disaggregated LLM Inference.
CoRR, January, 2025

2024
TeGraph+: Scalable Temporal Graph Processing Enabling Flexible Edge Modifications.
IEEE Trans. Parallel Distributed Syst., August, 2024

<i>TEA+</i>: A Novel Temporal Graph Random Walk Engine with Hybrid Storage Architecture.
ACM Trans. Archit. Code Optim., June, 2024

Kernel fusion in atomistic spin dynamics simulations on Nvidia GPUs using tensor core.
J. Comput. Sci., 2024

PrisonBreak: Jailbreaking Large Language Models with Fewer Than Twenty-Five Targeted Bit-flips.
CoRR, 2024

FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design.
CoRR, 2024

Quant-LLM: Accelerating the Serving of Large Language Models via FP6-Centric Algorithm-System Co-Design on Modern GPUs.
Proceedings of the 2024 USENIX Annual Technical Conference, 2024

2023
Motif-Based Graph Representation Learning with Application to Chemical Molecules.
Informatics, March, 2023

ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks.
CoRR, 2023

DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies.
CoRR, 2023

Tango: rethinking quantization for graph neural network training on GPUs.
CoRR, 2023

PeeK: A Prune-Centric Approach for K Shortest Path Computation.
Proceedings of the International Conference for High Performance Computing, 2023

TANGO: re-thinking quantization for graph neural network training on GPUs.
Proceedings of the International Conference for High Performance Computing, 2023

2022
A length adaptive algorithm-hardware co-design of transformer on FPGA through sparse attention and dynamic pipelining.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

Sparse Progressive Distillation: Resolving Overfitting under Pretrain-and-Finetune Paradigm.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021
Sparse Progressive Distillation: Resolving Overfitting under Pretrain-and-Finetune Paradigm.
CoRR, 2021

Optimizing FPGA-based Accelerator Design for Large-Scale Molecular Similarity Search.
CoRR, 2021

E.T.: re-thinking self-attention for transformer models on GPUs.
Proceedings of the International Conference for High Performance Computing, 2021

Optimizing FPGA-based Accelerator Design for Large-Scale Molecular Similarity Search (Special Session Paper).
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2021

HMC-TRAN: A Tensor-core Inspired Hierarchical Model Compression for Transformer-based DNNs on GPU.
Proceedings of the GLSVLSI '21: Great Lakes Symposium on VLSI 2021, 2021


  Loading...