SASDenSebLE: A Compact Vision Transformer Inference Architecture With Saturation-Approximate Softmax Dataflow Enabling Sequence-Parallelism Boosted Layer-Fusion Execution.

[BibT_eX]

[DOI]

Liu He

Yujin Wang

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., November, 2025

Enhancing Memory Efficiency in Large Language Model Training Through Chronos-aware Pipeline Parallelism.

[BibT_eX]

[DOI]

CoRR, March, 2025

MoESD: Unveil Speculative Decoding's Potential for Accelerating Sparse MoE.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

HyC-LoRA: Memory Efficient LoRA Fine-tuning with Hybrid Activation Compression.

[BibT_eX]

[DOI]

Proceedings of the Eighth Conference on Machine Learning and Systems, 2025

CCE: A 28nm Content Creation Engine with Asymmetric Computing, Semantic-Driven Instruction Generation and Collision-Free Outlier Mapper for Video Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Custom Integrated Circuits Conference, 2025

Pro-Cache-CIM: A 28nm 69.4TOPS/W Product-Cache-based Digital-Compute-in-Memory Macro Leveraging Data Locality Pattern in Vision AI Tasks.

[BibT_eX]

[DOI]

Proceedings of the IEEE Custom Integrated Circuits Conference, 2025

2024

Hecaton: Training and Finetuning Large Language Models with Scalable Chiplet Systems.

[BibT_eX]

[DOI]

CoRR, 2024

A 28nm 4.35TOPS/mm2 Transformer Accelerator with Basis-vector Based Ultra Storage Compression, Decomposed Computation and Unified LUT-Assisted Cores.

[BibT_eX]

[DOI]

Proceedings of the IEEE Symposium on VLSI Technology and Circuits 2024, 2024

Exploring Approximation and Dataflow Co-Optimization for Scalable Transformer Inference Architecture on the Edge.

[BibT_eX]

[DOI]

Proceedings of the 37th IEEE International System-on-Chip Conference, 2024

34.7 A 28nm 2.4Mb/mm<sup>2</sup> 6.9 - 16.3TOPS/mm<sup>2</sup> eDRAM-LUT-Based Digital-Computing-in-Memory Macro with In-Memory Encoding and Refreshing.

[BibT_eX]

[DOI]