Yufei Ding
Orcid: 0000-0002-8716-5793Affiliations:
- University of California San Diego, Department of Computer Science & Engineering, San Diego, CA, USA
- University of California at Santa Barbara, Department of Computer Science, Santa Barbara, CA, USA (2017 - 2023)
- North Carolina State University, Raleigh, NC, USA (PhD 2014)
- College of William and Mary, Williamsburg, VA, USA (until 2011)
According to our database1,
Yufei Ding
authored at least 145 papers
between 2013 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2025
HedraRAG: Coordinating LLM Generation and Database Retrieval in Heterogeneous RAG Serving.
CoRR, July, 2025
CoRR, July, 2025
KPerfIR: Towards an Open and Compiler-centric Ecosystem for GPU Kernel Performance Tooling on Modern AI Workloads.
CoRR, May, 2025
OneAdapt: Adaptive Compilation for Resource-Constrained Photonic One-Way Quantum Computing.
CoRR, April, 2025
Proceedings of the 2025 USENIX Annual Technical Conference, 2025
Proceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles, 2025
Mercury: Unlocking Multi-GPU Operator Optimization for LLMs via Remote Memory Scheduling.
Proceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles, 2025
KPerfIR: Towards a Open and Compiler-centric Ecosystem for GPU Kernel Performance Tooling on Modern AI Workloads.
Proceedings of the 19th USENIX Symposium on Operating Systems Design and Implementation, 2025
Proceedings of the 19th USENIX Symposium on Operating Systems Design and Implementation, 2025
Proceedings of the 52nd Annual International Symposium on Computer Architecture, 2025
SwitchQNet: Optimizing Distributed Quantum Computing for Quantum Data Centers with Switch Networks.
Proceedings of the 52nd Annual International Symposium on Computer Architecture, 2025
TRACI: Network Acceleration of Input-Dynamic Communication for Large-Scale Deep Learning Recommendation Model.
Proceedings of the 52nd Annual International Symposium on Computer Architecture, 2025
Proceedings of the 52nd Annual International Symposium on Computer Architecture, 2025
Mutual Effort for Efficiency: A Similarity-based Token Pruning for Vision Transformers in Self-Supervised Learning.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Push Multicast: A Speculative and Coherent Interconnect for Mitigating Manycore CPU Communication Bottleneck.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2025
QECC-Synth: A Layout Synthesizer for Quantum Error Correction Codes on Sparse Architectures.
Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2025
Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2025
2024
CoRR, 2024
OPER: Optimality-Guided Embedding Table Parallelization for Large-scale Recommendation Model.
Proceedings of the 2024 USENIX Annual Technical Conference, 2024
RecFlex: Enabling Feature Heterogeneity-Aware Optimization for Deep Recommendation Models with Flexible Schedules.
Proceedings of the International Conference for High Performance Computing, 2024
Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024
Soter: Analytical Tensor-Architecture Modeling and Automatic Tensor Program Tuning for Spatial Accelerators.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024
RAP: Resource-aware Automated GPU Sharing for Multi-GPU Recommendation Model Training and Input Preprocessing.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024
ZENO: A Type-based Optimization Framework for Zero Knowledge Neural Network Inference.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024
2023
ACM Trans. Archit. Code Optim., September, 2023
IEEE Trans. Neural Networks Learn. Syst., June, 2023
Exploring Adversarial Attack in Spiking Neural Networks With Spike-Compatible Gradient.
IEEE Trans. Neural Networks Learn. Syst., May, 2023
A Geometrical Approach to Evaluate the Adversarial Robustness of Deep Neural Networks.
ACM Trans. Multim. Comput. Commun. Appl., 2023
SDP: Co-Designing Algorithm, Dataflow, and Architecture for In-SRAM Sparse NN Acceleration.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2023
ReDCIM: Reconfigurable Digital Computing- In -Memory Processor With Unified FP/INT Pipeline for Cloud AI Acceleration.
IEEE J. Solid State Circuits, 2023
TranCIM: Full-Digital Bitline-Transpose CIM-based Sparse Transformer Accelerator With Pipeline/Parallel Reconfigurable Modes.
IEEE J. Solid State Circuits, 2023
Proceedings of the 2023 USENIX Annual Technical Conference, 2023
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023
Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2023
MGG: Accelerating Graph Neural Networks with Fine-Grained Intra-Kernel Communication-Computation Pipelining on Multi-GPU Platforms.
Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, 2023
Proceedings of the Sixth Conference on Machine Learning and Systems, 2023
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023
RM-STC: Row-Merge Dataflow Inspired GPU Sparse Tensor Core for Energy-Efficient Sparse Acceleration.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023
Q-BEEP: Quantum Bayesian Error Mitigation Employing Poisson Modeling over the Hamming Spectrum.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023
ECSSD: Hardware/Data Layout Co-Designed In-Storage-Computing Architecture for Extreme Classification.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023
Proceedings of the 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Network, 2023
2022
STPAcc: Structural TI-Based Pruning for Accelerating Distance-Related Algorithms on CPU-FPGA Platforms.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022
IEEE Trans. Computers, 2022
IEEE Trans. Computers, 2022
CoRR, 2022
Empowering GNNs with Fine-grained Communication-Computation Pipelining on Multi-GPU Platforms.
CoRR, 2022
CollComm: Enabling Efficient Collective Quantum Communication Based on EPR buffering.
CoRR, 2022
GMI-DRL: Empowering Multi-GPU Deep Reinforcement Learning with GPU Spatial Multiplexing.
CoRR, 2022
IEEE Comput. Archit. Lett., 2022
Proceedings of the 2022 USENIX Annual Technical Conference, 2022
Proceedings of the SC22: International Conference for High Performance Computing, 2022
EL-Rec: Efficient Large-Scale Recommendation Model Training via Tensor-Train Embedding Table.
Proceedings of the SC22: International Conference for High Performance Computing, 2022
Proceedings of the PPoPP '22: 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Seoul, Republic of Korea, April 2, 2022
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Understanding GNN Computational Graph: A Coordinated Computation, IO, and Memory Perspective.
Proceedings of the Fifth Conference on Machine Learning and Systems, 2022
AutoComm: A Framework for Enabling Efficient Communication in Distributed Quantum Programs.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022
A 28nm 15.59µJ/Token Full-Digital Bitline-Transpose CIM-Based Sparse Transformer Accelerator with Pipeline/Parallel Reconfigurable Modes.
Proceedings of the IEEE International Solid-State Circuits Conference, 2022
A 28nm 29.2TFLOPS/W BF16 and 36.5TOPS/W INT8 Reconfigurable Digital CIM Processor with Unified FP/INT Pipeline and Bitwise In-Memory Booth Multiplication for Cloud Deep Learning Acceleration.
Proceedings of the IEEE International Solid-State Circuits Conference, 2022
A synthesis framework for stitching surface code with superconducting quantum devices.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022
INSPIRE: in-storage private information retrieval via protocol and architecture co-design.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022
Shfl-BW: accelerating deep neural network inference with tensor-core aware weight pruning.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022
Proceedings of the ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022, 2022
Paulihedral: a generalized block-wise compiler optimization framework for Quantum simulation kernels.
Proceedings of the ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022, 2022
2021
Effective and Efficient Batch Normalization Using a Few Uncorrelated Data for Statistics Estimation.
IEEE Trans. Neural Networks Learn. Syst., 2021
IACR Cryptol. ePrint Arch., 2021
TC-GNN: Accelerating Sparse Graph Neural Network Computation Via Dense Tensor Core on GPUs.
CoRR, 2021
Mitigating Noise-Induced Gradient Vanishing in Variational Quantum Algorithm Training.
CoRR, 2021
Efficient Sparse Matrix Kernels based on Adaptive Workload-Balancing and Parallel-Reduction.
CoRR, 2021
Proceedings of the 2021 USENIX Annual Technical Conference, 2021
APNN-TC: accelerating arbitrary precision neural networks on ampere GPU tensor cores.
Proceedings of the International Conference for High Performance Computing, 2021
Efficient tensor core-based GPU kernels for structured sparsity under reduced precision.
Proceedings of the International Conference for High Performance Computing, 2021
Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021
Proceedings of the 15th USENIX Symposium on Operating Systems Design and Implementation, 2021
Proceedings of the NANOCOM '21: The Eighth Annual ACM International Conference on Nanoscale Computing and Communication, Virtual Event, Italy, September 7, 2021
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the CIKM '21: The 30th ACM International Conference on Information and Knowledge Management, Virtual Event, Queensland, Australia, November 1, 2021
Proceedings of the 21st IEEE/ACM International Symposium on Cluster, 2021
UAG: Uncertainty-aware Attention Graph Neural Network for Defending Adversarial Attacks.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021
2020
Proc. ACM Program. Lang., 2020
Tianjic: A Unified and Scalable Chip Bridging Spike-Based and Continuous Neural Computation.
IEEE J. Solid State Circuits, 2020
A novel ensemble pruning approach based on information exchange glowworm swarm optimization and complementarity measure.
J. Intell. Fuzzy Syst., 2020
CoRR, 2020
Scalable Adversarial Attack on Graph Neural Networks with Alternating Direction Method of Multipliers.
CoRR, 2020
CoRR, 2020
Domain-adversarial multi-task framework for novel therapeutic property prediction of compounds.
Bioinform., 2020
Proceedings of the Network and Parallel Computing, 2020
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020
iPIM: Programmable In-Memory Image Processing Accelerator Using Near-Bank Architecture.
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020
SGQuant: Squeezing the Last Bit on Graph Neural Networks with Specialized Quantization.
Proceedings of the 32nd IEEE International Conference on Tools with Artificial Intelligence, 2020
Proceedings of the 37th International Conference on Machine Learning, 2020
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020
Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020
Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020
2019
DASM: Data-Streaming-Based Computing in Nonvolatile Memory Architecture for Embedded System.
IEEE Trans. Very Large Scale Integr. Syst., 2019
CoRR, 2019
AccD: A Compiler-based Framework for Accelerating Distance-related Algorithms on CPU-FPGA Platforms.
CoRR, 2019
SANQ: A Simulation Framework for Architecting Noisy Intermediate-Scale Quantum Computing System.
CoRR, 2019
Neural Network Model Extraction Attacks in Edge Devices by Hearing Architectural Hints.
CoRR, 2019
Proceedings of the 31st IEEE International Conference on Tools with Artificial Intelligence, 2019
Proceedings of the 7th International Conference on Learning Representations, 2019
Proceedings of the 27th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2019
Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019
2018
Domain-Adversarial Multi-Task Framework for Novel Therapeutic Property Prediction of Compounds.
CoRR, 2018
CoRR, 2018
Challenges Towards Deploying Data Intensive Scientific Applications on Extreme Heterogeneity Supercomputers.
CoRR, 2018
Proceedings of the 34th IEEE International Conference on Data Engineering, 2018
2017
Proc. ACM Program. Lang., 2017
Generalizations of the theory and deployment of triangular inequality for compiler-based strength reduction.
Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2017
Sweet KNN: An Efficient KNN on GPU through Reconciliation between Redundancy Removal and Regularity.
Proceedings of the 33rd IEEE International Conference on Data Engineering, 2017
2015
TOP: A Framework for Enabling Algorithmic Optimizations for Distance-Related Problems.
Proc. VLDB Endow., 2015
Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2015
Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup.
Proceedings of the 32nd International Conference on Machine Learning, 2015
2014
Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, 2014
Finding the limit: examining the potential and complexity of compilation scheduling for JIT-based runtime systems.
Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2014
2013
Profmig: A framework for flexible migration of program profiles across software versions.
Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, 2013