Yufei Ding

Orcid: 0000-0002-8716-5793

Affiliations:
  • University of California at Santa Barbara, CA, USA


According to our database1, Yufei Ding authored at least 120 papers between 2013 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
MECH: Multi-Entry Communication Highway for Superconducting Quantum Chiplets.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

RAP: Resource-aware Automated GPU Sharing for Multi-GPU Recommendation Model Training and Input Preprocessing.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

ZENO: A Type-based Optimization Framework for Zero Knowledge Neural Network Inference.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023
MPU: Memory-centric SIMT Processor via In-DRAM Near-bank Computing.
ACM Trans. Archit. Code Optim., September, 2023

Comprehensive SNN Compression Using ADMM Optimization and Activity Regularization.
IEEE Trans. Neural Networks Learn. Syst., June, 2023

Exploring Adversarial Attack in Spiking Neural Networks With Spike-Compatible Gradient.
IEEE Trans. Neural Networks Learn. Syst., May, 2023

A Geometrical Approach to Evaluate the Adversarial Robustness of Deep Neural Networks.
ACM Trans. Multim. Comput. Commun. Appl., 2023

SDP: Co-Designing Algorithm, Dataflow, and Architecture for In-SRAM Sparse NN Acceleration.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2023

SPG: Structure-Private Graph Database via SqueezePIR.
Proc. VLDB Endow., 2023

ReDCIM: Reconfigurable Digital Computing- In -Memory Processor With Unified FP/INT Pipeline for Cloud AI Acceleration.
IEEE J. Solid State Circuits, 2023

TranCIM: Full-Digital Bitline-Transpose CIM-based Sparse Transformer Accelerator With Pipeline/Parallel Reconfigurable Modes.
IEEE J. Solid State Circuits, 2023

TC-GNN: Bridging Sparse GNN Computation and Dense Tensor Cores on GPUs.
Proceedings of the 2023 USENIX Annual Technical Conference, 2023

QASMTrans: A QASM Quantum Transpiler Framework for NISQ Devices.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

Dynamic N: M Fine-Grained Structured Sparse Attention Mechanism.
Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2023

MGG: Accelerating Graph Neural Networks with Fine-Grained Intra-Kernel Communication-Computation Pipelining on Multi-GPU Platforms.
Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, 2023

QuComm: Optimizing Collective Communication for Distributed Quantum Computing.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

RM-STC: Row-Merge Dataflow Inspired GPU Sparse Tensor Core for Energy-Efficient Sparse Acceleration.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

OneQ: A Compilation Framework for Photonic One-Way Quantum Computation.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

Q-BEEP: Quantum Bayesian Error Mitigation Employing Poisson Modeling over the Hamming Spectrum.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

ECSSD: Hardware/Data Layout Co-Designed In-Storage-Computing Architecture for Extreme Classification.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

On Adversarial Robustness of Point Cloud Semantic Segmentation.
Proceedings of the 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Network, 2023

2022
STPAcc: Structural TI-Based Pruning for Accelerating Distance-Related Algorithms on CPU-FPGA Platforms.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022

Rubik: A Hierarchical Architecture for Efficient Graph Neural Network Training.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022

Dynamic Sparse Attention for Scalable Transformer Acceleration.
IEEE Trans. Computers, 2022

A Systematic View of Model Leakage Risks in Deep Neural Network Systems.
IEEE Trans. Computers, 2022

Quantum and Post-Moore's Law Computing.
IEEE Internet Comput., 2022

Enabling Data Movement and Computation Pipelining in Deep Learning Compiler.
CoRR, 2022

Empowering GNNs with Fine-grained Communication-Computation Pipelining on Multi-GPU Platforms.
CoRR, 2022

CollComm: Enabling Efficient Collective Quantum Communication Based on EPR buffering.
CoRR, 2022

GMI-DRL: Empowering Multi-GPU Deep Reinforcement Learning with GPU Spatial Multiplexing.
CoRR, 2022

Heuristic Adaptability to Input Dynamics for SpMM on GPUs.
CoRR, 2022

MPU-Sim: A Simulator for In-DRAM Near-Bank Processing Architectures.
IEEE Comput. Archit. Lett., 2022

Faith: An Efficient Framework for Transformer Verification on GPUs.
Proceedings of the 2022 USENIX Annual Technical Conference, 2022

LightSeq2: Accelerated Training for Transformer-Based Models on GPUs.
Proceedings of the SC22: International Conference for High Performance Computing, 2022

EL-Rec: Efficient Large-Scale Recommendation Model Training via Tensor-Train Embedding Table.
Proceedings of the SC22: International Conference for High Performance Computing, 2022

QGTC: accelerating quantized graph neural networks via GPU tensor core.
Proceedings of the PPoPP '22: 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Seoul, Republic of Korea, April 2, 2022

Biologically Inspired Dynamic Thresholds for Spiking Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Understanding GNN Computational Graph: A Coordinated Computation, IO, and Memory Perspective.
Proceedings of Machine Learning and Systems 2022, 2022

AutoComm: A Framework for Enabling Efficient Communication in Distributed Quantum Programs.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

A 28nm 15.59µJ/Token Full-Digital Bitline-Transpose CIM-Based Sparse Transformer Accelerator with Pipeline/Parallel Reconfigurable Modes.
Proceedings of the IEEE International Solid-State Circuits Conference, 2022

A 28nm 29.2TFLOPS/W BF16 and 36.5TOPS/W INT8 Reconfigurable Digital CIM Processor with Unified FP/INT Pipeline and Bitwise In-Memory Booth Multiplication for Cloud Deep Learning Acceleration.
Proceedings of the IEEE International Solid-State Circuits Conference, 2022

A synthesis framework for stitching surface code with superconducting quantum devices.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

EQC: ensembled quantum computing for variational quantum algorithms.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

INSPIRE: in-storage private information retrieval via protocol and architecture co-design.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

Shfl-BW: accelerating deep neural network inference with tensor-core aware weight pruning.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

Heuristic adaptability to input dynamics for SpMM on CPUs.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

DOTA: detect and omit weak attentions for scalable transformer acceleration.
Proceedings of the ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022, 2022

Paulihedral: a generalized block-wise compiler optimization framework for Quantum simulation kernels.
Proceedings of the ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022, 2022

2021
Effective and Efficient Batch Normalization Using a Few Uncorrelated Data for Statistics Estimation.
IEEE Trans. Neural Networks Learn. Syst., 2021

Reuse-centric k-means configuration.
Inf. Syst., 2021

ZEN: Efficient Zero-Knowledge Proofs for Neural Networks.
IACR Cryptol. ePrint Arch., 2021

Attacking Point Cloud Segmentation with Color-only Perturbation.
CoRR, 2021

TC-GNN: Accelerating Sparse Graph Neural Network Computation Via Dense Tensor Core on GPUs.
CoRR, 2021

Towards Efficient Ansatz Architecture for Variational Quantum Algorithms.
CoRR, 2021

Mapping Surface Code to Superconducting Quantum Processors.
CoRR, 2021

QECV: Quantum Error Correction Verification.
CoRR, 2021

Mitigating Noise-Induced Gradient Vanishing in Variational Quantum Algorithm Training.
CoRR, 2021

QGTC: Accelerating Quantized GNN via GPU Tensor Core.
CoRR, 2021

Transformer Acceleration with Dynamic Sparse Attention.
CoRR, 2021

Efficient Sparse Matrix Kernels based on Adaptive Workload-Balancing and Parallel-Reduction.
CoRR, 2021

MPU: Towards Bandwidth-abundant SIMT Processor via Near-bank Computing.
CoRR, 2021

Palleon: A Runtime System for Efficient Video Processing toward Dynamic Class Skew.
Proceedings of the 2021 USENIX Annual Technical Conference, 2021

APNN-TC: accelerating arbitrary precision neural networks on ampere GPU tensor cores.
Proceedings of the International Conference for High Performance Computing, 2021

Efficient tensor core-based GPU kernels for structured sparsity under reduced precision.
Proceedings of the International Conference for High Performance Computing, 2021

EGEMM-TC: accelerating scientific computing on tensor cores with extended precision.
Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

GNNAdvisor: An Adaptive and Efficient Runtime System for GNN Acceleration on GPUs.
Proceedings of the 15th USENIX Symposium on Operating Systems Design and Implementation, 2021

On the Co-Design of Quantum Software and Hardware.
Proceedings of the NANOCOM '21: The Eighth Annual ACM International Conference on Nanoscale Computing and Communication, Virtual Event, Italy, September 7, 2021

ENMC: Extreme Near-Memory Classification via Approximate Screening.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

Improving Streaming Graph Processing Performance using Input Knowledge.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

DSXplore: Optimizing Convolutional Neural Networks via Sliding-Channel Convolutions.
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

Overcoming the Memory Hierarchy Inefficiencies in Graph Processing Applications.
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2021

Saga: Sparse Adversarial Attack on EEG-Based Brain Computer Interface.
Proceedings of the IEEE International Conference on Acoustics, 2021

An Efficient Quantitative Approach for Optimizing Convolutional Neural Networks.
Proceedings of the CIKM '21: The 30th ACM International Conference on Information and Knowledge Management, Virtual Event, Queensland, Australia, November 1, 2021

TiAcc: Triangle-inequality based Hardware Accelerator for K-means on FPGAs.
Proceedings of the 21st IEEE/ACM International Symposium on Cluster, 2021

UAG: Uncertainty-aware Attention Graph Neural Network for Defending Adversarial Attacks.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Projection-based runtime assertions for testing and debugging Quantum programs.
Proc. ACM Program. Lang., 2020

Rethinking the performance comparison between SNNS and ANNS.
Neural Networks, 2020

Tianjic: A Unified and Scalable Chip Bridging Spike-Based and Continuous Neural Computation.
IEEE J. Solid State Circuits, 2020

A novel ensemble pruning approach based on information exchange glowworm swarm optimization and complementarity measure.
J. Intell. Fuzzy Syst., 2020

Rubik: A Hierarchical Architecture for Efficient Graph Learning.
CoRR, 2020

Uncertainty-aware Attention Graph Neural Network for Defending Adversarial Attacks.
CoRR, 2020

Scalable Adversarial Attack on Graph Neural Networks with Alternating Direction Method of Multipliers.
CoRR, 2020

Optimizing Convolutional Neural Network Architecture via Information Field.
CoRR, 2020

GNNAdvisor: An Efficient Runtime System for GNN Acceleration on GPUs.
CoRR, 2020

Domain-adversarial multi-task framework for novel therapeutic property prediction of compounds.
Bioinform., 2020

A Close Look at Multi-tenant Parallel CNN Inference for Autonomous Driving.
Proceedings of the Network and Parallel Computing, 2020

DUET: Boosting Deep Neural Network Efficiency on Dual-Module Architecture.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

iPIM: Programmable In-Memory Image Processing Accelerator Using Near-Bank Architecture.
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

SGQuant: Squeezing the Last Bit on Graph Neural Networks with Specialized Quantization.
Proceedings of the 32nd IEEE International Conference on Tools with Artificial Intelligence, 2020

Boosting Deep Neural Network Efficiency with Dual-Module Inference.
Proceedings of the 37th International Conference on Machine Learning, 2020

Eliminating Redundant Computation in Noisy Quantum Computing Simulation.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

Towards Efficient Superconducting Quantum Processor Architecture Design.
Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020

DeepSniffer: A DNN Model Extraction Framework Based on Learning Architectural Hints.
Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020

Weighted-Sampling Audio Adversarial Example Attack.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
DASM: Data-Streaming-Based Computing in Nonvolatile Memory Architecture for Embedded System.
IEEE Trans. Very Large Scale Integr. Syst., 2019

Poq: Projection-based Runtime Assertions for Debugging on a Quantum Computer.
CoRR, 2019

AccD: A Compiler-based Framework for Accelerating Distance-related Algorithms on CPU-FPGA Platforms.
CoRR, 2019

SANQ: A Simulation Framework for Architecting Noisy Intermediate-Scale Quantum Computing System.
CoRR, 2019

Neural Network Model Extraction Attacks in Edge Devices by Hearing Architectural Hints.
CoRR, 2019

Adversarial attack on Speech-to-Text Recognition Models.
CoRR, 2019

Reconciling Feature-Reuse and Overfitting in DenseNet with Specialized Dropout.
Proceedings of the 31st IEEE International Conference on Tools with Artificial Intelligence, 2019

Dynamic Sparse Graph for Efficient Deep Learning.
Proceedings of the 7th International Conference on Learning Representations, 2019

KPynq: A Work-Efficient Triangle-Inequality Based K-Means on FPGA.
Proceedings of the 27th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2019

Tackling the Qubit Mapping Problem for NISQ-Era Quantum Devices.
Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019

2018
Penetrating the Fog: the Path to Efficient CNN Models.
CoRR, 2018

Domain-Adversarial Multi-Task Framework for Novel Therapeutic Property Prediction of Compounds.
CoRR, 2018

Reconciling Feature-Reuse and Overfitting in DenseNet with Specialized Dropout.
CoRR, 2018

In-memory multiplication engine with SOT-MRAM based stochastic computing.
CoRR, 2018

SECS: Efficient Deep Stream Processing via Class Skew Dichotomy.
CoRR, 2018

Challenges Towards Deploying Data Intensive Scientific Applications on Extreme Heterogeneity Supercomputers.
CoRR, 2018

Reuse-Centric K-Means Configuration.
Proceedings of the 34th IEEE International Conference on Data Engineering, 2018

2017
GLORE: generalized loop redundancy elimination upon LER-notation.
Proc. ACM Program. Lang., 2017

Generalizations of the theory and deployment of triangular inequality for compiler-based strength reduction.
Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2017

Sweet KNN: An Efficient KNN on GPU through Reconciliation between Redundancy Removal and Regularity.
Proceedings of the 33rd IEEE International Conference on Data Engineering, 2017

2015
TOP: A Framework for Enabling Algorithmic Optimizations for Distance-Related Problems.
Proc. VLDB Endow., 2015

Autotuning algorithmic choice for input sensitivity.
Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2015

Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup.
Proceedings of the 32nd International Conference on Machine Learning, 2015

2014
Call sequence prediction through probabilistic calling automata.
Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, 2014

Finding the limit: examining the potential and complexity of compilation scheduling for JIT-based runtime systems.
Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2014

2013
Profmig: A framework for flexible migration of program profiles across software versions.
Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, 2013


  Loading...