Mingyu Gao

Orcid: 0000-0001-8433-7281

Affiliations:
  • Tsinghua University, Institute for Interdisciplinary Information Sciences, Beijing, China
  • Stanford University, Stanford, CA, USA


According to our database1, Mingyu Gao authored at least 61 papers between 2015 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
SSS-DIMM: Removing Redundant Data Movement in Trusted DIMM-Based Near-Memory-Processing Kernel Offloading via Secure Space Sharing.
IEEE Trans. Parallel Distributed Syst., August, 2025

PUSHtap: PIM-based In-Memory HTAP with Unified Data Storage Format.
CoRR, August, 2025

Raccoon: Lightweight Support for Comprehensive Control Flows in Reconfigurable Spatial Architectures.
IEEE Trans. Parallel Distributed Syst., June, 2025

Femur: A Flexible Framework for Fast and Secure Querying from Public Key-Value Store.
Proc. ACM Manag. Data, June, 2025

Twilight: Adaptive Attention Sparsity with Hierarchical Top-<i>p</i> Pruning.
CoRR, February, 2025

A comprehensive survey of hardware-based security techniques from an architectural perspective.
J. Syst. Archit., 2025

A Critique on Average-Case Noise Analysis in RLWE-Based Homomorphic Encryption.
IACR Cryptol. ePrint Arch., 2025

HotRAP: Hot Record Retention and Promotion for LSM-trees with Tiered Storage.
Proceedings of the 2025 USENIX Annual Technical Conference, 2025

HYTE: Flexible Tiling for Sparse Accelerators via Hybrid Static-Dynamic Approaches.
Proceedings of the 52nd Annual International Symposium on Computer Architecture, 2025

ANSMET: Approximate Nearest Neighbor Search with Near-Memory Processing and Hybrid Early Termination.
Proceedings of the 52nd Annual International Symposium on Computer Architecture, 2025

Lincoln: Real-Time 50~100B LLM Inference on Consumer Devices with LPDDR-Interfaced, Compute-Enabled Flash Memory.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2025

Adyna: Accelerating Dynamic Neural Networks with Adaptive Scheduling.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2025

EFFACT: A Highly Efficient Full-Stack FHE Acceleration Platform.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2025

SoMa: Identifying, Exploring, and Understanding the DRAM Communication Scheduling Space for DNN Accelerators.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2025

A Unified Vector Processing Unit for Fully Homomorphic Encryption.
Proceedings of the Design, Automation & Test in Europe Conference, 2025

UniZK: Accelerating Zero-Knowledge Proof with Unified Hardware and Flexible Kernel Mapping.
Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2025

KAPLA: Scalable NN Accelerator Dataflow Design Space Structuring and Fast Exploring.
Proceedings of the 30th Asia and South Pacific Design Automation Conference, 2025

2024
PimPam: Efficient Graph Pattern Matching on Real Processing-in-Memory Hardware.
Proc. ACM Manag. Data, 2024

FastSwitch: Optimizing Context Switching Efficiency in Fairness-aware Large Language Model Serving.
CoRR, 2024

A system capable of verifiably and privately screening global DNA synthesis.
CoRR, 2024

ETCIM: An Error-Tolerant Digital-CIM Processor with Redundancy-Free Repair and Run-Time MAC and Cell Error Correction.
Proceedings of the IEEE Symposium on VLSI Technology and Circuits 2024, 2024

Bulkor: Enabling Bulk Loading for Path ORAM.
Proceedings of the IEEE Symposium on Security and Privacy, 2024

Hydrogen: Contention-Aware Hybrid Memory for Heterogeneous CPU-GPU Architectures.
Proceedings of the International Conference for High Performance Computing, 2024

LightZone: Lightweight Hardware-Assisted In-Process Isolation for ARM64.
Proceedings of the 25th International Middleware Conference, 2024

Stream-Based Data Placement for Near-Data Processing with Extended Memory.
Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024

UM-PIM: DRAM-based PIM with Uniform & Shared Memory Space.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

NDPBridge: Enabling Cross-Bank Coordination in Near-DRAM-Bank Processing Architectures.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

Seesaw: Compensating for Nonlinear Reduction with Linear Computations for Private Inference.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Gemini: Mapping and Architecture Co-exploration for Large-scale DNN Chiplet Accelerators.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024

Salus: A Practical Trusted Execution Environment for CPU-FPGA Heterogeneous Cloud Platforms.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

Trimma: Trimming Metadata Storage and Latency for Hybrid Memory Systems.
Proceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques, 2024

2023
Optimizing DNNs With Partially Equivalent Transformations and Automated Corrections.
IEEE Trans. Computers, December, 2023

SODA: A Set of Fast Oblivious Algorithms in Distributed Secure Data Analytics.
Proc. VLDB Endow., 2023

FLARE: A Fast, Secure, and Memory-Efficient Distributed Analytics Framework (Flavor: Systems).
Proc. VLDB Endow., 2023

When Tree Meets Hash: Reducing Random Reads for Index Structures on Persistent Memories.
Proc. ACM Manag. Data, 2023

KAPLA: Pragmatic Representation and Fast Solving of Scalable NN Accelerator Dataflow.
CoRR, 2023

Honeycomb: Secure and Efficient GPU Executions via Static Validation.
Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, 2023

SAM: A Scalable Accelerator for Number Theoretic Transform Using Multi-Dimensional Decomposition.
Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 2023

Baryon: Efficient Hybrid Memory Management with Compression and Sub-Blocking.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

ABNDP: Co-optimizing Data Access and Load Balance in Near-Data Processing.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

GZKP: A GPU Accelerated Zero-Knowledge Proof System.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

Spada: Accelerating Sparse Matrix Multiplication with Adaptive Dataflow.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

Secure MLaaS with Temper: Trusted and Efficient Model Partitioning and Enclave Reuse.
Proceedings of the Annual Computer Security Applications Conference, 2023

2022
RePAST: A ReRAM-based PIM Accelerator for Second-order Training of DNN.
CoRR, 2022

PPMLAC: high performance chipset architecture for secure multi-party computation.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

ShEF: shielded enclaves for cloud FPGAs.
Proceedings of the ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022, 2022

FINGERS: exploiting fine-grained parallelism in graph mining accelerators.
Proceedings of the ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022, 2022

2021
PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections.
Proceedings of the 15th USENIX Symposium on Operating Systems Design and Implementation, 2021

PipeZK: Accelerating Zero-Knowledge Proof with a Pipelined Architecture.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

2020
Improving the Accuracy, Scalability, and Performance of Graph Neural Networks with Roc.
Proceedings of the Third Conference on Machine Learning and Systems, 2020

Interstellar: Using Halide's Scheduling Language to Analyze DNN Accelerators.
Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020

2019
Optimizing DNN Computation with Relaxed Graph Substitutions.
Proceedings of the Second Conference on Machine Learning and Systems, SysML 2019, 2019

TANGRAM: Optimized Coarse-Grained Dataflow for Scalable NN Accelerators.
Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019

2018
DNN Dataflow Choice Is Overrated.
CoRR, 2018

GraphP: Reducing Communication for PIM-Based Graph Processing with Efficient Data Partition.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

2017
DRAF: A Low-Power DRAM-Based Reconfigurable Acceleration Fabric.
IEEE Micro, 2017

3D nanosystems enable <i>embedded</i> abundant-data computing: special session paper.
Proceedings of the Twelfth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis Companion, 2017

TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory.
Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017

2016
HRL: Efficient and flexible reconfigurable logic for near-data processing.
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

2015
Energy-Efficient Abundant-Data Computing: The N3XT 1, 000x.
Computer, 2015

Practical Near-Data Processing for In-Memory Analytics Frameworks.
Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015


  Loading...