Xulong Tang

CoRR, March, 2026

Personalized Dance Synthesis Based on Physical and Cognitive Intensities.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Virtual Reality and 3D User Interfaces, 2026

Rethinking the Potential of Layer Freezing for DNN Training Efficiency.

[BibT_eX]

[DOI]

Proceedings of the Great Lakes Symposium on VLSI 2026, 2026

2025

FlowerDance: MeanFlow for Efficient and Refined 3D Dance Generation.

[BibT_eX]

[DOI]

CoRR, November, 2025

Rethinking the Potential of Layer Freezing for Efficient DNN Training.

[BibT_eX]

[DOI]

CoRR, August, 2025

MEGADance: Mixture-of-Experts Architecture for Genre-Aware 3D Dance Generation.

[BibT_eX]

[DOI]

CoRR, May, 2025

MatchDance: Collaborative Mamba-Transformer Architecture Matching for High-Quality 3D Dance Synthesis.

[BibT_eX]

[DOI]

CoRR, May, 2025

SELECT: A Submodular Approach for Active LiDAR Semantic Segmentation.

[BibT_eX]

[DOI]

Ruiyu Mao

Sarthak Kumar Maharana

Yunhui Guo

CoRR, May, 2025

Special Issue on Top Picks From the 2024 Computer Architecture Conferences.

[BibT_eX]

[DOI]

Jun Yang

IEEE Micro, 2025

WAGES: Workload-Aware GPU Sharing System for Energy-Efficient Serverless LLM Serving.

[BibT_eX]

[DOI]

Proceedings of the SC '25 Workshops of the International Conference for High Performance Computing, 2025

CoheDancers: Enhancing Interactive Group Dance Generation through Music-Driven Coherence Decomposition.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

FastTree: Optimizing Attention Kernel and Runtime for Tree-Structured LLM Inference.

[BibT_eX]

[DOI]

Proceedings of the Eighth Conference on Machine Learning and Systems, 2025

Reinforcement Learning-Guided Graph State Generation in Photonic Quantum Computers.

[BibT_eX]

[DOI]

Proceedings of the 52nd Annual International Symposium on Computer Architecture, 2025

CIExplorer: Microarchitecture-Aware Exploration for Tightly Integrated Custom Instruction.

[BibT_eX]

[DOI]

Proceedings of the 39th ACM International Conference on Supercomputing, 2025

MemFreezing: A Novel Adversarial Attack on Temporal Graph Neural Networks under Limited Future Knowledge.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Mutual Effort for Efficiency: A Similarity-based Token Pruning for Vision Transformers in Self-Supervised Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

STMC: Small-Tile Multiple-Copy Compilation for Reliable Measurement-Based Quantum Computing.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2025

OASIS: Object-Aware Page Management for Multi-GPU Systems.

[BibT_eX]

[DOI]

Yueqi Wang

Bingyao Li

Mohamed Tarek Ibn Ziad

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2025

Pruner: A Draft-then-Verify Exploration Mechanism to Accelerate Tensor Program Tuning.

[BibT_eX]

[DOI]

Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2025

Cascade: A Dependency-aware Efficient Training Framework for Temporal Graph Neural Network.

[BibT_eX]

[DOI]

Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2025

A Computation and Energy Efficient Hardware Architecture for SSL Acceleration.

[BibT_eX]

[DOI]

Proceedings of the 30th Asia and South Pacific Design Automation Conference, 2025

2024

CoheDancers: Enhancing Interactive Group Dance Generation through Music-Driven Coherence Decomposition.

[BibT_eX]

[DOI]

CoRR, 2024

The Stabilizer Bootstrap of Quantum Machine Learning with up to 10000 qubits.

[BibT_eX]

[DOI]

CoRR, 2024

Improving GPU Multi-Tenancy Through Dynamic Multi-Instance GPU Reconfiguration.

[BibT_eX]

[DOI]

CoRR, 2024

Improving Multi-Instance GPU Efficiency via Sub-Entry Sharing TLB Design.

[BibT_eX]

[DOI]

CoRR, 2024

EdgeOL: Efficient in-situ Online Learning on Edge Devices.

[BibT_eX]

[DOI]

CoRR, 2024

BeatDance: A Beat-Based Model-Agnostic Contrastive Learning Framework for Music-Dance Retrieval.

[BibT_eX]

[DOI]

Proceedings of the 2024 International Conference on Multimedia Retrieval, 2024

CoDancers: Music-Driven Coherent Group Dance Generation with Choreographic Unit.

[BibT_eX]

[DOI]

Proceedings of the 2024 International Conference on Multimedia Retrieval, 2024

STAR: Sub-Entry Sharing-Aware TLB for Multi-Instance GPU.

[BibT_eX]

[DOI]

Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024

Waxing-and-Waning: a Generic Similarity-based Framework for Efficient Self-Supervised Learning.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

GRIT: Enhancing Multi-GPU Performance with Fine-Grained Dynamic Page Placement.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024

FCM: A Fusion-aware Wire Cutting Approach for Measurement-based Quantum Computing.

[BibT_eX]

[DOI]

Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

LOTUS: learning-based online thermal and latency variation management for two-stage detectors on edge devices.

[BibT_eX]

[DOI]

Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

QRCC: Evaluating Large Quantum Circuits on Small Quantum Computers through Integrated Qubit Reuse and Circuit Cutting.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

FMCC: Flexible Measurement-based Quantum Computation over Cluster State.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023

Sustainable AI Processing at the Edge.

[BibT_eX]

[DOI]

IEEE Micro, 2023

Minimizing Photonic Cluster State Depth in Measurement-Based Quantum Computing.

[BibT_eX]

[DOI]

CoRR, 2023

Integrated Qubit Reuse and Circuit Cutting for Large Quantum Circuit Evaluation.

[BibT_eX]

[DOI]

CoRR, 2023

SupeRBNN: Randomized Binary Neural Network Using Adiabatic Superconductor Josephson Devices.

[BibT_eX]

[DOI]

Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

IDYLL: Enhancing Page Translation in Multi-GPUs via Light Weight PTE Invalidations.

[BibT_eX]

[DOI]

Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

SmartFRZ: An Efficient Training Framework using Attention-Based Layer Freezing.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

FlexGM: An Adaptive Runtime System to Accelerate Graph Matching Networks on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 41st IEEE International Conference on Computer Design, 2023

AB-ORAM: Constructing Adjustable Buckets for Space Reduction in Ring ORAM.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

Trans-FW: Short Circuiting Page Table Walk in Multi-GPU Systems via Remote Forwarding.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

CEGMA: Coordinated Elastic Graph Matching Acceleration for Graph Matching Networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

EP-ORAM: Efficient NVM-Friendly Path Eviction for Ring ORAM in Hybrid Memory.

[BibT_eX]

[DOI]

Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

Orchestrated Scheduling and Partitioning for Improved Address Translation in GPUs.

[BibT_eX]

[DOI]

Bingyao Li

Yueqi Wang

Kaushik Parasuram Seshadreesan

Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

Orchestrating Measurement-Based Quantum Computation over Photonic Quantum Processors.

[BibT_eX]

[DOI]

Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

2022

An efficient segmented quantization for graph neural networks.

[BibT_eX]

[DOI]

CCF Trans. High Perform. Comput., December, 2022

Mobile or FPGA? A Comprehensive Evaluation on Energy Efficiency and a Unified Optimization Framework.

[BibT_eX]

[DOI]

ACM Trans. Embed. Comput. Syst., September, 2022

Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration.

[BibT_eX]

[DOI]

ACM Trans. Design Autom. Electr. Syst., 2022

Demystifying Arch-hints for Model Extraction: An Attack in Unified Memory System.

[BibT_eX]

[DOI]

CoRR, 2022

Sustainable AI Processing at the Edge.

[BibT_eX]

[DOI]

CoRR, 2022

Optimizing Data Layout for Training Deep Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Companion of The Web Conference 2022, Virtual Event / Lyon, France, April 25, 2022

Layer Freezing & Data Sieving: Missing Pieces of a Generic Framework for Sparse Training.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Enhancing GPU Performance via Neighboring Directory Table Based Inter-TLB Sharing.

[BibT_eX]

[DOI]

Proceedings of the IEEE 40th International Conference on Computer Design, 2022

Fine-Granular Computation and Data Layout Reorganization for Improving Locality.

[BibT_eX]

[DOI]

Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design, 2022

Q-GPU: A Recipe of Optimizations for Quantum Circuit Simulation Using GPUs.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

You Already Have It: A Generator-Free Low-Precision DNN Training Framework Using Stochastic Rounding.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

2021

Algorithm-hardware Co-design of Attention Mechanism on FPGA Devices.

[BibT_eX]

[DOI]

ACM Trans. Embed. Comput. Syst., 2021

A Survey of Large-Scale Deep Learning Serving System Optimization: Challenges and Opportunities.

[BibT_eX]

[DOI]

CoRR, 2021

Parallelizing DNN Training on GPUs: Challenges and Opportunities.

[BibT_eX]

[DOI]

Weizheng Xu

Proceedings of the Companion of The Web Conference 2021, 2021

Mix and Match: Reorganizing Tasks for Enhancing Data Locality.

[BibT_eX]

[DOI]

Proceedings of the SIGMETRICS '21: ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems, 2021

Work in Progress: Mobile or FPGA? A Comprehensive Evaluation on Energy Efficiency and a Unified Optimization Framework.

[BibT_eX]

[DOI]

Proceedings of the 27th IEEE Real-Time and Embedded Technology and Applications Symposium, 2021

Compiler support for near data computing.

[BibT_eX]

[DOI]

Jihyun Ryoo

Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

Distance-in-time versus distance-in-space.

[BibT_eX]

[DOI]

Proceedings of the PLDI '21: 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, 2021

Fluid: a framework for approximate concurrency via controlled dependency relaxation.

[BibT_eX]

[DOI]

Danfeng Zhang

Proceedings of the PLDI '21: 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, 2021

Characterizing AI Model Inference Applications Running in the SGX Environment.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Networking, Architecture and Storage, 2021

Improving Address Translation in Multi-GPUs via Sharing and Spilling aware TLB Design.

[BibT_eX]

[DOI]

Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

Automated Runtime-Aware Scheduling for Multi-Tenant DNN Inference on GPU.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2021

ScaleDNN: Data Movement Aware DNN Training on Multi-GPU.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2021

Towards a Secure Integrated Heterogeneous Platform via Cooperative CPU/GPU Encryption.

[BibT_eX]

[DOI]

Proceedings of the 30th IEEE Asian Test Symposium, 2021

A Compression-Compilation Co-Design Framework Towards Real-Time Object Detection on Mobile Devices.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

YOLObile: Real-Time Object Detection on Mobile Devices via Compression-Compilation Co-Design.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

Enabling Latency-Aware Data Initialization for Integrated CPU/GPU Heterogeneous Platform.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

Exploration of Input Patterns for Enhancing the Performance of Liquid State Machines.

[BibT_eX]

[DOI]

CoRR, 2020

Enhancing Address Translations in Throughput Processors via Compression.

[BibT_eX]

[DOI]

Ziyu Zhang

Weizheng Xu

Rami G. Melhem

Jun Yang

Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

2019

Quantifying Data Locality in Dynamic Parallelism in GPUs.

[BibT_eX]

[DOI]

Chita R. Das

Proceedings of the Abstracts of the 2019 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems, 2019

Computing with Near Data.

[BibT_eX]

[DOI]

Hui Zhao

Myoungsoo Jung

Proceedings of the Abstracts of the 2019 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems, 2019

Architecture-Aware Approximate Computing.

[BibT_eX]

[DOI]

Orhan Kislal

Meenakshi Arunachalam

Proceedings of the Abstracts of the 2019 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems, 2019

Co-optimizing memory-level parallelism and cache-level parallelism.

[BibT_eX]

[DOI]

Meenakshi Arunachalam

Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2019

Opportunistic computing in GPU architectures.

[BibT_eX]

[DOI]

Anand Sivasubramaniam

Chita R. Das

Proceedings of the 46th International Symposium on Computer Architecture, 2019

Architecture-Centric Bottleneck Analysis for Deep Neural Network Applications.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Conference on High Performance Computing, 2019

2018

Oversubscribed Command Queues in GPUs.

[BibT_eX]

[DOI]

Proceedings of the 11th Workshop on General Purpose Processing using GPUs, 2018

Enhancing computation-to-core assignment with physical location information.

[BibT_eX]

[DOI]

Orhan Kislal

Jagadish Kotra

Myoungsoo Jung

Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2018

Quantifying and Optimizing Data Access Parallelism on Manycores.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Symposium on Modeling, 2018

2017

Data movement aware computation partitioning.

[BibT_eX]

[DOI]

Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

DEMM: A Dynamic Energy-Saving Mechanism for Multicore Memories.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE International Symposium on Modeling, 2017

Controlled Kernel Launch for Dynamic Parallelism in GPUs.

[BibT_eX]

[DOI]

Mohamed Assem Ibrahim

Mahmut T. Kandemir

Chita R. Das

Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

POSTER: Location-Aware Computation Mapping for Manycore Processors.

[BibT_eX]

[DOI]

Orhan Kislal

Jagadish Kotra