Xulong Tang

Orcid: 0000-0002-3385-2053

According to our database1, Xulong Tang authored at least 64 papers between 2012 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
EdgeOL: Efficient in-situ Online Learning on Edge Devices.
CoRR, 2024

GRIT: Enhancing Multi-GPU Performance with Fine-Grained Dynamic Page Placement.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024

2023
Sustainable AI Processing at the Edge.
IEEE Micro, 2023

Minimizing Photonic Cluster State Depth in Measurement-Based Quantum Computing.
CoRR, 2023

Integrated Qubit Reuse and Circuit Cutting for Large Quantum Circuit Evaluation.
CoRR, 2023

BeatDance: A Beat-Based Model-Agnostic Contrastive Learning Framework for Music-Dance Retrieval.
CoRR, 2023

SupeRBNN: Randomized Binary Neural Network Using Adiabatic Superconductor Josephson Devices.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

IDYLL: Enhancing Page Translation in Multi-GPUs via Light Weight PTE Invalidations.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

SmartFRZ: An Efficient Training Framework using Attention-Based Layer Freezing.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

FlexGM: An Adaptive Runtime System to Accelerate Graph Matching Networks on GPUs.
Proceedings of the 41st IEEE International Conference on Computer Design, 2023

AB-ORAM: Constructing Adjustable Buckets for Space Reduction in Ring ORAM.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

Trans-FW: Short Circuiting Page Table Walk in Multi-GPU Systems via Remote Forwarding.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

CEGMA: Coordinated Elastic Graph Matching Acceleration for Graph Matching Networks.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

EP-ORAM: Efficient NVM-Friendly Path Eviction for Ring ORAM in Hybrid Memory.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

Orchestrated Scheduling and Partitioning for Improved Address Translation in GPUs.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

Orchestrating Measurement-Based Quantum Computation over Photonic Quantum Processors.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

2022
An efficient segmented quantization for graph neural networks.
CCF Trans. High Perform. Comput., December, 2022

Mobile or FPGA? A Comprehensive Evaluation on Energy Efficiency and a Unified Optimization Framework.
ACM Trans. Embed. Comput. Syst., September, 2022

Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration.
ACM Trans. Design Autom. Electr. Syst., 2022

Demystifying Arch-hints for Model Extraction: An Attack in Unified Memory System.
CoRR, 2022

Sustainable AI Processing at the Edge.
CoRR, 2022

Optimizing Data Layout for Training Deep Neural Networks.
Proceedings of the Companion of The Web Conference 2022, Virtual Event / Lyon, France, April 25, 2022

Layer Freezing & Data Sieving: Missing Pieces of a Generic Framework for Sparse Training.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Enhancing GPU Performance via Neighboring Directory Table Based Inter-TLB Sharing.
Proceedings of the IEEE 40th International Conference on Computer Design, 2022

Fine-Granular Computation and Data Layout Reorganization for Improving Locality.
Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design, 2022

Q-GPU: A Recipe of Optimizations for Quantum Circuit Simulation Using GPUs.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

You Already Have It: A Generator-Free Low-Precision DNN Training Framework Using Stochastic Rounding.
Proceedings of the Computer Vision - ECCV 2022, 2022

2021
Algorithm-hardware Co-design of Attention Mechanism on FPGA Devices.
ACM Trans. Embed. Comput. Syst., 2021

Mix and Match: Reorganizing Tasks for Enhancing Data Locality.
Proc. ACM Meas. Anal. Comput. Syst., 2021

A Survey of Large-Scale Deep Learning Serving System Optimization: Challenges and Opportunities.
CoRR, 2021

Parallelizing DNN Training on GPUs: Challenges and Opportunities.
Proceedings of the Companion of The Web Conference 2021, 2021

Work in Progress: Mobile or FPGA? A Comprehensive Evaluation on Energy Efficiency and a Unified Optimization Framework.
Proceedings of the 27th IEEE Real-Time and Embedded Technology and Applications Symposium, 2021

Compiler support for near data computing.
Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

Distance-in-time versus distance-in-space.
Proceedings of the PLDI '21: 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, 2021

Fluid: a framework for approximate concurrency via controlled dependency relaxation.
Proceedings of the PLDI '21: 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, 2021

Characterizing AI Model Inference Applications Running in the SGX Environment.
Proceedings of the IEEE International Conference on Networking, Architecture and Storage, 2021

Improving Address Translation in Multi-GPUs via Sharing and Spilling aware TLB Design.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

Automated Runtime-Aware Scheduling for Multi-Tenant DNN Inference on GPU.
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2021

ScaleDNN: Data Movement Aware DNN Training on Multi-GPU.
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2021

Towards a Secure Integrated Heterogeneous Platform via Cooperative CPU/GPU Encryption.
Proceedings of the 30th IEEE Asian Test Symposium, 2021

A Compression-Compilation Co-Design Framework Towards Real-Time Object Detection on Mobile Devices.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

YOLObile: Real-Time Object Detection on Mobile Devices via Compression-Compilation Co-Design.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Enabling Latency-Aware Data Initialization for Integrated CPU/GPU Heterogeneous Platform.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

Exploration of Input Patterns for Enhancing the Performance of Liquid State Machines.
CoRR, 2020

Enhancing Address Translations in Throughput Processors via Compression.
Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

2019
Architecture-Aware Approximate Computing.
Proc. ACM Meas. Anal. Comput. Syst., 2019

Co-optimizing memory-level parallelism and cache-level parallelism.
Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2019

Opportunistic computing in GPU architectures.
Proceedings of the 46th International Symposium on Computer Architecture, 2019

Architecture-Centric Bottleneck Analysis for Deep Neural Network Applications.
Proceedings of the 26th IEEE International Conference on High Performance Computing, 2019

2018
Quantifying Data Locality in Dynamic Parallelism in GPUs.
Proc. ACM Meas. Anal. Comput. Syst., 2018

Computing with Near Data.
Proc. ACM Meas. Anal. Comput. Syst., 2018

Oversubscribed Command Queues in GPUs.
Proceedings of the 11th Workshop on General Purpose Processing using GPUs, 2018

Enhancing computation-to-core assignment with physical location information.
Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2018

Quantifying and Optimizing Data Access Parallelism on Manycores.
Proceedings of the 26th IEEE International Symposium on Modeling, 2018

2017
Data movement aware computation partitioning.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

DEMM: A Dynamic Energy-Saving Mechanism for Multicore Memories.
Proceedings of the 25th IEEE International Symposium on Modeling, 2017

Controlled Kernel Launch for Dynamic Parallelism in GPUs.
Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

POSTER: Location-Aware Computation Mapping for Manycore Processors.
Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

2016
Improving bank-level parallelism for irregular applications.
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

Scheduling Techniques for GPU Architectures with Processing-In-Memory Capabilities.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

μC-States: Fine-grained GPU Datapath Power Management.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015
Memory Row Reuse Distance and its Role in Optimizing Application Performance.
Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, 2015

Optimizing off-chip accesses in multicores.
Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2015

2012
FlexBFS: a parallelism-aware implementation of breadth-first search on GPU.
Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2012


  Loading...