Proceedings of the ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022, 2022

2021

A Fast Lock for Explicit Message Passing Architectures.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2021

3-D Partitioning for Large-Scale Graph Processing.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2021

Kudu: An Efficient and Scalable Distributed Graph Pattern Mining Engine.

[BibT_eX]

[DOI]

Jingji Chen

Xuehai Qian

CoRR, 2021

Graph processing and machine learning architectures with emerging memory technologies: a survey.

[BibT_eX]

[DOI]

Xuehai Qian

Sci. China Inf. Sci., 2021

ESCALATE: Boosting the Efficiency of Sparse CNN Accelerator with Kernel Decomposition.

[BibT_eX]

[DOI]

Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

FORMS: Fine-grained Polarized ReRAM-based In-situ Computation for Mixed-signal DNN Accelerator.

[BibT_eX]

[DOI]

Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

HASCO: Towards Agile HArdware and Software CO-design for Tensor Computation.

[BibT_eX]

[DOI]

Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

GoSPA: An Energy-efficient High-performance Globally Optimized SParse Convolutional Neural Network Accelerator.

[BibT_eX]

[DOI]

Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

Mix and Match: A Novel FPGA-Centric Deep Neural Network Quantization Framework.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

A Lightweight Isolation Mechanism for Secure Branch Predictors.

[BibT_eX]

[DOI]

Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021

2020

iCELIA: A Full-Stack Framework for STT-MRAM-Based Deep Learning Acceleration.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2020

Efficient Performance Estimation and Work-Group Size Pruning for OpenCL Kernels on GPUs.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2020

Guest Editors' Introduction to the Special Issue on Machine Learning Architectures and Accelerators.

[BibT_eX]

[DOI]

Xuehai Qian

Yanzhi Wang

Avinash Karanth

IEEE Trans. Computers, 2020

IntersectX: An Accelerator for Graph Mining.

[BibT_eX]

[DOI]

Gengyu Rao

Jingji Chen

Xuehai Qian

CoRR, 2020

Low-Cost Floating-Point Processing in ReRAM for Scientific Computing.

[BibT_eX]

[DOI]

CoRR, 2020

DwarvesGraph: A High-Performance Graph Mining System with Pattern Decomposition.

[BibT_eX]

[DOI]

Jingji Chen

Xuehai Qian

CoRR, 2020

ReversiSpec: Reversible Coherence Protocol for Defending Transient Attacks.

[BibT_eX]

[DOI]

You Wu

Xuehai Qian

CoRR, 2020

A Comprehensive Evaluation of RDMA-enabled Concurrency Control Protocols.

[BibT_eX]

[DOI]

Chao Wang

Kezhao Huang

Xuehai Qian

CoRR, 2020

SympleGraph: distributed graph processing with precise loop-carried dependency guarantee.

[BibT_eX]

[DOI]

Proceedings of the 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation, 2020

AccQOC: Accelerating Quantum Optimal Control Based Pulse Generation.

[BibT_eX]

[DOI]

Jinglei Cheng

Haoqing Deng

Xuehai Qian

Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

AccPar: Tensor Partitioning for Heterogeneous Deep Learning Accelerators.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020

TUPIM: A Transparent and Universal Processing-in-Memory Architecture for Unmodified Binaries.

[BibT_eX]

[DOI]

Proceedings of the GLSVLSI '20: Great Lakes Symposium on VLSI 2020, 2020

DNNGuard: An Elastic Heterogeneous DNN Accelerator Architecture against Adversarial Attacks.

[BibT_eX]

[DOI]

Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020

Capuchin: Tensor-based GPU Memory Management for Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020

PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning.

[BibT_eX]

[DOI]

Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020

AsymNVM: An Efficient Framework for Implementing Persistent Data Structures on Asymmetric NVM Architecture.

[BibT_eX]

[DOI]

Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020

Prague: High-Performance Heterogeneity-Aware Asynchronous Decentralized Training.

[BibT_eX]

[DOI]

Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020

2019

Clip: A Disk I/O Focused Parallel Out-of-Core Graph Processing System.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2019

Distributed Graph Processing System and Processing-in-memory Architecture with Precise Loop-carried Dependency Guarantee.

[BibT_eX]

[DOI]

ACM Trans. Comput. Syst., 2019

HEIF: Highly Efficient Stochastic Computing-Based Inference Framework for Deep Neural Networks.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2019

Heterogeneity-Aware Asynchronous Decentralized Training.

[BibT_eX]

[DOI]

CoRR, 2019

A Stochastic-Computing based Deep Learning Framework using Adiabatic Quantum-Flux-Parametron SuperconductingTechnology.

[BibT_eX]

[DOI]

CoRR, 2019

Non-structured DNN Weight Pruning Considered Harmful.

[BibT_eX]

[DOI]

CoRR, 2019

ReBNN: in-situ acceleration of binarized neural networks in ReRAM using complementary resistive cell.

[BibT_eX]

[DOI]

CCF Trans. High Perform. Comput., 2019

PIMSim: A Flexible and Detailed Processing-in-Memory Simulator.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2019

GraphQ: Scalable PIM-Based Graph Processing.

[BibT_eX]

[DOI]

Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

TPShare: a time-space sharing scheduling abstraction for shared cloud via vertical labels.

[BibT_eX]

[DOI]

Proceedings of the 46th International Symposium on Computer Architecture, 2019

TIE: energy-efficient tensor train-based inference engine for deep neural network.

[BibT_eX]

[DOI]

Proceedings of the 46th International Symposium on Computer Architecture, 2019

A stochastic-computing based deep learning framework using adiabatic quantum-flux-parametron superconducting technology.

[BibT_eX]

[DOI]

Proceedings of the 46th International Symposium on Computer Architecture, 2019

SpeedyBox: Low-Latency NFV Service Chains with Cross-NF Runtime Consolidation.

[BibT_eX]

[DOI]

Proceedings of the 39th IEEE International Conference on Distributed Computing Systems, 2019

A Hybrid Framework for Fast and Accurate GPU Performance Estimation through Source-Level Analysis and Trace-Based Simulation.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019

HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019

E-RNN: Design Optimization for Efficient Recurrent Neural Networks in FPGAs.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019

pLock: A Fast Lock for Architectures with Explicit Inter-core Message Passing.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019

ADMM-NN: An Algorithm-Hardware Co-Design Framework of DNNs Using Alternating Direction Methods of Multipliers.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019

Hop: Heterogeneity-aware Decentralized Training.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019

2018

DudeTx: Durable Transactions Made Decoupled.

[BibT_eX]

[DOI]

ACM Trans. Storage, 2018

ADMM-NN: An Algorithm-Hardware Co-Design Framework of DNNs Using Alternating Direction Method of Multipliers.

[BibT_eX]

[DOI]

CoRR, 2018

An Efficient Framework for Implementing Persist Data Structures on Remote NVM.

[BibT_eX]

[DOI]

CoRR, 2018

vSensor: leveraging fixed-workload snippets of programs for performance variance detection.

[BibT_eX]

[DOI]

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

CSE: Parallel Finite State Machines with Convergence Set Enumeration.

[BibT_eX]

[DOI]

Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

CounterMiner: Mining Big Performance Data from Hardware Counters.

[BibT_eX]

[DOI]

Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

PermDNN: Efficient Compressed DNN Architecture with Permuted Diagonal Matrices.

[BibT_eX]

[DOI]

Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

GraphP: Reducing Communication for PIM-Based Graph Processing with Efficient Data Partition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

G-TSC: Timestamp Based Coherence for GPUs.

[BibT_eX]

[DOI]

Abdulaziz Tabbakh

Xuehai Qian

Murali Annavaram

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

GraphR: Accelerating Graph Processing Using ReRAM.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

ReRAM-based accelerator for deep learning.

[BibT_eX]

[DOI]

Proceedings of the 2018 Design, Automation & Test in Europe Conference & Exhibition, 2018

Wonderland: A Novel Abstraction-Based Out-Of-Core Graph Processing System.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018

Datasize-Aware High Dimensional Configurations Auto-Tuning of In-Memory Cluster Computing.

[BibT_eX]

[DOI]

Zhibin Yu

Zhendong Bei

Xuehai Qian

Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018

VIBNN: Hardware Acceleration of Bayesian Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018

Neu-NoC: A high-efficient interconnection network for accelerated neuromorphic systems.

[BibT_eX]

[DOI]

Proceedings of the 23rd Asia and South Pacific Design Automation Conference, 2018

Towards Ultra-High Performance and Energy Efficiency of Deep Learning Systems: An Algorithm-Hardware Co-Optimization Framework.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017

CirCNN: Accelerating and Compressing Deep Neural Networks Using Block-CirculantWeight Matrices.

[BibT_eX]

[DOI]

CoRR, 2017

Squeezing out All the Value of Loaded Data: An Out-of-core Graph Processing System with Reduced Disk I/O.

[BibT_eX]

[DOI]

Proceedings of the 2017 USENIX Annual Technical Conference, 2017

CirCNN: accelerating and compressing deep neural networks using block-circulant weight matrices.

[BibT_eX]

[DOI]

Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

Power Efficient Sharing-Aware GPU Data Management.

[BibT_eX]

[DOI]

Abdulaziz Tabbakh

Murali Annavaram

Xuehai Qian

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

SC-DCNN: Highly-Scalable Deep Convolutional Neural Network using Stochastic Computing.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017

DudeTM: Building Durable Transactions with Decoupling for Persistent Memory.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017

2016

OPR: deterministic group replay for one-sided communication.

[BibT_eX]

[DOI]

Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2016

Exploring the Hidden Dimension in Graph Processing.

[BibT_eX]

[DOI]

Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, 2016

SReplay: Deterministic Sub-Group Replay for One-Sided Communication.

[BibT_eX]

[DOI]

Proceedings of the 2016 International Conference on Supercomputing, 2016

2015

Improving multiprocessor performance with fine-grain coherence bypass.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2015

2014

OmniOrder: Directory-based conflict serialization of transactions.

[BibT_eX]

[DOI]

Xuehai Qian

Benjamín Sahelices

Josep Torrellas

Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

Pacifier: Record and replay for relaxed-consistency multiprocessors with distributed directory protocol.

[BibT_eX]

[DOI]

Xuehai Qian

Benjamín Sahelices

Depei Qian

Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

2013

Scalable and flexible bulk architecture

[BibT_eX]

[DOI]

Xuehai Qian

PhD thesis, 2013

BulkCommit: scalable and fast commit of atomic blocks in a lazy multiprocessor environment.

[BibT_eX]

[DOI]

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

Rainbow: Efficient memory dependence recording with high replay parallelism for relaxed memory model.

[BibT_eX]

[DOI]

Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013

Volition: scalable and precise sequential consistency violation detection.

[BibT_eX]

[DOI]

Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2013

2012

BulkSMT: Designing SMT processors for atomic-block execution.

[BibT_eX]

[DOI]

Xuehai Qian

Benjamín Sahelices

Josep Torrellas

Proceedings of the 18th IEEE International Symposium on High Performance Computer Architecture, 2012

2010

ScalableBulk: Scalable Cache Coherence for Atomic Blocks in a Lazy Environment.

[BibT_eX]

[DOI]

Xuehai Qian

Wonsun Ahn

Josep Torrellas

Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

2007

Design and Implementation of Floating Point Stack on General RISC Architecture.

[BibT_eX]

[DOI]

Proceedings of the 15th Euromicro International Conference on Parallel, 2007

Circuit implementation of floating point range reduction for trigonometric functions.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Circuits and Systems (ISCAS 2007), 2007

Optimized Register Renaming Scheme for Stack-Based x86 Operations.

[BibT_eX]

[DOI]

Proceedings of the Architecture of Computing Systems, 2007

Xuehai Qian

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...