Xu Liu

Orcid: 0000-0002-1487-963X

Affiliations:
  • North Carolina State University, Raleigh, NC, USA
  • College of William and Mary, Williamsburg, VA, USA (former)
  • Rice University, Department of Computer Science, Houston, TX, USA (PhD 2014)


According to our database1, Xu Liu authored at least 75 papers between 2008 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
EasyView: Bringing Performance Profiles into Integrated Development Environments.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2024

DrPy: Pinpointing Inefficient Memory Usage in Multi-Layer Python Applications.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2024

2023
AutoST: Training-free Neural Architecture Search for Spiking Transformers.
CoRR, 2023

DrGPU: A Top-Down Profiler for GPU Applications.
Proceedings of the 2023 ACM/SPEC International Conference on Performance Engineering, 2023

DroidPerf: Profiling Memory Objects on Android Devices.
Proceedings of the 29th Annual International Conference on Mobile Computing and Networking, 2023

DJXPerf: Identifying Memory Inefficiencies via Object-Centric Profiling for Java.
Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization, 2023

2022
BinGo: Pinpointing Concurrency Bugs in Go via Binary Analysis.
CoRR, 2022

Graph Neural Networks Based Memory Inefficiency Detection Using Selective Sampling.
Proceedings of the SC22: International Conference for High Performance Computing, 2022

OJXPERF: Featherlight Object Replica Detection for Java Programs.
Proceedings of the 44th IEEE/ACM 44th International Conference on Software Engineering, 2022

ValueExpert: exploring value patterns in GPU-accelerated applications.
Proceedings of the ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022, 2022

2021
NumaPerf: Predictive and Full NUMA Profiling.
CoRR, 2021

Toward efficient interactions between Python and native libraries.
Proceedings of the ESEC/FSE '21: 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2021

NumaPerf: predictive NUMA profiling.
Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021

2020
Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect.
IEEE Trans. Parallel Distributed Syst., 2020

Efficient Abortable-locking Protocol for Multi-level NUMA Systems: Design and Correctness.
ACM Trans. Parallel Comput., 2020

GVProf: a value profiler for GPU-based clusters.
Proceedings of the International Conference for High Performance Computing, 2020

DrCCTProf: a fine-grained call path profiler for ARM-based clusters.
Proceedings of the International Conference for High Performance Computing, 2020

ZeroSpy: exploring software inefficiency with redundant zeros.
Proceedings of the International Conference for High Performance Computing, 2020

ScalAna: automating scaling loss detection with graph analysis.
Proceedings of the International Conference for High Performance Computing, 2020

Identifying scalability bottlenecks for large-scale parallel programs with graph analysis.
Proceedings of the PPoPP '20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2020

What every scientific programmer should know about compiler optimizations?
Proceedings of the ICS '20: 2020 International Conference on Supercomputing, 2020

ATMem: adaptive data placement in graph applications on heterogeneous memories.
Proceedings of the CGO '20: 18th ACM/IEEE International Symposium on Code Generation and Optimization, 2020

2019
Intelligent-Unrolling: Exploiting Regular Patterns in Irregular Applications.
CoRR, 2019

Pinpointing performance inefficiencies in Java.
Proceedings of the ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2019

Pinpointing performance inefficiencies via lightweight variance profiling.
Proceedings of the International Conference for High Performance Computing, 2019

Lightweight hardware transactional memory profiling.
Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2019

Redundant loads: a software inefficiency indicator.
Proceedings of the 41st International Conference on Software Engineering, 2019

Can we trust profiling results?: understanding and fixing the inaccuracy in modern profilers.
Proceedings of the ACM International Conference on Supercomputing, 2019

CPpf: a prefetch aware LLC partitioning approach.
Proceedings of the 48th International Conference on Parallel Processing, 2019

Featherlight Reuse-Distance Measurement.
Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019

Transforming Query Sequences for High-Throughput B+ Tree Processing on Many-Core Processors.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2019

2018
LWPTool: A Lightweight Profiler to Guide Data Layout Optimization.
IEEE Trans. Parallel Distributed Syst., 2018

NUMA-Caffe: NUMA-Aware Deep Learning Neural Networks.
ACM Trans. Archit. Code Optim., 2018

Start Late or Finish Early: A Distributed Graph Processing System with Redundancy Reduction.
Proc. VLDB Endow., 2018

An Evaluation of Vectorization and Cache Reuse Tradeoffs on Modern CPUs.
Proceedings of the 9th International Workshop on Programming Models and Applications for Multicores and Manycores, 2018

Featherlight on-the-fly false-sharing detection.
Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suite.
Proceedings of the 2018 IEEE International Symposium on Workload Characterization, 2018

ProfDP: A Lightweight Profiler to Guide Data Placement in Heterogeneous Memory Systems.
Proceedings of the 32nd International Conference on Supercomputing, 2018

Towards Efficient SpMV on Sunway Manycore Architectures.
Proceedings of the 32nd International Conference on Supercomputing, 2018

CVR: efficient vectorization of SpMV on x86 processors.
Proceedings of the 2018 International Symposium on Code Generation and Optimization, 2018

CUDAAdvisor: LLVM-based runtime profiling for modern GPUs.
Proceedings of the 2018 International Symposium on Code Generation and Optimization, 2018

Lightweight detection of cache conflicts.
Proceedings of the 2018 International Symposium on Code Generation and Optimization, 2018

Watching for Software Inefficiencies with Witch.
Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018

2017
An Efficient Abortable-locking Protocol for Multi-level NUMA Systems.
Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017

DR-BW: Identifying Bandwidth Contention in NUMA Architectures with Supervised Learning.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

FLEP: Enabling Flexible and Efficient Preemption on GPUs.
Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017

REDSPY: Exploring Value Locality in Software.
Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017

Locality-Aware CTA Clustering for Modern GPUs.
Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017

2016
Correctness of Hierarchical MCS Locks with Timeout.
CoRR, 2016

Characterizing emerging heterogeneous memory.
Proceedings of the 2016 ACM SIGPLAN International Symposium on Memory Management, Santa Barbara, CA, USA, June 14, 2016

HIPS Introduction and Committees.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

SMT-Aware Instantaneous Footprint Optimization.
Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, 2016

Understanding Data Analytics Workloads on Intel(R) Xeon Phi(R).
Proceedings of the 18th IEEE International Conference on High Performance Computing and Communications; 14th IEEE International Conference on Smart City; 2nd IEEE International Conference on Data Science and Systems, 2016

StructSlim: a lightweight profiler to guide structure splitting.
Proceedings of the 2016 International Symposium on Code Generation and Optimization, 2016

Cheetah: detecting false sharing efficiently and effectively.
Proceedings of the 2016 International Symposium on Code Generation and Optimization, 2016

<i>memif</i>: Towards Programming Heterogeneous Memory Asynchronously.
Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, 2016

2015
ScaAnalyzer: a tool to identify memory scalability bottlenecks in parallel programs.
Proceedings of the International Conference for High Performance Computing, 2015

Characterizing Data Analytics Workloads on Intel Xeon Phi.
Proceedings of the 2015 IEEE International Symposium on Workload Characterization, 2015

Towards Hybrid Programming in Big Data.
Proceedings of the 7th USENIX Workshop on Hot Topics in Cloud Computing, 2015

Runtime Value Numbering: A Profiling Technique to Pinpoint Redundant Computations.
Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

2014
A tool to analyze the performance of multithreaded programs on NUMA architectures.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2014

Call Paths for Pin Tools.
Proceedings of the 12th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2014

ArrayTool: a lightweight profiler to guide array regrouping.
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013
A data-centric profiler for parallel programs.
Proceedings of the International Conference for High Performance Computing, 2013

OMPT: An OpenMP Tools Application Programming Interface for Performance Analysis.
Proceedings of the OpenMP in the Era of Low Power Devices and Accelerators, 2013

Pinpointing data locality bottlenecks with low overhead.
Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2013

A new approach for performance analysis of openMP programs.
Proceedings of the International Conference on Supercomputing, 2013

Evaluating task scheduling in hadoop-based cloud systems.
Proceedings of the 2013 IEEE International Conference on Big Data (IEEE BigData 2013), 2013

2011
Automatic performance debugging of SPMD-style parallel programs.
J. Parallel Distributed Comput., 2011

Towards quantitative analysis of data intensive computing: a case study of Hadoop.
Proceedings of the 8th International Conference on Autonomic Computing, 2011

Pinpointing data locality problems using data-centric analysis.
Proceedings of the CGO 2011, 2011

2010
Automatic Performance Debugging of SPMD Parallel Programs
CoRR, 2010

2009
Similarity Analysis in Automatic Performance Debugging of SPMD Parallel Programs
CoRR, 2009

2008
A Fast-Start, Fault-Tolerant MPI Launcher on Dawning Supercomputers.
Proceedings of the Ninth International Conference on Parallel and Distributed Computing, 2008

A Dynamic Provisioning Framework for Multi-tier Internet Applications in Virtualized Data Center.
Proceedings of the Ninth International Conference on Parallel and Distributed Computing, 2008


  Loading...