Xu Liu

According to our database1, Xu Liu authored at least 53 papers between 2008 and 2020.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Other 

Links

Homepages:

On csauthors.net:

Bibliography

2020
Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect.
IEEE Trans. Parallel Distrib. Syst., 2020

2019
Intelligent-Unrolling: Exploiting Regular Patterns in Irregular Applications.
CoRR, 2019

Pinpointing performance inefficiencies in Java.
Proceedings of the ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2019

Pinpointing performance inefficiencies via lightweight variance profiling.
Proceedings of the International Conference for High Performance Computing, 2019

Lightweight hardware transactional memory profiling.
Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2019

Redundant loads: a software inefficiency indicator.
Proceedings of the 41st International Conference on Software Engineering, 2019

Can we trust profiling results?: understanding and fixing the inaccuracy in modern profilers.
Proceedings of the ACM International Conference on Supercomputing, 2019

Distributed Direction-Optimizing Label Propagation for Community Detection.
Proceedings of the 2019 IEEE High Performance Extreme Computing Conference, 2019

Featherlight Reuse-Distance Measurement.
Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019

Transforming Query Sequences for High-Throughput B+ Tree Processing on Many-Core Processors.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2019

2018
LWPTool: A Lightweight Profiler to Guide Data Layout Optimization.
IEEE Trans. Parallel Distrib. Syst., 2018

NUMA-Caffe: NUMA-Aware Deep Learning Neural Networks.
TACO, 2018

Start Late or Finish Early: A Distributed Graph Processing System with Redundancy Reduction.
PVLDB, 2018

An Evaluation of Vectorization and Cache Reuse Tradeoffs on Modern CPUs.
Proceedings of the 9th International Workshop on Programming Models and Applications for Multicores and Manycores, 2018

Featherlight on-the-fly false-sharing detection.
Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suite.
Proceedings of the 2018 IEEE International Symposium on Workload Characterization, 2018

ProfDP: A Lightweight Profiler to Guide Data Placement in Heterogeneous Memory Systems.
Proceedings of the 32nd International Conference on Supercomputing, 2018

Towards Efficient SpMV on Sunway Manycore Architectures.
Proceedings of the 32nd International Conference on Supercomputing, 2018

CVR: efficient vectorization of SpMV on x86 processors.
Proceedings of the 2018 International Symposium on Code Generation and Optimization, 2018

CUDAAdvisor: LLVM-based runtime profiling for modern GPUs.
Proceedings of the 2018 International Symposium on Code Generation and Optimization, 2018

Lightweight detection of cache conflicts.
Proceedings of the 2018 International Symposium on Code Generation and Optimization, 2018

Watching for Software Inefficiencies with Witch.
Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018

2017
An Efficient Abortable-locking Protocol for Multi-level NUMA Systems.
Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017

DR-BW: Identifying Bandwidth Contention in NUMA Architectures with Supervised Learning.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

FLEP: Enabling Flexible and Efficient Preemption on GPUs.
Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017

REDSPY: Exploring Value Locality in Software.
Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017

Locality-Aware CTA Clustering for Modern GPUs.
Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017

2016
Correctness of Hierarchical MCS Locks with Timeout.
CoRR, 2016

Characterizing emerging heterogeneous memory.
Proceedings of the 2016 ACM SIGPLAN International Symposium on Memory Management, Santa Barbara, CA, USA, June 14, 2016

HIPS Introduction and Committees.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

SMT-Aware Instantaneous Footprint Optimization.
Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, 2016

Understanding Data Analytics Workloads on Intel(R) Xeon Phi(R).
Proceedings of the 18th IEEE International Conference on High Performance Computing and Communications; 14th IEEE International Conference on Smart City; 2nd IEEE International Conference on Data Science and Systems, 2016

StructSlim: a lightweight profiler to guide structure splitting.
Proceedings of the 2016 International Symposium on Code Generation and Optimization, 2016

Cheetah: detecting false sharing efficiently and effectively.
Proceedings of the 2016 International Symposium on Code Generation and Optimization, 2016

memif: Towards Programming Heterogeneous Memory Asynchronously.
Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, 2016

2015
ScaAnalyzer: a tool to identify memory scalability bottlenecks in parallel programs.
Proceedings of the International Conference for High Performance Computing, 2015

Characterizing Data Analytics Workloads on Intel Xeon Phi.
Proceedings of the 2015 IEEE International Symposium on Workload Characterization, 2015

Towards Hybrid Programming in Big Data.
Proceedings of the 7th USENIX Workshop on Hot Topics in Cloud Computing, 2015

Runtime Value Numbering: A Profiling Technique to Pinpoint Redundant Computations.
Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

2014
A tool to analyze the performance of multithreaded programs on NUMA architectures.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2014

Call Paths for Pin Tools.
Proceedings of the 12th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2014

ArrayTool: a lightweight profiler to guide array regrouping.
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013
A data-centric profiler for parallel programs.
Proceedings of the International Conference for High Performance Computing, 2013

OMPT: An OpenMP Tools Application Programming Interface for Performance Analysis.
Proceedings of the OpenMP in the Era of Low Power Devices and Accelerators, 2013

Pinpointing data locality bottlenecks with low overhead.
Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2013

A new approach for performance analysis of openMP programs.
Proceedings of the International Conference on Supercomputing, 2013

Evaluating task scheduling in hadoop-based cloud systems.
Proceedings of the 2013 IEEE International Conference on Big Data, 2013

2011
Automatic performance debugging of SPMD-style parallel programs.
J. Parallel Distributed Comput., 2011

Pinpointing data locality problems using data-centric analysis.
Proceedings of the CGO 2011, 2011

2010
Automatic Performance Debugging of SPMD Parallel Programs
CoRR, 2010

2009
Similarity Analysis in Automatic Performance Debugging of SPMD Parallel Programs
CoRR, 2009

2008
A Fast-Start, Fault-Tolerant MPI Launcher on Dawning Supercomputers.
Proceedings of the Ninth International Conference on Parallel and Distributed Computing, 2008

A Dynamic Provisioning Framework for Multi-tier Internet Applications in Virtualized Data Center.
Proceedings of the Ninth International Conference on Parallel and Distributed Computing, 2008


  Loading...