Xiaoyi Lu

Orcid: 0000-0001-7581-8905

According to our database1, Xiaoyi Lu authored at least 141 papers between 2008 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Compression Analysis for BlueField-2/-3 Data Processing Units: Lossy and Lossless Perspectives.
IEEE Micro, 2024

High-Speed Data Communication With Advanced Networks in Large Language Model Training.
IEEE Micro, 2024

2023
xCCL: A Survey of Industry-Led Collective Communication Libraries for Deep Learning.
J. Comput. Sci. Technol., February, 2023

SBGT: Scaling Bayesian-based Group Testing for Disease Surveillance.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

Performance Characterization of Large Language Models on High-Speed Interconnects.
Proceedings of the IEEE Symposium on High-Performance Interconnects, 2023

Characterizing Lossy and Lossless Compression on Emerging BlueField DPU Architectures.
Proceedings of the IEEE Symposium on High-Performance Interconnects, 2023

2022
A Study of Database Performance Sensitivity to Experiment Settings.
Proc. VLDB Endow., 2022

Arcadia: A Fast and Reliable Persistent Memory Replicated Log.
CoRR, 2022

NVMe-oAF: Towards Adaptive NVMe-oF for IO-Intensive Workloads on HPC Cloud.
Proceedings of the HPDC '22: The 31st International Symposium on High-Performance Parallel and Distributed Computing, Minneapolis, MN, USA, 27 June 2022, 2022

HiBGT: High-Performance Bayesian Group Testing for COVID-19.
Proceedings of the 29th IEEE International Conference on High Performance Computing, 2022

Benchmarking Object Detection Models with Mummy Nuts Datasets.
Proceedings of the Benchmarking, Measuring, and Optimizing, 2022

2021
Towards Offloadable and Migratable Microservices on Disaggregated Architectures: Vision, Challenges, and Research Roadmap.
CoRR, 2021

HatRPC: hint-accelerated thrift RPC over RDMA.
Proceedings of the International Conference for High Performance Computing, 2021

NVMe-CR: A Scalable Ephemeral Storage Runtime for Checkpoint/Restart with NVMe-over-Fabrics.
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

Characterizing and Accelerating End-to-End EdgeAI Inference Systems for Object Detection Applications.
Proceedings of the 6th IEEE/ACM Symposium on Edge Computing, 2021

DStore: A Fast, Tailless, and Quiescent-Free Object Store for PMEM.
Proceedings of the HPDC '21: The 30th International Symposium on High-Performance Parallel and Distributed Computing, 2021

Global Adaptive Optimization Parameters For Robust Pupil Location.
Proceedings of the 17th International Conference on Computational Intelligence and Security CIS 2021, 2021

2020
Understanding the Idiosyncrasies of Real Persistent Memory.
Proc. VLDB Endow., 2020

CirroData: Yet Another SQL-on-Hadoop Data Analytics Engine with High Performance.
J. Comput. Sci. Technol., 2020

On mass conservation and solvability of the discretized variable-density zero-Mach Navier-Stokes equations.
J. Comput. Phys., 2020

INEC: fast and coherent in-network erasure coding.
Proceedings of the International Conference for High Performance Computing, 2020

RDMP-KV: designing remote direct memory persistence based key-value stores with PMEM.
Proceedings of the International Conference for High Performance Computing, 2020

Workshop 7: HPBDC High-Performance Big Data and Cloud Computing.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

Impact of Commodity Networks on Storage Disaggregation with NVMe-oF.
Proceedings of the Benchmarking, Measuring, and Optimizing, 2020

2019
Exploiting Hardware Multicast and GPUDirect RDMA for Efficient Broadcast.
IEEE Trans. Parallel Distributed Syst., 2019

Performance analysis of deep learning workloads using roofline trajectories.
CCF Trans. High Perform. Comput., 2019

TriEC: tripartite graph based erasure coding NIC offload.
Proceedings of the International Conference for High Performance Computing, 2019

Introduction to HPBDC 2019.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2019

C-GDR: High-Performance Container-Aware GPUDirect MPI Communication Schemes on RDMA Networks.
Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

SimdHT-Bench: Characterizing SIMD-Aware Hash Table Designs on Emerging CPU Architectures.
Proceedings of the IEEE International Symposium on Workload Characterization, 2019

UMR-EC: A Unified and Multi-Rail Erasure Coding Library for High-Performance Distributed Storage Systems.
Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing, 2019

SCOR-KV: SIMD-Aware Client-Centric and Optimistic RDMA-Based Key-Value Store for Emerging CPU Architectures.
Proceedings of the 26th IEEE International Conference on High Performance Computing, 2019

Early Experience in Benchmarking Edge AI Processors with Object Detection Workloads.
Proceedings of the Benchmarking, Measuring, and Optimizing, 2019

2018
DLoBD: A Comprehensive Study of Deep Learning over Big Data Stacks on HPC Clusters.
IEEE Trans. Multi Scale Comput. Syst., 2018

Networking and communication challenges for post-exascale systems.
Frontiers Inf. Technol. Electron. Eng., 2018

MR-Advisor: A comprehensive tuning, profiling, and prediction tool for MapReduce execution frameworks on HPC clusters.
J. Parallel Distributed Comput., 2018

Designing a Micro-Benchmark Suite to Evaluate gRPC for TensorFlow: Early Experiences.
CoRR, 2018

Analyzing, Modeling, and Provisioning QoS for NVMe SSDs.
Proceedings of the 11th IEEE/ACM International Conference on Utility and Cloud Computing, 2018

Multi-Threading and Lock-Free MPI RMA Based Graph Processing on KNL and POWER Architectures.
Proceedings of the 25th European MPI Users' Group Meeting, 2018

Introduction to HPBDC 2018.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018

Accelerating TensorFlow with Adaptive RDMA-Based gRPC.
Proceedings of the 25th IEEE International Conference on High Performance Computing, 2018

OC-DNN: Exploiting Advanced Unified Memory Capabilities in CUDA 9 and Volta GPUs for Out-of-Core DNN Training.
Proceedings of the 25th IEEE International Conference on High Performance Computing, 2018

Cutting the Tail: Designing High Performance Message Brokers to Reduce Tail Latencies in Stream Processing.
Proceedings of the IEEE International Conference on Cluster Computing, 2018

High-Performance Multi-Rail Erasure Coding Library over Modern Data Center Architectures: Early Experiences.
Proceedings of the ACM Symposium on Cloud Computing, 2018

Spark-uDAPL: Cost-Saving Big Data Analytics on Microsoft Azure Cloud with RDMA Networks<sup>*</sup>.
Proceedings of the IEEE International Conference on Big Data (IEEE BigData 2018), 2018

A Survey on Deep Learning Benchmarks: Do We Still Need New Ones?
Proceedings of the Benchmarking, Measuring, and Optimizing, 2018

EC-Bench: Benchmarking Onload and Offload Erasure Coders on Modern Hardware Architectures.
Proceedings of the Benchmarking, Measuring, and Optimizing, 2018

HPC AI500: A Benchmark Suite for HPC AI Systems.
Proceedings of the Benchmarking, Measuring, and Optimizing, 2018

2017
A Comprehensive Study of MapReduce Over Lustre for Intermediate Data Placement and Shuffle Strategies on HPC Clusters.
IEEE Trans. Parallel Distributed Syst., 2017

Scalable and Distributed Key-Value Store-based Data Management Using RDMA-Memcached.
IEEE Data Eng. Bull., 2017

Designing Locality and NUMA Aware MPI Runtime for Nested Virtualization based HPC Cloud with SR-IOV Enabled InfiniBand.
Proceedings of the 13th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, 2017

Is Singularity-based Container Technology Ready for Running MPI Applications on HPC Clouds?
Proceedings of the 10th International Conference on Utility and Cloud Computing, 2017

HPC Meets Cloud: Building Efficient Clouds for HPC, Big Data, and Deep Learning Middleware and Applications.
Proceedings of the 10th International Conference on Utility and Cloud Computing, 2017

Scalable reduction collectives with data partitioning-based multi-leader design.
Proceedings of the International Conference for High Performance Computing, 2017

Research on Millimeter Wave Communication Interference Suppression of UAV Based on Beam Optimization.
Proceedings of the Machine Learning and Intelligent Communications, 2017

High-Performance Virtual Machine Migration Framework for MPI Applications on SR-IOV Enabled InfiniBand Clusters.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Introduction to HPBDC Workshop.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

Efficient and Scalable Multi-Source Streaming Broadcast on GPU Clusters for Deep Learning.
Proceedings of the 46th International Conference on Parallel Processing, 2017

High-Performance and Resilient Key-Value Store with Online Erasure Coding for Big Data Workloads.
Proceedings of the 37th IEEE International Conference on Distributed Computing Systems, 2017

Characterizing Deep Learning over Big Data (DLoBD) Stacks on RDMA-Capable Networks.
Proceedings of the 25th IEEE Annual Symposium on High-Performance Interconnects, 2017

Designing Registration Caching Free High-Performance MPI Library with Implicit On-Demand Paging (ODP) of InfiniBand.
Proceedings of the 24th IEEE International Conference on High Performance Computing, 2017

MPI-LiFE: Designing High-Performance Linear Fascicle Evaluation of Brain Connectome with MPI.
Proceedings of the 24th IEEE International Conference on High Performance Computing, 2017

A Scalable Network-Based Performance Analysis Tool for MPI on Large-Scale HPC Systems.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

Swift-X: Accelerating OpenStack Swift with RDMA for Building an Efficient HPC Cloud.
Proceedings of the 17th IEEE/ACM International Symposium on Cluster, 2017

NVMD: Non-volatile memory assisted design for accelerating MapReduce and DAG execution frameworks on HPC systems.
Proceedings of the 2017 IEEE International Conference on Big Data (IEEE BigData 2017), 2017

Performance characterization and acceleration of big data workloads on OpenPOWER system.
Proceedings of the 2017 IEEE International Conference on Big Data (IEEE BigData 2017), 2017

Characterizing and accelerating indexing techniques on distributed ordered tables.
Proceedings of the 2017 IEEE International Conference on Big Data (IEEE BigData 2017), 2017

Characterization of Big Data Stream Processing Pipeline: A Case Study using Flink and Kafka.
Proceedings of the Fourth IEEE/ACM International Conference on Big Data Computing, 2017

Building Efficient HPC Cloud with SR-IOV-Enabled InfiniBand: The MVAPICH2 Approach.
Proceedings of the Research Advances in Cloud Computing, 2017

2016
Characterizing and benchmarking stand-alone Hadoop MapReduce on modern HPC clusters.
J. Supercomput., 2016

Experiences and Benefits of Running RDMA Hadoop and Spark on SDSC Comet.
Proceedings of the XSEDE16 Conference on Diversity, 2016

INAM2: InfiniBand Network Analysis and Monitoring with MPI.
Proceedings of the High Performance Computing - 31st International Conference, 2016

Can Non-volatile Memory Benefit MapReduce Applications on HPC Clusters?
Proceedings of the 1st Joint International Workshop on Parallel Data Storage and data Intensive Scalable Computing Systems, 2016

Designing MPI library with on-demand paging (ODP) of infiniband: challenges and benefits.
Proceedings of the International Conference for High Performance Computing, 2016

MR-Advisor: A Comprehensive Tuning Tool for Advising HPC Users to Accelerate MapReduce Applications on Supercomputers.
Proceedings of the 28th International Symposium on Computer Architecture and High Performance Computing, 2016

Performance Characterization of Hypervisor-and Container-Based Virtualization for HPC on SR-IOV Enabled InfiniBand Clusters.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

High-Performance Hybrid Key-Value Store on Modern Clusters with RDMA Interconnects and SSDs: Non-blocking Extensions, Designs, and Benefits.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

HPBDC Introduction and Committees.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

High Performance Design for HDFS with Byte-Addressability of NVM and RDMA.
Proceedings of the 2016 International Conference on Supercomputing, 2016

High Performance MPI Library for Container-Based HPC Cloud on InfiniBand Clusters.
Proceedings of the 45th International Conference on Parallel Processing, 2016

Mizan-RMA: Accelerating Mizan Graph Processing Framework with MPI RMA.
Proceedings of the 23rd IEEE International Conference on High Performance Computing, 2016

Slurm-V: Extending Slurm for Building Efficient HPC Cloud with SR-IOV and IVShmem.
Proceedings of the Euro-Par 2016: Parallel Processing, 2016

Impact of HPC Cloud Networking Technologies on Accelerating Hadoop RPC and HBase.
Proceedings of the 2016 IEEE International Conference on Cloud Computing Technology and Science, 2016

Designing Virtualization-Aware and Automatic Topology Detection Schemes for Accelerating Hadoop on SR-IOV-Enabled Clouds.
Proceedings of the 2016 IEEE International Conference on Cloud Computing Technology and Science, 2016

Boldio: A hybrid and resilient burst-buffer over lustre for accelerating big data I/O.
Proceedings of the 2016 IEEE International Conference on Big Data (IEEE BigData 2016), 2016

High-performance design of apache spark with RDMA and its benefits on various workloads.
Proceedings of the 2016 IEEE International Conference on Big Data (IEEE BigData 2016), 2016

Efficient data access strategies for Hadoop and Spark on HPC cluster with heterogeneous storage.
Proceedings of the 2016 IEEE International Conference on Big Data (IEEE BigData 2016), 2016

Performance characterization of hadoop workloads on SR-IOV-enabled virtualized InfiniBand clusters.
Proceedings of the 3rd IEEE/ACM International Conference on Big Data Computing, 2016

2015
Accelerating Iterative Big Data Computing Through MPI.
J. Comput. Sci. Technol., 2015

Accelerating k-NN Algorithm with Hybrid MPI and OpenSHMEM.
Proceedings of the OpenSHMEM and Related Technologies. Experiences, Implementations, and Technologies, 2015

Can RDMA benefit online data processing workloads on memcached and MySQL?
Proceedings of the 2015 IEEE International Symposium on Performance Analysis of Systems and Software, 2015

High-Performance Design of YARN MapReduce on Modern HPC Clusters with Lustre and RDMA.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

High-Performance Coarray Fortran Support with MVAPICH2-X: Initial Experience and Evaluation.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015

Accelerating I/O Performance of Big Data Analytics on HPC Clusters through RDMA-Based Key-Value Store.
Proceedings of the 44th International Conference on Parallel Processing, 2015

Accelerating Apache Hive with MPI for Data Warehouse Systems.
Proceedings of the 35th IEEE International Conference on Distributed Computing Systems, 2015

High Performance OpenSHMEM Strided Communication Support with InfiniBand UMR.
Proceedings of the 22nd IEEE International Conference on High Performance Computing, 2015

High-Performance and Scalable Design of MPI-3 RMA on Xeon Phi Clusters.
Proceedings of the Euro-Par 2015: Parallel Processing, 2015

High Performance MPI Datatype Support with User-Mode Memory Registration: Challenges, Designs, and Benefits.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

MVAPICH2 over OpenStack with SR-IOV: An Efficient Approach to Build HPC Clouds.
Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015

Triple-H: A Hybrid Approach to Accelerate HDFS on HPC Clusters with Heterogeneous Storage Architecture.
Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015

A Plugin-Based Approach to Exploit RDMA Benefits for Apache and Enterprise HDFS.
Proceedings of the Big Data Benchmarks, Performance Optimization, and Emerging Hardware, 2015

Modeling and Designing Fault-Tolerance Mechanisms for MPI-Based MapReduce Data Computing Framework.
Proceedings of the First IEEE International Conference on Big Data Computing Service and Applications, 2015

Benchmarking key-value stores on high-performance storage and interconnects for web-scale workloads.
Proceedings of the 2015 IEEE International Conference on Big Data (IEEE BigData 2015), Santa Clara, CA, USA, October 29, 2015

Performance characterization and acceleration of in-memory file systems for Hadoop and Spark applications on HPC clusters.
Proceedings of the 2015 IEEE International Conference on Big Data (IEEE BigData 2015), Santa Clara, CA, USA, October 29, 2015

2014
On Big Data Benchmarking.
CoRR, 2014

A Micro-benchmark Suite for Evaluating Hadoop MapReduce on High-Performance Networks.
Proceedings of the Big Data Benchmarks, Performance Optimization, and Emerging Hardware, 2014

Performance Benefits of DataMPI: A Case Study with BigDataBench.
Proceedings of the Big Data Benchmarks, Performance Optimization, and Emerging Hardware, 2014

On Big Data Benchmarking.
Proceedings of the Big Data Benchmarks, Performance Optimization, and Emerging Hardware, 2014

Initial study of multi-endpoint runtime for MPI+OpenMP hybrid programming model on multi-core systems.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2014

Scalable MiniMD Design with Hybrid MPI and OpenSHMEM.
Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, 2014

Designing Scalable Out-of-core Sorting with Hybrid MPI+PGAS Programming Models.
Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, 2014

Performance Characterization of Hadoop and Data MPI Based on Amdahl's Second Law.
Proceedings of the 9th IEEE International Conference on Networking, 2014

DataMPI: Extending MPI to Hadoop-Like Big Data Computing.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

HOMR: a hybrid approach to exploit maximum overlapping in MapReduce over high performance interconnects.
Proceedings of the 2014 International Conference on Supercomputing, 2014

Performance Modeling for RDMA-Enhanced Hadoop MapReduce.
Proceedings of the 43rd International Conference on Parallel Processing, 2014

HAND: A Hybrid Approach to Accelerate Non-contiguous Data Movement Using MPI Datatypes on GPU Clusters.
Proceedings of the 43rd International Conference on Parallel Processing, 2014

SOR-HDFS: a SEDA-based approach to maximize overlapping in RDMA-enhanced HDFS.
Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing, 2014

Accelerating Spark with RDMA for Big Data Processing: Early Experiences.
Proceedings of the 22nd IEEE Annual Symposium on High-Performance Interconnects, 2014

High performance MPI library over SR-IOV enabled infiniband clusters.
Proceedings of the 21st International Conference on High Performance Computing, 2014

Can Inter-VM Shmem Benefit MPI Applications on SR-IOV Based Virtualized Infiniband Clusters?
Proceedings of the Euro-Par 2014 Parallel Processing, 2014

MapReduce over Lustre: Can RDMA-Based Approach Benefit?
Proceedings of the Euro-Par 2014 Parallel Processing, 2014

Scalable Graph500 design with MPI-3 RMA.
Proceedings of the 2014 IEEE International Conference on Cluster Computing, 2014

High performance OpenSHMEM for Xeon Phi clusters: Extensions, runtime designs and application co-design.
Proceedings of the 2014 IEEE International Conference on Cluster Computing, 2014

In-memory I/O and replication for HDFS with Memcached: Early experiences.
Proceedings of the 2014 IEEE International Conference on Big Data (IEEE BigData 2014), 2014

2013
A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks.
Proceedings of the Advancing Big Data Benchmarks, 2013

High-Performance RDMA-based Design of Hadoop MapReduce over InfiniBand.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

High-Performance Design of Hadoop RPC with RDMA over InfiniBand.
Proceedings of the 42nd International Conference on Parallel Processing, 2013

Can Parallel Replication Benefit Hadoop Distributed File System for High Performance Interconnects?
Proceedings of the IEEE 21st Annual Symposium on High-Performance Interconnects, 2013

Tutorials.
Proceedings of the IEEE 21st Annual Symposium on High-Performance Interconnects, 2013

A scalable and portable approach to accelerate hybrid HPL on heterogeneous CPU-GPU clusters.
Proceedings of the 2013 IEEE International Conference on Cluster Computing, 2013

Does RDMA-based enhanced Hadoop MapReduce need a new performance model?
Proceedings of the ACM Symposium on Cloud Computing, SOCC '13, 2013

SR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience.
Proceedings of the 13th IEEE/ACM International Symposium on Cluster, 2013

2012
A Micro-benchmark Suite for Evaluating HDFS Operations on Modern Clusters.
Proceedings of the Specifying Big Data Benchmarks, 2012

2011
Vega LingCloud: A Resource Single Leasing Point System to Support Heterogeneous Application Modes on Shared Infrastructure.
Proceedings of the IEEE International Symposium on Parallel and Distributed Processing with Applications, 2011

Can MPI Benefit Hadoop and MapReduce Applications?
Proceedings of the 2011 International Conference on Parallel Processing Workshops, 2011

2010
Investigating, Modeling, and Ranking Interface Complexity of Web Services on the World Wide Web.
Proceedings of the 6th World Congress on Services, 2010

JAMILA: A Usable Batch Job Management System to Coordinate Heterogeneous Clusters and Diverse Applications over Grid or Cloud Infrastructure.
Proceedings of the Network and Parallel Computing, IFIP International Conference, 2010

VegaWarden: A Uniform User Management System for Cloud Applications.
Proceedings of the Fifth International Conference on Networking, Architecture, and Storage, 2010

2009
A Model of Message-Based Debugging Facilities for Web or Grid Services.
Proceedings of the 2009 IEEE Congress on Services, Part I, 2009

ICOMC: Invocation Complexity Of Multi-Language Clients for Classified Web Services and its Impact on Large Scale SOA Applications.
Proceedings of the 2009 International Conference on Parallel and Distributed Computing, 2009

2008
An Experimental Analysis for Memory Usage of GOS Core.
Proceedings of the Ninth International Conference on Parallel and Distributed Computing, 2008


  Loading...