Proceedings of the PLDI '22: 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, San Diego, CA, USA, June 13, 2022

Efficiently emulating high-bitwidth computation with low-bitwidth hardware.

[BibT_eX]

[DOI]

Zixuan Ma

Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022

Message from the High Performance Computing and Communications 2022 Program Chairs.

[BibT_eX]

[DOI]

Yunquan Zhang

Jidong Zhai

Rajiv Ranjan

Proceedings of the 24th IEEE Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, 2022

GraphQ IR: Unifying the Semantic Parsing of Graph Query Languages with One Intermediate Representation.

[BibT_eX]

[DOI]

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

AStitch: enabling a new multi-dimensional optimization space for memory-intensive ML training and inference on modern SIMT architectures.

[BibT_eX]

[DOI]

Proceedings of the ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022, 2022

Suppressing ZZ crosstalk of Quantum computers through pulse and scheduling co-optimization.

[BibT_eX]

[DOI]

Proceedings of the ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022, 2022

2021

TADOC: Text analytics directly on compression.

[BibT_eX]

[DOI]

VLDB J., 2021

Critique of "Planetary Normal Mode Computation: Parallel Algorithms, Performance, and Reproducibility" by SCC Team From Tsinghua University.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2021

An Efficient Parallel Secure Machine Learning Framework on GPUs.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2021

Collaborative Heterogeneity-Aware OS Scheduler for Asymmetric Multicore Processors.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2021

Guest Editorial.

[BibT_eX]

[DOI]

Pavan Balaji

Jidong Zhai

Min Si

IEEE Trans. Parallel Distributed Syst., 2021

Automatic Irregularity-Aware Fine-Grained Workload Partitioning on Integrated Architectures.

[BibT_eX]

[DOI]

IEEE Trans. Knowl. Data Eng., 2021

A Fast Lock for Explicit Message Passing Architectures.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2021

Preface.

[BibT_eX]

[DOI]

J. Comput. Sci. Technol., 2021

FastMoE: A Fast Mixture-of-Expert Training System.

[BibT_eX]

[DOI]

CoRR, 2021

AIPerf: Automated machine learning as an AI-HPC benchmark.

[BibT_eX]

[DOI]

Big Data Min. Anal., 2021

Understanding and bridging the gaps in current GNN performance optimizations.

[BibT_eX]

[DOI]

Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections.

[BibT_eX]

[DOI]

Proceedings of the 15th USENIX Symposium on Operating Systems Design and Implementation, 2021

HyQuas: hybrid partitioner based quantum circuit simulation system on GPU.

[BibT_eX]

[DOI]

Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021

G-TADOC: Enabling Efficient GPU-Based Text Analytics without Decompression.

[BibT_eX]

[DOI]

Proceedings of the 37th IEEE International Conference on Data Engineering, 2021

Mitigating Crosstalk in Quantum Computers through Commutativity-Based Instruction Reordering.

[BibT_eX]

[DOI]

Lei Xie

Jidong Zhai

Weimin Zheng

Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021

Accelerating GPU Message Communication for Autonomous Navigation Systems.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Cluster Computing, 2021

2020

Message Passing Optimization in Robot Operating System.

[BibT_eX]

[DOI]

Int. J. Parallel Program., 2020

GraphPi: high performance graph pattern matching through effective redundancy elimination.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2020

ScalAna: automating scaling loss detection with graph analysis.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2020

Identifying scalability bottlenecks for large-scale parallel programs with graph analysis.

[BibT_eX]

[DOI]

Proceedings of the PPoPP '20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2020

Payment Behavior Prediction and Statistical Analysis for Shared Parking Lots.

[BibT_eX]

[DOI]

Proceedings of the Network and Parallel Computing, 2020

PewLSTM: Periodic LSTM with Weather-Aware Gating Mechanism for Parking Behavior Prediction.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

Edge-Stream: a Stream Processing Approach for Distributed Applications on a Hierarchical Edge-computing System.

[BibT_eX]

[DOI]

Proceedings of the 5th IEEE/ACM Symposium on Edge Computing, 2020

Memory-Centric Communication Mechanism for Real-time Autonomous Navigation Applications.

[BibT_eX]

[DOI]

Proceedings of the ICPP 2020: 49th International Conference on Parallel Processing, 2020

ParSecureML: An Efficient Parallel Secure Machine Learning Framework on GPUs.

[BibT_eX]

[DOI]

Proceedings of the ICPP 2020: 49th International Conference on Parallel Processing, 2020

Enabling Efficient Random Access to Hierarchically-Compressed Data.

[BibT_eX]

[DOI]

Proceedings of the 36th IEEE International Conference on Data Engineering, 2020

Elan: Towards Generic and Efficient Elastic Training for Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the 40th IEEE International Conference on Distributed Computing Systems, 2020

GOPipe: A Granularity-Oblivious Programming Framework for Pipelined Stencil Executions on GPU.

[BibT_eX]

[DOI]

Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

2019

Privacy Regulation Aware Process Mapping in Geo-Distributed Cloud Data Centers.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2019

Student Cluster Competition 2018, Team Tsinghua University: Reproducing performance of multi-physics simulations of the Tsunamigenic 2004 Sumatra megathrust earthquake on the Intel Skylake Architecture.

[BibT_eX]

[DOI]

Parallel Comput., 2019

Guest Editorial: Special Issue on Network and Parallel Computing for Emerging Architectures and Applications.

[BibT_eX]

[DOI]

Int. J. Parallel Program., 2019

Performance evaluation and analysis of sparse matrix and graph kernels on heterogeneous processors.

[BibT_eX]

[DOI]

CCF Trans. High Perform. Comput., 2019

Spread-n-share: improving application performance and cluster throughput with resource-aware job placement.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2019

End-to-end I/O Monitoring on a Leading Supercomputer.

[BibT_eX]

[DOI]

Proceedings of the 16th USENIX Symposium on Networked Systems Design and Implementation, 2019

Statistical Analysis and Prediction of Parking Behavior.

[BibT_eX]

[DOI]

Proceedings of the Network and Parallel Computing, 2019

Automatic, Application-Aware I/O Forwarding Resource Allocation.

[BibT_eX]

[DOI]

Proceedings of the 17th USENIX Conference on File and Storage Technologies, 2019

HiWayLib: A Software Framework for Enabling High Performance Communications for Heterogeneous Pipeline Computations.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019

pLock: A Fast Lock for Architectures with Explicit Inter-core Message Passing.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019

2018

An Efficient In-Memory Checkpoint Method and its Practice on Fault-Tolerant HPL.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2018

An adaptive breadth-first search algorithm on integrated architectures.

[BibT_eX]

[DOI]

J. Supercomput., 2018

Efficient Document Analytics on Compressed Data: Method, Challenges, Algorithms, Insights.

[BibT_eX]

[DOI]

Proc. VLDB Endow., 2018

Student cluster competition 2017, team Tsinghua University: Reproducing vectorization of the tersoff multi-body potential on the Intel Skylake and NVIDIA Volta architectures.

[BibT_eX]

[DOI]

Parallel Comput., 2018

A vision of post-exascale programming.

[BibT_eX]

[DOI]

Jidong Zhai

Wen-Guang Chen

Frontiers Inf. Technol. Electron. Eng., 2018

Spindle: Informed Memory Access Monitoring.

[BibT_eX]

[DOI]

Proceedings of the 2018 USENIX Annual Technical Conference, 2018

vSensor: leveraging fixed-workload snippets of programs for performance variance detection.

[BibT_eX]

[DOI]

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

CSE: Parallel Finite State Machines with Convergence Set Enumeration.

[BibT_eX]

[DOI]

Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

BitFlow: Exploiting Vector Parallelism for Binary Neural Networks on CPU.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

Zwift: A Programming Framework for High Performance Text Analytics on Compressed Data.

[BibT_eX]

[DOI]

Proceedings of the 32nd International Conference on Supercomputing, 2018

2017

Understanding Co-Running Behaviors on Integrated CPU/GPU Architectures.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2017

Efficient process mapping in geo-distributed cloud data centers.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2017

Self-Checkpoint: An In-Memory Checkpoint Method Using Less Space and Its Practice on Fault-Tolerant HPL.

[BibT_eX]

[DOI]

Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017

Versapipe: a versatile programming framework for pipelined computing on GPU.

[BibT_eX]

[DOI]

Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

Scalable Graph Traversal on Sunway TaihuLight with Ten Million Cores.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Algorithm-Directed Crash Consistence in Non-volatile Memory for HPC.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

FinePar: irregularity-aware fine-grained workload partitioning on integrated architectures.

[BibT_eX]

[DOI]

Proceedings of the 2017 International Symposium on Code Generation and Optimization, 2017

2016

Building Semi-Elastic Virtual Clusters for Cost-Effective HPC Cloud Resource Provisioning.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2016

Performance Prediction for Large-Scale Parallel Applications Using Representative Replay.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2016

A survey of cloud resource management for complex engineering applications.

[BibT_eX]

[DOI]

Frontiers Comput. Sci., 2016

Characterizing and optimizing TPC-C workloads on large-scale systems using SSD arrays.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2016

A Fast Tridiagonal Solver for Intel MIC Architecture.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

2015

Automatic Cloud I/O Configurator for I/O Intensive Parallel Applications.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2015

Optimizing seam carving on multi-GPU systems for real-time content-aware image resizing.

[BibT_eX]

[DOI]

J. Supercomput., 2015

To Co-run, or Not to Co-run: A Performance Study on Integrated Architectures.

[BibT_eX]

[DOI]

Proceedings of the 23rd IEEE International Symposium on Modeling, 2015

A Power-Conserving Online Scheduling Scheme for Video Streaming Services.

[BibT_eX]

[DOI]

Proceedings of the Algorithms and Architectures for Parallel Processing, 2015

2014

CYPRESS: Combining Static and Dynamic Analysis for Top-Down Communication Trace Compression.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2014

Optimizing Seam Carving on multi-GPU systems for real-time image resizing.

[BibT_eX]

[DOI]

Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems, 2014

2013

Cost-effective cloud HPC resource provisioning by building semi-elastic virtual clusters.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2013

ACIC: automatic cloud I/O configurator for HPC applications.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2013

ACIC: automatic cloud I/O configurator for parallel applications.

[BibT_eX]

[DOI]

Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing, 2013

2012

Employing Checkpoint to Improve Job Scheduling in Large-Scale Systems.

[BibT_eX]

[DOI]

Proceedings of the Job Scheduling Strategies for Parallel Processing, 2012

2011

Efficiently Acquiring Communication Traces for Large-Scale Parallel Applications.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2011

Cloud versus in-house cluster: evaluating Amazon cluster compute instances for running MPI applications.

[BibT_eX]

[DOI]

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, 2011

One optimized I/O configuration per HPC application: leveraging the configurability of cloud.

[BibT_eX]

[DOI]

Proceedings of the APSys '11 Asia Pacific Workshop on Systems, 2011

2010

PHANTOM: predicting performance of parallel applications on large-scale parallel machines using a single node.

[BibT_eX]

[DOI]

Jidong Zhai

Wenguang Chen

Weimin Zheng

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010

2009

LogGPO: An accurate communication model for performance prediction of MPI programs.

[BibT_eX]

[DOI]

Sci. China Ser. F Inf. Sci., 2009

FACT: fast communication trace collection for parallel applications through program slicing.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

Process Mapping for MPI Collective Communications.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2009 Parallel Processing, 2009

Jidong Zhai

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...