Jidong Zhai

Orcid: 0000-0002-7656-6428

According to our database1, Jidong Zhai authored at least 125 papers between 2009 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Compressed Data Direct Computing for Databases.
IEEE Trans. Knowl. Data Eng., May, 2024

FreshGNN: Reducing Memory Access via Stable Historical Embeddings for Graph Neural Network Training.
Proc. VLDB Endow., February, 2024

FastDecode: High-Throughput GPU-Efficient LLM Serving using Heterogeneous Pipelines.
CoRR, 2024

POSTER: Pattern-Aware Sparse Communication for Scalable Recommendation Model Training.
Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2024

WiseGraph: Optimizing GNN with Joint Workload Partition of Graph and Operations.
Proceedings of the Nineteenth European Conference on Computer Systems, 2024

Optimal Kernel Orchestration for Tensor Programs with Korch.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023
Optimizing DNNs With Partially Equivalent Transformations and Automated Corrections.
IEEE Trans. Computers, December, 2023

Enabling Efficient Random Access to Hierarchically Compressed Text Data on Diverse GPU Platforms.
IEEE Trans. Parallel Distributed Syst., October, 2023

BladeDISC: Optimizing Dynamic Shape Machine Learning Workloads via Compiler Approach.
Proc. ACM Manag. Data, September, 2023

Critique of "A Parallel Framework for Constraint-Based Bayesian Network Learning via Markov Blanket Discovery" by SCC Team From Tsinghua University.
IEEE Trans. Parallel Distributed Syst., June, 2023

Unified Programming Models for Heterogeneous High-Performance Computers.
J. Comput. Sci. Technol., February, 2023

Special issue on new trends in high-performance computing: Software systems and applications.
Softw. Pract. Exp., 2023

CompressGraph: Efficient Parallel Graph Analytics with Rule-Based Compression.
Proc. ACM Manag. Data, 2023

PowerFusion: A Tensor Compiler with Explicit Data Movement Description and Instruction-level Graph IR.
CoRR, 2023

ReFresh: Reducing Memory Access from Exploiting Stable Historical Embeddings for Graph Neural Network Training.
CoRR, 2023

SmartMoE: Efficiently Training Sparsely-Activated Models through Combining Offline and Online Parallelization.
Proceedings of the 2023 USENIX Annual Technical Conference, 2023

GraphSet: High Performance Graph Mining through Equivalent Set Transformations.
Proceedings of the International Conference for High Performance Computing, 2023

EINNET: Optimizing Tensor Programs with Derivation-Based Transformations.
Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, 2023

Cocktailer: Analyzing and Optimizing Dynamic Control Flow in Deep Learning.
Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, 2023

GLM-130B: An Open Bilingual Pre-trained Model.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Joint Geometrical and Statistical Domain Adaptation for Cross-domain Code Vulnerability Detection.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Unveiling the Black Box of PLMs with Semantic Anchors: Towards Interpretable Neural Semantic Parsing.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Payment behavior prediction on shared parking lots with TR-GCN.
VLDB J., 2022

Critique of "MemXCT: Memory-Centric X-Ray CT Reconstruction With Massive Parallelization" by SCC Team From Tsinghua University.
IEEE Trans. Parallel Distributed Syst., 2022

POCLib: A High-Performance Framework for Enabling Near Orthogonal Processing on Compression.
IEEE Trans. Parallel Distributed Syst., 2022

Detecting Performance Variance for Parallel Applications Without Source Code.
IEEE Trans. Parallel Distributed Syst., 2022

Leveraging Code Snippets to Detect Variations in the Performance of HPC Systems.
IEEE Trans. Parallel Distributed Syst., 2022

Guest Editorial.
IEEE Trans. Parallel Distributed Syst., 2022

Exploring Data Analytics Without Decompression on Embedded GPU Systems.
IEEE Trans. Parallel Distributed Syst., 2022

Exploring Query Processing on CPU-GPU Integrated Edge Device.
IEEE Trans. Parallel Distributed Syst., 2022

Periodic Weather-Aware LSTM With Event Mechanism for Parking Behavior Prediction.
IEEE Trans. Knowl. Data Eng., 2022

Zoro: A robotic middleware combining high performance and high reliability.
J. Parallel Distributed Comput., 2022

GLM-130B: An Open Bilingual Pre-trained Model.
CoRR, 2022

Guiding the PLMs with Semantic Anchors as Intermediate Supervision: Towards Interpretable Semantic Parsing.
CoRR, 2022

OLLIE: Derivation-based Tensor Program Optimizer.
CoRR, 2022

GraphQ IR: Unifying Semantic Parsing of Graph Query Language with Intermediate Representation.
CoRR, 2022

CompressDB: Enabling Efficient Compressed Data Direct Processing for Various Databases.
Proceedings of the SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12, 2022

UniQ: A Unified Programming Model for Efficient Quantum Circuit Simulation.
Proceedings of the SC22: International Conference for High Performance Computing, 2022

Vapro: performance variance detection and diagnosis for production-run parallel applications.
Proceedings of the PPoPP '22: 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Seoul, Republic of Korea, April 2, 2022

BaGuaLu: targeting brain scale pretrained models with over 37 million cores.
Proceedings of the PPoPP '22: 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Seoul, Republic of Korea, April 2, 2022

PerFlow: a domain specific framework for automatic performance analysis of parallel applications.
Proceedings of the PPoPP '22: 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Seoul, Republic of Korea, April 2, 2022

FasterMoE: modeling and optimizing training of large-scale dynamic pre-trained models.
Proceedings of the PPoPP '22: 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Seoul, Republic of Korea, April 2, 2022

FreeTensor: a free-form DSL with holistic optimizations for irregular tensor programs.
Proceedings of the PLDI '22: 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, San Diego, CA, USA, June 13, 2022

Efficiently emulating high-bitwidth computation with low-bitwidth hardware.
Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022

Message from the High Performance Computing and Communications 2022 Program Chairs.
Proceedings of the 24th IEEE Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, 2022

GraphQ IR: Unifying the Semantic Parsing of Graph Query Languages with One Intermediate Representation.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

AStitch: enabling a new multi-dimensional optimization space for memory-intensive ML training and inference on modern SIMT architectures.
Proceedings of the ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022, 2022

Suppressing ZZ crosstalk of Quantum computers through pulse and scheduling co-optimization.
Proceedings of the ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022, 2022

2021
TADOC: Text analytics directly on compression.
VLDB J., 2021

Critique of "Planetary Normal Mode Computation: Parallel Algorithms, Performance, and Reproducibility" by SCC Team From Tsinghua University.
IEEE Trans. Parallel Distributed Syst., 2021

An Efficient Parallel Secure Machine Learning Framework on GPUs.
IEEE Trans. Parallel Distributed Syst., 2021

Collaborative Heterogeneity-Aware OS Scheduler for Asymmetric Multicore Processors.
IEEE Trans. Parallel Distributed Syst., 2021

Guest Editorial.
IEEE Trans. Parallel Distributed Syst., 2021

Automatic Irregularity-Aware Fine-Grained Workload Partitioning on Integrated Architectures.
IEEE Trans. Knowl. Data Eng., 2021

A Fast Lock for Explicit Message Passing Architectures.
IEEE Trans. Computers, 2021

Preface.
J. Comput. Sci. Technol., 2021

FastMoE: A Fast Mixture-of-Expert Training System.
CoRR, 2021

AIPerf: Automated machine learning as an AI-HPC benchmark.
Big Data Min. Anal., 2021

Understanding and bridging the gaps in current GNN performance optimizations.
Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections.
Proceedings of the 15th USENIX Symposium on Operating Systems Design and Implementation, 2021

HyQuas: hybrid partitioner based quantum circuit simulation system on GPU.
Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021

G-TADOC: Enabling Efficient GPU-Based Text Analytics without Decompression.
Proceedings of the 37th IEEE International Conference on Data Engineering, 2021

Mitigating Crosstalk in Quantum Computers through Commutativity-Based Instruction Reordering.
Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021

Accelerating GPU Message Communication for Autonomous Navigation Systems.
Proceedings of the IEEE International Conference on Cluster Computing, 2021

2020
Message Passing Optimization in Robot Operating System.
Int. J. Parallel Program., 2020

GraphPi: high performance graph pattern matching through effective redundancy elimination.
Proceedings of the International Conference for High Performance Computing, 2020

ScalAna: automating scaling loss detection with graph analysis.
Proceedings of the International Conference for High Performance Computing, 2020

Identifying scalability bottlenecks for large-scale parallel programs with graph analysis.
Proceedings of the PPoPP '20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2020

Payment Behavior Prediction and Statistical Analysis for Shared Parking Lots.
Proceedings of the Network and Parallel Computing, 2020

PewLSTM: Periodic LSTM with Weather-Aware Gating Mechanism for Parking Behavior Prediction.
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

Edge-Stream: a Stream Processing Approach for Distributed Applications on a Hierarchical Edge-computing System.
Proceedings of the 5th IEEE/ACM Symposium on Edge Computing, 2020

Memory-Centric Communication Mechanism for Real-time Autonomous Navigation Applications.
Proceedings of the ICPP 2020: 49th International Conference on Parallel Processing, 2020

ParSecureML: An Efficient Parallel Secure Machine Learning Framework on GPUs.
Proceedings of the ICPP 2020: 49th International Conference on Parallel Processing, 2020

Enabling Efficient Random Access to Hierarchically-Compressed Data.
Proceedings of the 36th IEEE International Conference on Data Engineering, 2020

Elan: Towards Generic and Efficient Elastic Training for Deep Learning.
Proceedings of the 40th IEEE International Conference on Distributed Computing Systems, 2020

GOPipe: A Granularity-Oblivious Programming Framework for Pipelined Stencil Executions on GPU.
Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

2019
Privacy Regulation Aware Process Mapping in Geo-Distributed Cloud Data Centers.
IEEE Trans. Parallel Distributed Syst., 2019

Student Cluster Competition 2018, Team Tsinghua University: Reproducing performance of multi-physics simulations of the Tsunamigenic 2004 Sumatra megathrust earthquake on the Intel Skylake Architecture.
Parallel Comput., 2019

Guest Editorial: Special Issue on Network and Parallel Computing for Emerging Architectures and Applications.
Int. J. Parallel Program., 2019

Performance evaluation and analysis of sparse matrix and graph kernels on heterogeneous processors.
CCF Trans. High Perform. Comput., 2019

Spread-n-share: improving application performance and cluster throughput with resource-aware job placement.
Proceedings of the International Conference for High Performance Computing, 2019

End-to-end I/O Monitoring on a Leading Supercomputer.
Proceedings of the 16th USENIX Symposium on Networked Systems Design and Implementation, 2019

Statistical Analysis and Prediction of Parking Behavior.
Proceedings of the Network and Parallel Computing, 2019

Automatic, Application-Aware I/O Forwarding Resource Allocation.
Proceedings of the 17th USENIX Conference on File and Storage Technologies, 2019

HiWayLib: A Software Framework for Enabling High Performance Communications for Heterogeneous Pipeline Computations.
Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019

pLock: A Fast Lock for Architectures with Explicit Inter-core Message Passing.
Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019

2018
An Efficient In-Memory Checkpoint Method and its Practice on Fault-Tolerant HPL.
IEEE Trans. Parallel Distributed Syst., 2018

An adaptive breadth-first search algorithm on integrated architectures.
J. Supercomput., 2018

Efficient Document Analytics on Compressed Data: Method, Challenges, Algorithms, Insights.
Proc. VLDB Endow., 2018

Student cluster competition 2017, team Tsinghua University: Reproducing vectorization of the tersoff multi-body potential on the Intel Skylake and NVIDIA Volta architectures.
Parallel Comput., 2018

A vision of post-exascale programming.
Frontiers Inf. Technol. Electron. Eng., 2018

Spindle: Informed Memory Access Monitoring.
Proceedings of the 2018 USENIX Annual Technical Conference, 2018

vSensor: leveraging fixed-workload snippets of programs for performance variance detection.
Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

CSE: Parallel Finite State Machines with Convergence Set Enumeration.
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

BitFlow: Exploiting Vector Parallelism for Binary Neural Networks on CPU.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

Zwift: A Programming Framework for High Performance Text Analytics on Compressed Data.
Proceedings of the 32nd International Conference on Supercomputing, 2018

2017
Understanding Co-Running Behaviors on Integrated CPU/GPU Architectures.
IEEE Trans. Parallel Distributed Syst., 2017

Efficient process mapping in geo-distributed cloud data centers.
Proceedings of the International Conference for High Performance Computing, 2017

Self-Checkpoint: An In-Memory Checkpoint Method Using Less Space and Its Practice on Fault-Tolerant HPL.
Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017

Versapipe: a versatile programming framework for pipelined computing on GPU.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

Scalable Graph Traversal on Sunway TaihuLight with Ten Million Cores.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Algorithm-Directed Crash Consistence in Non-volatile Memory for HPC.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

FinePar: irregularity-aware fine-grained workload partitioning on integrated architectures.
Proceedings of the 2017 International Symposium on Code Generation and Optimization, 2017

2016
Building Semi-Elastic Virtual Clusters for Cost-Effective HPC Cloud Resource Provisioning.
IEEE Trans. Parallel Distributed Syst., 2016

Performance Prediction for Large-Scale Parallel Applications Using Representative Replay.
IEEE Trans. Computers, 2016

A survey of cloud resource management for complex engineering applications.
Frontiers Comput. Sci., 2016

Characterizing and optimizing TPC-C workloads on large-scale systems using SSD arrays.
Sci. China Inf. Sci., 2016

A Fast Tridiagonal Solver for Intel MIC Architecture.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

2015
Automatic Cloud I/O Configurator for I/O Intensive Parallel Applications.
IEEE Trans. Parallel Distributed Syst., 2015

Optimizing seam carving on multi-GPU systems for real-time content-aware image resizing.
J. Supercomput., 2015

To Co-run, or Not to Co-run: A Performance Study on Integrated Architectures.
Proceedings of the 23rd IEEE International Symposium on Modeling, 2015

A Power-Conserving Online Scheduling Scheme for Video Streaming Services.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2015

2014
CYPRESS: Combining Static and Dynamic Analysis for Top-Down Communication Trace Compression.
Proceedings of the International Conference for High Performance Computing, 2014

Optimizing Seam Carving on multi-GPU systems for real-time image resizing.
Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems, 2014

2013
Cost-effective cloud HPC resource provisioning by building semi-elastic virtual clusters.
Proceedings of the International Conference for High Performance Computing, 2013

ACIC: automatic cloud I/O configurator for HPC applications.
Proceedings of the International Conference for High Performance Computing, 2013

ACIC: automatic cloud I/O configurator for parallel applications.
Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing, 2013

2012
Employing Checkpoint to Improve Job Scheduling in Large-Scale Systems.
Proceedings of the Job Scheduling Strategies for Parallel Processing, 2012

2011
Efficiently Acquiring Communication Traces for Large-Scale Parallel Applications.
IEEE Trans. Parallel Distributed Syst., 2011

Cloud versus in-house cluster: evaluating Amazon cluster compute instances for running MPI applications.
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, 2011

One optimized I/O configuration per HPC application: leveraging the configurability of cloud.
Proceedings of the APSys '11 Asia Pacific Workshop on Systems, 2011

2010
PHANTOM: predicting performance of parallel applications on large-scale parallel machines using a single node.
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010

2009
LogGPO: An accurate communication model for performance prediction of MPI programs.
Sci. China Ser. F Inf. Sci., 2009

FACT: fast communication trace collection for parallel applications through program slicing.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

Process Mapping for MPI Collective Communications.
Proceedings of the Euro-Par 2009 Parallel Processing, 2009


  Loading...