Wenguang Chen

Orcid: 0000-0002-4281-1018

According to our database1, Wenguang Chen authored at least 161 papers between 2003 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
A Context-Sensitive Pointer Analysis Framework for Rust and Its Application to Call Graph Construction.
Proceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction, 2024

2023
A Comprehensive Survey on Distributed Training of Graph Neural Networks.
Proc. IEEE, December, 2023

Mat2Stencil: A Modular Matrix-Based DSL for Explicit and Implicit Matrix-Free PDE Solvers on Structured Grid.
Proc. ACM Program. Lang., October, 2023

Toward 6G $\text{TK}\mu$ Extreme Connectivity: Architecture, Key Technologies and Experiments.
IEEE Wirel. Commun., June, 2023

TriCache: A User-Transparent Block Cache Enabling High-Performance Out-of-Core Processing with In-Memory Programs.
ACM Trans. Storage, May, 2023

Gas Plume Target Detection in Multibeam Water Column Image Using Deep Residual Aggregation Structure and Attention Mechanism.
Remote. Sens., 2023

BumbleBee: Secure Two-party Inference Framework for Large Transformers.
IACR Cryptol. ePrint Arch., 2023

PUMA: Secure Inference of LLaMA-7B in Five Minutes.
CoRR, 2023

An Improved IPOS LLC Resonant Converter With Sub-Module Output Voltage Sharing in Low Ripple High-Voltage Applications.
IEEE Access, 2023

A Physics-guided NN-based Approach for Tropical Cyclone Intensity Estimation.
Proceedings of the 2023 SIAM International Conference on Data Mining, 2023

GraphSet: High Performance Graph Mining through Equivalent Set Transformations.
Proceedings of the International Conference for High Performance Computing, 2023

Canvas: Isolated and Adaptive Swapping for Multi-Applications on Remote Memory.
Proceedings of the 20th USENIX Symposium on Networked Systems Design and Implementation, 2023

GLM-130B: An Open Bilingual Pre-trained Model.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Lasa: Abstraction and Specialization for Productive and Performant Linear Algebra on FPGAs.
Proceedings of the 31st IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2023

G-Sparse: Compiler-Driven Acceleration for Generalized Sparse Computation for Graph Neural Networks on Modern GPUs.
Proceedings of the 32nd International Conference on Parallel Architectures and Compilation Techniques, 2023

2022
Detecting Performance Variance for Parallel Applications Without Source Code.
IEEE Trans. Parallel Distributed Syst., 2022

Leveraging Code Snippets to Detect Variations in the Performance of HPC Systems.
IEEE Trans. Parallel Distributed Syst., 2022

$TC-Stream$TC-Stream: Large-Scale Graph Triangle Counting on a Single Machine Using GPUs.
IEEE Trans. Parallel Distributed Syst., 2022

GLM-130B: An Open Bilingual Pre-trained Model.
CoRR, 2022

Toward 6G TKμ Extreme Connectivity: Architecture, Key Technologies and Experiments.
CoRR, 2022

Mixed-Precision Inference Quantization: Radically Towards Faster inference speed, Lower Storage requirement, and Lower Loss.
CoRR, 2022

Programming Matrices as Staged Sparse Rows to Generate Efficient Matrix-free Differential Equation Solver.
CoRR, 2022

Canvas: Isolated and Adaptive Swapping for Multi-Applications on Remote Memory.
CoRR, 2022

Quantization in Layer's Input is Matter.
CoRR, 2022

Scaling Graph 500 SSSP to 140 Trillion Edges with over 40 Million Cores.
Proceedings of the SC22: International Conference for High Performance Computing, 2022

Vapro: performance variance detection and diagnosis for production-run parallel applications.
Proceedings of the PPoPP '22: 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Seoul, Republic of Korea, April 2, 2022

BaGuaLu: targeting brain scale pretrained models with over 37 million cores.
Proceedings of the PPoPP '22: 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Seoul, Republic of Korea, April 2, 2022

Scaling graph traversal to 281 trillion edges with 40 million cores.
Proceedings of the PPoPP '22: 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Seoul, Republic of Korea, April 2, 2022

2021
TADOC: Text analytics directly on compression.
VLDB J., 2021

Automatic Irregularity-Aware Fine-Grained Workload Partitioning on Integrated Architectures.
IEEE Trans. Knowl. Data Eng., 2021

A Fast Lock for Explicit Message Passing Architectures.
IEEE Trans. Computers, 2021

Taking the Pulse of Financial Activities with Online Graph Processing.
ACM SIGOPS Oper. Syst. Rev., 2021

Chukonu: A Fully-Featured Big Data Processing System by Efficiently Integrating a Native Compute Engine into Spark.
Proc. VLDB Endow., 2021

GraphTheta: A Distributed Graph Neural Network Learning System With Flexible Training Strategy.
CoRR, 2021

Processing extreme-scale graphs on China's supercomputers.
Commun. ACM, 2021

AIPerf: Automated machine learning as an AI-HPC benchmark.
Big Data Min. Anal., 2021

LotusSQL: SQL engine for high-performance big data systems.
Big Data Min. Anal., 2021

RisGraph: A Real-Time Streaming System for Evolving Graphs to Support Sub-millisecond Per-update Analysis at Millions Ops/s.
Proceedings of the SIGMOD '21: International Conference on Management of Data, 2021

DFOGraph: an I/O- and communication-efficient system for distributed fully-out-of-core graph processing.
Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

Encouraging Compiler Optimization Practice for Undergraduate Students through Competition.
Proceedings of the ITiCSE '21: Proceedings of the 26th ACM Conference on Innovation and Technology in Computer Science Education V.1, Virtual Event, Germany, June 26, 2021

Sparker: Efficient Reduction for More Scalable Machine Learning with Spark.
Proceedings of the ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9, 2021

2020
SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs.
IEEE Trans. Parallel Distributed Syst., 2020

Survey of external memory large-scale graph processing on a multi-core system.
J. Supercomput., 2020

LiveGraph: A Transactional Graph Storage System with Purely Sequential Adjacency List Scans.
Proc. VLDB Endow., 2020

RisGraph: A Real-Time Streaming System for Evolving Graphs.
CoRR, 2020

2019
ANG: a combination of Apriori and graph computing techniques for frequent itemsets mining.
J. Supercomput., 2019

LiveGraph: A Transactional Graph Storage System with Purely Sequential Adjacency List Scans.
CoRR, 2019

Auxo: a temporal graph management system.
Big Data Min. Anal., 2019

Spread-n-share: improving application performance and cluster throughput with resource-aware job placement.
Proceedings of the International Conference for High Performance Computing, 2019

Pimiento: A Vertex-Centric Graph-Processing Framework on a Single Machine.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2019

T2S-Tensor: Productively Generating High-Performance Spatial Hardware for Dense Tensor Computations.
Proceedings of the 27th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2019

HiWayLib: A Software Framework for Enabling High Performance Communications for Heterogeneous Pipeline Computations.
Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019

pLock: A Fast Lock for Architectures with Explicit Inter-core Message Passing.
Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019

2018
Performance Evaluation and Optimization of HBM-Enabled GPU for Data-Intensive Applications.
IEEE Trans. Very Large Scale Integr. Syst., 2018

An Efficient In-Memory Checkpoint Method and its Practice on Fault-Tolerant HPL.
IEEE Trans. Parallel Distributed Syst., 2018

Efficient Document Analytics on Compressed Data: Method, Challenges, Algorithms, Insights.
Proc. VLDB Endow., 2018

Bridging the Gap Between Neural Networks and Neuromorphic Hardware with A Neural Network Compiler.
CoRR, 2018

The future of artificial intelligence in China.
Commun. ACM, 2018

Will supercomputers be super-data and super-AI machines?
Commun. ACM, 2018

Welcome to the China region special section.
Commun. ACM, 2018

Spindle: Informed Memory Access Monitoring.
Proceedings of the 2018 USENIX Annual Technical Conference, 2018

ShenTu: processing multi-trillion edge graphs on millions of cores in seconds.
Proceedings of the International Conference for High Performance Computing, 2018

vSensor: leveraging fixed-workload snippets of programs for performance variance detection.
Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

Zwift: A Programming Framework for High Performance Text Analytics on Compressed Data.
Proceedings of the 32nd International Conference on Supercomputing, 2018

Bridge the Gap between Neural Networks and Neuromorphic Hardware with a Neural Network Compiler.
Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018

2017
Understanding Co-Running Behaviors on Integrated CPU/GPU Architectures.
IEEE Trans. Parallel Distributed Syst., 2017

Congestion control and energy-balanced scheme based on the hierarchy for WSNs.
IET Wirel. Sens. Syst., 2017

Self-Checkpoint: An In-Memory Checkpoint Method Using Less Space and Its Practice on Fault-Tolerant HPL.
Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017

Versapipe: a versatile programming framework for pipelined computing on GPU.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

Scalable Graph Traversal on Sunway TaihuLight with Ten Million Cores.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

What Decides the Dropout in MOOCs?
Proceedings of the Database Systems for Advanced Applications, 2017

FinePar: irregularity-aware fine-grained workload partitioning on integrated architectures.
Proceedings of the 2017 International Symposium on Code Generation and Optimization, 2017

POSTER: Bridge the Gap Between Neural Networks and Neuromorphic Hardware.
Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

2016
Building Semi-Elastic Virtual Clusters for Cost-Effective HPC Cloud Resource Provisioning.
IEEE Trans. Parallel Distributed Syst., 2016

DRDDR: a lightweight method to detect data races in Linux kernel.
J. Supercomput., 2016

Performance Prediction for Large-Scale Parallel Applications Using Representative Replay.
IEEE Trans. Computers, 2016

WarpLDA: a Cache Efficient O(1) Algorithm for Latent Dirichlet Allocation.
Proc. VLDB Endow., 2016

NestedMP: Enabling cache-aware thread mapping for nested parallel shared memory applications.
Parallel Comput., 2016

Data adapter for querying and transformation between SQL and NoSQL database.
Future Gener. Comput. Syst., 2016

A survey of cloud resource management for complex engineering applications.
Frontiers Comput. Sci., 2016

Characterizing and optimizing TPC-C workloads on large-scale systems using SSD arrays.
Sci. China Inf. Sci., 2016

Refactoring and optimizing the community atmosphere model (CAM) on the sunway taihulight supercomputer.
Proceedings of the International Conference for High Performance Computing, 2016

Gemini: A Computation-Centric Distributed Graph Processing System.
Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, 2016

NEUTRAMS: Neural network transformation and co-design under neuromorphic hardware constraints.
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

Neural network transformation under hardware constraints.
Proceedings of the 2016 International Conference on Compilers, 2016

2015
Automatic Cloud I/O Configurator for I/O Intensive Parallel Applications.
IEEE Trans. Parallel Distributed Syst., 2015

ImmortalGraph: A System for Storage and Analysis of Temporal Graphs.
ACM Trans. Storage, 2015

Extending Conditional Dependencies with Built-in Predicates.
IEEE Trans. Knowl. Data Eng., 2015

Optimizing seam carving on multi-GPU systems for real-time content-aware image resizing.
J. Supercomput., 2015

WarpLDA: a Simple and Efficient O(1) Algorithm for Latent Dirichlet Allocation.
CoRR, 2015

GridGraph: Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning.
Proceedings of the 2015 USENIX Annual Technical Conference, 2015

BiFennel: Fast Bipartite Graph Partitioning Algorithm for Big Data.
Proceedings of the 2015 IEEE International Conference on Smart City/SocialCom/SustainCom/DataCom/SC2 2015, 2015

To Co-run, or Not to Co-run: A Performance Study on Integrated Architectures.
Proceedings of the 23rd IEEE International Symposium on Modeling, 2015

AsHES Introduction and Committees.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015

Cost-Effective Resource Configuration for Cloud Video Streaming Services.
Proceedings of the 21st IEEE International Conference on Parallel and Distributed Systems, 2015

A Power-Conserving Online Scheduling Scheme for Video Streaming Services.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2015

Distributed Metaserver Mechanism and Recovery Mechanism Support in Quantcast File System.
Proceedings of the 39th IEEE Annual Computer Software and Applications Conference, 2015

Weibo, and a Tale of Two Worlds.
Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 2015

2014
CYPRESS: Combining Static and Dynamic Analysis for Top-Down Communication Trace Compression.
Proceedings of the International Conference for High Performance Computing, 2014

Cybertron: pushing the limit on I/O reduction in data-parallel programs.
Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, 2014

Nondeterminism in MapReduce considered harmful? an empirical study on non-commutative aggregators in MapReduce programs.
Proceedings of the 36th International Conference on Software Engineering, 2014

NestedMP: Taming Complex Configuration Space of Degree of Parallelism for Nested-Parallel Programs.
Proceedings of the 43rd International Conference on Parallel Processing Workshops, 2014

Optimizing Seam Carving on multi-GPU systems for real-time image resizing.
Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems, 2014

Chronos: a graph engine for temporal graph analysis.
Proceedings of the Ninth Eurosys Conference 2014, 2014

Kernel data race detection using debug register in Linux.
Proceedings of the 2014 IEEE Symposium on Low-Power and High-Speed Chips, 2014

2013
Taming Hardware Event Samples for Precise and Versatile Feedback Directed Optimizations.
IEEE Trans. Computers, 2013

Improving cis-regulatory elements modeling by consensus scaffolded mixture models.
Sci. China Inf. Sci., 2013

Cost-effective cloud HPC resource provisioning by building semi-elastic virtual clusters.
Proceedings of the International Conference for High Performance Computing, 2013

ACIC: automatic cloud I/O configurator for HPC applications.
Proceedings of the International Conference for High Performance Computing, 2013

ACIC: automatic cloud I/O configurator for parallel applications.
Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing, 2013

Shall I Use Heterogeneous Data Centers? - A Case Study on Video on Demand Systems.
Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing, 2013

2012
SMILE: streaming management of applications and data for mobile terminals.
Int. J. Cloud Comput., 2012

CUDA-Zero: a framework for porting shared memory GPU applications to multi-GPUs.
Sci. China Inf. Sci., 2012

Acolyte: An In-Memory Social Network Query System.
Proceedings of the Web Information Systems Engineering - WISE 2012, 2012

Employing Checkpoint to Improve Job Scheduling in Large-Scale Systems.
Proceedings of the Job Scheduling Strategies for Parallel Processing, 2012

Parameter estimation of Conditional Random Fields model based on cloud computing.
Proceedings of the 2012 IEEE International Conference on Granular Computing, 2012

2011
Efficiently Acquiring Communication Traces for Large-Scale Parallel Applications.
IEEE Trans. Parallel Distributed Syst., 2011

ASLOP: A field-access affinity-based structure data layout optimizer.
Sci. China Inf. Sci., 2011

Cloud versus in-house cluster: evaluating Amazon cluster compute instances for running MPI applications.
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, 2011

An SSA-based algorithm for optimal speculative code motion under an execution profile.
Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, 2011

RACEZ: a lightweight and non-invasive race detection tool for production applications.
Proceedings of the 33rd International Conference on Software Engineering, 2011

One optimized I/O configuration per HPC application: leveraging the configurability of cloud.
Proceedings of the APSys '11 Asia Pacific Workshop on Systems, 2011

OpenMDSP: Extending OpenMP to Program Multi-Core DSP.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

2010
PHANTOM: predicting performance of parallel applications on large-scale parallel machines using a single node.
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010

Do I use the wrong definition?: DeFuse: definition-use invariants for detecting concurrency and sequential bugs.
Proceedings of the 25th Annual ACM SIGPLAN Conference on Object-Oriented Programming, 2010

How OpenMP Applications Get More Benefit from Many-Core Era.
Proceedings of the Beyond Loop Level Parallelism in OpenMP: Accelerators, 2010

Taming hardware event samples for FDO compilation.
Proceedings of the CGO 2010, 2010

MapCG: writing parallel program portable between CPU and GPU.
Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

2009
Incorporating cardinality constraints and synonym rules into conditional functional dependencies.
Inf. Process. Lett., 2009

LogGPO: An accurate communication model for performance prediction of MPI programs.
Sci. China Ser. F Inf. Sci., 2009

FACT: fast communication trace collection for parallel applications through program slicing.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

MPIWiz: subgroup reproducible replay of mpi applications.
Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009

Improving Dense Linear Equation Solver on Hybrid CPU-GPU System.
Proceedings of the 10th International Symposium on Pervasive Systems, 2009

Process Mapping for MPI Collective Communications.
Proceedings of the Euro-Par 2009 Parallel Processing, 2009

Analyses and Validation of Conditional Dependencies with Built-in Predicates.
Proceedings of the Database and Expert Systems Applications, 20th International Conference, 2009

Extracting Maximal Degenerate Motifs Based on a Suffix Tree.
Proceedings of the International Conference on Bioinformatics & Computational Biology, 2009

Cache Sharing Management for Performance Fairness in Chip Multiprocessors.
Proceedings of the PACT 2009, 2009

2008
Exploring the Emerging Applications for Transactional Memory.
Proceedings of the Ninth International Conference on Parallel and Distributed Computing, 2008

CprFS: a user-level file system to support consistent file states for checkpoint and restart.
Proceedings of the 22nd Annual International Conference on Supercomputing, 2008

Maotai: View-Oriented Parallel Programming on CMT Processors.
Proceedings of the 2008 International Conference on Parallel Processing, 2008

Parallelization and Characterization of Probabilistic Latent Semantic Analysis.
Proceedings of the 2008 International Conference on Parallel Processing, 2008

Parallelization of spectral clustering algorithm on multi-core processors and GPGPU.
Proceedings of the 13th Asia-Pacific Computer Systems Architecture Conference, 2008

2007
OpenUH: an optimizing, portable OpenMP compiler.
Concurr. Comput. Pract. Exp., 2007

PBB: a parallel bioinformatics benchmark suite for shared memory multiprocessors.
Proceedings of the CHINA HPC 2007, 2007

Performance Evaluation of View-Oriented Parallel Programming on Cluster of Computers.
Proceedings of the High Performance Computing and Communications, 2007

History Based User Interest Modeling in WWW Access.
Proceedings of the Human-Computer Interaction. HCI Intelligent Multimodal Interaction Environments, 2007

Revisit of View-Oriented Parallel Programming.
Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2007), 2007

2006
Parallelization of module network structure learning and performance tuning on SMP.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Parallel implementation and performance characterization of MUSCLE.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Tree partition based parallel frequent pattern mining on shared memory systems.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

MPIPP: an automatic profile-guided parallel process placement toolset for SMP clusters and multiclusters.
Proceedings of the 20th Annual International Conference on Supercomputing, 2006

Distributed File Streamer: A Framework for Distributed Application Data Coupling.
Proceedings of the 7th IEEE/ACM International Conference on Grid Computing (GRID 2006), 2006

VODCA: View-Oriented, Distributed, Cluster-Based Approach to Parallel Computing.
Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2006), 2006

2005
Thckpt: Transparent Checkpointing of Linux Processes Under IA-64.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 2005

Parallel Implementation of SEMPHY - a Structural EM Algorithm for Phylogenetic Reconstruction.
Proceedings of the Parallel Computing: Current & Future Issues of High-End Computing, 2005

Parallel Module Network Learning on Distributed Memory Multiprocessors.
Proceedings of the 34th International Conference on Parallel Processing Workshops (ICPP 2005 Workshops), 2005

A Dynamic Energy Conservation Scheme for Clusters in Computing Centers.
Proceedings of the Embedded Software and Systems, Second International Conference, 2005

Hierarchical Parallel Simulated Annealing and Its Applications.
Proceedings of the Distributed and Parallel Computing, 2005

2004
Parallelization of Bayesian Network based SNPs Pattern Analysis and Performance Characterization on SMP/HT.
Proceedings of the 10th International Conference on Parallel and Distributed Systems, 2004

A Single Thread Discrete Event Simulation Toolkit for Java: STSimJ.
Proceedings of the Computational Science, 2004

2003
On the Malicious Participants Problem in Computational Grid.
Proceedings of the Grid and Cooperative Computing, Second International Workshop, 2003


  Loading...