Wenguang Chen

According to our database1, Wenguang Chen authored at least 106 papers between 2003 and 2018.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Other 

Links

On csauthors.net:

Bibliography

2018
Performance Evaluation and Optimization of HBM-Enabled GPU for Data-Intensive Applications.
IEEE Trans. VLSI Syst., 2018

An Efficient In-Memory Checkpoint Method and its Practice on Fault-Tolerant HPL.
IEEE Trans. Parallel Distrib. Syst., 2018

Efficient Document Analytics on Compressed Data: Method, Challenges, Algorithms, Insights.
PVLDB, 2018

Bridging the Gap Between Neural Networks and Neuromorphic Hardware with A Neural Network Compiler.
CoRR, 2018

Spindle: Informed Memory Access Monitoring.
Proceedings of the 2018 USENIX Annual Technical Conference, 2018

vSensor: leveraging fixed-workload snippets of programs for performance variance detection.
Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

Zwift: A Programming Framework for High Performance Text Analytics on Compressed Data.
Proceedings of the 32nd International Conference on Supercomputing, 2018

Bridge the Gap between Neural Networks and Neuromorphic Hardware with a Neural Network Compiler.
Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018

2017
Understanding Co-Running Behaviors on Integrated CPU/GPU Architectures.
IEEE Trans. Parallel Distrib. Syst., 2017

Congestion control and energy-balanced scheme based on the hierarchy for WSNs.
IET Wireless Sensor Systems, 2017

Self-Checkpoint: An In-Memory Checkpoint Method Using Less Space and Its Practice on Fault-Tolerant HPL.
Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017

Versapipe: a versatile programming framework for pipelined computing on GPU.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

Scalable Graph Traversal on Sunway TaihuLight with Ten Million Cores.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Performance evaluation and optimization of HBM-Enabled GPU for data-intensive applications.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2017

What Decides the Dropout in MOOCs?
Proceedings of the Database Systems for Advanced Applications, 2017

FinePar: irregularity-aware fine-grained workload partitioning on integrated architectures.
Proceedings of the 2017 International Symposium on Code Generation and Optimization, 2017

SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs.
Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017

POSTER: Bridge the Gap Between Neural Networks and Neuromorphic Hardware.
Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

2016
Building Semi-Elastic Virtual Clusters for Cost-Effective HPC Cloud Resource Provisioning.
IEEE Trans. Parallel Distrib. Syst., 2016

DRDDR: a lightweight method to detect data races in Linux kernel.
The Journal of Supercomputing, 2016

Performance Prediction for Large-Scale Parallel Applications Using Representative Replay.
IEEE Trans. Computers, 2016

WarpLDA: a Cache Efficient O(1) Algorithm for Latent Dirichlet Allocation.
PVLDB, 2016

NestedMP: Enabling cache-aware thread mapping for nested parallel shared memory applications.
Parallel Computing, 2016

Data adapter for querying and transformation between SQL and NoSQL database.
Future Generation Comp. Syst., 2016

A survey of cloud resource management for complex engineering applications.
Frontiers Comput. Sci., 2016

SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs.
CoRR, 2016

Characterizing and optimizing TPC-C workloads on large-scale systems using SSD arrays.
SCIENCE CHINA Information Sciences, 2016

Refactoring and optimizing the community atmosphere model (CAM) on the sunway taihulight supercomputer.
Proceedings of the International Conference for High Performance Computing, 2016

Gemini: A Computation-Centric Distributed Graph Processing System.
Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, 2016

NEUTRAMS: Neural network transformation and co-design under neuromorphic hardware constraints.
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

Neural network transformation under hardware constraints.
Proceedings of the 2016 International Conference on Compilers, 2016

2015
Automatic Cloud I/O Configurator for I/O Intensive Parallel Applications.
IEEE Trans. Parallel Distrib. Syst., 2015

ImmortalGraph: A System for Storage and Analysis of Temporal Graphs.
TOS, 2015

Extending Conditional Dependencies with Built-in Predicates.
IEEE Trans. Knowl. Data Eng., 2015

Optimizing seam carving on multi-GPU systems for real-time content-aware image resizing.
The Journal of Supercomputing, 2015

WarpLDA: a Simple and Efficient O(1) Algorithm for Latent Dirichlet Allocation.
CoRR, 2015

GridGraph: Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning.
Proceedings of the 2015 USENIX Annual Technical Conference, 2015

BiFennel: Fast Bipartite Graph Partitioning Algorithm for Big Data.
Proceedings of the 2015 IEEE International Conference on Smart City/SocialCom/SustainCom 2015, 2015

To Co-run, or Not to Co-run: A Performance Study on Integrated Architectures.
Proceedings of the 23rd IEEE International Symposium on Modeling, 2015

AsHES Introduction and Committees.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015

Cost-Effective Resource Configuration for Cloud Video Streaming Services.
Proceedings of the 21st IEEE International Conference on Parallel and Distributed Systems, 2015

A Power-Conserving Online Scheduling Scheme for Video Streaming Services.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2015

Distributed Metaserver Mechanism and Recovery Mechanism Support in Quantcast File System.
Proceedings of the 39th IEEE Annual Computer Software and Applications Conference, 2015

Weibo, and a Tale of Two Worlds.
Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 2015

2014
CYPRESS: Combining Static and Dynamic Analysis for Top-Down Communication Trace Compression.
Proceedings of the International Conference for High Performance Computing, 2014

Cybertron: pushing the limit on I/O reduction in data-parallel programs.
Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, 2014

Nondeterminism in MapReduce considered harmful? an empirical study on non-commutative aggregators in MapReduce programs.
Proceedings of the 36th International Conference on Software Engineering, 2014

NestedMP: Taming Complex Configuration Space of Degree of Parallelism for Nested-Parallel Programs.
Proceedings of the 43rd International Conference on Parallel Processing Workshops, 2014

Optimizing Seam Carving on multi-GPU systems for real-time image resizing.
Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems, 2014

Chronos: a graph engine for temporal graph analysis.
Proceedings of the Ninth Eurosys Conference 2014, 2014

Kernel data race detection using debug register in Linux.
Proceedings of the 2014 IEEE Symposium on Low-Power and High-Speed Chips, 2014

2013
Taming Hardware Event Samples for Precise and Versatile Feedback Directed Optimizations.
IEEE Trans. Computers, 2013

Improving cis-regulatory elements modeling by consensus scaffolded mixture models.
SCIENCE CHINA Information Sciences, 2013

Cost-effective cloud HPC resource provisioning by building semi-elastic virtual clusters.
Proceedings of the International Conference for High Performance Computing, 2013

ACIC: automatic cloud I/O configurator for HPC applications.
Proceedings of the International Conference for High Performance Computing, 2013

ACIC: automatic cloud I/O configurator for parallel applications.
Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing, 2013

Shall I Use Heterogeneous Data Centers? - A Case Study on Video on Demand Systems.
Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing, 2013

2012
SMILE: streaming management of applications and data for mobile terminals.
IJCC, 2012

CUDA-Zero: a framework for porting shared memory GPU applications to multi-GPUs.
SCIENCE CHINA Information Sciences, 2012

Acolyte: An In-Memory Social Network Query System.
Proceedings of the Web Information Systems Engineering - WISE 2012, 2012

Employing Checkpoint to Improve Job Scheduling in Large-Scale Systems.
Proceedings of the Job Scheduling Strategies for Parallel Processing, 2012

Parameter estimation of Conditional Random Fields model based on cloud computing.
Proceedings of the 2012 IEEE International Conference on Granular Computing, 2012

2011
Efficiently Acquiring Communication Traces for Large-Scale Parallel Applications.
IEEE Trans. Parallel Distrib. Syst., 2011

ASLOP: A field-access affinity-based structure data layout optimizer.
SCIENCE CHINA Information Sciences, 2011

Cloud versus in-house cluster: evaluating Amazon cluster compute instances for running MPI applications.
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, 2011

An SSA-based algorithm for optimal speculative code motion under an execution profile.
Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, 2011

RACEZ: a lightweight and non-invasive race detection tool for production applications.
Proceedings of the 33rd International Conference on Software Engineering, 2011

One optimized I/O configuration per HPC application: leveraging the configurability of cloud.
Proceedings of the APSys '11 Asia Pacific Workshop on Systems, 2011

OpenMDSP: Extending OpenMP to Program Multi-Core DSP.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

2010
PHANTOM: predicting performance of parallel applications on large-scale parallel machines using a single node.
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010

Do I use the wrong definition?: DeFuse: definition-use invariants for detecting concurrency and sequential bugs.
Proceedings of the 25th Annual ACM SIGPLAN Conference on Object-Oriented Programming, 2010

How OpenMP Applications Get More Benefit from Many-Core Era.
Proceedings of the Beyond Loop Level Parallelism in OpenMP: Accelerators, 2010

Taming hardware event samples for FDO compilation.
Proceedings of the CGO 2010, 2010

MapCG: writing parallel program portable between CPU and GPU.
Proceedings of the 19th International Conference on Parallel Architecture and Compilation Techniques, 2010

2009
Incorporating cardinality constraints and synonym rules into conditional functional dependencies.
Inf. Process. Lett., 2009

LogGPO: An accurate communication model for performance prediction of MPI programs.
Science in China Series F: Information Sciences, 2009

FACT: fast communication trace collection for parallel applications through program slicing.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

MPIWiz: subgroup reproducible replay of mpi applications.
Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009

Improving Dense Linear Equation Solver on Hybrid CPU-GPU System.
Proceedings of the 10th International Symposium on Pervasive Systems, 2009

Process Mapping for MPI Collective Communications.
Proceedings of the Euro-Par 2009 Parallel Processing, 2009

Analyses and Validation of Conditional Dependencies with Built-in Predicates.
Proceedings of the Database and Expert Systems Applications, 20th International Conference, 2009

Extracting Maximal Degenerate Motifs Based on a Suffix Tree.
Proceedings of the International Conference on Bioinformatics & Computational Biology, 2009

Cache Sharing Management for Performance Fairness in Chip Multiprocessors.
Proceedings of the PACT 2009, 2009

2008
Exploring the Emerging Applications for Transactional Memory.
Proceedings of the Ninth International Conference on Parallel and Distributed Computing, 2008

CprFS: a user-level file system to support consistent file states for checkpoint and restart.
Proceedings of the 22nd Annual International Conference on Supercomputing, 2008

Maotai: View-Oriented Parallel Programming on CMT Processors.
Proceedings of the 2008 International Conference on Parallel Processing, 2008

Parallelization and Characterization of Probabilistic Latent Semantic Analysis.
Proceedings of the 2008 International Conference on Parallel Processing, 2008

Parallelization of spectral clustering algorithm on multi-core processors and GPGPU.
Proceedings of the 13th Asia-Pacific Computer Systems Architecture Conference, 2008

2007
OpenUH: an optimizing, portable OpenMP compiler.
Concurrency and Computation: Practice and Experience, 2007

PBB: a parallel bioinformatics benchmark suite for shared memory multiprocessors.
Proceedings of the CHINA HPC 2007, 2007

Performance Evaluation of View-Oriented Parallel Programming on Cluster of Computers.
Proceedings of the High Performance Computing and Communications, 2007

History Based User Interest Modeling in WWW Access.
Proceedings of the Human-Computer Interaction. HCI Intelligent Multimodal Interaction Environments, 2007

Revisit of View-Oriented Parallel Programming.
Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2007), 2007

2006
Parallelization of module network structure learning and performance tuning on SMP.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Parallel implementation and performance characterization of MUSCLE.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Tree partition based parallel frequent pattern mining on shared memory systems.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

MPIPP: an automatic profile-guided parallel process placement toolset for SMP clusters and multiclusters.
Proceedings of the 20th Annual International Conference on Supercomputing, 2006

VODCA: View-Oriented, Distributed, Cluster-Based Approach to Parallel Computing.
Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2006), 2006

2005
Thckpt: Transparent Checkpointing of Linux Processes Under IA-64.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 2005

Parallel Implementation of SEMPHY - a Structural EM Algorithm for Phylogenetic Reconstruction.
Proceedings of the Parallel Computing: Current & Future Issues of High-End Computing, 2005

Parallel Module Network Learning on Distributed Memory Multiprocessors.
Proceedings of the 34th International Conference on Parallel Processing Workshops (ICPP 2005 Workshops), 2005

A Dynamic Energy Conservation Scheme for Clusters in Computing Centers.
Proceedings of the Embedded Software and Systems, Second International Conference, 2005

Hierarchical Parallel Simulated Annealing and Its Applications.
Proceedings of the Distributed and Parallel Computing, 2005

2004
Parallelization of Bayesian Network based SNPs Pattern Analysis and Performance Characterization on SMP/HT.
Proceedings of the 10th International Conference on Parallel and Distributed Systems, 2004

A Single Thread Discrete Event Simulation Toolkit for Java: STSimJ.
Proceedings of the Computational Science, 2004

2003
On the Malicious Participants Problem in Computational Grid.
Proceedings of the Grid and Cooperative Computing, Second International Workshop, 2003


  Loading...