Guojing Cong

According to our database1, Guojing Cong authored at least 62 papers between 2004 and 2020.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Other 

Links

On csauthors.net:

Bibliography

2020
CASTELO: Clustered Atom Subtypes aidEd Lead Optimization - a combined machine learning and molecular modeling method.
CoRR, 2020

Fast Training of Deep Neural Networks for Speech Recognition.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Partial data permutation for training deep neural networks.
Proceedings of the 20th IEEE/ACM International Symposium on Cluster, 2020

2019
Fast neural network training on a cluster of GPUs for action recognition with high accuracy.
J. Parallel Distributed Comput., 2019

A Distributed Hierarchical SGD Algorithm with Sparse Global Reduction.
CoRR, 2019

Video Action Recognition With an Additional End-to-End Trained Temporal Stream.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2019


Reducing global reductions in large-scale distributed training.
Proceedings of the 48th International Conference on Parallel Processing, 2019

Accelerating Data Loading in Deep Neural Network Training.
Proceedings of the 26th IEEE International Conference on High Performance Computing, 2019

2018
Accelerating Deep Neural Network Training for Action Recognition on a Cluster of GPUs.
Proceedings of the 30th International Symposium on Computer Architecture and High Performance Computing, 2018

On the Convergence Properties of a K-step Averaging Stochastic Gradient Descent Algorithm for Nonconvex Optimization.
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

2017
Foreword to the special issue of the 18th IEEE international conference on computational science and engineering (CSE2015).
Concurr. Comput. Pract. Exp., 2017

Accelerating deep neural network learning for speech recognition on a cluster of GPUs.
Proceedings of the Machine Learning on HPC Environments, 2017

An Efficient, Distributed Stochastic Gradient Descent Algorithm for Deep-Learning Applications.
Proceedings of the 46th International Conference on Parallel Processing, 2017

A Hierarchical, Bulk-Synchronous Stochastic Gradient Descent Algorithm for Deep-Learning Applications on GPU Clusters.
Proceedings of the 16th IEEE International Conference on Machine Learning and Applications, 2017

2016
Practical Efficiency of Asynchronous Stochastic Gradient Descent.
Proceedings of the 2nd Workshop on Machine Learning in HPC Environments, 2016

Composable Locality Optimizations for Accelerating Parallel Forest Computations.
Proceedings of the 18th IEEE International Conference on High Performance Computing and Communications; 14th IEEE International Conference on Smart City; 2nd IEEE International Conference on Data Science and Systems, 2016

2015
Parallelism-centric optimization and performance study of a finance aggregation engine on modern NUMA systems.
Proceedings of the 8th Workshop on High Performance Computational Finance, 2015

Memory Centric Computation (Mc2) for Large-Scale Graph Processing.
Proceedings of the 27th International Symposium on Computer Architecture and High Performance Computing, 2015

Parallel Strategies for Solving Large Unit Commitment Problems in the California ISO Planning Model.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Accelerating Minimum Spanning Forest Computations on Multicore Platforms.
Proceedings of the Euro-Par 2015: Parallel Processing Workshops, 2015

2014
A Synchronous Parallel Max-Flow Algorithm for Real-World Networks.
Proceedings of the 2014 IEEE International Conference on High Performance Computing and Communications, 2014

Fast Parallel Connected Components Algorithms on GPUs.
Proceedings of the Euro-Par 2014: Parallel Processing Workshops, 2014

2013
Maximizing the performance of irregular applications on multithreaded, NUMA systems.
Proceedings of the 3rd Workshop on Irregular Applications - Architectures and Algorithms, 2013

Mapping applications for high performance on multithreaded, NUMA systems.
Proceedings of the Computing Frontiers Conference, 2013

2012
A Systematic Approach toward Automated Performance Analysis and Tuning.
IEEE Trans. Parallel Distributed Syst., 2012

Application data prefetching on the IBM blue gene/Q supercomputer.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

A static analysis tool using a three-step approach for data races in HPC programs.
Proceedings of the 10th Workshop on Parallel and Distributed Systems: Testing, 2012

An Efficient Framework for Multi-dimensional Tuning of High Performance Computing Applications.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Optimizing Large-scale Graph Analysis on Multithreaded, Multicore Platforms.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Tool-assisted Optimization of Shared-memory Accesses in UPC Applications.
Proceedings of the 14th IEEE International Conference on High Performance Computing and Communication & 9th IEEE International Conference on Embedded Software and Systems, 2012

2011
Hybrid Programming With SIMPLE.
Proceedings of the Encyclopedia of Parallel Computing, 2011

SWARM: A Parallel Programming Framework for Multicore Processors.
Proceedings of the Encyclopedia of Parallel Computing, 2011

Spanning Tree, Minimum Weight.
Proceedings of the Encyclopedia of Parallel Computing, 2011

Graph Algorithms.
Proceedings of the Encyclopedia of Parallel Computing, 2011

Optimizing Large-Scale Graph Analysis on a Multi-threaded, Multi-core Platform.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

2010
Workload performance characterization of DARPA HPCS benchmarks.
Concurr. Comput. Pract. Exp., 2010

Fast PGAS Implementation of Distributed Graph Algorithms.
Proceedings of the Conference on High Performance Computing Networking, 2010

Application tuning through bottleneck-driven refactoring.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Guided Performance Analysis Combining Profile and Trace Tools.
Proceedings of the Euro-Par 2010 Parallel Processing Workshops, 2010

2009
Towards a framework for automated performance tuning.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

A Holistic Approach towards Automated Performance Analysis and Tuning.
Proceedings of the Euro-Par 2009 Parallel Processing, 2009

Improving Memory Access Locality for Large-Scale Graph Analysis Applications.
Proceedings of the 22nd International Conference on Parallel and Distributed Computing and Communication Systems, 2009

2008
A scalable, asynchronous spanning tree algorithm on a cluster of SMPs.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

A framework for automated performance bottleneck detection.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Solving Large, Irregular Graph Problems Using Adaptive Work-Stealing.
Proceedings of the 2008 International Conference on Parallel Processing, 2008

2007
Design of Multithreaded Algorithms for Combinatorial Problems.
Proceedings of the Handbook of Parallel Computing - Models, Algorithms and Applications., 2007

Efficient Parallel Graph Algorithms for Multicore and Multiprocessors.
Proceedings of the Handbook of Parallel Computing - Models, Algorithms and Applications., 2007

A productivity centered application performance tuning framework.
Proceedings of the 2nd International Conference on Performance Evaluation Methodolgies and Tools, 2007

A Productivity Centered Tools Framework for Application Performance Tuning.
Proceedings of the Fourth International Conference on the Quantitative Evaluaiton of Systems (QEST 2007), 2007

Techniques for Designing Efficient Parallel Graph Algorithms for SMPs and Multicore Processors.
Proceedings of the Parallel and Distributed Processing and Applications, 2007

A Selective Pro ling Tool: Towards Automatic Performance Tuning.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

2006
Designing irregular parallel algorithms with mutual exclusion and lock-free protocols.
J. Parallel Distributed Comput., 2006

Fast shared-memory algorithms for computing the minimum spanning forest of sparse graphs.
J. Parallel Distributed Comput., 2006

A Study on the Locality Behavior of Minimum Spanning Tree Algorithms.
Proceedings of the High Performance Computing, 2006

2005
A fast, parallel spanning tree algorithm for symmetric multiprocessors (SMPs).
J. Parallel Distributed Comput., 2005

An Experimental Study of Parallel Biconnected Components Algorithms on Symmetric Multiprocessors (SMPs).
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

On the Architectural Requirements for Efficient Execution of Graph Algorithms.
Proceedings of the 34th International Conference on Parallel Processing (ICPP 2005), 2005

An Empirical Analysis of Parallel Random Permutation Algorithms ON SMPs.
Proceedings of the ISCA 18th International Conference on Parallel and Distributed Computing Systems, 2005

2004
A Fast, Parallel Spanning Tree Algorithm for Symmetric Multiprocessors.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

The Euler Tour Technique and Parallel Rooted Spanning Tree.
Proceedings of the 33rd International Conference on Parallel Processing (ICPP 2004), 2004

Lock-Free Parallel Algorithms: An Experimental Study.
Proceedings of the High Performance Computing, 2004


  Loading...