Bo Wu

Orcid: 0009-0001-1696-4272

Affiliations:
  • Colorado School of Mines, Golden, CA, USA


According to our database1, Bo Wu authored at least 46 papers between 2011 and 2023.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2023
MemPerf: Profiling Allocator-Induced Performance Slowdowns.
Proc. ACM Program. Lang., October, 2023

NUMAlloc: A Faster NUMA Memory Allocator.
Proceedings of the 2023 ACM SIGPLAN International Symposium on Memory Management, 2023

2022
Compressing Pre-trained Transformers via Low-Bit NxM Sparsity for Natural Language Understanding.
CoRR, 2022

DGSM: A GPU-Based Subgraph Isomorphism framework with DFS exploration.
Proceedings of the IEEE/ACM Redefining Scalability for Diversely Heterogeneous Architectures Workshop, 2022

SampleMine: A Framework for Applying Random Sampling to Subgraph Pattern Mining through Loop Perforation.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2022

2021
Automatic Irregularity-Aware Fine-Grained Workload Partitioning on Integrated Architectures.
IEEE Trans. Knowl. Data Eng., 2021

GraphZero: A High-Performance Subgraph Matching System.
ACM SIGOPS Oper. Syst. Rev., 2021

An Efficient Graph Mining System for Large Patterns.
CoRR, 2021

ELIχR: Eliminating Computation Redundancy in CNN-Based Video Processing.
Proceedings of the IEEE/ACM Redefining Scalability for Diversely Heterogeneous Architectures Workshop, 2021

NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Dryadic: Flexible and Fast Graph Pattern Matching at Scale.
Proceedings of the 30th International Conference on Parallel Architectures and Compilation Techniques, 2021

2019
GraphZero: Breaking Symmetry for Efficient Graph Mining.
CoRR, 2019

AutoMine: harmonizing high-level abstraction and high performance for graph mining.
Proceedings of the 27th ACM Symposium on Operating Systems Principles, 2019

FLARE: Flexibly Sharing Commodity GPUs to Enforce QoS and Improve Utilization.
Proceedings of the Languages and Compilers for Parallel Computing, 2019

Laius: Towards latency awareness and improved utilization of spatial multitasking accelerators in datacenters.
Proceedings of the ACM International Conference on Supercomputing, 2019

GRNN: Low-Latency and Scalable RNN Inference on GPUs.
Proceedings of the Fourteenth EuroSys Conference 2019, Dresden, Germany, March 25-28, 2019, 2019

2018
Resolving the GPU responsiveness dilemma through program transformations.
Frontiers Comput. Sci., 2018

ApproxG: Fast Approximate Parallel Graphlet Counting Through Accuracy Control.
Proceedings of the 18th IEEE/ACM International Symposium on Cluster, 2018

Graphphi: efficient parallel graph processing on emerging throughput-oriented architectures.
Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, 2018

2017
Optimizing Data Placement on GPU Memory: A Portable Approach.
IEEE Trans. Computers, 2017

Understanding co-run performance on CPU-GPU integrated processors: observations, insights, directions.
Frontiers Comput. Sci., 2017

Co-Run Scheduling with Power Cap on Integrated CPU-GPU Systems.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Cookie-based amplification repression protocol.
Proceedings of the 36th IEEE International Performance Computing and Communications Conference, 2017

Enabling scalability-sensitive speculative parallelization for FSM computations.
Proceedings of the International Conference on Supercomputing, 2017

FinePar: irregularity-aware fine-grained workload partitioning on integrated architectures.
Proceedings of the 2017 International Symposium on Code Generation and Optimization, 2017

FLEP: Enabling Flexible and Efficient Preemption on GPUs.
Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017

Graphie: Large-Scale Asynchronous Graph Traversals on Just a GPU.
Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

2016
Examining and Reducing the Influence of Sampling Errors on Feedback-Driven Optimizations.
ACM Trans. Archit. Code Optim., 2016

2015
Enabling Portable Optimizations of Data Placement on GPU.
IEEE Micro, 2015

ScaAnalyzer: a tool to identify memory scalability bottlenecks in parallel programs.
Proceedings of the International Conference for High Performance Computing, 2015

Enabling and Exploiting Flexible Task Assignment on GPU through SM-Centric Program Transformations.
Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

Software Engagement with Sleeping CPUs.
Proceedings of the 15th Workshop on Hot Topics in Operating Systems, 2015

2014
Call sequence prediction through probabilistic calling automata.
Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, 2014

PORPLE: An Extensible Optimizer for Portable Data Placement on GPU.
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

Understanding Co-run Degradations on Integrated Heterogeneous Processors.
Proceedings of the Languages and Compilers for Parallel Computing, 2014

Challenging the "embarrassingly sequential": parallelizing finite state machine-based computations through principled speculation.
Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2014

SM-centric transformation: circumventing hardware restrictions for flexible GPU scheduling.
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013
Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on GPU.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2013

Software-level scheduling to exploit non-uniformly shared data cache on GPGPU.
Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness, 2013

Simple Profile Rectifications Go a Long Way - Statistically Exploring and Alleviating the Effects of Sampling Errors for Program Optimizations.
Proceedings of the ECOOP 2013 - Object-Oriented Programming, 2013

Profmig: A framework for flexible migration of program profiles across software versions.
Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, 2013

Exploring hybrid memory for GPU energy efficiency through software-hardware co-design.
Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013

2012
Exploiting inter-sequence correlations for program behavior prediction.
Proceedings of the 27th Annual ACM SIGPLAN Conference on Object-Oriented Programming, 2012

One stone two birds: synchronization relaxation and redundancy removal in GPU-CPU translation.
Proceedings of the International Conference on Supercomputing, 2012

Speculative parallelization needs rigor: probabilistic analysis for optimal speculation of finite-state machine applications.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011
Enhancing Data Locality for Dynamic Simulations through Asynchronous Data Transformations and Adaptive Control.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011


  Loading...