Bo Wu

Orcid: 0009-0001-1696-4272

Affiliations:

Colorado School of Mines, Golden, CA, USA

According to our database¹, Bo Wu authored at least 46 papers between 2011 and 2023.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2023

MemPerf: Profiling Allocator-Induced Performance Slowdowns.

[BibT_eX]

[DOI]

Proc. ACM Program. Lang., October, 2023

NUMAlloc: A Faster NUMA Memory Allocator.

[BibT_eX]

[DOI]

Proceedings of the 2023 ACM SIGPLAN International Symposium on Memory Management, 2023

2022

Compressing Pre-trained Transformers via Low-Bit NxM Sparsity for Natural Language Understanding.

[BibT_eX]

[DOI]

CoRR, 2022

DGSM: A GPU-Based Subgraph Isomorphism framework with DFS exploration.

[BibT_eX]

[DOI]

Wei Han

Connor Holmes

Bo Wu

Proceedings of the IEEE/ACM Redefining Scalability for Diversely Heterogeneous Architectures Workshop, 2022

SampleMine: A Framework for Applying Random Sampling to Subgraph Pattern Mining through Loop Perforation.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2022

2021

Automatic Irregularity-Aware Fine-Grained Workload Partitioning on Integrated Architectures.

[BibT_eX]

[DOI]

IEEE Trans. Knowl. Data Eng., 2021

GraphZero: A High-Performance Subgraph Matching System.

[BibT_eX]

[DOI]

ACM SIGOPS Oper. Syst. Rev., 2021

An Efficient Graph Mining System for Large Patterns.

[BibT_eX]

[DOI]

Peng Jiang

Rujia Wang

Bo Wu

CoRR, 2021

ELIχR: Eliminating Computation Redundancy in CNN-Based Video Processing.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM Redefining Scalability for Diversely Heterogeneous Architectures Workshop, 2021

NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Dryadic: Flexible and Fast Graph Pattern Matching at Scale.

[BibT_eX]

[DOI]

Proceedings of the 30th International Conference on Parallel Architectures and Compilation Techniques, 2021

2019

GraphZero: Breaking Symmetry for Efficient Graph Mining.

[BibT_eX]

[DOI]

CoRR, 2019

AutoMine: harmonizing high-level abstraction and high performance for graph mining.

[BibT_eX]

[DOI]

Daniel Mawhirter

Bo Wu

Proceedings of the 27th ACM Symposium on Operating Systems Principles, 2019

FLARE: Flexibly Sharing Commodity GPUs to Enforce QoS and Improve Utilization.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2019

Laius: Towards latency awareness and improved utilization of spatial multitasking accelerators in datacenters.

[BibT_eX]

[DOI]

Daniel Edward Mawhirter

Bo Wu

Chao Li

Minyi Guo

Proceedings of the ACM International Conference on Supercomputing, 2019

GRNN: Low-Latency and Scalable RNN Inference on GPUs.

[BibT_eX]

[DOI]

Proceedings of the Fourteenth EuroSys Conference 2019, Dresden, Germany, March 25-28, 2019, 2019

2018

Resolving the GPU responsiveness dilemma through program transformations.

[BibT_eX]

[DOI]

Frontiers Comput. Sci., 2018

ApproxG: Fast Approximate Parallel Graphlet Counting Through Accuracy Control.

[BibT_eX]

[DOI]

Proceedings of the 18th IEEE/ACM International Symposium on Cluster, 2018

Graphphi: efficient parallel graph processing on emerging throughput-oriented architectures.

[BibT_eX]

[DOI]

Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, 2018

2017

Optimizing Data Placement on GPU Memory: A Portable Approach.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2017

Understanding co-run performance on CPU-GPU integrated processors: observations, insights, directions.

[BibT_eX]

[DOI]

Frontiers Comput. Sci., 2017

Co-Run Scheduling with Power Cap on Integrated CPU-GPU Systems.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Cookie-based amplification repression protocol.

[BibT_eX]

[DOI]

Proceedings of the 36th IEEE International Performance Computing and Communications Conference, 2017

Enabling scalability-sensitive speculative parallelization for FSM computations.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Supercomputing, 2017

FinePar: irregularity-aware fine-grained workload partitioning on integrated architectures.

[BibT_eX]

[DOI]

Proceedings of the 2017 International Symposium on Code Generation and Optimization, 2017

FLEP: Enabling Flexible and Efficient Preemption on GPUs.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017

Graphie: Large-Scale Asynchronous Graph Traversals on Just a GPU.

[BibT_eX]

[DOI]

Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

2016

Examining and Reducing the Influence of Sampling Errors on Feedback-Driven Optimizations.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2016

2015

Enabling Portable Optimizations of Data Placement on GPU.

[BibT_eX]

[DOI]

IEEE Micro, 2015

ScaAnalyzer: a tool to identify memory scalability bottlenecks in parallel programs.

[BibT_eX]

[DOI]

Xu Liu

Bo Wu

Proceedings of the International Conference for High Performance Computing, 2015

Enabling and Exploiting Flexible Task Assignment on GPU through SM-Centric Program Transformations.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

Software Engagement with Sleeping CPUs.

[BibT_eX]

[DOI]

Proceedings of the 15th Workshop on Hot Topics in Operating Systems, 2015

2014

Call sequence prediction through probabilistic calling automata.

[BibT_eX]

[DOI]

Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, 2014

PORPLE: An Extensible Optimizer for Portable Data Placement on GPU.

[BibT_eX]

[DOI]

Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

Understanding Co-run Degradations on Integrated Heterogeneous Processors.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2014

Challenging the "embarrassingly sequential": parallelizing finite state machine-based computations through principled speculation.

[BibT_eX]

[DOI]

Zhijia Zhao

Bo Wu

Xipeng Shen

Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2014

SM-centric transformation: circumventing hardware restrictions for flexible GPU scheduling.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013

Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on GPU.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2013

Software-level scheduling to exploit non-uniformly shared data cache on GPGPU.

[BibT_eX]

[DOI]

Bo Wu

Weilin Wang

Xipeng Shen

Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness, 2013

Simple Profile Rectifications Go a Long Way - Statistically Exploring and Alleviating the Effects of Sampling Errors for Program Optimizations.

[BibT_eX]

[DOI]

Proceedings of the ECOOP 2013 - Object-Oriented Programming, 2013

Profmig: A framework for flexible migration of program profiles across software versions.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, 2013

Exploring hybrid memory for GPU energy efficiency through software-hardware co-design.

[BibT_eX]

[DOI]

Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013

2012

Exploiting inter-sequence correlations for program behavior prediction.

[BibT_eX]

[DOI]

Proceedings of the 27th Annual ACM SIGPLAN Conference on Object-Oriented Programming, 2012

One stone two birds: synchronization relaxation and redundancy removal in GPU-CPU translation.

[BibT_eX]

[DOI]

Ziyu Guo

Bo Wu

Xipeng Shen

Proceedings of the International Conference on Supercomputing, 2012

Speculative parallelization needs rigor: probabilistic analysis for optimal speculation of finite-state machine applications.

[BibT_eX]

[DOI]

Zhijia Zhao

Bo Wu

Xipeng Shen

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011

Enhancing Data Locality for Dynamic Simulations through Asynchronous Data Transformations and Adaptive Control.

[BibT_eX]

[DOI]

Bo Wu

Eddy Z. Zhang

Xipeng Shen

Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

Bo Wu

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...