Xiaobing Feng

Orcid: 0000-0003-2909-7750

Affiliations:

Chinese Academy of Sciences, Institute of Computing Technology, State Key Lab of Computer Architecture, Beijing, China
University of Chinese Academy of Sciences

According to our database¹, Xiaobing Feng authored at least 94 papers between 2004 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2024

Fast Convolution Meets Low Precision: Exploring Efficient Quantized Winograd Convolution on Modern CPUs.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., March, 2024

2023

Automatic Target Description File Generation.

[BibT_eX]

[DOI]

J. Comput. Sci. Technol., December, 2023

VTensor: Using Virtual Tensors to Build a Layout-Oblivious AI Programming Framework.

[BibT_eX]

[DOI]

J. Comput. Sci. Technol., September, 2023

Facilitating hardware-aware neural architecture search with learning-based predictive models.

[BibT_eX]

[DOI]

J. Syst. Archit., April, 2023

Portable and Scalable All-Electron Quantum Perturbation Simulations on Exascale Supercomputers.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2023

Honeycomb: Secure and Efficient GPU Executions via Static Validation.

[BibT_eX]

[DOI]

Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, 2023

OPTango: Multi-central Representation Learning against Innumerable Compiler Optimization for Binary Diffing.

[BibT_eX]

[DOI]

Proceedings of the 34th IEEE International Symposium on Software Reliability Engineering, 2023

Occamy: Elastically Sharing a SIMD Co-processor across Multiple CPU Cores.

[BibT_eX]

[DOI]

Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2022

CloudRaid: Detecting Distributed Concurrency Bugs via Log Mining and Enhancement.

[BibT_eX]

[DOI]

IEEE Trans. Software Eng., 2022

Scaling Poisson Solvers on Many Cores via MMEwald.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2022

An Application-oblivious Memory Scheduling System for DNN Accelerators.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2022

Optimizing deep neural networks on intelligent edge accelerators via flexible-rate filter pruning.

[BibT_eX]

[DOI]

J. Syst. Archit., 2022

2021

Unified Holistic Memory Management Supporting Multiple Big Data Processing Frameworks over Hybrid Memories.

[BibT_eX]

[DOI]

ACM Trans. Comput. Syst., 2021

Compiler-assisted Operator Template Library for DNN Accelerators.

[BibT_eX]

[DOI]

Int. J. Parallel Program., 2021

Pinpointing the Memory Behaviors of DNN Training.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2021

Understanding the Runtime Overheads of Deep Learning Inference on Edge Devices.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom), New York City, NY, USA, September 30, 2021

LoWino: Towards Efficient Low-Precision Winograd Convolutions on Modern CPUs.

[BibT_eX]

[DOI]

Proceedings of the ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9, 2021

Unleashing the Low-Precision Computation Potential of Tensor Cores on GPUs.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2021

2020

ParaML: A Polyvalent Multicore Accelerator for Machine Learning.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

Fusion-Catalyzed Pruning for Optimizing Deep Learning on Intelligent Edge Devices.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

DNNTune: Automatic Benchmarking DNN Models for Mobile-cloud Computing.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2020

Referee: A Pattern-Guided Approach for Auto Design in Compiler-Based Analyzers.

[BibT_eX]

[DOI]

Proceedings of the 27th IEEE International Conference on Software Analysis, 2020

Compiler-Assisted Operator Template Library for DNN Accelerators.

[BibT_eX]

[DOI]

Proceedings of the Network and Parallel Computing, 2020

Characterizing the I/O Pipeline in the Deployment of CNNs on Commercial Accelerators.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Parallel & Distributed Processing with Applications, 2020

Lance: efficient low-precision quantized winograd convolution for neural networks based on graphics processing units.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Accelerating Deep Learning Inference with Cross-Layer Data Reuse on GPUs.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2020: Parallel Processing, 2020

VTensor: Using Virtual Tensors to Build a Layout-oblivious AI Programming Framework.

[BibT_eX]

[DOI]

Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

Bandwidth-Aware Loop Tiling for DMA-Supported Scratchpad Memory.

[BibT_eX]

[DOI]

Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

2019

Cacheap: Portable and Collaborative I/O Optimization for Graph Processing.

[BibT_eX]

[DOI]

J. Comput. Sci. Technol., 2019

ElasticActor: An Actor System with Automatic Granularity Adjustment.

[BibT_eX]

[DOI]

Int. J. Parallel Program., 2019

Understanding Node Change Bugs for Distributed Systems.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Conference on Software Analysis, 2019

CrashTuner: detecting crash-recovery bugs in cloud systems via meta-info analysis.

[BibT_eX]

[DOI]

Proceedings of the 27th ACM Symposium on Operating Systems Principles, 2019

Exploiting the input sparsity to accelerate deep neural networks: poster.

[BibT_eX]

[DOI]

Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2019

Panthera: holistic memory management for big data processing over hybrid memories.

[BibT_eX]

[DOI]

Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2019

Accelerating GPU Computing at Runtime with Binary Optimization.

[BibT_eX]

[DOI]

Guangli Li

Lei Liu

Xiaobing Feng

Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2019

PPOpenCL: a performance-portable OpenCL compiler with host and kernel thread code fusion.

[BibT_eX]

[DOI]

Proceedings of the 28th International Conference on Compiler Construction, 2019

XDN: Towards Efficient Inference of Residual Neural Networks on Cambricon Chips.

[BibT_eX]

[DOI]

Proceedings of the Benchmarking, Measuring, and Optimizing, 2019

Acorns: A Framework for Accelerating Deep Neural Networks with Input Sparsity.

[BibT_eX]

[DOI]

Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques, 2019

2018

Using Local Clocks to Reproduce Concurrency Bugs.

[BibT_eX]

[DOI]

IEEE Trans. Software Eng., 2018

NVM Streaker: a fast and reconfigurable performance simulator for non-volatile memory-based memory architecture.

[BibT_eX]

[DOI]

J. Supercomput., 2018

RARE: An Efficient Static Fault Detection Framework for Definition-Use Faults in Large Programs.

[BibT_eX]

[DOI]

IEEE Access, 2018

CloudRaid: hunting concurrency bugs in the cloud via log-mining.

[BibT_eX]

[DOI]

Proceedings of the 2018 ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2018

Lazygraph: lazy data coherency for replicas in distributed graph-parallel computation.

[BibT_eX]

[DOI]

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

On Retargeting the AI Programming Framework to New Hardwares.

[BibT_eX]

[DOI]

Proceedings of the Network and Parallel Computing, 2018

Background Subtraction on Depth Videos with Convolutional Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 2018 International Joint Conference on Neural Networks, 2018

Characterizing DNN Models for Edge-Cloud Computing.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Symposium on Workload Characterization, 2018

Revisiting Loop Tiling for Datacenters: Live and Let Live.

[BibT_eX]

[DOI]

Proceedings of the 32nd International Conference on Supercomputing, 2018

Auto-tuning Neural Network Quantization Framework for Collaborative Inference Between the Cloud and Edge.

[BibT_eX]

[DOI]

Proceedings of the Artificial Neural Networks and Machine Learning - ICANN 2018, 2018

Fast CNN Pruning via Redundancy-Aware Training.

[BibT_eX]

[DOI]

Proceedings of the Artificial Neural Networks and Machine Learning - ICANN 2018, 2018

May-happen-in-parallel analysis with static vector clocks.

[BibT_eX]

[DOI]

Proceedings of the 2018 International Symposium on Code Generation and Optimization, 2018

2017

Locating Software Faults Based on Minimum Debugging Frontier Set.

[BibT_eX]

[DOI]

IEEE Trans. Software Eng., 2017

An Accelerator for High Efficient Vision Processing.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2017

Parallel Incremental Frequent Itemset Mining for Large Data.

[BibT_eX]

[DOI]

Yu-Geng Song

Hui-Min Cui

Xiaobing Feng

J. Comput. Sci. Technol., 2017

Two-Level Task Scheduling for Irregular Applications on GPU Platform.

[BibT_eX]

[DOI]

Int. J. Parallel Program., 2017

2016

Predicting Cross-Core Performance Interference on Multicore Processors with Regression Analysis.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2016

Pragma Directed Shared Memory Centric Optimizations on GPUs.

[BibT_eX]

[DOI]

J. Comput. Sci. Technol., 2016

Articulation points guided redundancy elimination for betweenness centrality.

[BibT_eX]

[DOI]

Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2016

Efficient Management for Hybrid Memory in Managed Language Runtime.

[BibT_eX]

[DOI]

Proceedings of the Network and Parallel Computing, 2016

2015

WiseThrottling: a new asynchronous task scheduler for mitigating I/O bottleneck in large-scale datacenter servers.

[BibT_eX]

[DOI]

J. Supercomput., 2015

Practical Iterative Optimization for the Data Center.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2015

ShiDianNao: shifting vision processing closer to the sensor.

[BibT_eX]

[DOI]

Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

ReCBuLC: Reproducing Concurrency Bugs Using Local Clocks.

[BibT_eX]

[DOI]

Proceedings of the 37th IEEE/ACM International Conference on Software Engineering, 2015

Hadoop+: Modeling and Evaluating the Heterogeneity for MapReduce Applications in Heterogeneous Clusters.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

PuDianNao: A Polyvalent Machine Learning Accelerator.

[BibT_eX]

[DOI]

Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, 2015

2014

Dynamic I/O-Aware Scheduling for Batch-Mode Applications on Chip Multiprocessor Systems of Cluster Platforms.

[BibT_eX]

[DOI]

J. Comput. Sci. Technol., 2014

Group Orbit Optimization: A Unified Approach to Data Normalization.

[BibT_eX]

[DOI]

Shuchang Zhou

Zhihua Zhang

Xiaobing Feng

CoRR, 2014

Concurrency bug localization using shared memory access pairs.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2014

Localization of concurrency bugs using shared memory access pairs.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE International Conference on Automated Software Engineering, 2014

A collaborative divide-and-conquer K-means clustering algorithm for processing large data.

[BibT_eX]

[DOI]

Proceedings of the Computing Frontiers Conference, CF'14, 2014

2013

Layout-oblivious compiler optimization for matrix computations.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2013

Effective fault localization based on minimum debugging frontier set.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, 2013

An empirical model for predicting cross-core performance interference on multicore processors.

[BibT_eX]

[DOI]

Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013

2012

Extendable pattern-oriented optimization directives.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2012

A Hybrid Circular Queue Method for Iterative Stencil Computations on GPUs.

[BibT_eX]

[DOI]

J. Comput. Sci. Technol., 2012

Can We Make It Faster? Efficient May-Happen-in-Parallel Analysis Revisited.

[BibT_eX]

[DOI]

Proceedings of the 13th International Conference on Parallel and Distributed Computing, 2012

A Highly Parallel Reuse Distance Analysis Algorithm on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Layout-oblivious optimization for matrix computations.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

Making it practical and effective: fast and precise may-happen-in-parallel analysis.

[BibT_eX]

[DOI]

Congming Chen

Wei Huo

Xiaobing Feng

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011

Dependence-based multi-level tracing and replay for wireless sensor networks debugging.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGPLAN/SIGBED 2011 conference on Languages, 2011

Automatic Library Generation for BLAS3 on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Parallelizing a machine translation decoder for multicore computer.

[BibT_eX]

[DOI]

Proceedings of the Seventh International Conference on Natural Computation, 2011

2010

Landing Stencil Code on Godson-T.

[BibT_eX]

[DOI]

J. Comput. Sci. Technol., 2010

Continuous speculative program parallelization in software.

[BibT_eX]

[DOI]

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010

Software-Hardware Cooperative DRAM Bank Partitioning for Chip Multiprocessors.

[BibT_eX]

[DOI]

Proceedings of the Network and Parallel Computing, IFIP International Conference, 2010

Level by level: making flow- and context-sensitive pointer analysis scalable for millions of lines of code.

[BibT_eX]

[DOI]

Proceedings of the CGO 2010, 2010

An adaptive task creation strategy for work-stealing scheduling.

[BibT_eX]

[DOI]

Proceedings of the CGO 2010, 2010

2009

PARBLO: Page-Allocation-Based DRAM Row Buffer Locality Optimization.

[BibT_eX]

[DOI]

J. Comput. Sci. Technol., 2009

Detecting and Eliminating Potential Violations of Sequential Consistency for Concurrent C/C++ Programs.

[BibT_eX]

[DOI]

Proceedings of the CGO 2009, 2009

2008

Exploiting idle register classes for fast spill destination.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual International Conference on Supercomputing, 2008

Global Tiling for Communication Minimal Parallelization on Distributed Memory Systems.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2008, 2008

2006

Library Function Disposing Approach in Binary Translation.

[BibT_eX]

[DOI]

J. Comput. Res. Dev., 2006

Global Partial Replicate Computation Partitioning.

[BibT_eX]

[DOI]

J. Comput. Res. Dev., 2006

2005

Integrating Parallelizing Compilation Technologies for SMP Clusters.

[BibT_eX]

[DOI]

J. Comput. Sci. Technol., 2005

2004

An Overview of the Open Research Compiler.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for High Performance Computing, 2004

Xiaobing Feng

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...