Xipeng Shen

According to our database1, Xipeng Shen authored at least 146 papers between 2001 and 2021.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2021
An Automatic Synthesizer of Advising Tools for High Performance Computing.
IEEE Trans. Parallel Distributed Syst., 2021

Faster SAT Solving for Software with Repeated Structures (with Case Studies on Software Test Suite Minimization).
CoRR, 2021

Exploring deep reuse in winograd CNN inference.
Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

Understanding and bridging the gaps in current GNN performance optimizations.
Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

Deep NLP-based co-evolvement for synthesizing code analysis from natural language.
Proceedings of the CC '21: 30th ACM SIGPLAN International Conference on Compiler Construction, 2021

2020
Enabling Runtime SpMV Format Selection through an Overhead Conscious Method.
IEEE Trans. Parallel Distributed Syst., 2020

DIAC: An Inter-app Conflicts Detector for Open IoT Systems.
ACM Trans. Embed. Comput. Syst., 2020

Achieving Real-Time LiDAR 3D Object Detection on a Mobile Device.
CoRR, 2020

TADOC: Text Analytics Directly on Compression.
CoRR, 2020

Simpler Hyperparameter Optimization for Software Analytics: Why, How, When?
CoRR, 2020

Achieving Real-Time Execution of 3D Convolutional Neural Networks on Mobile Devices.
CoRR, 2020

CoCoPIE: Making Mobile AI Sweet As PIE -Compression-Compilation Co-Design Goes a Long Way.
CoRR, 2020

Special Issue: Graph Computing.
Concurr. Comput. Pract. Exp., 2020

HISyn: human learning-inspired natural language programming.
Proceedings of the ESEC/FSE '20: 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2020

FLEET: Flexible Efficient Ensemble Training for Heterogeneous Deep Neural Networks.
Proceedings of Machine Learning and Systems 2020, 2020

Hardware-Based Domain Virtualization for Intra-Process Isolation of Persistent Memory Objects.
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

HARP: holistic analysis for refactoring Python-based analytics programs.
Proceedings of the ICSE '20: 42nd International Conference on Software Engineering, Seoul, South Korea, 27 June, 2020

MKPipe: a compiler framework for optimizing multi-kernel workloads in OpenCL for FPGA.
Proceedings of the ICS '20: 2020 International Conference on Supercomputing, 2020

Enabling Efficient Random Access to Hierarchically-Compressed Data.
Proceedings of the 36th IEEE International Conference on Data Engineering, 2020

MERR: Improving Security of Persistent Memory Objects via Efficient Memory Exposure Reduction and Randomization.
Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020

GOPipe: A Granularity-Oblivious Programming Framework for Pipelined Stencil Executions on GPU.
Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

2019
How to "DODGE" Complex Software Analytics?
CoRR, 2019

Wootz: a compiler-based framework for fast CNN pruning via composability.
Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2019

In-Place Zero-Space Memory Protection for CNN.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

IA-graph based inter-app conflicts detection in open IoT systems.
Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, 2019

Deep reuse: streamline CNN inference on the fly via coarse-grained computation reuse.
Proceedings of the ACM International Conference on Supercomputing, 2019

Adaptive Deep Reuse: Accelerating CNN Training on the Fly.
Proceedings of the 35th IEEE International Conference on Data Engineering, 2019

Streamline Density Peak Clustering for Practical Adoptions.
Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 2019

HiWayLib: A Software Framework for Enabling High Performance Communications for Heterogeneous Pipeline Computations.
Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019

2018
Efficient Document Analytics on Compressed Data: Method, Challenges, Algorithms, Insights.
Proc. VLDB Endow., 2018

LCD: A Fast Contrastive Divergence Based Algorithm for Restricted Boltzmann Machine.
Neural Networks, 2018

Editorial for the Special Issue on In-Memory Computing.
J. Parallel Distributed Comput., 2018

Resolving the GPU responsiveness dilemma through program transformations.
Frontiers Comput. Sci., 2018

Hyperparameter Optimization for Effort Estimation.
CoRR, 2018

Why Software Effort Estimation Needs SBSE.
CoRR, 2018

Exploring flexible communications for streamlining DNN ensemble training pipelines.
Proceedings of the International Conference for High Performance Computing, 2018

Bridging the gap between deep learning and sparse matrix format selection.
Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

Footprint modeling of cache associativity and granularity.
Proceedings of the International Symposium on Memory Systems, 2018

Overhead-Conscious Format Selection for SpMV-Based Applications.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

Taming the "Monster": Overcoming Program Optimization Challenges on SW26010 Through Precise Performance Modeling.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

Zwift: A Programming Framework for High Performance Text Analytics on Compressed Data.
Proceedings of the 32nd International Conference on Supercomputing, 2018

LEEM: Lean Elastic EM for Gaussian Mixture Model via Bounds-Based Filtering.
Proceedings of the IEEE International Conference on Data Mining, 2018

Reuse-Centric K-Means Configuration.
Proceedings of the 34th IEEE International Conference on Data Engineering, 2018

FALCON: A Fast Drop-In Replacement of Citation KNN for Multiple Instance Learning.
Proceedings of the 27th ACM International Conference on Information and Knowledge Management, 2018

Rethinking compilers in the rise of machine learning and AI (keynote).
Proceedings of the 27th International Conference on Compiler Construction, 2018

2017
Optimizing Data Placement on GPU Memory: A Portable Approach.
IEEE Trans. Computers, 2017

GLORE: generalized loop redundancy elimination upon LER-notation.
Proc. ACM Program. Lang., 2017

Understanding co-run performance on CPU-GPU integrated processors: observations, insights, directions.
Frontiers Comput. Sci., 2017

Egeria: a framework for automatic synthesis of HPC advising tools through multi-layered natural language processing.
Proceedings of the International Conference for High Performance Computing, 2017

POSTER: An Infrastructure for HPC Knowledge Sharing and Reuse.
Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017

EffiSha: A Software Framework for Enabling Effficient Preemptive Scheduling of GPU.
Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017

Generalizations of the theory and deployment of triangular inequality for compiler-based strength reduction.
Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2017

Versapipe: a versatile programming framework for pipelined computing on GPU.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

Efficient support of position independence on non-volatile memory.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

Bridging the gap between memory performance and massive parallelism: the critical role of programming systems innovations (keynote).
Proceedings of the 2017 ACM SIGPLAN International Symposium on Memory Management, 2017

Co-Run Scheduling with Power Cap on Integrated CPU-GPU Systems.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Sweet KNN: An Efficient KNN on GPU through Reconciliation between Redundancy Removal and Regularity.
Proceedings of the 33rd IEEE International Conference on Data Engineering, 2017

POSTER: Bridging the Gap Between Deep Learning and Sparse Matrix Format Selection.
Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

POSTER: Cutting the Fat: Speeding Up RBM for Fast Deep Learning Through Generalized Redundancy Elimination.
Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

2016
Examining and Reducing the Influence of Sampling Errors on Feedback-Driven Optimizations.
ACM Trans. Archit. Code Optim., 2016

Tuning for software analytics: Is it really necessary?
Inf. Softw. Technol., 2016

Data-centric combinatorial optimization of parallel code.
Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2016

Coherence-Free Multiview: Enabling Reference-Discerning Data Placement on GPU.
Proceedings of the 2016 International Conference on Supercomputing, 2016

Towards Ontology-Based Program Analysis.
Proceedings of the 30th European Conference on Object-Oriented Programming, 2016

The workshop on compiler-driven performance.
Proceedings of the 26th Annual International Conference on Computer Science and Software Engineering, 2016

OpenCL-based erasure coding on heterogeneous architectures.
Proceedings of the 27th IEEE International Conference on Application-specific Systems, 2016

2015
TOP: A Framework for Enabling Algorithmic Optimizations for Distance-Related Problems.
Proc. VLDB Endow., 2015

Enabling Portable Optimizations of Data Placement on GPU.
IEEE Micro, 2015

Enhancing domain specific language implementations through ontology.
Proceedings of the 5th International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing, 2015

Autotuning algorithmic choice for input sensitivity.
Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2015

Free launch: optimizing GPU dynamic kernel launches through thread reuse.
Proceedings of the 48th International Symposium on Microarchitecture, 2015

Enabling and Exploiting Flexible Task Assignment on GPU through SM-Centric Program Transformations.
Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup.
Proceedings of the 32nd International Conference on Machine Learning, 2015

Software Engagement with Sleeping CPUs.
Proceedings of the 15th Workshop on Hot Topics in Operating Systems, 2015

14th compiler-driven performance workshop.
Proceedings of 25th Annual International Conference on Computer Science and Software Engineering, 2015

On-the-Fly Principled Speculation for FSM Parallelization.
Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, 2015

2014
Space-efficient multi-versioning for input-adaptive feedback-driven program optimizations.
Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, 2014

Call sequence prediction through probabilistic calling automata.
Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, 2014

PORPLE: An Extensible Optimizer for Portable Data Placement on GPU.
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

Understanding Co-run Degradations on Integrated Heterogeneous Processors.
Proceedings of the Languages and Compilers for Parallel Computing, 2014

Localization of concurrency bugs using shared memory access pairs.
Proceedings of the ACM/IEEE International Conference on Automated Software Engineering, 2014

SatScore: uncovering and avoiding a principled pitfall in responsiveness measurements of app launches.
Proceedings of the 2014 ACM Conference on Ubiquitous Computing, UbiComp '14, Seattle, WA, 2014

Challenging the "embarrassingly sequential": parallelizing finite state machine-based computations through principled speculation.
Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2014

Finding the limit: examining the potential and complexity of compilation scheduling for JIT-based runtime systems.
Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2014

SM-centric transformation: circumventing hardware restrictions for flexible GPU scheduling.
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013
HPar: A practical parallel parser for HTML-taming HTML complexities for parallel parsing.
ACM Trans. Archit. Code Optim., 2013

An Infrastructure for Tackling Input-Sensitivity of GPU Program Optimizations.
Int. J. Parallel Program., 2013

Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on GPU.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2013

Software-level scheduling to exploit non-uniformly shared data cache on GPGPU.
Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness, 2013

A Versatile Performance and Energy Simulation Tool for Composite GPU Global Memory.
Proceedings of the 2013 IEEE 21st International Symposium on Modelling, 2013

Simple Profile Rectifications Go a Long Way - Statistically Exploring and Alleviating the Effects of Sampling Errors for Program Optimizations.
Proceedings of the ECOOP 2013 - Object-Oriented Programming, 2013

Profmig: A framework for flexible migration of program profiles across software versions.
Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, 2013

Exploring hybrid memory for GPU energy efficiency through software-hardware co-design.
Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013

2012
The Significance of CMP Cache Sharing on Contemporary Multithreaded Applications.
IEEE Trans. Parallel Distributed Syst., 2012

A study towards optimal data layout for GPU computing.
Proceedings of the 2012 ACM SIGPLAN workshop on Memory Systems Performance and Correctness: held in conjunction with PLDI '12, 2012

Exploiting inter-sequence correlations for program behavior prediction.
Proceedings of the 27th Annual ACM SIGPLAN Conference on Object-Oriented Programming, 2012

Optimal Co-Scheduling to Minimize Makespan on Chip Multiprocessors.
Proceedings of the Job Scheduling Strategies for Parallel Processing, 2012

One stone two birds: synchronization relaxation and redundancy removal in GPU-CPU translation.
Proceedings of the International Conference on Supercomputing, 2012

Speculative parallelization needs rigor: probabilistic analysis for optimal speculation of finite-state machine applications.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011
The Complexity of Optimal Job Co-Scheduling on Chip Multiprocessors and Heuristics-Based Solutions.
IEEE Trans. Parallel Distributed Syst., 2011

A step towards transparent integration of input-consciousness into dynamic program optimizations.
Proceedings of the 26th Annual ACM SIGPLAN Conference on Object-Oriented Programming, 2011

Fine-Grained Treatment to Synchronizations in GPU-to-CPU Translation.
Proceedings of the Languages and Compilers for Parallel Computing, 2011

On-the-fly elimination of dynamic irregularities for GPU computing.
Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems, 2011

Enhancing Data Locality for Dynamic Simulations through Asynchronous Data Transformations and Adaptive Control.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

Correctly Treating Synchronizations in Compiling Fine-Grained SPMD-Threaded Programs for CPU.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

2010
Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs?
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010

An input-centric paradigm for program dynamic optimizations.
Proceedings of the 25th Annual ACM SIGPLAN Conference on Object-Oriented Programming, 2010

LU Decomposition on Cell Broadband Engine: An Empirical Study to Exploit Heterogeneous Chip Multiprocessors.
Proceedings of the Network and Parallel Computing, IFIP International Conference, 2010

Array Regrouping on CMP with Non-uniform Cache Sharing.
Proceedings of the Languages and Compilers for Parallel Computing, 2010

Streamlining GPU applications on the fly: thread divergence elimination through runtime thread-data remapping.
Proceedings of the 24th International Conference on Supercomputing, 2010

Combining Locality Analysis with Online Proactive Job Co-scheduling in Chip Multiprocessors.
Proceedings of the High Performance Embedded Architectures and Compilers, 2010

Exploiting statistical correlations for proactive prediction of program behaviors.
Proceedings of the CGO 2010, 2010

Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors?
Proceedings of the Compiler Construction, 19th International Conference, 2010

2009
Program locality analysis using reuse distance.
ACM Trans. Program. Lang. Syst., 2009

The study and handling of program inputs in the selection of garbage collectors.
ACM SIGOPS Oper. Syst. Rev., 2009

Influence of program inputs on the selection of garbage collectors.
Proceedings of the 5th International Conference on Virtual Execution Environments, 2009

A cross-input adaptive framework for GPU program optimizations.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Speculation with Little Wasting: Saving Cost in Software Speculation through Transparent Learning.
Proceedings of the 15th IEEE International Conference on Parallel and Distributed Systems, 2009

Cross-Input Learning and Discriminative Prediction in Evolvable Virtual Machines.
Proceedings of the CGO 2009, 2009

A study on optimally co-scheduling jobs of different lengths on chip multiprocessors.
Proceedings of the 6th Conference on Computing Frontiers, 2009

2008
Scalable Implementation of Efficient Locality Approximation.
Proceedings of the Languages and Compilers for Parallel Computing, 2008

Adaptive speculation in behavior-oriented parallelization.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Adaptive Software Speculation for Enhancing the Cost-Efficiency of Behavior-Oriented Parallelization.
Proceedings of the 2008 International Conference on Parallel Processing, 2008

Exploration of the Influence of Program Inputs on CMP Co-scheduling.
Proceedings of the Euro-Par 2008, 2008

Analysis and approximation of optimal co-scheduling on chip multiprocessors.
Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, 2008

2007
Miss Rate Prediction Across Program Inputs and Cache Configurations.
IEEE Trans. Computers, 2007

Predicting locality phases for dynamic memory optimization.
J. Parallel Distributed Comput., 2007

Locality approximation using time.
Proceedings of the 34th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2007

Software behavior oriented parallelization.
Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation, 2007

Modeling Relations between Inputs and Dynamic Behavior for General Programs.
Proceedings of the Languages and Compilers for Parallel Computing, 2007

A Key-based Adaptive Transactional Memory Executor.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Analysis of input-dependent program behavior using active profiling.
Proceedings of the Workshop on Experimental Computer Science, 2007

Bridging Inputs and Program Dynamic Behavior.
Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007

2006
Program-level adaptive memory management.
Proceedings of the 5th International Symposium on Memory Management, 2006

2005
Parallelization of Utility Programs Based on Behavior Phase Analysis.
Proceedings of the Languages and Compilers for Parallel Computing, 2005

Lightweight reference affinity analysis.
Proceedings of the 19th Annual International Conference on Supercomputing, 2005

Gated memory control for memory monitoring, leak detection and garbage collection.
Proceedings of the 2005 workshop on Memory System Performance, 2005

2004
Learning multi-label scene classification.
Pattern Recognit., 2004

Multilabel machine learning and its application to semantic scene classification.
Proceedings of the Storage and Retrieval Methods and Applications for Multimedia 2004, 2004

Array regrouping and structure splitting using whole-program reference affinity.
Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation 2004, 2004

Phase-Based Miss Rate Prediction Across Program Inputs.
Proceedings of the Languages and Compilers for High Performance Computing, 2004

Adaptive Data Partition for Sorting Using Probability Distribution.
Proceedings of the 33rd International Conference on Parallel Processing (ICPP 2004), 2004

Locality phase prediction.
Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems, 2004

2003
A Hierarchical Model of Reference Affinity.
Proceedings of the Languages and Compilers for Parallel Computing, 2003

2001
The study of the effect of training set on statistical language modeling.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

Study and auto-detection of stress based on tonal pitch range in Mandarin.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001


  Loading...