Proceedings of the 24th IEEE Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, 2022

A High-performance SpMV Accelerator on HBM-equipped FPGAs.

[BibT_eX]

[DOI]

Tao Li

Li Shen

Shangshang Yao

2021

GraphPEG: Accelerating Graph Processing on GPUs.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2021

Reducing TLB Miss Penalty on GPUs via Unified Multi-level PWB and PWC.

[BibT_eX]

[DOI]

Proceedings of the 12th International Symposium on Parallel Architectures, 2021

An Efficient Hybrid Parallel Compression Approximate Multiplier.

[BibT_eX]

[DOI]

Proceedings of the 39th IEEE International Conference on Computer Design, 2021

A Multi-precision Quantized Super-Resolution Model Framework.

[BibT_eX]

[DOI]

Proceedings of the Algorithms and Architectures for Parallel Processing, 2021

Multi-level PWB and PWC for Reducing TLB Miss Overheads on GPUs.

[BibT_eX]

[DOI]

Proceedings of the Algorithms and Architectures for Parallel Processing, 2021

2020

Transparent partial page migration between CPU and GPU.

[BibT_eX]

[DOI]

Frontiers Comput. Sci., 2020

A Multi-model Super-Resolution Training and Reconstruction Framework.

[BibT_eX]

[DOI]

Proceedings of the Network and Parallel Computing, 2020

A Unified Page Walk Buffer and Page Walk Cache.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Parallel & Distributed Processing with Applications, 2020

Customizing Super-Resolution Framework According to Image Features.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Parallel & Distributed Processing with Applications, 2020

High-Performance Computing and Engineering Educational Development and Practice.

[BibT_eX]

[DOI]

Juan Chen

John Impagliazzo

Li Shen

Proceedings of the IEEE Frontiers in Education Conference, 2020

2019

A statistic approach for power analysis of integrated GPU.

[BibT_eX]

[DOI]

Soft Comput., 2019

MMSR: A Multi-model Super Resolution Framework.

[BibT_eX]

[DOI]

Proceedings of the Network and Parallel Computing, 2019

A Lightweight Method for Handling Control Divergence in GPGPUs.

[BibT_eX]

[DOI]

YaoHua Yang

Shiqing Zhang

Li Shen

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2019

Improve Student Performance Using Moderated Two-Stage Projects.

[BibT_eX]

[DOI]

Proceedings of the ACM Conference on Global Computing Education, 2019

2018

Accelerating Deep Learning with a Parallel Mechanism Using CPU + MIC.

[BibT_eX]

[DOI]

Sijiang Fan

Jiawei Fei

Li Shen

Int. J. Parallel Program., 2018

Resolving the GPU responsiveness dilemma through program transformations.

[BibT_eX]

[DOI]

Frontiers Comput. Sci., 2018

Design of Practical Experiences to Improve Student Understanding of Efficiency and Scalability Issues in High Performance Computing: (Abstract Only).

[BibT_eX]

[DOI]

Proceedings of the 49th ACM Technical Symposium on Computer Science Education, 2018

GPU Memory Management Solution Supporting Incomplete Pages.

[BibT_eX]

[DOI]

Proceedings of the Network and Parallel Computing, 2018

Adaptive VC Partitioning for NoCs in GPGPUs.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Circuits and Systems, 2018

Efficient Data Communication between CPU and GPU through Transparent Partial-Page Migration.

[BibT_eX]

[DOI]

Proceedings of the 20th IEEE International Conference on High Performance Computing and Communications; 16th IEEE International Conference on Smart City; 4th IEEE International Conference on Data Science and Systems, 2018

Control Divergence Optimization through Partial Warp Regrouping in GPGPUs.

[BibT_eX]

[DOI]

YaoHua Yang

Shiqing Zhang

Li Shen

Proceedings of the 2018 2nd International Conference on Computer Science and Artificial Intelligence, 2018

Parallel programming course development based on parallel computational thinking.

[BibT_eX]

[DOI]

Proceedings of ACM Turing Celebration Conference - China, 2018

Design of paper CPU project to improve student understanding of CPU working principle.

[BibT_eX]

[DOI]

Juan Chen

Li Shen

Proceedings of ACM Turing Celebration Conference - China, 2018

2017

Improving the Efficiency of GPGPU Work-Queue Through Data Awareness.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2017

线程级猜测并行系统代码自动生成工具的设计与实现 (Design and Implementation of Automatic Code Generator for TLS System).

[BibT_eX]

[DOI]

Jialong Wang

Yanhong Liu

Li Shen

计算机科学, 2017

Understanding co-run performance on CPU-GPU integrated processors: observations, insights, directions.

[BibT_eX]

[DOI]

Frontiers Comput. Sci., 2017

Spark-SIFT: A Spark-Based Large-Scale Image Feature Extract System.

[BibT_eX]

[DOI]

Xinming Zhang

YaoHua Yang

Li Shen

Proceedings of the 13th International Conference on Semantics, Knowledge and Grids, 2017

Unleashing the power of GPU for physically-based rendering via dynamic ray shuffling.

[BibT_eX]

[DOI]

Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

Parallel Computing in DNNs Using CPU and MIC.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), 2017

OTR: A Fine-Grained Dynamic Power Scaling Pipeline Based on Trace.

[BibT_eX]

[DOI]

Co-Run Scheduling with Power Cap on Integrated CPU-GPU Systems.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

A Novel Statistical Power Model for Integrated GPU with Optimization.

[BibT_eX]

[DOI]

Proceedings of the Data Science, 2017

POSTER: DaQueue: A Data-Aware Work-Queue Design for GPGPUs.

[BibT_eX]

[DOI]

Ya-Shuai Lü

Libo Huang

Li Shen

Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

A Software-Hardware Co-designed Methodology for Efficient Thread Level Speculation.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Computer and Information Technology, 2017

2016

GPU平台上面向性能和功耗的分支优化 (Branch Divergence Optimization for Performance and Power Consumption on GPU Platform).

[BibT_eX]

[DOI]

计算机科学, 2016

面向Cassandra数据库的高效动态数据管理机制 (Efficient and Dynamic Data Management System for Cassandra Database).

[BibT_eX]

[DOI]

计算机科学, 2016

Optimization Strategies Oriented to Loop Characteristics in Software Thread Level Speculation Systems.

[BibT_eX]

[DOI]

Li Shen

Fan Xu

Zhiying Wang

J. Comput. Sci. Technol., 2016

Fast Task Submission in Software Thread Level Speculation Systems.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE Trustcom/BigDataSE/ISPA, 2016

Dynamic Power-Performance Adjustment on Clustered Multi-Threading Processors.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Networking, 2016

An implementation of analytical power model on integrated GPU.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Integrated Circuits, 2016

A lightweight instruction-set simulator for teaching of dynamic instruction scheduling.

[BibT_eX]

[DOI]

Wenjie Liu

Li Shen

Zhiying Wang

Proceedings of the 11th International Conference on Computer Science & Education, 2016

A Hybrid Power-Performance Adjustment Strategy for Clustered Multi-threading Architecture.

[BibT_eX]

[DOI]

Proceedings of the 18th IEEE International Conference on High Performance Computing and Communications; 14th IEEE International Conference on Smart City; 2nd IEEE International Conference on Data Science and Systems, 2016

2014

Novel Flow Control for Fully Adaptive Routing in Cache-Coherent NoCs.

[BibT_eX]

[DOI]

Sheng Ma

Zhiying Wang

Natalie D. Enright Jerger

Li Shen

Nong Xiao

IEEE Trans. Parallel Distributed Syst., 2014

Mac or Non-MAC: not a Problem.

[BibT_eX]

[DOI]

J. Circuits Syst. Comput., 2014

Binary compatibility for embedded systems using greedy subgraph mapping.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2014

Implementing a Leading Loads Performance Predictor on Commodity Processors.

[BibT_eX]

[DOI]

Proceedings of the 2014 USENIX Annual Technical Conference, 2014

PPEP: Online Performance, Power, and Energy Prediction Framework and DVFS Space Exploration.

[BibT_eX]

[DOI]

Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

Understanding Co-run Degradations on Integrated Heterogeneous Processors.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2014

Improving Speculation Accuracy with Inter-thread Fetching Value Prediction.

[BibT_eX]

[DOI]

Proceedings of the Algorithms and Architectures for Parallel Processing, 2014

Dynamic Power Estimation with Hardware Performance Counters Support on Multi-core Platform.

[BibT_eX]

[DOI]

Proceedings of the Advanced Computer Architecture - 10th Annual Conference, 2014

Customized Core Layout: A Case Study on Dual-Core Dynamic Binary Translation System.

[BibT_eX]

[DOI]

Proceedings of the 14th IEEE International Conference on Computer and Information Technology, 2014

2013

Region-Based Way-Partitioning on L1 Data Cache for Low Power.

[BibT_eX]

[DOI]

Zhong Zheng

Zhiying Wang

Li Shen

IEICE Trans. Inf. Syst., 2013

HEUSPEC: A Software Speculation Parallel Model.

[BibT_eX]

[DOI]

Proceedings of the 42nd International Conference on Parallel Processing, 2013

DCP: Improving the Throughput of Asynchronous Pipeline by Dual Control Path.

[BibT_eX]

[DOI]

Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing, 2013

2012

Low-Cost Binary128 Floating-Point FMA Unit Design with SIMD Support.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2012

Dynamic Optimization on Multi-core Platform.

[BibT_eX]

[DOI]

Jiahui Wen

Li Shen

Zhiying Wang

Proceedings of the 11th IEEE International Conference on Trust, 2012

2011

GSM: An Efficient Code Generation Algorithm for Dynamic Binary Translator.

[BibT_eX]

[DOI]

Proceedings of the Fourth International Symposium on Parallel Architectures, 2011

Characterizing Fine-Grain Parallelism on Modern Multicore Platform.

[BibT_eX]

[DOI]

Proceedings of the 17th IEEE International Conference on Parallel and Distributed Systems, 2011

A novel shared-buffer router for network-on-chip based on Hierarchical Bit-line Buffer.

[BibT_eX]

[DOI]

Proceedings of the IEEE 29th International Conference on Computer Design, 2011

A specialized low-cost vectorized loop buffer for embedded processors.

[BibT_eX]

[DOI]

Proceedings of the Design, Automation and Test in Europe, 2011

2010

Permutation optimization for SIMD devices.

[BibT_eX]

[DOI]

Libo Huang

Li Shen

Zhiying Wang

Proceedings of the International Symposium on Circuits and Systems (ISCAS 2010), May 30, 2010

SIF: Overcoming the limitations of SIMD devices via implicit permutation.

[BibT_eX]

[DOI]

Proceedings of the 16th International Conference on High-Performance Computer Architecture (HPCA-16 2010), 2010

A Dynamic Binary Translation Framework Based on Page Fault Mechanism in Linux Kernel.

[BibT_eX]

[DOI]

Fan Xu

Li Shen

Zhiying Wang

Proceedings of the 10th IEEE International Conference on Computer and Information Technology, 2010

2009

Optimal subgraph covering for customisable VLIW processors.

[BibT_eX]

[DOI]

IET Comput. Digit. Tech., 2009

Using Pcache to Speedup Interpretation in Dynamic Binary Translation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Parallel and Distributed Processing with Applications, 2009

A Light-weight Code Cache Design for Dynamic Binary Translation.

[BibT_eX]

[DOI]

Proceedings of the 15th IEEE International Conference on Parallel and Distributed Systems, 2009

Dynamically utilizing computation accelerators for extensible processors in a software approach.

[BibT_eX]

[DOI]

Proceedings of the 7th International Conference on Hardware/Software Codesign and System Synthesis, 2009

A Hardware Approach for Reducing Interpretation Overhead.

[BibT_eX]

[DOI]

Proceedings of the Ninth IEEE International Conference on Computer and Information Technology, 2009

2008

A New CORDIC Algorithm and Software Implementation Based on Synchronized Data Triggering Architecture.

[BibT_eX]

[DOI]

Proceedings of the 2008 International Conference on Multimedia and Ubiquitous Engineering (MUE 2008), 2008

A Novel Hardware Assisted Full Virtualization Technique.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference for Young Computer Scientists, 2008

DBTIM: An Advanced Hardware Assisted Full Virtualization Architecture.

[BibT_eX]

[DOI]

Proceedings of the 2008 IEEE/IPIP International Conference on Embedded and Ubiquitous Computing (EUC 2008), 2008

Customizing computation accelerators for extensible multi-issue processors with effective optimization techniques.

[BibT_eX]

[DOI]

Proceedings of the 45th Design Automation Conference, 2008

2007

Hardware Support for Arithmetic Units of Processor with Multimedia Extension.

[BibT_eX]

[DOI]

Proceedings of the 2007 International Conference on Multimedia and Ubiquitous Engineering (MUE 2007), 2007

A New Architecture For Multiple-Precision Floating-Point Multiply-Add Fused Unit Design.

[BibT_eX]

[DOI]

Proceedings of the 18th IEEE Symposium on Computer Arithmetic (ARITH-18 2007), 2007

2004

A New Technique for Program Code Compression in Embedded Microprocessor.

[BibT_eX]

[DOI]

Proceedings of the Embedded Software and Systems, First International Conference, 2004

2003

Predicate Analysis Based on Path Information.

[BibT_eX]

[DOI]

Li Shen

Zhiying Wang

Jianzhuang Lu

Proceedings of the Advanced Parallel Programming Technologies, 5th International Workshop, 2003

Li Shen

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...