Li Shen

Affiliations:
  • National University of Defense Technology, School of Computer, Changsha, Hunan, China


According to our database1, Li Shen authored at least 89 papers between 2003 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
MPRTA: An Efficient Multilevel Parallel Mobile Accelerator for High-Performance Ray Tracing.
IEEE Trans. Very Large Scale Integr. Syst., February, 2024

Local Sample-Weighted Multiple Kernel Clustering With Consensus Discriminative Graph.
IEEE Trans. Neural Networks Learn. Syst., February, 2024

A Low-Cost Floating-Point Dot-Product-Dual-Accumulate Architecture for HPC-Enabled AI.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., February, 2024

2023
MMsRT: A Hardware Architecture for Ray Tracing in the Mobile Domain.
J. Circuits Syst. Comput., July, 2023

Dense Crosstalk Feature Aggregation for Classification and Localization in Object Detection.
IEEE Trans. Circuits Syst. Video Technol., June, 2023

Late Fusion Multiple Kernel Clustering With Local Kernel Alignment Maximization.
IEEE Trans. Multim., 2023

One-step Multi-view Clustering with Diverse Representation.
CoRR, 2023

ImprLM: An Improved Logarithmic Multiplier Design Approach via Iterative Linear-Compensation and Modified Dynamic Segment.
Proceedings of the 41st IEEE International Conference on Computer Design, 2023

2022
Compressed page walk cache.
Frontiers Comput. Sci., 2022

Stride Equality Prediction for Value Speculation.
IEEE Comput. Archit. Lett., 2022

DGEMM Optimization Oriented to ARM SVE Instruction Set Architecture.
Proceedings of the 28th IEEE International Conference on Parallel and Distributed Systems, 2022

RTA: an Efficient SIMD Architecture for Ray Tracing.
Proceedings of the 24th IEEE Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, 2022

A High-performance SpMV Accelerator on HBM-equipped FPGAs.
Proceedings of the 24th IEEE Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, 2022

2021
GraphPEG: Accelerating Graph Processing on GPUs.
ACM Trans. Archit. Code Optim., 2021

Reducing TLB Miss Penalty on GPUs via Unified Multi-level PWB and PWC.
Proceedings of the 12th International Symposium on Parallel Architectures, 2021

An Efficient Hybrid Parallel Compression Approximate Multiplier.
Proceedings of the 39th IEEE International Conference on Computer Design, 2021

A Multi-precision Quantized Super-Resolution Model Framework.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2021

Multi-level PWB and PWC for Reducing TLB Miss Overheads on GPUs.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2021

2020
Transparent partial page migration between CPU and GPU.
Frontiers Comput. Sci., 2020

A Multi-model Super-Resolution Training and Reconstruction Framework.
Proceedings of the Network and Parallel Computing, 2020

A Unified Page Walk Buffer and Page Walk Cache.
Proceedings of the IEEE International Conference on Parallel & Distributed Processing with Applications, 2020

Customizing Super-Resolution Framework According to Image Features.
Proceedings of the IEEE International Conference on Parallel & Distributed Processing with Applications, 2020

High-Performance Computing and Engineering Educational Development and Practice.
Proceedings of the IEEE Frontiers in Education Conference, 2020

2019
A statistic approach for power analysis of integrated GPU.
Soft Comput., 2019

MMSR: A Multi-model Super Resolution Framework.
Proceedings of the Network and Parallel Computing, 2019

A Lightweight Method for Handling Control Divergence in GPGPUs.
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2019

Improve Student Performance Using Moderated Two-Stage Projects.
Proceedings of the ACM Conference on Global Computing Education, 2019

2018
Accelerating Deep Learning with a Parallel Mechanism Using CPU + MIC.
Int. J. Parallel Program., 2018

Resolving the GPU responsiveness dilemma through program transformations.
Frontiers Comput. Sci., 2018

Design of Practical Experiences to Improve Student Understanding of Efficiency and Scalability Issues in High Performance Computing: (Abstract Only).
Proceedings of the 49th ACM Technical Symposium on Computer Science Education, 2018

GPU Memory Management Solution Supporting Incomplete Pages.
Proceedings of the Network and Parallel Computing, 2018

Adaptive VC Partitioning for NoCs in GPGPUs.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2018

Efficient Data Communication between CPU and GPU through Transparent Partial-Page Migration.
Proceedings of the 20th IEEE International Conference on High Performance Computing and Communications; 16th IEEE International Conference on Smart City; 4th IEEE International Conference on Data Science and Systems, 2018

Control Divergence Optimization through Partial Warp Regrouping in GPGPUs.
Proceedings of the 2018 2nd International Conference on Computer Science and Artificial Intelligence, 2018

Parallel programming course development based on parallel computational thinking.
Proceedings of ACM Turing Celebration Conference - China, 2018

Design of paper CPU project to improve student understanding of CPU working principle.
Proceedings of ACM Turing Celebration Conference - China, 2018

2017
Improving the Efficiency of GPGPU Work-Queue Through Data Awareness.
ACM Trans. Archit. Code Optim., 2017

线程级猜测并行系统代码自动生成工具的设计与实现 (Design and Implementation of Automatic Code Generator for TLS System).
计算机科学, 2017

Understanding co-run performance on CPU-GPU integrated processors: observations, insights, directions.
Frontiers Comput. Sci., 2017

Spark-SIFT: A Spark-Based Large-Scale Image Feature Extract System.
Proceedings of the 13th International Conference on Semantics, Knowledge and Grids, 2017

Unleashing the power of GPU for physically-based rendering via dynamic ray shuffling.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

Parallel Computing in DNNs Using CPU and MIC.
Proceedings of the 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), 2017

OTR: A Fine-Grained Dynamic Power Scaling Pipeline Based on Trace.
Proceedings of the 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), 2017

Co-Run Scheduling with Power Cap on Integrated CPU-GPU Systems.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

A Novel Statistical Power Model for Integrated GPU with Optimization.
Proceedings of the Data Science, 2017

POSTER: DaQueue: A Data-Aware Work-Queue Design for GPGPUs.
Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

A Software-Hardware Co-designed Methodology for Efficient Thread Level Speculation.
Proceedings of the 2017 IEEE International Conference on Computer and Information Technology, 2017

2016
GPU平台上面向性能和功耗的分支优化 (Branch Divergence Optimization for Performance and Power Consumption on GPU Platform).
计算机科学, 2016

面向Cassandra数据库的高效动态数据管理机制 (Efficient and Dynamic Data Management System for Cassandra Database).
计算机科学, 2016

Optimization Strategies Oriented to Loop Characteristics in Software Thread Level Speculation Systems.
J. Comput. Sci. Technol., 2016

Fast Task Submission in Software Thread Level Speculation Systems.
Proceedings of the 2016 IEEE Trustcom/BigDataSE/ISPA, 2016

Dynamic Power-Performance Adjustment on Clustered Multi-Threading Processors.
Proceedings of the IEEE International Conference on Networking, 2016

An implementation of analytical power model on integrated GPU.
Proceedings of the International Symposium on Integrated Circuits, 2016

A lightweight instruction-set simulator for teaching of dynamic instruction scheduling.
Proceedings of the 11th International Conference on Computer Science & Education, 2016

A Hybrid Power-Performance Adjustment Strategy for Clustered Multi-threading Architecture.
Proceedings of the 18th IEEE International Conference on High Performance Computing and Communications; 14th IEEE International Conference on Smart City; 2nd IEEE International Conference on Data Science and Systems, 2016

2014
Novel Flow Control for Fully Adaptive Routing in Cache-Coherent NoCs.
IEEE Trans. Parallel Distributed Syst., 2014

Mac or Non-MAC: not a Problem.
J. Circuits Syst. Comput., 2014

Binary compatibility for embedded systems using greedy subgraph mapping.
Sci. China Inf. Sci., 2014

Implementing a Leading Loads Performance Predictor on Commodity Processors.
Proceedings of the 2014 USENIX Annual Technical Conference, 2014

PPEP: Online Performance, Power, and Energy Prediction Framework and DVFS Space Exploration.
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

Understanding Co-run Degradations on Integrated Heterogeneous Processors.
Proceedings of the Languages and Compilers for Parallel Computing, 2014

Improving Speculation Accuracy with Inter-thread Fetching Value Prediction.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2014

Dynamic Power Estimation with Hardware Performance Counters Support on Multi-core Platform.
Proceedings of the Advanced Computer Architecture - 10th Annual Conference, 2014

Customized Core Layout: A Case Study on Dual-Core Dynamic Binary Translation System.
Proceedings of the 14th IEEE International Conference on Computer and Information Technology, 2014

2013
Region-Based Way-Partitioning on L1 Data Cache for Low Power.
IEICE Trans. Inf. Syst., 2013

HEUSPEC: A Software Speculation Parallel Model.
Proceedings of the 42nd International Conference on Parallel Processing, 2013

DCP: Improving the Throughput of Asynchronous Pipeline by Dual Control Path.
Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing, 2013

2012
Low-Cost Binary128 Floating-Point FMA Unit Design with SIMD Support.
IEEE Trans. Computers, 2012

Dynamic Optimization on Multi-core Platform.
Proceedings of the 11th IEEE International Conference on Trust, 2012

2011
GSM: An Efficient Code Generation Algorithm for Dynamic Binary Translator.
Proceedings of the Fourth International Symposium on Parallel Architectures, 2011

Characterizing Fine-Grain Parallelism on Modern Multicore Platform.
Proceedings of the 17th IEEE International Conference on Parallel and Distributed Systems, 2011

A novel shared-buffer router for network-on-chip based on Hierarchical Bit-line Buffer.
Proceedings of the IEEE 29th International Conference on Computer Design, 2011

A specialized low-cost vectorized loop buffer for embedded processors.
Proceedings of the Design, Automation and Test in Europe, 2011

2010
Permutation optimization for SIMD devices.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2010), May 30, 2010

SIF: Overcoming the limitations of SIMD devices via implicit permutation.
Proceedings of the 16th International Conference on High-Performance Computer Architecture (HPCA-16 2010), 2010

A Dynamic Binary Translation Framework Based on Page Fault Mechanism in Linux Kernel.
Proceedings of the 10th IEEE International Conference on Computer and Information Technology, 2010

2009
Optimal subgraph covering for customisable VLIW processors.
IET Comput. Digit. Tech., 2009

Using Pcache to Speedup Interpretation in Dynamic Binary Translation.
Proceedings of the IEEE International Symposium on Parallel and Distributed Processing with Applications, 2009

A Light-weight Code Cache Design for Dynamic Binary Translation.
Proceedings of the 15th IEEE International Conference on Parallel and Distributed Systems, 2009

Dynamically utilizing computation accelerators for extensible processors in a software approach.
Proceedings of the 7th International Conference on Hardware/Software Codesign and System Synthesis, 2009

A Hardware Approach for Reducing Interpretation Overhead.
Proceedings of the Ninth IEEE International Conference on Computer and Information Technology, 2009

2008
A New CORDIC Algorithm and Software Implementation Based on Synchronized Data Triggering Architecture.
Proceedings of the 2008 International Conference on Multimedia and Ubiquitous Engineering (MUE 2008), 2008

A Novel Hardware Assisted Full Virtualization Technique.
Proceedings of the 9th International Conference for Young Computer Scientists, 2008

DBTIM: An Advanced Hardware Assisted Full Virtualization Architecture.
Proceedings of the 2008 IEEE/IPIP International Conference on Embedded and Ubiquitous Computing (EUC 2008), 2008

Customizing computation accelerators for extensible multi-issue processors with effective optimization techniques.
Proceedings of the 45th Design Automation Conference, 2008

2007
Hardware Support for Arithmetic Units of Processor with Multimedia Extension.
Proceedings of the 2007 International Conference on Multimedia and Ubiquitous Engineering (MUE 2007), 2007

A New Architecture For Multiple-Precision Floating-Point Multiply-Add Fused Unit Design.
Proceedings of the 18th IEEE Symposium on Computer Arithmetic (ARITH-18 2007), 2007

2004
A New Technique for Program Code Compression in Embedded Microprocessor.
Proceedings of the Embedded Software and Systems, First International Conference, 2004

2003
Predicate Analysis Based on Path Information.
Proceedings of the Advanced Parallel Programming Technologies, 5th International Workshop, 2003


  Loading...