Proceedings of the Euro-Par 2023: Parallel Processing - 29th International Conference on Parallel and Distributed Computing, Limassol, Cyprus, August 28, 2023

2022

A fast unsmoothed aggregation algebraic multigrid framework for the large-scale simulation of incompressible flow.

[BibT_eX]

[DOI]

Han Shao

Libo Huang

Dominik L. Michels

ACM Trans. Graph., 2022

Multi-Lane Detection and Tracking Using Temporal-Spatial Model and Particle Filtering.

[BibT_eX]

[DOI]

IEEE Trans. Intell. Transp. Syst., 2022

RV16: An Ultra-Low-Cost Embedded RISC-V Processor Core.

[BibT_eX]

[DOI]

J. Comput. Sci. Technol., 2022

Lifelong Generative Learning via Knowledge Reconstruction.

[BibT_eX]

[DOI]

CoRR, 2022

Stride Equality Prediction for Value Speculation.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2022

SADD: A Novel Systolic Array Accelerator with Dynamic Dataflow for Sparse GEMM in Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the Network and Parallel Computing, 2022

Optimizing Winograd Convolution on GPUs via Partial Kernel Fusion.

[BibT_eX]

[DOI]

Proceedings of the Network and Parallel Computing, 2022

TJ4DRadSet: A 4D Radar Dataset for Autonomous Driving.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE International Conference on Intelligent Transportation Systems, 2022

MMTP: Multi-Modal Trajectory Prediction with Interaction Attention and Adaptive Task Weighting.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE International Conference on Intelligent Transportation Systems, 2022

Efficient Multiple-Precision and Mixed-Precision Floating-Point Fused Multiply-Accumulate Unit for HPC and AI Applications.

[BibT_eX]

[DOI]

Proceedings of the Algorithms and Architectures for Parallel Processing, 2022

PipeFB: An Optimized Pipeline Parallelism Scheme to Reduce the Peak Memory Usage.

[BibT_eX]

[DOI]

Proceedings of the Algorithms and Architectures for Parallel Processing, 2022

RTA: an Efficient SIMD Architecture for Ray Tracing.

[BibT_eX]

[DOI]

Proceedings of the 24th IEEE Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, 2022

2021

Ships, splashes, and waves on a vast ocean.

[BibT_eX]

[DOI]

ACM Trans. Graph., 2021

GraphPEG: Accelerating Graph Processing on GPUs.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2021

Dynamic Hand Gesture Recognition in In-Vehicle Environment Based on FMCW Radar and Transformer.

[BibT_eX]

[DOI]

Sensors, 2021

A Joint 2D-3D Complementary Network for Stereo Matching.

[BibT_eX]

[DOI]

Sensors, 2021

Fast and Accurate Lane Detection via Graph Structure and Disentangled Representation Learning.

[BibT_eX]

[DOI]

Sensors, 2021

Radar Transformer: An Object Classification Network Based on 4D MMW Imaging Radar.

[BibT_eX]

[DOI]

Sensors, 2021

Robust Target Detection and Tracking Algorithm Based on Roadside Radar and Camera.

[BibT_eX]

[DOI]

Sensors, 2021

Fast Convolution based on Winograd Minimum Filtering: Introduction and Development.

[BibT_eX]

[DOI]

Gan Tong

Libo Huang

CoRR, 2021

Multi-Scale Cost Volumes Cascade Network for Stereo Matching.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Automation, 2021

Multi-Scale Cascade Disparity Refinement Stereo Network.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Unsupervised Hard Case Extraction Based on Image Perceptual Hash Encoding.

[BibT_eX]

[DOI]

Proceedings of the CONF-CDS 2021: The 2nd International Conference on Computing and Data Science, 2021

2020

Surface-only ferrofluids.

[BibT_eX]

[DOI]

Libo Huang

Dominik L. Michels

ACM Trans. Graph., 2020

A quantitative evaluation of unified memory in GPUs.

[BibT_eX]

[DOI]

J. Supercomput., 2020

HPE: Hierarchical Page Eviction Policy for Unified Memory in GPUs.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

DancerFly: An Order-Aware Network-on-Chip Router On-the-Fly Mitigating Multi-path Packet Reordering.

[BibT_eX]

[DOI]

Int. J. Parallel Program., 2020

Coordinated Page Prefetch and Eviction for Memory Oversubscription Management in GPUs.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

Spike Sorting Based On Low-Rank And Sparse Representation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2020

2019

Coordinated DMA: Improving the DRAM Access Efficiency for Matrix Multiplication.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2019

On the accurate large-scale simulation of ferrofluids.

[BibT_eX]

[DOI]

Libo Huang

Torsten Hädrich

Dominik L. Michels

ACM Trans. Graph., 2019

Efficient architectural exploration of TAGE branch predictor for embedded processors.

[BibT_eX]

[DOI]

Microelectron. J., 2019

SIMD stealing: Architectural support for efficient data parallel execution on multicores.

[BibT_eX]

[DOI]

Microprocess. Microsystems, 2019

MT-DMA: A DMA Controller Supporting Efficient Matrix Transposition for Digital Signal Processing.

[BibT_eX]

[DOI]

IEEE Access, 2019

Hierarchical Page Eviction Policy for Unified Memory in GPUs.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2019

An Efficient Direct Memory Access (DMA) Controller for Scientific Computing Accelerators.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Circuits and Systems, 2019

Improving the DRAM Access Efficiency for Matrix Multiplication on Multicore Accelerators.

[BibT_eX]

[DOI]

Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2019

MOOBench: towards massive open online workbench.

[BibT_eX]

[DOI]

Proceedings of the ACM Turing Celebration Conference - China, 2019

2018

Moving from exascale to zettascale computing: challenges and techniques.

[BibT_eX]

[DOI]

Frontiers Inf. Technol. Electron. Eng., 2018

CHAM: Improving Prefetch Efficiency Using a Composite Hierarchy-Aware Method.

[BibT_eX]

[DOI]

J. Circuits Syst. Comput., 2018

FC-AMAT: factor-based C-AMAT analysis in memory system measurement.

[BibT_eX]

[DOI]

Innov. Syst. Softw. Eng., 2018

The Design of NoC-Side Memory Access Scheduling for Energy-Efficient GPGPUs.

[BibT_eX]

[DOI]

Int. J. Parallel Program., 2018

DyCache: Dynamic Multi-Grain Cache Management for Irregular Memory Accesses on GPU.

[BibT_eX]

[DOI]

IEEE Access, 2018

Accelerating BFS via Data Structure-Aware Prefetching on GPU.

[BibT_eX]

[DOI]

IEEE Access, 2018

Evaluating Memory Performance of Emerging Scale-Out Applications Using C-AMAT.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2018

HMCSP: Reducing Transaction Latency of CSR-based SPMV in Hybrid Memory Cube.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2018

Improving Branch Prediction Accuracy on Multi-Core Architectures for Big Data.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Parallel & Distributed Processing with Applications, 2018

Adaptive VC Partitioning for NoCs in GPGPUs.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Circuits and Systems, 2018

VISU: A Simple and Efficient Cache Coherence Protocol Based on Self-updating.

[BibT_eX]

[DOI]

Proceedings of the Algorithms and Architectures for Parallel Processing, 2018

Peer-Formulated Assignment Method for Experimental Projects in CS courses.

[BibT_eX]

[DOI]

Proceedings of the IEEE Frontiers in Education Conference, 2018

CMH: compression management for improving capacity in the hybrid memory cube.

[BibT_eX]

[DOI]

Proceedings of the 15th ACM International Conference on Computing Frontiers, 2018

HASS: High Accuracy Spike Sorting with Wavelet Package Decomposition and Mutual Information.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine, 2018

2017

Improving the Efficiency of GPGPU Work-Queue Through Data Awareness.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2017

Factor-Based C-AMAT Analysis for Memory Optimization.

[BibT_eX]

[DOI]

Proceedings of the Verification and Evaluation of Computer and Communication Systems, 2017

Motivating Students through Peer-Formulated Assignments in CS Experimental Courses.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference on Information Technology Education and the 6th Annual Conference on Research in Information Technology, 2017

Improving Branch Prediction for Thread Migration on Multi-core Architectures.

[BibT_eX]

[DOI]

Proceedings of the Network and Parallel Computing, 2017

SimpleBP: A Lightweight Branch Prediction Simulator for Effective Design Exploration.

[BibT_eX]

[DOI]

Proceedings of the 2017 International Conference on Networking, Architecture, and Storage, 2017

Branch Prediction Migration for Multi-Core Architectures.

[BibT_eX]

[DOI]

Proceedings of the 2017 International Conference on Networking, Architecture, and Storage, 2017

BPSim: An integrated missrate, area, and power simulator for branch predictor.

[BibT_eX]

[DOI]

Chaobing Zhou

Libo Huang

Qiang Dou

Proceedings of the 6th International Conference on Modern Circuits and Systems Technologies, 2017

Unleashing the power of GPU for physically-based rendering via dynamic ray shuffling.

[BibT_eX]

[DOI]

Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

Effective Optimization of Branch Predictors through Lightweight Simulation.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Computer Design, 2017

Trace-based method for big data memory characteristics research.

[BibT_eX]

[DOI]

Proceedings of the 2017 International Conference on Advances in Computing, 2017

Design Space Exploration of TAGE Branch Predictor with Ultra-Small RAM.

[BibT_eX]

[DOI]

Proceedings of the on Great Lakes Symposium on VLSI 2017, 2017

BC-AMAT: Considering Blocked Time in Memory System Measurement.

[BibT_eX]

[DOI]

Proceedings of the Computing Frontiers Conference, 2017

POSTER: DaQueue: A Data-Aware Work-Queue Design for GPGPUs.

[BibT_eX]

[DOI]

Ya-Shuai Lü

Libo Huang

Li Shen

Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

2016

A Methodology for Performance Verification of Microprocessors.

[BibT_eX]

[DOI]

Yongwen Wang

Libo Huang

Zhong Zheng

Proceedings of the Computer Engineering and Technology - 20th CCF Conference, 2016

2015

Efficient data management on 3D stacked memory for big data applications.

[BibT_eX]

[DOI]

Proceedings of the 10th International Design & Test Symposium, 2015

A Study on Non-volatile 3D Stacked Memory for Big Data Applications.

[BibT_eX]

[DOI]

Proceedings of the Algorithms and Architectures for Parallel Processing, 2015

Fast FPGA system for microarchitecture optimization on synthesizable modern processor design.

[BibT_eX]

[DOI]

Proceedings of the 25th International Conference on Field Programmable Logic and Applications, 2015

2014

Integrated Coherence Prediction: Towards Efficient Cache Coherence on NoC-Based Multicore Architectures.

[BibT_eX]

[DOI]

ACM Trans. Design Autom. Electr. Syst., 2014

Holistic Routing Algorithm Design to Support Workload Consolidation in NoCs.

[BibT_eX]

[DOI]

Sheng Ma

Natalie D. Enright Jerger

Zhiying Wang

Ming-che Lai

Libo Huang

IEEE Trans. Computers, 2014

Mac or Non-MAC: not a Problem.

[BibT_eX]

[DOI]

J. Circuits Syst. Comput., 2014

Efficient Utilization of SIMD Engines for General-Purpose Processors.

[BibT_eX]

[DOI]

Comput. J., 2014

Leveraging on-chip networks for efficient prediction on multicore coherence.

[BibT_eX]

[DOI]

Libo Huang

Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014

2013

Dynamic Streamization Model Execution for SIMD Engines on Multicore Architectures.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2013

Adaptive communication mechanism for accelerating MPI functions in NoC-based multicore processors.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2013

Efficient multimedia coprocessor with enhanced SIMD engines for exploiting ILP and DLP.

[BibT_eX]

[DOI]

Parallel Comput., 2013

VBON: Toward efficient on-chip networks via hierarchical virtual bus.

[BibT_eX]

[DOI]

Libo Huang

Zhiying Wang

Nong Xiao

Microprocess. Microsystems, 2013

DCP: Improving the Throughput of Asynchronous Pipeline by Dual Control Path.

[BibT_eX]

[DOI]

Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing, 2013

2012

Low-Cost Binary128 Floating-Point FMA Unit Design with SIMD Support.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2012

An optimized multicore cache coherence design for exploiting communication locality.

[BibT_eX]

[DOI]

Libo Huang

Zhiying Wang

Nong Xiao

Proceedings of the Great Lakes Symposium on VLSI 2012, 2012

Accelerating NoC-Based MPI Primitives via Communication Architecture Customization.

[BibT_eX]

[DOI]

Libo Huang

Zhiying Wang

Nong Xiao

Proceedings of the 23rd IEEE International Conference on Application-Specific Systems, 2012

2011

A specialized low-cost vectorized loop buffer for embedded processors.

[BibT_eX]

[DOI]

Proceedings of the Design, Automation and Test in Europe, 2011

2010

Permutation optimization for SIMD devices.

[BibT_eX]

[DOI]

Libo Huang

Li Shen

Zhiying Wang

Proceedings of the International Symposium on Circuits and Systems (ISCAS 2010), May 30, 2010

SV: Enhancing SIMD Architectures via Combined SIMD-Vector Approach.

[BibT_eX]

[DOI]

Libo Huang

Zhiying Wang

Proceedings of the Algorithms and Architectures for Parallel Processing, 2010

SIF: Overcoming the limitations of SIMD devices via implicit permutation.

[BibT_eX]

[DOI]

Proceedings of the 16th International Conference on High-Performance Computer Architecture (HPCA-16 2010), 2010

2009

Optimal subgraph covering for customisable VLIW processors.

[BibT_eX]

[DOI]

IET Comput. Digit. Tech., 2009

Implementation of OpenVG Path and Paint Algorithms on Synchronous Data Triggered Architecture with Optimization.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Networking, Architecture, and Storage, 2009

2008

Hierarchical memory system design for a heterogeneous multi-core processor.

[BibT_eX]

[DOI]

Proceedings of the 2008 ACM Symposium on Applied Computing (SAC), 2008

A New CORDIC Algorithm and Software Implementation Based on Synchronized Data Triggering Architecture.

[BibT_eX]

[DOI]

Proceedings of the 2008 International Conference on Multimedia and Ubiquitous Engineering (MUE 2008), 2008

Customizing computation accelerators for extensible multi-issue processors with effective optimization techniques.

[BibT_eX]

[DOI]

Proceedings of the 45th Design Automation Conference, 2008

Memory System Design for a Multi-core Processor.

[BibT_eX]

[DOI]

Proceedings of the Second International Conference on Complex, 2008

2007

Hardware Support for Arithmetic Units of Processor with Multimedia Extension.

[BibT_eX]

[DOI]

Proceedings of the 2007 International Conference on Multimedia and Ubiquitous Engineering (MUE 2007), 2007

A New Architecture For Multiple-Precision Floating-Point Multiply-Add Fused Unit Design.

[BibT_eX]

[DOI]

Proceedings of the 18th IEEE Symposium on Computer Arithmetic (ARITH-18 2007), 2007

Libo Huang

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...