Proceedings of the Euro-Par 2023: Parallel Processing - 29th International Conference on Parallel and Distributed Computing, Limassol, Cyprus, August 28, 2023

A High-accurate Multi-objective Exploration Framework for Design Space of CPU.

[BibT_eX]

[DOI]

Duo Wang

Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

Simple and Efficient Heterogeneous Graph Neural Network.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

General spiking neural network framework for the learning trajectory from a noisy mmWave radar.

[BibT_eX]

[DOI]

Neuromorph. Comput. Eng., June, 2022

Multi-Node Acceleration for Large-Scale GCNs.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2022

JBNN: A Hardware Design for Binarized Neural Networks Using Single-Flux-Quantum Circuits.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2022

Accelerating Data Transfer in Dataflow Architectures Through a Look-Ahead Acknowledgment Mechanism.

[BibT_eX]

[DOI]

J. Comput. Sci. Technol., 2022

Sampling Methods for Efficient Training of Graph Convolutional Networks: A Survey.

[BibT_eX]

[DOI]

IEEE CAA J. Autom. Sinica, 2022

Rethinking Efficiency and Redundancy in Training Large-scale Graphs.

[BibT_eX]

[DOI]

CoRR, 2022

A synergistic reinforcement learning-based framework design in driving automation.

[BibT_eX]

[DOI]

Comput. Electr. Eng., 2022

A survey on superconducting computing technology: circuits, architectures and design tools.

[BibT_eX]

[DOI]

CCF Trans. High Perform. Comput., 2022

Accelerating Graph Processing With Lightweight Learning-Based Data Reordering.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2022

Characterizing and Understanding HGNNs on GPUs.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2022

Characterization and Implementation of Radar System Applications on a Reconfigurable Dataflow Architecture.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2022

Characterizing and Understanding Distributed GNN Training on GPUs.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2022

GNNSampler: Bridging the Gap Between Sampling Algorithms of GNN and Hardware.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2022

A Routing-Aware Mapping Method for Dataflow Architectures.

[BibT_eX]

[DOI]

Proceedings of the Network and Parallel Computing, 2022

Survey on Graph Neural Network Acceleration: An Algorithmic Perspective.

[BibT_eX]

[DOI]

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

Heterogeneous Collaborative Refining for Real-Time End-to-End Image-Text Retrieval System.

[BibT_eX]

[DOI]

Proceedings of the ICIAI 2022: The 6th International Conference on Innovation in Artificial Intelligence, Guangzhou China, March 4, 2022

GEM: Execution-Aware Cache Management for Graph Analytics.

[BibT_eX]

[DOI]

Proceedings of the Algorithms and Architectures for Parallel Processing, 2022

MatGraph: An Energy-Efficient and Flexible CGRA Engine for Matrix-Based Graph Analytics.

[BibT_eX]

[DOI]

Proceedings of the Algorithms and Architectures for Parallel Processing, 2022

Parallel-Friendly and Work-Efficient Single Source Shortest Path Algorithm on Single-Node System.

[BibT_eX]

[DOI]

Proceedings of the 24th IEEE Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, 2022

A Loop Optimization Method for Dataflow Architecture.

[BibT_eX]

[DOI]

HetGraph: A High Performance CPU-CGRA Architecture for Matrix-based Graph Analytics.

[BibT_eX]

[DOI]

Proceedings of the GLSVLSI '22: Great Lakes Symposium on VLSI 2022, Irvine CA USA, June 6, 2022

WiLi - Vehicular Wireless Channel Dataset enriched with LiDAR and Radar Data.

[BibT_eX]

[DOI]

Proceedings of the IEEE Global Communications Conference, 2022

LRP: Predictive output activation based on SVD approach for CNN s acceleration.

[BibT_eX]

[DOI]

Proceedings of the 2022 Design, Automation & Test in Europe Conference & Exhibition, 2022

Alleviating datapath conflicts and design centralization in graph analytics acceleration.

[BibT_eX]

[DOI]

Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

2021

An efficient scheduling algorithm for dataflow architecture using loop-pipelining.

[BibT_eX]

[DOI]

Inf. Sci., 2021

BSR-TC: Adaptively Sampling for Accurate Triangle Counting over Evolving Graph Streams.

[BibT_eX]

[DOI]

Int. J. Softw. Eng. Knowl. Eng., 2021

Tackling Variabilities in Autonomous Driving.

[BibT_eX]

[DOI]

CoRR, 2021

RISC-NN: Use RISC, NOT CISC as Neural Network Hardware Infrastructure.

[BibT_eX]

[DOI]

CoRR, 2021

Scalable and efficient graph traversal on high-throughput cluster.

[BibT_eX]

[DOI]

CCF Trans. High Perform. Comput., 2021

Hardware Acceleration for GCNs via Bidirectional Fusion.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2021

Triangle Counting by Adaptively Resampling over Evolving Graph Streams.

[BibT_eX]

[DOI]

Proceedings of the 33rd International Conference on Software Engineering and Knowledge Engineering, 2021

Scalable, resource and locality-aware selection of active scatterers in Geometry-based stochastic channel models.

[BibT_eX]

[DOI]

Proceedings of the 32nd IEEE Annual International Symposium on Personal, 2021

Alleviating Imbalance in Synchronous Distributed Training of Deep Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom), New York City, NY, USA, September 30, 2021

Streamline Ring ORAM Accesses through Spatial and Temporal Optimization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

2020

3DACN: 3D Augmented convolutional network for time series data.

[BibT_eX]

[DOI]

Inf. Sci., 2020

An efficient dataflow accelerator for scientific applications.

[BibT_eX]

[DOI]

Future Gener. Comput. Syst., 2020

Video Face Recognition System: RetinaFace-mnet-faster and Secondary Search.

[BibT_eX]

[DOI]

CoRR, 2020

Top-Related Meta-Learning Method for Few-Shot Detection.

[BibT_eX]

[DOI]

CoRR, 2020

Pixel-Semantic Revise of Position Learning A One-Stage Object Detector with A Shared Encoder-Decoder.

[BibT_eX]

[DOI]

CoRR, 2020

Characterizing and Understanding GCNs on GPU.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2020

A Reliability-Aware Joint Design Method of Application Mapping and Wavelength Assignment for WDM-Based Silicon Photonic Interconnects on Chip.

[BibT_eX]

[DOI]

IEEE Access, 2020

An Efficient Multicast Router using Shared-Buffer with Packet Merging for Dataflow Architecture.

[BibT_eX]

[DOI]

Proceedings of the 14th IEEE/ACM International Symposium on Networks-on-Chip, 2020

Highly Efficient and GPU-Friendly Implementation of BFS on Single-node System.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Parallel & Distributed Processing with Applications, 2020

Pixel-Semantic Revising of Position: One-Stage Object Detector with Shared Encoder-Decoder.

[BibT_eX]

[DOI]

Proceedings of the Neural Information Processing - 27th International Conference, 2020

CTA: A Critical Task Aware Scheduling Mechanism for Dataflow Architecture.

[BibT_eX]

[DOI]

Proceedings of the Algorithms and Architectures for Parallel Processing, 2020

HyGCN: A GCN Accelerator with Hybrid Architecture.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020

Design Automation Methodology from RTL to Gate-level Netlist and Schematic for RSFQ Logic Circuits.

[BibT_eX]

[DOI]

Proceedings of the GLSVLSI '20: Great Lakes Symposium on VLSI 2020, 2020

2019

PIM-WEAVER: A High Energy-efficient, General-purpose Acceleration Architecture for String Operations in Big Data Processing.

[BibT_eX]

[DOI]

Sustain. Comput. Informatics Syst., 2019

Wavelength assignment method based on ACO to reduce crosstalk for ring-based optical Network-on-Chip.

[BibT_eX]

[DOI]

Microprocess. Microsystems, 2019

Applying CNN on a scientific application accelerator based on dataflow architecture.

[BibT_eX]

[DOI]

CCF Trans. High Perform. Comput., 2019

Alleviating Irregularity in Graph Analytics Acceleration: a Hardware/Software Co-Design Approach.

[BibT_eX]

[DOI]

Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

Instruction Vulnerability Test and Code Optimization Against DVFS Attack.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Test Conference in Asia, 2019

Balancing Memory Accesses for Energy-Efficient Graph Analytics Accelerators.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/ACM International Symposium on Low Power Electronics and Design, 2019

iATPG: Instruction-level Automatic Test Program Generation for Vulnerabilities under DVFS attack.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE International Symposium on On-Line Testing and Robust System Design, 2019

C-MIDN: Coupled Multiple Instance Detection Network With Segmentation Guidance for Weakly Supervised Object Detection.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Highly Efficient Breadth-First Search on CPU-Based Single-Node System.

[BibT_eX]

[DOI]

Proceedings of the 21st IEEE International Conference on High Performance Computing and Communications; 17th IEEE International Conference on Smart City; 5th IEEE International Conference on Data Science and Systems, 2019

A Sharing Path Awareness Scheduling Algorithm for Dataflow Architecture.

[BibT_eX]

[DOI]

C-MAP: Improving the Effectiveness of Mapping Method for CGRA by Reducing NoC Congestion.

[BibT_eX]

[DOI]

Magma: A Monolithic 3D Vertical Heterogeneous ReRAM-based Main Memory Architecture.

[BibT_eX]

[DOI]

Proceedings of the 56th Annual Design Automation Conference 2019, 2019

Utilizing the Instability in Weakly Supervised Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

Crosstalk-aware GA-based wavelength allocation method for ring-based optical network-on-chip.

[BibT_eX]

[DOI]

Proceedings of the ACM Turing Celebration Conference - China, 2019

2018

A Pipelining Loop Optimization Method for Dataflow Architecture.

[BibT_eX]

[DOI]

J. Comput. Sci. Technol., 2018

A Non-Stop Double Buffering Mechanism for Dataflow Architecture.

[BibT_eX]

[DOI]

J. Comput. Sci. Technol., 2018

High-Performance and Energy-Efficient Fault Tolerance Scheduling Algorithm Based on Improved TMR for Heterogeneous System.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Parallel & Distributed Processing with Applications, 2018

WEAVER: An Energy Efficient, General-Purpose Acceleration Architecture for String Operations in Big Data Applications.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Parallel & Distributed Processing with Applications, 2018

Accelerating CNN Algorithm with Fine-Grained Dataflow Architectures.

[BibT_eX]

[DOI]

Proceedings of the 20th IEEE International Conference on High Performance Computing and Communications; 16th IEEE International Conference on Smart City; 4th IEEE International Conference on Data Science and Systems, 2018

Optimizing the Efficiency of Data Transfer in Dataflow Architectures.

[BibT_eX]

[DOI]

SmarCo: An Efficient Many-Core Processor for High-Throughput Applications in Datacenters.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

Optimizing network efficiency of dataflow architectures through dynamic packet merging.

[BibT_eX]

[DOI]

Proceedings of the Ninth International Green and Sustainable Computing Conference, 2018

2017

An Efficient Network-on-Chip Router for Dataflow Architecture.

[BibT_eX]

[DOI]

J. Comput. Sci. Technol., 2017

2016

ACCC: An Acceleration Mechanism for Character Operation Based on Cache Computing in Big Data Applications.

[BibT_eX]

[DOI]

Proceedings of the 18th IEEE International Conference on High Performance Computing and Communications; 14th IEEE International Conference on Smart City; 2nd IEEE International Conference on Data Science and Systems, 2016

An energy-efficient bandwidth allocation method for single-chip heterogeneous processor.

[BibT_eX]

[DOI]

Proceedings of the Seventh International Green and Sustainable Computing Conference, 2016

A framework for energy-efficient optimization on multi-cores.

[BibT_eX]

[DOI]

Proceedings of the Seventh International Green and Sustainable Computing Conference, 2016

Memory partition for SIMD in streaming dataflow architectures.

[BibT_eX]

[DOI]

Proceedings of the Seventh International Green and Sustainable Computing Conference, 2016

On the properties of data migration based on topology pattern keeping on cache hierarchy.

[BibT_eX]

[DOI]

Proceedings of the Seventh International Green and Sustainable Computing Conference, 2016

A Percolation Data Migration Schema in a hybrid Cache Hierarchy.

[BibT_eX]

[DOI]

Proceedings of the Seventh International Green and Sustainable Computing Conference, 2016

POSTER: An Optimization of Dataflow Architectures for Scientific Applications.

[BibT_eX]

[DOI]

Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015

Corrigendum to "Fast and scalable lock methods for video coding on many-core architecture" [J. Visual Communication and Image Representation 25(7) (2014) 1758-1762].

[BibT_eX]

[DOI]

J. Vis. Commun. Image Represent., 2015

A high-density data path implementation fitting for HTC applications.

[BibT_eX]

[DOI]

Proceedings of the Sixth International Green and Sustainable Computing Conference, 2015

Thread ID based power reduction mechanism for multi-thread shared set-associative caches.

[BibT_eX]

[DOI]

Proceedings of the Sixth International Green and Sustainable Computing Conference, 2015

2014

Fast and scalable lock methods for video coding on many-core architecture.

[BibT_eX]

[DOI]

J. Vis. Commun. Image Represent., 2014

Optimizing mapreduce with low memory requirements for shared-memory systems.

[BibT_eX]

[DOI]

Proceedings of the 15th IEEE/ACIS International Conference on Software Engineering, 2014

Efficiently and Completely Verifying Synchronized Consistency Models.

[BibT_eX]

[DOI]

Proceedings of the Automated Technology for Verification and Analysis, 2014

2013

A Path-Adaptive Opto-electronic Hybrid NoC for Chip Multi-processor.

[BibT_eX]

[DOI]

Proceedings of the 12th IEEE International Conference on Trust, 2013

SimICT: A fast and flexible framework for performance and power evaluation of large-scale architecture.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED), 2013

Low Execution Efficiency: When General Multi-core Processor Meets Wireless Communication Protocol.

[BibT_eX]

[DOI]

Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing, 2013

An Efficient Parallel Mechanism for Highly-Debuggable Multicore Simulator.

[BibT_eX]

[DOI]

Proceedings of the Advanced Parallel Processing Technologies, 2013

2012

Godson-T: An Efficient Many-Core Processor Exploring Thread-Level Parallelism.

[BibT_eX]

[DOI]

IEEE Micro, 2012

Auto-Tuning GEMV on Many-Core GPU.

[BibT_eX]

[DOI]

Proceedings of the 18th IEEE International Conference on Parallel and Distributed Systems, 2012

ALWP: A Workload Partition Method for the Efficient Parallel Simulation of Manycores.

[BibT_eX]

[DOI]

Proceedings of the 14th IEEE International Conference on High Performance Computing and Communication & 9th IEEE International Conference on Embedded Software and Systems, 2012

PartitionSim: A Parallel Simulator for Many-cores.

[BibT_eX]

[DOI]

Proceedings of the 14th IEEE International Conference on High Performance Computing and Communication & 9th IEEE International Conference on Embedded Software and Systems, 2012

CRAW/P: A Workload Partition Method for the Efficient Parallel Simulation of Manycores.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012

2011

High-efficient architecture of Godson-T many-core processor.