Ninghui Sun

Yungang Bao

Dongrui Fan

Frontiers Inf. Technol. Electron. Eng., 2018

High-performance genomic analysis framework with in-memory computing.

[BibT_eX]

[DOI]

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

Accelerating FM-index Search for Genomic Data Processing.

[BibT_eX]

[DOI]

Proceedings of the 47th International Conference on Parallel Processing, 2018

SmarCo: An Efficient Many-Core Processor for High-Throughput Applications in Datacenters.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

DearDRAM: Discard Weak Rows for Reducing DRAM's Refresh Overhead.

[BibT_eX]

[DOI]

Xusheng Zhan

Yungang Bao

Proceedings of the Advanced Computer Architecture - 12th Conference, 2018

2017

An Efficient Network-on-Chip Router for Dataflow Architecture.

[BibT_eX]

[DOI]

J. Comput. Sci. Technol., 2017

HyperFatTree: A Large-Scale Tree-Based Network with Low-Radix Switches.

[BibT_eX]

[DOI]

Int. J. Parallel Program., 2017

HiKV: A Hybrid Index Key-Value Store for DRAM-NVM Memory Systems.

[BibT_eX]

[DOI]

Proceedings of the 2017 USENIX Annual Technical Conference, 2017

Regional Congestion Mitigation in Lossless Datacenter Networks.

[BibT_eX]

[DOI]

Proceedings of the Network and Parallel Computing, 2017

A performance analysis framework for exploiting GPU microarchitectural capability.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Supercomputing, 2017

Quantifying and Mitigating Computational Inefficiency of Genomics Data Analysis.

[BibT_eX]

[DOI]

Proceedings of the 19th IEEE International Conference on High Performance Computing and Communications; 15th IEEE International Conference on Smart City; 3rd IEEE International Conference on Data Science and Systems, 2017

The Φ-stack for smart web of things.

[BibT_eX]

[DOI]

Proceedings of the Workshop on Smart Internet of Things, SmartIoT@SEC 2017, 2017

Dadu: Accelerating Inverse Kinematics for High-DOF Robots.

[BibT_eX]

[DOI]

Proceedings of the 54th Annual Design Automation Conference, 2017

2016

Graphine: Programming Graph-Parallel Computation of Large Natural Graphs for Multicore Clusters.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2016

Accelerating Irregular Computation in Massive Short Reads Mapping on FPGA Co-Processor.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2016

NONCODE 2016: an informative and valuable data source of long non-coding RNAs.

[BibT_eX]

[DOI]

Nucleic Acids Res., 2016

DianNao family: energy-efficient hardware accelerators for machine learning.

[BibT_eX]

[DOI]

Commun. ACM, 2016

Modeling Traffic of Big Data Platform for Large Scale Datacenter Networks.

[BibT_eX]

[DOI]

Proceedings of the 22nd IEEE International Conference on Parallel and Distributed Systems, 2016

ACCC: An Acceleration Mechanism for Character Operation Based on Cache Computing in Big Data Applications.

[BibT_eX]

[DOI]

Proceedings of the 18th IEEE International Conference on High Performance Computing and Communications; 14th IEEE International Conference on Smart City; 2nd IEEE International Conference on Data Science and Systems, 2016

Accelerating large-scale genomic analysis with Spark.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine, 2016

2015

A Small-Footprint Accelerator for Large-Scale Neural Networks.

[BibT_eX]

[DOI]

ACM Trans. Comput. Syst., 2015

A High-Throughput Neural Network Accelerator.

[BibT_eX]

[DOI]

IEEE Micro, 2015

A Survey of Phase Change Memory Systems.

[BibT_eX]

[DOI]

J. Comput. Sci. Technol., 2015

Adapting Memory Hierarchies for Emerging Datacenter Interconnects.

[BibT_eX]

[DOI]

J. Comput. Sci. Technol., 2015

Detection of soft errors in LU decomposition with partial pivoting using algorithm-based fault tolerance.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2015

Understanding Big Data Analytic Workloads on Modern Processors.

[BibT_eX]

[DOI]

CoRR, 2015

FAST: A Fast Stencil Autotuning Framework Based On An Optimal-solution Space Model.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

Study on Partitioning Real-World Directed Graphs of Skewed Degree Distribution.

[BibT_eX]

[DOI]

Proceedings of the 44th International Conference on Parallel Processing, 2015

PROP: Using PCIe-Based RDMA to Accelerate Rack-Scale Communications in Data Centers.

[BibT_eX]

[DOI]

Proceedings of the 21st IEEE International Conference on Parallel and Distributed Systems, 2015

Supporting Differentiated Services in Computers via Programmable Architecture for Resourcing-on-Demand (PARD).

[BibT_eX]

[DOI]

Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, 2015

2014

Exploiting fine-grained parallelism in graph traversal algorithms via lock virtualization on multi-core architecture.

[BibT_eX]

[DOI]

J. Supercomput., 2014

HMTT: A hybrid hardware/software tracing system for bridging the DRAM access trace's semantic gap.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2014

Decentralized NIC-Switching Architecture Using SR-IOV PCI Express Network Device.

[BibT_eX]

[DOI]

IEEE Micro, 2014

DaDianNao: A Machine-Learning Supercomputer.

[BibT_eX]

[DOI]

Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

Pipelined Compaction for the LSM-Tree.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Understanding the behavior of in-memory computing workloads.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE International Symposium on Workload Characterization, 2014

DWC: dynamic write consolidation for phase change memory systems.

[BibT_eX]

[DOI]

Proceedings of the 2014 International Conference on Supercomputing, 2014

Building a large-scale direct network with low-radix routers.

[BibT_eX]

[DOI]

Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems, 2014

Write-aware random page initialization for non-volatile memory systems.

[BibT_eX]

[DOI]

Proceedings of the 32nd IEEE International Conference on Computer Design, 2014

Reducing Communication in Parallel Breadth-First Search on Distributed Memory Systems.

[BibT_eX]

[DOI]

Proceedings of the 17th IEEE International Conference on Computational Science and Engineering, 2014

Digging deeper into cluster system logs for failure prediction and root cause diagnosis.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE International Conference on Cluster Computing, 2014

DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning.

[BibT_eX]

[DOI]

Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2014

2013

Optimizing Parallel S n Sweeps on Unstructured Grids for Multi-Core Clusters.

[BibT_eX]

[DOI]

J. Comput. Sci. Technol., 2013

Understanding parallelism in graph traversal on multi-core clusters.

[BibT_eX]

[DOI]

Comput. Sci. Res. Dev., 2013

GRE: A Graph Runtime Engine for Large-Scale Distributed Graph-Parallel Applications.

[BibT_eX]

[DOI]

CoRR, 2013

SMAT: an input adaptive auto-tuner for sparse matrix-vector multiplication.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2013

cHPP controller: A High Performance Hyper-node Hardware Accelerator.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel and Distributed Computing, 2013

SimICT: A fast and flexible framework for performance and power evaluation of large-scale architecture.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED), 2013

Accelerating Allreduce Operation: A Switch-Based Solution.

[BibT_eX]

[DOI]

Proceedings of the 22nd International Conference on Computer Communication and Networks, 2013

Vlock: Lock virtualization mechanism for exploiting fine-grained parallelism in graph traversal algorithms.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, 2013

2012

Godson-T: An Efficient Many-Core Processor Exploring Thread-Level Parallelism.

[BibT_eX]

[DOI]

IEEE Micro, 2012

CloudRank-D: benchmarking and ranking cloud computing systems for data processing applications.

[BibT_eX]

[DOI]

Frontiers Comput. Sci., 2012

Compression and Sieve: Reducing Communication in Parallel Breadth First Search on Distributed Memory Systems

[BibT_eX]

[DOI]

CoRR, 2012

High Volume Throughput Computing: Identifying and Characterizing Throughput Oriented Workloads in Data Centers.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

A Case Study of Designing Efficient Algorithm-based Fault Tolerant Application for Exascale Parallelism.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Investigating Memory Optimization of Hash-index for Next Generation Sequencing on Multi-core Architecture.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

Micro-architectural characterization of desktop cloud workloads.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Symposium on Workload Characterization, 2012

An optimized large-scale hybrid DGEMM design for CPUs and ATI GPUs.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Supercomputing, 2012

ALWP: A Workload Partition Method for the Efficient Parallel Simulation of Manycores.

[BibT_eX]

[DOI]

Proceedings of the 14th IEEE International Conference on High Performance Computing and Communication & 9th IEEE International Conference on Embedded Software and Systems, 2012

PartitionSim: A Parallel Simulator for Many-cores.

[BibT_eX]

[DOI]

Proceedings of the 14th IEEE International Conference on High Performance Computing and Communication & 9th IEEE International Conference on Embedded Software and Systems, 2012

A coarse-grained stream architecture for cryo-electron microscopy images 3D reconstruction.

[BibT_eX]

[DOI]

Proceedings of the ACM/SIGDA 20th International Symposium on Field Programmable Gate Arrays, 2012

Accelerating Millions of Short Reads Mapping on a Heterogeneous Architecture with FPGA Accelerator.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE 20th Annual International Symposium on Field-Programmable Custom Computing Machines, 2012

CRAW/P: A Workload Partition Method for the Efficient Parallel Simulation of Manycores.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012

2011

Dawning Nebulae: A PetaFLOPS Supercomputer with a Heterogeneous Structure.

[BibT_eX]

[DOI]

J. Comput. Sci. Technol., 2011

Fast implementation of DGEMM on Fermi GPU.

[BibT_eX]

[DOI]

Proceedings of the Conference on High Performance Computing Networking, 2011

Optimizing MPI Alltoall Communication of Large Messages in Multicore Clusters.

[BibT_eX]

[DOI]

Qiang Li

Zhigang Huo

Proceedings of the 12th International Conference on Parallel and Distributed Computing, 2011

EthSpeeder: A High-performance Scalable Fault-Tolerant Ethernet Network Architecture for Data Center.

[BibT_eX]

[DOI]

Proceedings of the Sixth International Conference on Networking, Architecture, and Storage, 2011

Characterization of real workloads of web search engines.

[BibT_eX]

[DOI]

Proceedings of the 2011 IEEE International Symposium on Workload Characterization, 2011

Floating-point mixed-radix FFT core generation for FPGA and comparison with GPU and CPU.

[BibT_eX]

[DOI]

Proceedings of the 2011 International Conference on Field-Programmable Technology, 2011

Design of HPC Node with Heterogeneous Processors.

[BibT_eX]

[DOI]

Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER), 2011

Fast and Compact Regular Expression Matching Using Character Substitution.

[BibT_eX]

[DOI]

Xingkui Liu

Xinchun Liu

Proceedings of the 2011 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS), 2011

2010

High-performance Computing in China: Research and Applications.

[BibT_eX]

[DOI]

David K. Kahaner

Debbie Chen

Int. J. High Perform. Comput. Appl., 2010

Design and implementation of communication system of the Dawning 6000 supercomputer.

[BibT_eX]

[DOI]

Frontiers Comput. Sci. China, 2010

HPP controller: a system controller for high performance computing.

[BibT_eX]

[DOI]

Frontiers Comput. Sci. China, 2010

Accelerating 2D FFT with Non-Power-of-Two Problem Size on FPGA.

[BibT_eX]

[DOI]

Proceedings of the ReConFig'10: 2010 International Conference on Reconfigurable Computing and FPGAs, 2010

HPP Controller: A System Controller Dedicated for Message Passing.

[BibT_eX]

[DOI]

Proceedings of the 2010 International Conference on Parallel and Distributed Computing, 2010

Integrating DBMSs as a Read-Only Execution Layer into Hadoop.

[BibT_eX]

[DOI]

Proceedings of the 2010 International Conference on Parallel and Distributed Computing, 2010

P-GAS: Parallelizing a Cycle-Accurate Event-Driven Many-Core Processor Simulator Using Parallel Discrete Event Simulation.

[BibT_eX]

[DOI]

Proceedings of the 24th ACM/IEEE/SCS Workshop on Principles of Advanced and Distributed Simulation, 2010

GenerOS: An asymmetric operating system kernel for multi-core systems.

[BibT_eX]

[DOI]

Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Adding an Expressway to Accelerate the Neighborhood Communication.

[BibT_eX]

[DOI]

Proceedings of the 12th IEEE International Conference on High Performance Computing and Communications, 2010

Building a Personal High Performance Computer with Heterogeneous Processors.

[BibT_eX]

[DOI]

Qiang Li

Zhigang Huo

Proceedings of the GCC 2010, 2010

Preliminary Investigation of Accelerating Molecular Dynamics Simulation on Godson-T Many-Core Processor.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2010 Parallel Processing Workshops, 2010

2009

Improving Performance of Dynamic Programming via Parallelism and Locality on Multicore Architectures.

[BibT_eX]

[DOI]

Guang R. Gao

IEEE Trans. Parallel Distributed Syst., 2009

SimK: A Large-Scale Parallel Simulation Engine.

[BibT_eX]

[DOI]

J. Comput. Sci. Technol., 2009

Preface.

[BibT_eX]

[DOI]

J. Comput. Sci. Technol., 2009

HPPNetSim: a parallel simulation of large-scale interconnection networks.

[BibT_eX]

[DOI]

Proceedings of the 2009 Spring Simulation Multiconference, SpringSim 2009, 2009

Adaptive and scalable metadata management to support a trillion files.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

A Scalability Analysis of the Symmetric Multiprocessing Architecture in Multi-Core System.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Networking, Architecture, and Storage, 2009

Gemini NI: An Integration of Two Network Interfaces.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Networking, Architecture, and Storage, 2009

Group-by Query Process in Middleware of Large Scale Data Intensive Systems.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Networking, Architecture, and Storage, 2009

A Virtualized Self-Adaptive Parallel Programming Framework for Heterogeneous High Productivity Computers.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Parallel and Distributed Processing with Applications, 2009

A Parallel Algorithm for Computing Betweenness Centrality.

[BibT_eX]

[DOI]

Dengbiao Tu

Proceedings of the ICPP 2009, 2009

HPP-Controller: An intra-node controller designed for connecting heterogeneous CPUs.

[BibT_eX]

[DOI]

Qiang Li

Panyong Zhang

Proceedings of the 2009 IEEE International Conference on Cluster Computing, August 31, 2009

2008

HPP Switch: A Novel High Performance Switch for HPC.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual IEEE Symposium on High Performance Interconnects (HOTI 2008), 2008

Memory Based Metadata Server for Cluster File Systems.

[BibT_eX]

[DOI]

Proceedings of the Seventh International Conference on Grid and Cooperative Computing, 2008

Query Prediction in Large Scale Data Intensive Event Stream Analysis Systems.

[BibT_eX]

[DOI]

Proceedings of the Seventh International Conference on Grid and Cooperative Computing, 2008

A HyperTransport-based personal parallel computer.

[BibT_eX]

[DOI]

Proceedings of the 2008 IEEE International Conference on Cluster Computing, 29 September, 2008

A novel hint-based I/O mechanism for centralized file server of cluster.

[BibT_eX]

[DOI]

Huan Chen

Jin Xiong

Proceedings of the 2008 IEEE International Conference on Cluster Computing, 29 September, 2008

2007

Cache oblivious algorithms for nonserial polyadic programming.

[BibT_eX]

[DOI]

J. Supercomput., 2007

A Reconfigurable Accelerator for Smith-Waterman Algorithm.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. II Express Briefs, 2007

Regular Paper: A Study of Architectural Optimization Methods in Bioinformatics Applications.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2007

Dawning4000A high performance computer.

[BibT_eX]

[DOI]

Dan Meng

Frontiers Comput. Sci. China, 2007

A parallel dynamic programming algorithm on a multi-core architecture.

[BibT_eX]

[DOI]

Guang R. Gao

Proceedings of the SPAA 2007: Proceedings of the 19th Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2007

The design methodology of Phoenix cluster system software stack.

[BibT_eX]

[DOI]

Proceedings of the CHINA HPC 2007, 2007

HPP: an architecture for high performance and utility computing.

[BibT_eX]

[DOI]

Proceedings of the CHINA HPC 2007, 2007

Design of NIC Based on I/O Processor for Cluster Interconnect Network.

[BibT_eX]

[DOI]

Xiaojun Yang

Dongdong Wu

Proceedings of the International Conference on Networking, 2007

United-FS: A Logical File System Providing a Single Image of Multiple Physical File Systems on NFS Server.

[BibT_eX]

[DOI]

Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

A layered design methodology of cluster system stack.

[BibT_eX]

[DOI]

Proceedings of the 2007 IEEE International Conference on Cluster Computing, 2007

2006

Improvement of Performance of MegaBlast Algorithm for DNA Sequence Alignment.

[BibT_eX]

[DOI]

J. Comput. Sci. Technol., 2006

Biology - Locality and parallelism optimization for dynamic programming algorithm in bioinformatics.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

An experimental study of optimizing bioinformatics applications.

[BibT_eX]

[DOI]

Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Improving locality of nonserial polyadic dynamic programming.

[BibT_eX]

[DOI]

Dongbo Bu

Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Research on Key Technologies of Load Balancing for NFS Server with Multiple Network Paths.

[BibT_eX]

[DOI]

Proceedings of the Grid and Cooperative Computing Workshops, 2006

Load Balancing and Parallel Multiple Sequence Alignment with Tree Accumulation.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2006, Parallel Processing, 12th International Euro-Par Conference, Dresden, Germany, August 28, 2006

PhoenixG: A Unified Management Framework for Industrial Information Grid.

[BibT_eX]

[DOI]

Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2006), 2006

2005

Load Balancing Algorithm in Cluster-based RNA secondary structure Prediction.

[BibT_eX]

[DOI]

Proceedings of the 4th International Symposium on Parallel and Distributed Computing (ISPDC 2005), 2005

Impact of Page Size on Communication Performance.

[BibT_eX]

[DOI]

Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Destructive Transaction: Human-Oriented Cluster System Management Mechanism.

[BibT_eX]

[DOI]

Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

An Optimized Algorithm of High Spatial-temporal Efficiency for MegaBlast.

[BibT_eX]

[DOI]

Proceedings of the 11th International Conference on Parallel and Distributed Systems, 2005

An Efficient Dynamic Programming Algorithm and Implementation for RNA Secondary Structure Prediction.

[BibT_eX]

[DOI]

Xinchun Liu

Proceedings of the Computational Science, 2005

Exploiting Parallelization for RNA Secondary Structure Prediction in Cluster.

[BibT_eX]

[DOI]

Proceedings of the Computational Science, 2005

Fire Phoenix Cluster Operating System Kernel and its Evaluation.

[BibT_eX]

[DOI]

Jianfeng Zhan

Proceedings of the 2005 IEEE International Conference on Cluster Computing (CLUSTER 2005), September 26, 2005

An Efficient Metadata Distribution Policy for Cluster File Systems.

[BibT_eX]

[DOI]

Proceedings of the 2005 IEEE International Conference on Cluster Computing (CLUSTER 2005), September 26, 2005

Parallel Optimization Technology for Backbone Network Intrusion Detection System.

[BibT_eX]

[DOI]

Proceedings of the Computational Intelligence and Security, International Conference, 2005

2004

Design of System Area Network Interface Card Based on Intel IOP310.

[BibT_eX]

[DOI]

Proceedings of the Embedded Software and Systems, First International Conference, 2004

2003

Design and Performance of the Dawning Cluster File System.

[BibT_eX]

[DOI]

Proceedings of the 2003 IEEE International Conference on Cluster Computing (CLUSTER 2003), 2003

2002

NCPN: A Simulation Tool for Coloured Petri Nets.

[BibT_eX]

Xinyu Liu

Wen Gao

Proceedings of the International Conference on Parallel and Distributed Computing Systems, 2002

2001

Cluster and Grid Superservers: The Dawning Experiences in China.

[BibT_eX]

[DOI]

Proceedings of the 2001 IEEE International Conference on Cluster Computing (CLUSTER 2001), 2001

1999

Reference implementation of scalable I/O low-level API on Intel Paragon.

[BibT_eX]

[DOI]