Ninghui Sun

According to our database1, Ninghui Sun authored at least 129 papers between 1997 and 2018.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Other 

Links

On csauthors.net:

Bibliography

2018
High-performance genomic analysis framework with in-memory computing.
Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

Accelerating FM-index Search for Genomic Data Processing.
Proceedings of the 47th International Conference on Parallel Processing, 2018

SmarCo: An Efficient Many-Core Processor for High-Throughput Applications in Datacenters.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

DearDRAM: Discard Weak Rows for Reducing DRAM's Refresh Overhead.
Proceedings of the Advanced Computer Architecture - 12th Conference, 2018

2017
An Efficient Network-on-Chip Router for Dataflow Architecture.
J. Comput. Sci. Technol., 2017

HyperFatTree: A Large-Scale Tree-Based Network with Low-Radix Switches.
International Journal of Parallel Programming, 2017

HiKV: A Hybrid Index Key-Value Store for DRAM-NVM Memory Systems.
Proceedings of the 2017 USENIX Annual Technical Conference, 2017

Regional Congestion Mitigation in Lossless Datacenter Networks.
Proceedings of the Network and Parallel Computing, 2017

A performance analysis framework for exploiting GPU microarchitectural capability.
Proceedings of the International Conference on Supercomputing, 2017

Quantifying and Mitigating Computational Inefficiency of Genomics Data Analysis.
Proceedings of the 19th IEEE International Conference on High Performance Computing and Communications; 15th IEEE International Conference on Smart City; 3rd IEEE International Conference on Data Science and Systems, 2017

The Φ-stack for smart web of things.
Proceedings of the Workshop on Smart Internet of Things, SmartIoT@SEC 2017, 2017

Dadu: Accelerating Inverse Kinematics for High-DOF Robots.
Proceedings of the 54th Annual Design Automation Conference, 2017

2016
Graphine: Programming Graph-Parallel Computation of Large Natural Graphs for Multicore Clusters.
IEEE Trans. Parallel Distrib. Syst., 2016

Accelerating Irregular Computation in Massive Short Reads Mapping on FPGA Co-Processor.
IEEE Trans. Parallel Distrib. Syst., 2016

NONCODE 2016: an informative and valuable data source of long non-coding RNAs.
Nucleic Acids Research, 2016

DianNao family: energy-efficient hardware accelerators for machine learning.
Commun. ACM, 2016

Modeling Traffic of Big Data Platform for Large Scale Datacenter Networks.
Proceedings of the 22nd IEEE International Conference on Parallel and Distributed Systems, 2016

ACCC: An Acceleration Mechanism for Character Operation Based on Cache Computing in Big Data Applications.
Proceedings of the 18th IEEE International Conference on High Performance Computing and Communications; 14th IEEE International Conference on Smart City; 2nd IEEE International Conference on Data Science and Systems, 2016

Accelerating large-scale genomic analysis with Spark.
Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine, 2016

2015
A Small-Footprint Accelerator for Large-Scale Neural Networks.
ACM Trans. Comput. Syst., 2015

A High-Throughput Neural Network Accelerator.
IEEE Micro, 2015

A Survey of Phase Change Memory Systems.
J. Comput. Sci. Technol., 2015

Adapting Memory Hierarchies for Emerging Datacenter Interconnects.
J. Comput. Sci. Technol., 2015

Detection of soft errors in LU decomposition with partial pivoting using algorithm-based fault tolerance.
IJHPCA, 2015

Understanding Big Data Analytic Workloads on Modern Processors.
CoRR, 2015

FAST: A Fast Stencil Autotuning Framework Based On An Optimal-solution Space Model.
Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

Study on Partitioning Real-World Directed Graphs of Skewed Degree Distribution.
Proceedings of the 44th International Conference on Parallel Processing, 2015

PROP: Using PCIe-Based RDMA to Accelerate Rack-Scale Communications in Data Centers.
Proceedings of the 21st IEEE International Conference on Parallel and Distributed Systems, 2015

Supporting Differentiated Services in Computers via Programmable Architecture for Resourcing-on-Demand (PARD).
Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, 2015

2014
Exploiting fine-grained parallelism in graph traversal algorithms via lock virtualization on multi-core architecture.
The Journal of Supercomputing, 2014

HMTT: A hybrid hardware/software tracing system for bridging the DRAM access trace's semantic gap.
TACO, 2014

Decentralized NIC-Switching Architecture Using SR-IOV PCI Express Network Device.
IEEE Micro, 2014

DaDianNao: A Machine-Learning Supercomputer.
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

Pipelined Compaction for the LSM-Tree.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Understanding the behavior of in-memory computing workloads.
Proceedings of the 2014 IEEE International Symposium on Workload Characterization, 2014

DWC: dynamic write consolidation for phase change memory systems.
Proceedings of the 2014 International Conference on Supercomputing, 2014

Building a large-scale direct network with low-radix routers.
Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems, 2014

Write-aware random page initialization for non-volatile memory systems.
Proceedings of the 32nd IEEE International Conference on Computer Design, 2014

Reducing Communication in Parallel Breadth-First Search on Distributed Memory Systems.
Proceedings of the 17th IEEE International Conference on Computational Science and Engineering, 2014

Digging deeper into cluster system logs for failure prediction and root cause diagnosis.
Proceedings of the 2014 IEEE International Conference on Cluster Computing, 2014

DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning.
Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2014

2013
Optimizing Parallel S n Sweeps on Unstructured Grids for Multi-Core Clusters.
J. Comput. Sci. Technol., 2013

Understanding parallelism in graph traversal on multi-core clusters.
Computer Science - R&D, 2013

GRE: A Graph Runtime Engine for Large-Scale Distributed Graph-Parallel Applications.
CoRR, 2013

SMAT: an input adaptive auto-tuner for sparse matrix-vector multiplication.
Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2013

cHPP controller: A High Performance Hyper-node Hardware Accelerator.
Proceedings of the International Conference on Parallel and Distributed Computing, 2013

SimICT: A fast and flexible framework for performance and power evaluation of large-scale architecture.
Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED), 2013

Accelerating Allreduce Operation: A Switch-Based Solution.
Proceedings of the 22nd International Conference on Computer Communication and Networks, 2013

Vlock: Lock virtualization mechanism for exploiting fine-grained parallelism in graph traversal algorithms.
Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, 2013

2012
Godson-T: An Efficient Many-Core Processor Exploring Thread-Level Parallelism.
IEEE Micro, 2012

CloudRank-D: benchmarking and ranking cloud computing systems for data processing applications.
Frontiers Comput. Sci., 2012

Compression and Sieve: Reducing Communication in Parallel Breadth First Search on Distributed Memory Systems
CoRR, 2012

High Volume Throughput Computing: Identifying and Characterizing Throughput Oriented Workloads in Data Centers
CoRR, 2012

High Volume Throughput Computing: Identifying and Characterizing Throughput Oriented Workloads in Data Centers.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

A Case Study of Designing Efficient Algorithm-based Fault Tolerant Application for Exascale Parallelism.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Investigating Memory Optimization of Hash-index for Next Generation Sequencing on Multi-core Architecture.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

Micro-architectural characterization of desktop cloud workloads.
Proceedings of the 2012 IEEE International Symposium on Workload Characterization, 2012

An optimized large-scale hybrid DGEMM design for CPUs and ATI GPUs.
Proceedings of the International Conference on Supercomputing, 2012

ALWP: A Workload Partition Method for the Efficient Parallel Simulation of Manycores.
Proceedings of the 14th IEEE International Conference on High Performance Computing and Communication & 9th IEEE International Conference on Embedded Software and Systems, 2012

PartitionSim: A Parallel Simulator for Many-cores.
Proceedings of the 14th IEEE International Conference on High Performance Computing and Communication & 9th IEEE International Conference on Embedded Software and Systems, 2012

A coarse-grained stream architecture for cryo-electron microscopy images 3D reconstruction.
Proceedings of the ACM/SIGDA 20th International Symposium on Field Programmable Gate Arrays, 2012

Accelerating Millions of Short Reads Mapping on a Heterogeneous Architecture with FPGA Accelerator.
Proceedings of the 2012 IEEE 20th Annual International Symposium on Field-Programmable Custom Computing Machines, 2012

CRAW/P: A Workload Partition Method for the Efficient Parallel Simulation of Manycores.
Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012

2011
Dawning Nebulae: A PetaFLOPS Supercomputer with a Heterogeneous Structure.
J. Comput. Sci. Technol., 2011

Fast implementation of DGEMM on Fermi GPU.
Proceedings of the Conference on High Performance Computing Networking, 2011

Optimizing MPI Alltoall Communication of Large Messages in Multicore Clusters.
Proceedings of the 12th International Conference on Parallel and Distributed Computing, 2011

EthSpeeder: A High-performance Scalable Fault-Tolerant Ethernet Network Architecture for Data Center.
Proceedings of the Sixth International Conference on Networking, Architecture, and Storage, 2011

Characterization of real workloads of web search engines.
Proceedings of the 2011 IEEE International Symposium on Workload Characterization, 2011

Floating-point mixed-radix FFT core generation for FPGA and comparison with GPU and CPU.
Proceedings of the 2011 International Conference on Field-Programmable Technology, 2011

Design of HPC Node with Heterogeneous Processors.
Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER), 2011

Fast and Compact Regular Expression Matching Using Character Substitution.
Proceedings of the 2011 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS), 2011

2010
High-performance Computing in China: Research and Applications.
IJHPCA, 2010

Design and implementation of communication system of the Dawning 6000 supercomputer.
Frontiers Comput. Sci. China, 2010

HPP controller: a system controller for high performance computing.
Frontiers Comput. Sci. China, 2010

Accelerating 2D FFT with Non-Power-of-Two Problem Size on FPGA.
Proceedings of the ReConFig'10: 2010 International Conference on Reconfigurable Computing and FPGAs, 2010

HPP Controller: A System Controller Dedicated for Message Passing.
Proceedings of the 2010 International Conference on Parallel and Distributed Computing, 2010

Integrating DBMSs as a Read-Only Execution Layer into Hadoop.
Proceedings of the 2010 International Conference on Parallel and Distributed Computing, 2010

P-GAS: Parallelizing a Cycle-Accurate Event-Driven Many-Core Processor Simulator Using Parallel Discrete Event Simulation.
Proceedings of the 24th ACM/IEEE/SCS Workshop on Principles of Advanced and Distributed Simulation, 2010

GenerOS: An asymmetric operating system kernel for multi-core systems.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Adding an Expressway to Accelerate the Neighborhood Communication.
Proceedings of the 12th IEEE International Conference on High Performance Computing and Communications, 2010

Building a Personal High Performance Computer with Heterogeneous Processors.
Proceedings of the GCC 2010, 2010

Preliminary Investigation of Accelerating Molecular Dynamics Simulation on Godson-T Many-Core Processor.
Proceedings of the Euro-Par 2010 Parallel Processing Workshops, 2010

2009
Improving Performance of Dynamic Programming via Parallelism and Locality on Multicore Architectures.
IEEE Trans. Parallel Distrib. Syst., 2009

SimK: A Large-Scale Parallel Simulation Engine.
J. Comput. Sci. Technol., 2009

Preface.
J. Comput. Sci. Technol., 2009

HPPNetSim: a parallel simulation of large-scale interconnection networks.
Proceedings of the 2009 Spring Simulation Multiconference, SpringSim 2009, 2009

Adaptive and scalable metadata management to support a trillion files.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

A Scalability Analysis of the Symmetric Multiprocessing Architecture in Multi-Core System.
Proceedings of the International Conference on Networking, Architecture, and Storage, 2009

Gemini NI: An Integration of Two Network Interfaces.
Proceedings of the International Conference on Networking, Architecture, and Storage, 2009

Group-by Query Process in Middleware of Large Scale Data Intensive Systems.
Proceedings of the International Conference on Networking, Architecture, and Storage, 2009

A Virtualized Self-Adaptive Parallel Programming Framework for Heterogeneous High Productivity Computers.
Proceedings of the IEEE International Symposium on Parallel and Distributed Processing with Applications, 2009

A Parallel Algorithm for Computing Betweenness Centrality.
Proceedings of the ICPP 2009, 2009

HPP-Controller: An intra-node controller designed for connecting heterogeneous CPUs.
Proceedings of the 2009 IEEE International Conference on Cluster Computing, August 31, 2009

2008
HPP Switch: A Novel High Performance Switch for HPC.
Proceedings of the 16th Annual IEEE Symposium on High Performance Interconnects (HOTI 2008), 2008

Memory Based Metadata Server for Cluster File Systems.
Proceedings of the Seventh International Conference on Grid and Cooperative Computing, 2008

Query Prediction in Large Scale Data Intensive Event Stream Analysis Systems.
Proceedings of the Seventh International Conference on Grid and Cooperative Computing, 2008

A HyperTransport-based personal parallel computer.
Proceedings of the 2008 IEEE International Conference on Cluster Computing, 29 September, 2008

A novel hint-based I/O mechanism for centralized file server of cluster.
Proceedings of the 2008 IEEE International Conference on Cluster Computing, 29 September, 2008

2007
Cache oblivious algorithms for nonserial polyadic programming.
The Journal of Supercomputing, 2007

A Reconfigurable Accelerator for Smith-Waterman Algorithm.
IEEE Trans. on Circuits and Systems, 2007

Dawning4000A high performance computer.
Frontiers Comput. Sci. China, 2007

A parallel dynamic programming algorithm on a multi-core architecture.
Proceedings of the SPAA 2007: Proceedings of the 19th Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2007

The design methodology of Phoenix cluster system software stack.
Proceedings of the CHINA HPC 2007, 2007

HPP: an architecture for high performance and utility computing.
Proceedings of the CHINA HPC 2007, 2007

Design of NIC Based on I/O Processor for Cluster Interconnect Network.
Proceedings of the International Conference on Networking, 2007

United-FS: A Logical File System Providing a Single Image of Multiple Physical File Systems on NFS Server.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

A layered design methodology of cluster system stack.
Proceedings of the 2007 IEEE International Conference on Cluster Computing, 2007

2006
Improvement of Performance of MegaBlast Algorithm for DNA Sequence Alignment.
J. Comput. Sci. Technol., 2006

Biology - Locality and parallelism optimization for dynamic programming algorithm in bioinformatics.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

An experimental study of optimizing bioinformatics applications.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Improving locality of nonserial polyadic dynamic programming.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Research on Key Technologies of Load Balancing for NFS Server with Multiple Network Paths.
Proceedings of the Grid and Cooperative Computing Workshops, 2006

Load Balancing and Parallel Multiple Sequence Alignment with Tree Accumulation.
Proceedings of the Euro-Par 2006, Parallel Processing, 12th International Euro-Par Conference, Dresden, Germany, August 28, 2006

PhoenixG: A Unified Management Framework for Industrial Information Grid.
Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2006), 2006

2005
Load Balancing Algorithm in Cluster-based RNA secondary structure Prediction.
Proceedings of the 4th International Symposium on Parallel and Distributed Computing (ISPDC 2005), 2005

Impact of Page Size on Communication Performance.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Destructive Transaction: Human-Oriented Cluster System Management Mechanism.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

An Optimized Algorithm of High Spatial-temporal Efficiency for MegaBlast.
Proceedings of the 11th International Conference on Parallel and Distributed Systems, 2005

An Efficient Dynamic Programming Algorithm and Implementation for RNA Secondary Structure Prediction.
Proceedings of the Computational Science, 2005

Exploiting Parallelization for RNA Secondary Structure Prediction in Cluster.
Proceedings of the Computational Science, 2005

Fire Phoenix Cluster Operating System Kernel and its Evaluation.
Proceedings of the 2005 IEEE International Conference on Cluster Computing (CLUSTER 2005), September 26, 2005

An Efficient Metadata Distribution Policy for Cluster File Systems.
Proceedings of the 2005 IEEE International Conference on Cluster Computing (CLUSTER 2005), September 26, 2005

Parallel Optimization Technology for Backbone Network Intrusion Detection System.
Proceedings of the Computational Intelligence and Security, International Conference, 2005

2004
Design of System Area Network Interface Card Based on Intel IOP310.
Proceedings of the Embedded Software and Systems, First International Conference, 2004

2003
Design and Performance of the Dawning Cluster File System.
Proceedings of the 2003 IEEE International Conference on Cluster Computing (CLUSTER 2003), 2003

2002
NCPN: A Simulation Tool for Coloured Petri Nets.
Proceedings of the International Conference on Parallel and Distributed Computing Systems, 2002

2001
Cluster and Grid Superservers: The Dawning Experiences in China.
Proceedings of the 2001 IEEE International Conference on Cluster Computing (CLUSTER 2001), 2001

1999
Reference implementation of scalable I/O low-level API on Intel Paragon.
J. Comput. Sci. Technol., 1999

1997
Dawning-1000 PROOS distributed operating system.
J. Comput. Sci. Technol., 1997


  Loading...