Ningyi Xu

Orcid: 0009-0004-6809-7694

According to our database1, Ningyi Xu authored at least 54 papers between 2005 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
INDM: Chiplet-Based Interconnect Network and Dataflow Mapping for DNN Accelerators.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., April, 2024

A Precision-Scalable Deep Neural Network Accelerator With Activation Sparsity Exploitation.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., January, 2024

Leveraging Enhanced Queries of Point Sets for Vectorized Map Construction.
CoRR, 2024

BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-Distillation.
CoRR, 2024

2023
Low-Complexity Precision-Scalable Multiply-Accumulate Unit Architectures for Deep Neural Network Accelerators.
IEEE Trans. Circuits Syst. II Express Briefs, April, 2023

NVP: A Flexible and Efficient Processor Architecture for Accelerating Diverse Computer Vision Tasks including DNN.
IEEE Trans. Circuits Syst. II Express Briefs, 2023

AFPQ: Asymmetric Floating Point Quantization for LLMs.
CoRR, 2023

Large Trajectory Models are Scalable Motion Predictors and Planners.
CoRR, 2023

Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models.
CoRR, 2023

History-Detr: Optimize Query Initialization Strategy by Using Historical Information and Kinematics.
Proceedings of the ACM Multimedia Asia 2023, 2023

SpOctA: A 3D Sparse Convolution Accelerator with Octree-Encoding-Based Map Search and Inherent Sparsity-Aware Processing.
Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 2023

A Point Transformer Accelerator with Fine-Grained Pipelines and Distribution-Aware Dynamic FPS.
Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 2023

Adam Accumulation to Reduce Memory Footprints of Both Activations and Gradients for Large-Scale DNN Training.
Proceedings of the ECAI 2023 - 26th European Conference on Artificial Intelligence, September 30 - October 4, 2023, Kraków, Poland, 2023

COSA:Co-Operative Systolic Arrays for Multi-head Attention Mechanism in Neural Network using Hybrid Data Reuse and Fusion Methodologies.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

FLNA: An Energy-Efficient Point Cloud Feature Learning Accelerator with Dataflow Decoupling.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

2022
Efficient Compression Methods for Wire-Spread-Based Stochastic Computing Deep Neural Networks.
IEEE Trans. Circuits Syst. II Express Briefs, 2022

2021
A Low-Latency FPGA Implementation for Real-Time Object Detection.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2021

CCASM: A Computation- and Communication-Aware Scheduling and Mapping Algorithm for NoC-Based DNN Accelerators.
Proceedings of the 14th IEEE International Conference on ASIC, 2021

2020
Crane: Mitigating Accelerator Under-utilization Caused by Sparsity Irregularities in CNNs.
IEEE Trans. Computers, 2020

Enhanced Power Decoupling Strategy for Virtual Synchronous Generator.
IEEE Access, 2020

2019
FlexSaaS: A Reconfigurable Accelerator for Web Search Selection.
ACM Trans. Reconfigurable Technol. Syst., 2019

2017
FxpNet: Training a deep convolutional neural network in fixed-point representation.
Proceedings of the 2017 International Joint Conference on Neural Networks, 2017

ForeGraph: Exploring Large-scale Graph Processing on Multi-FPGA Architecture.
Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017

FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates.
Proceedings of the 25th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2017

The Feniks FPGA Operating System for Cloud Computing.
Proceedings of the 8th Asia-Pacific Workshop on Systems, Mumbai, India, September 2, 2017, 2017

Using Data Compression for Optimizing FPGA-Based Convolutional Neural Network Accelerators.
Proceedings of the Advanced Parallel Processing Technologies, 2017

2016
ClickNP: Highly flexible and High-performance Network Processing with Reconfigurable Hardware.
Proceedings of the ACM SIGCOMM 2016 Conference, Florianopolis, Brazil, August 22-26, 2016, 2016

Going Deeper with Embedded FPGA Platform for Convolutional Neural Network.
Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2016

2015
Real-Time High-Quality Stereo Vision System in FPGA.
IEEE Trans. Circuits Syst. Video Technol., 2015

2014
Large scale recurrent neural network on GPU.
Proceedings of the 2014 International Joint Conference on Neural Networks, 2014

Energy efficient neural networks for big data analytics.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014

2012
Probabilistic Brain Fiber Tractography on GPUs.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

The Colored Concept Map and Its Application in Learning Assistance Program.
Proceedings of the Hybrid Learning - 5th International Conference, 2012

The Analysis of Research Hotspots and Fronts of Knowledge Visualization Based on CiteSpace II.
Proceedings of the Hybrid Learning - 5th International Conference, 2012

Efficient Query Processing for Web Search Engine with FPGAs.
Proceedings of the 2012 IEEE 20th Annual International Symposium on Field-Programmable Custom Computing Machines, 2012

2011
An FPGA-based accelerator for LambdaRank in Web search engines.
ACM Trans. Reconfigurable Technol. Syst., 2011

A heterogeneous accelerator platform for multi-subject voxel-based brain network analysis.
Proceedings of the 2011 IEEE/ACM International Conference on Computer-Aided Design, 2011

Gemma in April: A matrix-like parallel programming architecture on OpenCL.
Proceedings of the Design, Automation and Test in Europe, 2011

2010
FPGA and GPU implementation of large scale SpMV.
Proceedings of the IEEE 8th Symposium on Application Specific Processors, 2010

Efficient PageRank and SpMV Computation on AMD GPUs.
Proceedings of the 39th International Conference on Parallel Processing, 2010

Making Human Connectome Faster: GPU Acceleration of Brain Network Analysis.
Proceedings of the 16th IEEE International Conference on Parallel and Distributed Systems, 2010

A compression method for inverted index and its FPGA-based decompression solution.
Proceedings of the International Conference on Field-Programmable Technology, 2010

LambdaRank acceleration for relevance ranking in web search engines (abstract only).
Proceedings of the ACM/SIGDA 18th International Symposium on Field Programmable Gate Arrays, 2010

FPMR: MapReduce framework on FPGA.
Proceedings of the ACM/SIGDA 18th International Symposium on Field Programmable Gate Arrays, 2010

2009
FPGA Acceleration of RankBoost in Web Search Engines.
ACM Trans. Reconfigurable Technol. Syst., 2009

Parallel Inference for Latent Dirichlet Allocation on Graphics Processing Units.
Proceedings of the Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, 2009

FTL design exploration in reconfigurable high-performance SSD for server applications.
Proceedings of the 23rd international conference on Supercomputing, 2009

RankBoost Acceleration on both NVIDIA CUDA and ATI Stream Platforms.
Proceedings of the 15th IEEE International Conference on Parallel and Distributed Systems, 2009

FPGA-based acceleration of neural network for ranking in web search engine with a streaming architecture.
Proceedings of the 19th International Conference on Field Programmable Logic and Applications, 2009

An Efficient Lossless Compression Method for Internet Search Data in Hardware Accelerators.
Proceedings of the CSIE 2009, 2009 WRI World Congress on Computer Science and Information Engineering, March 31, 2009

2008
Distributed RankBoost Acceleration Using FPGA and MPI for Web Relevance Ranking.
Proceedings of the 14th International Conference on Parallel and Distributed Systems, 2008

2007
FPGA-based Accelerator Design for RankBoost in Web Search Engines.
Proceedings of the 2007 International Conference on Field-Programmable Technology, 2007

2006
A single receiving chip for DVB data broadcasting system.
IEEE Trans. Consumer Electron., 2006

2005
The design and implementation of a DVB receiving chip with PCI interface.
Proceedings of the 2005 Conference on Asia South Pacific Design Automation, 2005


  Loading...