Xueying Wang

Orcid: 0000-0002-7835-113X

Affiliations:

Beijing University of Posts and Telecommunications, Beijing, China

According to our database¹, Xueying Wang authored at least 24 papers between 2018 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2025

OptiFX: Automatic Optimization for Convolutional Neural Networks with Aggressive Operator Fusion on GPUs.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., June, 2025

Libra: Synergizing CUDA and Tensor Cores for High-Performance Sparse Matrix Multiplication.

[BibT_eX]

[DOI]

CoRR, June, 2025

SparkAttention: high-performance multi-head attention for large models on Volta GPU architecture.

[BibT_eX]

[DOI]

CCF Trans. High Perform. Comput., February, 2025

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores.

[BibT_eX]

[DOI]

Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2025

TopServe: Task-Operator Co-scheduling for Efficient Multi-DNN Inference Serving on GPUs.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2025: Parallel Processing, 2025

2024

Fast Convolution Meets Low Precision: Exploring Efficient Quantized Winograd Convolution on Modern CPUs.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., March, 2024

2023

CoAxNN: Optimizing on-device deep learning with conditional approximate neural networks.

[BibT_eX]

[DOI]

J. Syst. Archit., October, 2023

Facilitating hardware-aware neural architecture search with learning-based predictive models.

[BibT_eX]

[DOI]

J. Syst. Archit., April, 2023

2022

An Application-oblivious Memory Scheduling System for DNN Accelerators.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2022

Optimizing deep neural networks on intelligent edge accelerators via flexible-rate filter pruning.

[BibT_eX]

[DOI]

J. Syst. Archit., 2022

Accelerating deep neural network filter pruning with mask-aware convolutional computations on modern CPUs.

[BibT_eX]

[DOI]

Neurocomputing, 2022

2021

Compiler-assisted Operator Template Library for DNN Accelerators.

[BibT_eX]

[DOI]

Int. J. Parallel Program., 2021

Pinpointing the Memory Behaviors of DNN Training.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2021

Unleashing the Low-Precision Computation Potential of Tensor Cores on GPUs.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2021

2020

Fusion-Catalyzed Pruning for Optimizing Deep Learning on Intelligent Edge Devices.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

Compiler-Assisted Operator Template Library for DNN Accelerators.

[BibT_eX]

[DOI]

Proceedings of the Network and Parallel Computing, 2020

Characterizing the I/O Pipeline in the Deployment of CNNs on Commercial Accelerators.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Parallel & Distributed Processing with Applications, 2020

Lance: efficient low-precision quantized winograd convolution for neural networks based on graphics processing units.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Accelerating Deep Learning Inference with Cross-Layer Data Reuse on GPUs.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2020: Parallel Processing, 2020

2019

Exploiting the input sparsity to accelerate deep neural networks: poster.

[BibT_eX]

[DOI]

Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2019

XDN: Towards Efficient Inference of Residual Neural Networks on Cambricon Chips.

[BibT_eX]

[DOI]

Proceedings of the Benchmarking, Measuring, and Optimizing, 2019

Acorns: A Framework for Accelerating Deep Neural Networks with Input Sparsity.

[BibT_eX]

[DOI]

Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques, 2019

2018

Background Subtraction on Depth Videos with Convolutional Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 2018 International Joint Conference on Neural Networks, 2018

Auto-tuning Neural Network Quantization Framework for Collaborative Inference Between the Cloud and Edge.

[BibT_eX]

[DOI]

Proceedings of the Artificial Neural Networks and Machine Learning - ICANN 2018, 2018

Xueying Wang

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...