Xueying Wang
Orcid: 0000-0002-7835-113XAffiliations:
- Beijing University of Posts and Telecommunications, Beijing, China
According to our database1,
Xueying Wang
authored at least 23 papers
between 2018 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2025
OptiFX: Automatic Optimization for Convolutional Neural Networks with Aggressive Operator Fusion on GPUs.
ACM Trans. Archit. Code Optim., June, 2025
Libra: Synergizing CUDA and Tensor Cores for High-Performance Sparse Matrix Multiplication.
CoRR, June, 2025
SparkAttention: high-performance multi-head attention for large models on Volta GPU architecture.
CCF Trans. High Perform. Comput., February, 2025
FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores.
Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2025
2024
Fast Convolution Meets Low Precision: Exploring Efficient Quantized Winograd Convolution on Modern CPUs.
ACM Trans. Archit. Code Optim., March, 2024
2023
CoAxNN: Optimizing on-device deep learning with conditional approximate neural networks.
J. Syst. Archit., October, 2023
Facilitating hardware-aware neural architecture search with learning-based predictive models.
J. Syst. Archit., April, 2023
2022
ACM Trans. Archit. Code Optim., 2022
Optimizing deep neural networks on intelligent edge accelerators via flexible-rate filter pruning.
J. Syst. Archit., 2022
Accelerating deep neural network filter pruning with mask-aware convolutional computations on modern CPUs.
Neurocomputing, 2022
2021
Int. J. Parallel Program., 2021
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2021
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2021
2020
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020
Proceedings of the Network and Parallel Computing, 2020
Characterizing the I/O Pipeline in the Deployment of CNNs on Commercial Accelerators.
Proceedings of the IEEE International Conference on Parallel & Distributed Processing with Applications, 2020
Lance: efficient low-precision quantized winograd convolution for neural networks based on graphics processing units.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Proceedings of the Euro-Par 2020: Parallel Processing, 2020
2019
Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2019
Proceedings of the Benchmarking, Measuring, and Optimizing, 2019
Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques, 2019
2018
Proceedings of the 2018 International Joint Conference on Neural Networks, 2018
Auto-tuning Neural Network Quantization Framework for Collaborative Inference Between the Cloud and Edge.
Proceedings of the Artificial Neural Networks and Machine Learning - ICANN 2018, 2018