Yufan Xu

Orcid: 0000-0002-7787-6460

Affiliations:
  • University of Utah, School of Computing, Salt Lake City, UT, USA
  • Ohio State University, Columbus, OH, USA (2017 - 2019)


According to our database1, Yufan Xu authored at least 18 papers between 2019 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
Exploiting Efficient Mapping and Pipelined Execution for Accelerating SpMV on Tensor Cores.
Proceedings of the 31st ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2026

ElasGNN: An Elastic Training Framework for Distributed GNN Training.
Proceedings of the 31st ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2026

2025
RLHFSpec: Breaking the Efficiency Bottleneck in RLHF Training via Adaptive Drafting.
CoRR, December, 2025

\uline{LO}w-c\uline{O}st yet High-\uline{P}erformant \uline{S}parse Matrix-Matrix Multiplication on Arm SME Architectures.
CoRR, November, 2025

Towards Efficient LLM Inference via Collective and Adaptive Speculative Decoding.
Proceedings of the International Conference for High Performance Computing, 2025

Zero-Value Code Specialization via Profile-Guided Control Data Flow Analysis.
Proceedings of the International Conference for High Performance Computing, 2025

Accelerating Complex Stencil Computations with Adaptive Fusion Strategy.
Proceedings of the 39th ACM International Conference on Supercomputing, 2025

ESC: Effective Submanifold Convolution using Tensor Cores.
Proceedings of the 54th International Conference on Parallel Processing, 2025

OVERT: Orchestrating Vector-Scalar Execution for Efficient SpMV on Modern CPUs.
Proceedings of the 54th International Conference on Parallel Processing, 2025

2024
CoNST: Code Generator for Sparse Tensor Networks.
ACM Trans. Archit. Code Optim., December, 2024

CoNST: Code Generator for Sparse Tensor Networks.
CoRR, 2024

Accelerated Auto-Tuning of GPU Kernels for Tensor Computations.
Proceedings of the 38th ACM International Conference on Supercomputing, 2024

2023
PEAK: Generating High-Performance Schedules in MLIR.
Proceedings of the Languages and Compilers for Parallel Computing, 2023

2022
Training of deep learning pipelines on memory-constrained GPUs via segmented fused-tiled execution.
Proceedings of the CC '22: 31st ACM SIGPLAN International Conference on Compiler Construction, Seoul, South Korea, April 2, 2022

Effective Performance Modeling and Domain-Specific Compiler Optimization of CNNs for GPUs.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2022

2021
Efficient Distributed Algorithms for Convolutional Neural Networks.
Proceedings of the SPAA '21: 33rd ACM Symposium on Parallelism in Algorithms and Architectures, 2021

Analytical characterization and design space exploration for optimization of CNNs.
Proceedings of the ASPLOS '21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2021

2019
Dependence-aware, unbounded sound predictive race detection.
Proc. ACM Program. Lang., 2019


  Loading...