We stand with Ukraine

We stand with Ukraine

Xin You

Orcid: 0000-0002-5163-4607

Affiliations:

Beihang University, Beijing, China

According to our database¹, Xin You authored at least 44 papers between 2018 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

Online presence:

on orcid.org

On csauthors.net:

Bibliography

2026

Exploiting Efficient Mapping and Pipelined Execution for Accelerating SpMV on Tensor Cores.

[DOI]

,

,

,

,

,

,

,

Proceedings of the 31st ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2026

Efficient Temporal Graph Network Training via Unified Redundancy Elimination.

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2026

2025

\uline{LO}w-c\uline{O}st yet High-\uline{P}erformant \uline{S}parse Matrix-Matrix Multiplication on Arm SME Architectures.

[DOI]

,

,

,

,

,

,

,

Enrique S. Quintana-Orti

,

,

,

CoRR, November, 2025

PRAGMA: A Profiling-Reasoned Multi-Agent Framework for Automatic Kernel Optimization.

[DOI]

,

,

,

,

,

,

,

CoRR, November, 2025

Identifying Performance Inefficiencies of Parallel Program With Spatial and Temporal Trace Analysis.

[DOI]

,

,

,

,

,

,

IEEE Trans. Parallel Distributed Syst., July, 2025

SimTrace: Exploiting Spatial and Temporal Sampling for Large-Scale Performance Analysis.

[DOI]

,

,

,

,

,

,

ACM Trans. Archit. Code Optim., June, 2025

Exploiting Dynamic Regular Patterns in Irregular Programs for Efficient Vectorization.

[DOI]

,

,

,

,

,

,

ACM Trans. Archit. Code Optim., June, 2025

Hotspy: identifying performance hotspot with graph neural network based static analysis.

[DOI]

,

,

,

,

,

CCF Trans. High Perform. Comput., June, 2025

Towards Efficient LLM Inference via Collective and Adaptive Speculative Decoding.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the International Conference for High Performance Computing, 2025

Zero-Value Code Specialization via Profile-Guided Control Data Flow Analysis.

[DOI]

,

,

,

,

,

,

,

Proceedings of the International Conference for High Performance Computing, 2025

Exploiting Transformer-Based Static Binary Analysis for Identifying Inefficient Locks.

[DOI]

,

,

,

,

,

,

Proceedings of the Network and Parallel Computing, 2025

INSPIRIT: Adaptive Priority-based Task Scheduling for Heterogeneous Hardware.

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the 2025 IEEE International Parallel and Distributed Processing Symposium, 2025

GNNPerf: Towards Effective Performance Profiling and Analysis Across GNN Frameworks.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2025

Towards Efficient Instruction Stream Scheduling for Stencil Computation on ARM Processors.

[DOI]

,

,

,

,

,

Proceedings of the 2025 IEEE International Parallel and Distributed Processing Symposium, 2025

Efficient Locality-aware Instruction Stream Scheduling for Stencil Computation on ARM Processors.

[DOI]

,

,

,

,

,

Proceedings of the 39th ACM International Conference on Supercomputing, 2025

ESC: Effective Submanifold Convolution using Tensor Cores.

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the 54th International Conference on Parallel Processing, 2025

Identifying Potential Anomalous Operations in Graph Neural Network Training.

[DOI]

,

,

,

,

,

Proceedings of the Advanced Parallel Processing Technologies, 2025

2024

AtRec: Accelerating Recommendation Model Training on CPUs.

[DOI]

,

,

,

,

,

,

,

IEEE Trans. Parallel Distributed Syst., June, 2024

Minions: Accelerating Large Language Model Inference with Adaptive and Collective Speculative Decoding.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, 2024

GVARP: Detecting Performance Variance on Large-Scale Heterogeneous Systems.

[DOI]

,

,

,

,

,

Proceedings of the International Conference for High Performance Computing, 2024

PRoof: A Comprehensive Hierarchical Profiling Framework for Deep Neural Networks with Roofline Analysis.

[DOI]

,

,

,

,

,

,

Proceedings of the 53rd International Conference on Parallel Processing, 2024

Retrospection on the Performance Analysis Tools for Large-Scale HPC Programs.

[DOI]

,

,

,

,

,

,

Proceedings of the 31st IEEE International Conference on High Performance Computing, 2024

2023

TrivialSpy: Identifying Software Triviality via Fine-grained and Dataflow-based Value Profiling.

[DOI]

,

,

,

,

Proceedings of the International Conference for High Performance Computing, 2023

BiRFIA: Selective Binary Rewriting for Function Interception on ARM.

[DOI]

,

,

,

,

Proceedings of the 37th International Conference on Supercomputing, 2023

Accelerating Big Data Application by Eliminating Redundancy on Hadoop Cluster.

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the 29th IEEE International Conference on Parallel and Distributed Systems, 2023

Efficient Deep Molecular Dynamic Model Training on Heterogeneous System.

[DOI]

,

,

,

,

,

,

,

Proceedings of the 29th IEEE International Conference on Parallel and Distributed Systems, 2023

VClinic: A Portable and Efficient Framework for Fine-Grained Value Profilers.

[DOI]

,

,

,

,

Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2022

Accelerating the cryo-EM structure determination in RELION on GPU cluster.

[DOI]

,

,

,

Frontiers Comput. Sci., 2022

PowerSpector: Towards Energy Efficiency with Calling-Context-Aware Profiling.

[DOI]

,

,

,

,

Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

Vectorizing SpMV by Exploiting Dynamic Regular Patterns.

[DOI]

,

,

,

,

,

Proceedings of the 51st International Conference on Parallel Processing, 2022

2021

The Deep Learning Compiler: A Comprehensive Survey.

[DOI]

,

,

,

,

,

,

,

,

,

IEEE Trans. Parallel Distributed Syst., 2021

dgQuEST: Accelerating Large Scale Quantum Circuit Simulation through Hybrid CPU-GPU Memory Hierarchies.

[DOI]

,

,

,

,

,

,

Proceedings of the Network and Parallel Computing, 2021

Automatic Code Generation and Optimization of Large-scale Stencil Computation on Many-core Processors.

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9, 2021

DRStencil: Exploiting Data Reuse within Low-order Stencil on GPU.

[DOI]

,

,

,

,

Proceedings of the 2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, 2021

2020

The Deep Learning Compiler: A Comprehensive Survey.

[DOI]

,

,

,

,

,

,

,

CoRR, 2020

swGBDT: Efficient Gradient Boosted Decision Tree on Sunway Many-Core Processor.

[DOI]

,

,

,

,

,

,

Proceedings of the Supercomputing Frontiers - 6th Asian Conference, 2020

ZeroSpy: exploring software inefficiency with redundant zeros.

[DOI]

,

,

,

,

Proceedings of the International Conference for High Performance Computing, 2020

Accelerating De Novo Assembler WTDBG2 on Commodity Servers.

[DOI]

,

,

,

,

,

Proceedings of the Algorithms and Architectures for Parallel Processing, 2020

Towards GPU Acceleration of Phonon Computation with ShengBTE.

[DOI]

,

,

,

,

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2020

2019

Performance Evaluation and Analysis of Linear Algebra Kernels in the Prototype Tianhe-3 Cluster.

[DOI]

,

,

,

,

Proceedings of the Supercomputing Frontiers - 5th Asian Conference, 2019

Improving the Parallelism of CESM on GPU.

[DOI]

,

,

,

,

,

,

,

Proceedings of the Algorithms and Architectures for Parallel Processing, 2019

L-DAG: Enabling Loopy Workflow in Scientific Application with Automatic DAG Transformation.

[DOI]

,

,

,

Proceedings of the 2019 IEEE Intl Conf on Dependable, 2019

2018

swCaffe: A Parallel Framework for Accelerating Deep Learning Applications on Sunway TaihuLight.

[DOI]

,

,

,

,

,

,

,

Proceedings of the IEEE International Conference on Cluster Computing, 2018

Performance Analysis and Optimization of Cyro-EM Structure Determination in RELION-2.

[DOI]

,

,

,

Proceedings of the Advanced Computer Architecture - 12th Conference, 2018

Loading...