Xin You
Orcid: 0000-0002-5163-4607Affiliations:
- Beihang University, Beijing, China
According to our database1,
Xin You authored at least 44 papers
between 2018 and 2026.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2026
Exploiting Efficient Mapping and Pipelined Execution for Accelerating SpMV on Tensor Cores.
Proceedings of the 31st ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2026
Proceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2026
2025
\uline{LO}w-c\uline{O}st yet High-\uline{P}erformant \uline{S}parse Matrix-Matrix Multiplication on Arm SME Architectures.
CoRR, November, 2025
PRAGMA: A Profiling-Reasoned Multi-Agent Framework for Automatic Kernel Optimization.
CoRR, November, 2025
Identifying Performance Inefficiencies of Parallel Program With Spatial and Temporal Trace Analysis.
IEEE Trans. Parallel Distributed Syst., July, 2025
SimTrace: Exploiting Spatial and Temporal Sampling for Large-Scale Performance Analysis.
ACM Trans. Archit. Code Optim., June, 2025
Exploiting Dynamic Regular Patterns in Irregular Programs for Efficient Vectorization.
ACM Trans. Archit. Code Optim., June, 2025
Hotspy: identifying performance hotspot with graph neural network based static analysis.
CCF Trans. High Perform. Comput., June, 2025
Proceedings of the International Conference for High Performance Computing, 2025
Proceedings of the International Conference for High Performance Computing, 2025
Exploiting Transformer-Based Static Binary Analysis for Identifying Inefficient Locks.
Proceedings of the Network and Parallel Computing, 2025
Proceedings of the 2025 IEEE International Parallel and Distributed Processing Symposium, 2025
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2025
Towards Efficient Instruction Stream Scheduling for Stencil Computation on ARM Processors.
Proceedings of the 2025 IEEE International Parallel and Distributed Processing Symposium, 2025
Efficient Locality-aware Instruction Stream Scheduling for Stencil Computation on ARM Processors.
Proceedings of the 39th ACM International Conference on Supercomputing, 2025
Proceedings of the 54th International Conference on Parallel Processing, 2025
Proceedings of the Advanced Parallel Processing Technologies, 2025
2024
IEEE Trans. Parallel Distributed Syst., June, 2024
Minions: Accelerating Large Language Model Inference with Adaptive and Collective Speculative Decoding.
CoRR, 2024
Proceedings of the International Conference for High Performance Computing, 2024
PRoof: A Comprehensive Hierarchical Profiling Framework for Deep Neural Networks with Roofline Analysis.
Proceedings of the 53rd International Conference on Parallel Processing, 2024
Proceedings of the 31st IEEE International Conference on High Performance Computing, 2024
2023
TrivialSpy: Identifying Software Triviality via Fine-grained and Dataflow-based Value Profiling.
Proceedings of the International Conference for High Performance Computing, 2023
Proceedings of the 37th International Conference on Supercomputing, 2023
Proceedings of the 29th IEEE International Conference on Parallel and Distributed Systems, 2023
Proceedings of the 29th IEEE International Conference on Parallel and Distributed Systems, 2023
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023
2022
Frontiers Comput. Sci., 2022
Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022
Proceedings of the 51st International Conference on Parallel Processing, 2022
2021
IEEE Trans. Parallel Distributed Syst., 2021
dgQuEST: Accelerating Large Scale Quantum Circuit Simulation through Hybrid CPU-GPU Memory Hierarchies.
Proceedings of the Network and Parallel Computing, 2021
Automatic Code Generation and Optimization of Large-scale Stencil Computation on Many-core Processors.
Proceedings of the ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9, 2021
Proceedings of the 2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, 2021
2020
Proceedings of the Supercomputing Frontiers - 6th Asian Conference, 2020
Proceedings of the International Conference for High Performance Computing, 2020
Proceedings of the Algorithms and Architectures for Parallel Processing, 2020
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2020
2019
Performance Evaluation and Analysis of Linear Algebra Kernels in the Prototype Tianhe-3 Cluster.
Proceedings of the Supercomputing Frontiers - 5th Asian Conference, 2019
Proceedings of the Algorithms and Architectures for Parallel Processing, 2019
L-DAG: Enabling Loopy Workflow in Scientific Application with Automatic DAG Transformation.
Proceedings of the 2019 IEEE Intl Conf on Dependable, 2019
2018
swCaffe: A Parallel Framework for Accelerating Deep Learning Applications on Sunway TaihuLight.
Proceedings of the IEEE International Conference on Cluster Computing, 2018
Performance Analysis and Optimization of Cyro-EM Structure Determination in RELION-2.
Proceedings of the Advanced Computer Architecture - 12th Conference, 2018