Hideki Saito

Orcid: 0009-0004-5529-7048

Affiliations:
  • Intel Corporation, Santa Clara, CA, USA


According to our database1, Hideki Saito authored at least 33 papers between 1993 and 2023.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2023
Streamline Ahead-of-Time SYCL CPU Device Implementation through Bypassing SPIR-V.
Proceedings of the 2023 International Workshop on OpenCL, 2023

2017
LLVM Compiler Implementation for Explicit Parallelization and SIMD Vectorization.
Proceedings of the Fourth Workshop on the LLVM Compiler Infrastructure in HPC, 2017

2016
LLVM Framework and IR Extensions for Parallelization, SIMD Vectorization and Offloading.
Proceedings of the Third Workshop on the LLVM Compiler Infrastructure in HPC, 2016

Reducing the Functionality Gap Between Auto-Vectorization and Explicit Vectorization - Compress/Expand and Histogram.
Proceedings of the OpenMP: Memory, Devices, and Tasks, 2016

2015
Effective SIMD Vectorization for Intel Xeon Phi Coprocessors.
Sci. Program., 2015

Can traditional programming bridge the ninja performance gap for parallel computing applications?
Commun. ACM, 2015

2013
Practical SIMD Vectorization Techniques for Intel® Xeon Phi Coprocessors.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

Compiler-Based Data Prefetching and Streaming Non-temporal Store Generation for the Intel(R) Xeon Phi(TM) Coprocessor.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

2012
Extending OpenMP* with Vector Constructs for Modern Multicore SIMD Architectures.
Proceedings of the OpenMP in a Heterogeneous World - 8th International Workshop on OpenMP, 2012

Compiling C/C++ SIMD Extensions for Function and Loop Vectorizaion on Multicore-SIMD Processors.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

2010
SPEC MPI2007 - an application benchmark suite for parallel systems using MPI.
Concurr. Comput. Pract. Exp., 2010

On the efficacy of call graph-level thread-level speculation.
Proceedings of the first joint WOSP/SIPEW International Conference on Performance Engineering, 2010

Exploitation of nested thread-level speculative parallelism on multi-core systems.
Proceedings of the 7th Conference on Computing Frontiers, 2010

2009
On the exploitation of loop-level parallelism in embedded applications.
ACM Trans. Embed. Comput. Syst., 2009

2008
Comparative architectural characterization of SPEC CPU2000 and CPU2006 benchmarks on the intel® Core<sup>TM</sup> 2 Duo processor.
Proceedings of the 2008 International Conference on Embedded Computer Systems: Architectures, 2008

2006
A general approach for partitioning N-dimensional parallel nested loops with conditionals.
Proceedings of the SPAA 2006: Proceedings of the 18th Annual ACM Symposium on Parallelism in Algorithms and Architectures, Cambridge, Massachusetts, USA, July 30, 2006

On the performance potential of different types of speculative thread-level parallelism: The DL version of this paper includes corrections that were not made available in the printed proceedings.
Proceedings of the 20th Annual International Conference on Supercomputing, 2006

Lightweight lock-free synchronization methods for multithreading.
Proceedings of the 20th Annual International Conference on Supercomputing, 2006

Probablistic Self-Scheduling.
Proceedings of the Euro-Par 2006, Parallel Processing, 12th International Euro-Par Conference, Dresden, Germany, August 28, 2006

Challenges in exploitation of loop parallelism in embedded applications.
Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis, 2006

2005
Practical Compiler Techniques on Efficient Multithreaded Code Generation for OpenMP Programs.
Comput. J., 2005

Impact of Compiler-based Data-Prefetching Techniques on SPEC OMP Application Performance.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

2003
Large System Performance of SPEC OMP Benchmark Suites.
Int. J. Parallel Program., 2003

2002
Large System Performance of SPEC OMP2001 Benchmarks.
Proceedings of the High Performance Computing, 4th International Symposium, 2002

SPEC HPC2002: The Next High-Performance Computer Benchmark.
Proceedings of the High Performance Computing, 4th International Symposium, 2002

2000
The Design of the PROMIS Compiler-Towards Multi-Level Parallelization.
Int. J. Parallel Program., 2000

1999
Symbolic Analysis in the PROMIS Compiler.
Proceedings of the Languages and Compilers for Parallel Computing, 1999

Multithreading Runtime Support for Loop and Functional Parallelism.
Proceedings of the High Performance Computing, Second International Symposium, 1999

The Design of the PROMIS Compiler.
Proceedings of the Compiler Construction, 8th International Conference, 1999

1996
sigma-SSA and Its Construction Through Symbolic Interpretation.
Proceedings of the Languages and Compilers for Parallel Computing, 1996

1995
The CDP<sup>2</sup> Partitioning Algorithm a Combined End Program Partitioning Algorithm on the Data Partitioning Graph.
Proceedings of the 1995 International Conference on Parallel Processing, 1995

1994
The Data Partitioning Graph: Extending Data and Control Dependencies for Data Partitioning.
Proceedings of the Languages and Compilers for Parallel Computing, 1994

1993
A distributed shared memory multiprocessor ASURA: memory and cache architecture.
Proceedings of the Proceedings Supercomputing '93, 1993


  Loading...