Rio Yokota

Orcid: 0000-0001-7573-7873

According to our database1, Rio Yokota authored at least 81 papers between 2007 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Natural Gradient Primal-Dual Method for Decentralized Learning.
IEEE Trans. Signal Inf. Process. over Networks, 2024

Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order.
CoRR, 2024

Variational Learning is Effective for Large Deep Networks.
CoRR, 2024

2023
Cache Optimization and Performance Modeling of Batched, Small, and Rectangular Matrix Multiplication on Intel, AMD, and Fujitsu Processors.
ACM Trans. Math. Softw., September, 2023

The 2023 Society for Industrial and Applied Mathematics Conference on Computational Science and Engineering.
Comput. Sci. Eng., 2023

Computing the k-th Eigenvalue of Symmetric H<sup>2</sup>-Matrices.
CoRR, 2023

O(N) distributed direct factorization of structured dense matrices using runtime systems.
CoRR, 2023

DGEMM on Integer Matrix Multiplication Unit.
CoRR, 2023

ASDL: A Unified Interface for Gradient Preconditioning in PyTorch.
CoRR, 2023

Quantum Circuit Simulation by SGEMM Emulation on Tensor Cores and Automatic Precision Selection.
Proceedings of the High Performance Computing - 38th International Conference, 2023

Fast Symmetric Eigenvalue Decomposition via WY Representation on Tensor Core.
Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2023

Mixed-Precision Random Projection for RandNLA on Tensor Cores.
Proceedings of the Platform for Advanced Scientific Computing Conference, 2023

O(N) distributed direct factorization of structured dense matrices using runtime systems.
Proceedings of the 52nd International Conference on Parallel Processing, 2023

Computing the k-th Eigenvalue of Symmetric H2-Matrices.
Proceedings of the 52nd International Conference on Parallel Processing, 2023

SegRCDB: Semantic Segmentation via Formula-Driven Supervised Learning.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Pre-training Vision Transformers with Very Limited Synthesized Images.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Reducing shared memory footprint to leverage high throughput on Tensor Cores and its flexible API extension library.
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2023

Towards real-time formula driven dataset feed for large scale deep learning training.
Proceedings of the High Performance Computing for Imaging 2023, 2023

Visual Atoms: Pre-Training Vision Transformers with Sinusoidal Waves.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Pixel-level Contrastive Learning of Driving Videos with Optical Flow.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Parallel QR Factorization of Block Low-rank Matrices.
ACM Trans. Math. Softw., 2022

Scalable and Practical Natural Gradient for Large-Scale Deep Learning.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

Recovering single precision accuracy from Tensor Cores while surpassing the FP32 theoretical peak performance.
Int. J. High Perform. Comput. Appl., 2022

Empirical Study on Optimizer Selection for Out-of-Distribution Generalization.
CoRR, 2022

Scalable Linear Time Dense Direct Solver for 3-D Problems without Trailing Sub-Matrix Dependencies.
Proceedings of the SC22: International Conference for High Performance Computing, 2022

QR Factorization of Block Low-Rank Matrices on Multi-instance GPU.
Proceedings of the Parallel and Distributed Computing, Applications and Technologies, 2022

Informative Sample-Aware Proxy for Deep Metric Learning.
Proceedings of the 4th ACM International Conference on Multimedia in Asia, 2022

OPIRL: Sample Efficient Off-Policy Inverse Reinforcement Learning via Distribution Matching.
Proceedings of the 2022 International Conference on Robotics and Automation, 2022

Replacing Labeled Real-image Datasets with Auto-generated Contours.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
ExaFMM: a high-performance fast multipole method library with C++ and Python interfaces.
J. Open Source Softw., 2021

RePOSE: Real-Time Iterative Rendering and Refinement for 6D Object Pose Estimation.
CoRR, 2021

RePOSE: Fast 6D Object Pose Refinement via Deep Texture Rendering.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

2020
Epipolar-Guided Deep Object Matching for Scene Change Detection.
CoRR, 2020

Rich Information is Affordable: A Systematic Performance Analysis of Second-order Optimization Using K-FAC.
Proceedings of the KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2020

Effect of Mixed Precision Computing on H-Matrix Vector Multiplication in BEM Analysis.
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2020

2019
Extreme Scale FMM-Accelerated Boundary Integral Equation Solver for Wave Scattering.
SIAM J. Sci. Comput., 2019

QR Factorization of Block Low-rank Matrices with Weak Admissibility Condition.
J. Inf. Process., 2019

Distributed-memory lattice H-matrix factorization.
Int. J. High Perform. Comput. Appl., 2019

Practical Deep Learning with Bayesian Principles.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Optimization of Numerous Small Dense-Matrix-Vector Multiplications in H-Matrix Arithmetic on GPU.
Proceedings of the 13th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip, 2019

Performance Optimizations and Analysis of Distributed Deep Learning with Approximated Second-Order Optimization Method.
Proceedings of the 48th International Conference on Parallel Processing, 2019

Large-Scale Distributed Second-Order Optimization Using Kronecker-Factored Approximate Curvature for Deep Convolutional Neural Networks.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Exhaustive Study of Hierarchical AllReduce Patterns for Large Messages Between GPUs.
Proceedings of the 19th IEEE/ACM International Symposium on Cluster, 2019

A Performance Improvement Approach for Second-Order Optimization in Large Mini-batch Training.
Proceedings of the 19th IEEE/ACM International Symposium on Cluster, 2019

2018
Second-order Optimization Method for Large Mini-batch: Training ResNet-50 on ImageNet in 35 Epochs.
CoRR, 2018

Fast multipole preconditioners for sparse matrices arising from elliptic equations.
Comput. Vis. Sci., 2018

Optimization of Hierarchical Matrix Computation on GPU.
Proceedings of the Supercomputing Frontiers - 4th Asian Conference, 2018

Performance of Hierarchical-matrix BiCGStab Solver on GPU Clusters.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

2017
Communication Reducing Algorithms for Distributed Hierarchical N-Body Problems with Boundary Distributions.
Proceedings of the High Performance Computing - 32nd International Conference, 2017

Accelerating Matrix Multiplication in Deep Learning by Using Low-Rank Approximation.
Proceedings of the 2017 International Conference on High Performance Computing & Simulation, 2017

Evaluating the Compression Efficiency of the Filters in Convolutional Neural Networks.
Proceedings of the Artificial Neural Networks and Machine Learning - ICANN 2017, 2017

Performance Evaluation of Computation and Communication Kernels of the Fast Multipole Method on Intel Manycore Architecture.
Proceedings of the Euro-Par 2017: Parallel Processing - 23rd International Conference on Parallel and Distributed Computing, Santiago de Compostela, Spain, August 28, 2017

2016
A performance model for the communication in fast multipole methods on high-performance computing platforms.
Int. J. High Perform. Comput. Appl., 2016

Fast Multipole Method as a Matrix-Free Hierarchical Low-Rank Approximation.
CoRR, 2016

A Matrix-free Preconditioner for the Helmholtz Equation based on the Fast Multipole Method.
CoRR, 2016

Scaling FMM with Data-Driven OpenMP Tasks on Multicore Architectures.
Proceedings of the OpenMP: Memory, Devices, and Tasks, 2016

Tapas: An Implicitly Parallel Programming Framework for Hierarchical N-Body Algorithms.
Proceedings of the 22nd IEEE International Conference on Parallel and Distributed Systems, 2016

2014
Communication Complexity of the Fast Multipole Method and its Algebraic Variants.
Supercomput. Front. Innov., 2014

Petascale molecular dynamics simulation using the fast multipole method on K computer.
Comput. Phys. Commun., 2014

A Performance Model for the Communication in Fast Multipole Methods on HPC Platforms.
CoRR, 2014

Asynchronous Execution of the Fast Multipole Method Using Charm++.
CoRR, 2014

Data-driven execution of fast multipole methods.
Concurr. Comput. Pract. Exp., 2014

Scalable Fast Multipole Accelerated Vortex Methods.
Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014

2013
Petascale turbulence simulation using a highly parallel fast multipole method on GPUs.
Comput. Phys. Commun., 2013

Fork-Join and Data-Driven Execution Models on Multi-core Architectures: Case Study of the FMM.
Proceedings of the Supercomputing - 28th International Supercomputing Conference, 2013

2012
A tuned and scalable fast multipole method as a preeminent algorithm for exascale systems.
Int. J. High Perform. Comput. Appl., 2012

Hierarchical N-body Simulations with Autotuning for Heterogeneous Systems.
Comput. Sci. Eng., 2012

An FMM Based on Dual Tree Traversal for Many-core Architectures
CoRR, 2012

A Task Parallel Implementation of Fast Multipole Methods.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Poster: Scalable Fast Multipole Methods for Vortex Element Methods.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Abstract: Scalable Fast Multipole Methods for Vortex Element Methods.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Scalable Force Directed Graph Layout Algorithms Using Fast Multipole Methods.
Proceedings of the 11th International Symposium on Parallel and Distributed Computing, 2012

2011
Biomolecular electrostatics using a fast multipole BEM on up to 512 gpus and a billion unknowns.
Comput. Phys. Commun., 2011

Fast Multipole Method vs. Spectral Method for the Simulation of Isotropic Turbulence on GPUs
CoRR, 2011

Fast N-body Simulations on GPUs
CoRR, 2011

Petascale turbulence simulation using a highly parallel fast multipole method
CoRR, 2011

2010
Biomolecular Electrostatics Simulation by an FMM-based BEM on 512 GPUs
CoRR, 2010

2009
Fast multipole methods on a cluster of GPUs for the meshless simulation of turbulence.
Comput. Phys. Commun., 2009

PetRBF--A parallel O(N) algorithm for radial basis function interpolation
CoRR, 2009

42 TFlops hierarchical <i>N</i>-body simulations on GPUs with applications in both astrophysics and turbulence.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

2007
Calculation of isotropic turbulence using a pure Lagrangian vortex method.
J. Comput. Phys., 2007


  Loading...