Kenjiro Taura

Orcid: 0000-0001-5224-382X

Affiliations:
  • University of Tokyo, Japan


According to our database1, Kenjiro Taura authored at least 117 papers between 1992 and 2023.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2023
On Data Imbalance in Molecular Property Prediction with Pre-training.
CoRR, 2023

Is Self-Supervised Pretraining Good for Extrapolation in Molecular Property Prediction?
CoRR, 2023

Itoyori: Reconciling Global Address Space and Global Fork-Join Task Parallelism.
Proceedings of the International Conference for High Performance Computing, 2023

Associative Operator Precedence Parsing: A Method To Increase Data Parsing Parallelism.
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2023

2022
Improving Cache Utilization of Nested Parallel Programs by Almost Deterministic Work Stealing.
IEEE Trans. Parallel Distributed Syst., 2022

Cost-aware Programming on Page-based Distributed Shared Memory.
J. Inf. Process., 2022

ComposableThreads: Rethinking User-level Threads with Composability and Parametricity in C++.
J. Inf. Process., 2022

mdx: A Cloud Platform for Supporting Data Science and Cross-Disciplinary Research Collaborations.
CoRR, 2022

SimdFSM: An Adaptive Vectorization of Finite State Machines for Speculative Execution.
Proceedings of the Parallel and Distributed Computing, Applications and Technologies, 2022


Distributed Continuation Stealing is More Scalable than You Might Think.
Proceedings of the IEEE International Conference on Cluster Computing, 2022

2021
An Efficient and Scalable Distributed Hypergraph Processing System.
J. Inf. Process., 2021

Lightweight preemptive user-level threads.
Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

Pitfalls of InfiniBand with On-Demand Paging.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2021

Automatic Graph Partitioning for Very Large-scale Deep Learning.
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

Plex: Scaling Parallel Lexing with Backtrack-Free Prescanning.
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

2020
Analyzing the Performance Trade-Off in Implementing User-Level Threads.
IEEE Trans. Parallel Distributed Syst., 2020

CENTAURUS: A Dynamic Parser Generator for Parallel Ad Hoc Data Extraction.
J. Inf. Process., 2020

Parallelizing and optimizing neural Encoder-Decoder models without padding on multi-core architecture.
Future Gener. Comput. Syst., 2020

MENPS: A Decentralized Distributed Shared Memory Exploiting RDMA.
Proceedings of the Fourth IEEE/ACM Annual Workshop on Emerging Parallel and Distributed Runtime Systems and Middleware, 2020

Reliable Reverse Engineering of Intel DRAM Addressing Using Performance Counters.
Proceedings of the 28th International Symposium on Modeling, 2020

Automatic Identification and Precise Attribution of DRAM Bandwidth Contention.
Proceedings of the ICPP 2020: 49th International Conference on Parallel Processing, 2020

On the Correct Measurement of Application Memory Bandwidth and Memory Access Latency.
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2020

2019
TP-PARSEC: A Task Parallel PARSEC Benchmark Suite.
J. Inf. Process., 2019

PerfMemPlus: A Tool for Automatic Discovery of Memory Performance Problems.
Proceedings of the High Performance Computing - 34th International Conference, 2019

Almost deterministic work stealing.
Proceedings of the International Conference for High Performance Computing, 2019

ClPy: A NumPy-Compatible Library Accelerated with OpenCL.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2019

Software combining to mitigate multithreaded MPI contention.
Proceedings of the ACM International Conference on Supercomputing, 2019

An Efficient Inter-Node Communication System with Lightweight-Thread Scheduling.
Proceedings of the 21st IEEE International Conference on High Performance Computing and Communications; 17th IEEE International Conference on Smart City; 5th IEEE International Conference on Data Science and Systems, 2019

BOLT: Optimizing OpenMP Parallel Regions with User-Level Threads.
Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques, 2019

2018
Argobots: A Lightweight Low-Level Threading and Tasking Framework.
IEEE Trans. Parallel Distributed Syst., 2018

Lessons learned from analyzing dynamic promotion for user-level threading.
Proceedings of the International Conference for High Performance Computing, 2018

Effectiveness of Moldable and Malleable Scheduling in Deep Learning Tasks.
Proceedings of the 24th IEEE International Conference on Parallel and Distributed Systems, 2018

Parallelized Software Offloading of Low-Level Communication with User-Level Threads.
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2018

2017
SDAC: Porting Scientific Data to Spark RDDs.
Proceedings of the Network and Parallel Computing, 2017

Autonomic Resource Management for Program Orchestration in Large-Scale Data Analysis.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Cache Friendly Parallelization of Neural Encoder-Decoder Models Without Padding on Multi-core Architecture.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

Delay Spotter: A Tool for Spotting Scheduler-Caused Delays in Task Parallel Runtime Systems.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

2016
Scalable Work Stealing of Native Threads on an x86-64 Infiniband Cluster.
J. Inf. Process., 2016

Fragmented BWT: An Extended BWT for Full-Text Indexing.
Proceedings of the String Processing and Information Retrieval, 2016

Autotuning of a Cut-Off for Task Parallel Programs.
Proceedings of the 10th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip, 2016

Scaling FMM with Data-Driven OpenMP Tasks on Multicore Architectures.
Proceedings of the OpenMP: Memory, Devices, and Tasks, 2016

Tapas: An Implicitly Parallel Programming Framework for Hierarchical N-Body Algorithms.
Proceedings of the 22nd IEEE International Conference on Parallel and Distributed Systems, 2016

A Quest for Unified, Global View Parallel Programming Models for Our Future.
Proceedings of the 6th International Workshop on Runtime and Operating Systems for Supercomputers, 2016

From FLOPS to BYTES: disruptive change in high-performance computing towards the post-moore era.
Proceedings of the ACM International Conference on Computing Frontiers, CF'16, 2016

Low Latency and Resource-Aware Program Composition for Large-Scale Data Analysis.
Proceedings of the IEEE/ACM 16th International Symposium on Cluster, 2016

A Static Cut-off for Task Parallel Programs.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015
SIMD- and Cache-Friendly Algorithm for Sorting an Array of Structures.
Proc. VLDB Endow., 2015

DAGViz: a DAG visualization tool for analyzing task-parallel program traces.
Proceedings of the 2nd Workshop on Visual Performance Analysis, 2015

Scalable Task-Parallel SGD on Matrix Factorization in Multicore Architectures.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015

Uni-Address Threads: Scalable Thread Management for RDMA-Based Work Stealing.
Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, 2015

2014
Faster Set Intersection with SIMD instructions by Reducing Branch Mispredictions.
Proc. VLDB Endow., 2014

ParaLite: A Parallel Database System for Data-Intensive Workflows.
IEICE Trans. Inf. Syst., 2014

Scalable analysis of multicore data reuse and sharing.
Proceedings of the 2014 International Conference on Supercomputing, 2014

MassiveThreads: A Thread Library for High Productivity Languages.
Proceedings of the Concurrent Objects and Beyond, 2014

2013
Design and implementation of GXP make - A workflow system based on make.
Future Gener. Comput. Syst., 2013

Fork-Join and Data-Driven Execution Models on Multi-core Architectures: Case Study of the FMM.
Proceedings of the Supercomputing - 28th International Supercomputing Conference, 2013

Analysis of Data Reuse in Task-Parallel Runtimes.
Proceedings of the High Performance Computing Systems. Performance Modeling, Benchmarking and Simulation, 2013

Design and implementation of a customizable work stealing scheduler.
Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers, 2013

Parallel and memory-efficient Burrows-Wheeler transform.
Proceedings of the 2013 IEEE International Conference on Big Data (IEEE BigData 2013), 2013

A selective checkpointing mechanism for query plans in a parallel database system.
Proceedings of the 2013 IEEE International Conference on Big Data (IEEE BigData 2013), 2013

2012
Parallel Computational Reconfiguration Based on a PGAS Model.
J. Inf. Process., 2012

Half-process: A Process Partially Sharing Its Address Space with Other Processes.
J. Inf. Process., 2012

A Task Parallel Implementation of Fast Multipole Methods.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Acceleration of Data-Intensive Workflow Applications by Using File Access History.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

A Comparative Study of Data Processing Approaches for Text Processing Workflows.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

An Empirical Performance Study of Chapel Programming Language.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

ParaLite: Supporting Collective Queries in Database System to Parallelize User-Defined Executable.
Proceedings of the 12th IEEE/ACM International Symposium on Cluster, 2012

2010
Easy and instantaneous processing for data-intensive workflows.
Proceedings of the 3rd Workshop on Many-Task Computing on Grids and Supercomputers, 2010

File-access patterns of data-intensive workflow applications and their implications to distributed filesystems.
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, 2010

A global address space framework for irregular applications.
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, 2010

ParaTrac: a fine-grained profiler for data-intensive workflows.
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, 2010

File-Access Characteristics of Data-Intensive Workflow Applications.
Proceedings of the 10th IEEE/ACM International Conference on Cluster, 2010

Fine-Grained Profiling for Data-Intensive Workflows.
Proceedings of the 10th IEEE/ACM International Conference on Cluster, 2010

2009
Autonomous collaborative environment for project-based learning.
Robotics Auton. Syst., 2009

High performance wide-area overlay using deadlock-free routing.
Proceedings of the 18th ACM International Symposium on High Performance Distributed Computing, 2009

GMount: An Ad Hoc and Locality-Aware Distributed File System by Using SSH and FUSE.
Proceedings of the 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, 2009

2008
Collective operations for wide-area message-passing systems using adaptive spanning trees.
Int. J. High Perform. Comput. Netw., 2008

gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments.
Proceedings of the Languages and Compilers for Parallel Computing, 2008

A scalable high-performance communication library for wide-area environments.
Proceedings of the 9th IEEE/ACM International Conference on Grid Computing (Grid 2008), Tsukuba, Japan, September 29, 2008

GMount: Build your grid file system on the fly.
Proceedings of the 9th IEEE/ACM International Conference on Grid Computing (Grid 2008), Tsukuba, Japan, September 29, 2008

A Stable Broadcast Algorithm.
Proceedings of the 8th IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2008), 2008

Scalable Data Gathering for Real-Time Monitoring Systems on Distributed Computing.
Proceedings of the 8th IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2008), 2008

2007
Locality-aware connection management and rank assignment for wide-area MPI.
Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007

A fast topology inference: a building block for network-aware parallel processing.
Proceedings of the 16th International Symposium on High-Performance Distributed Computing (HPDC-16 2007), 2007

Locality-aware Connection Management and Rank Assignment forWide-area MPI.
Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2007), 2007

2006
Monte Carlo Go Has a Way to Go.
Proceedings of the Proceedings, 2006

2005
Worldwide computing: Adaptive middleware and programming technology for dynamic Grid environments.
Sci. Program., 2005

An Adaptive File Distribution Algorithm for Wide Area Network.
Scalable Comput. Pract. Exp., 2005

A scalable and efficient self-organizing failure detector for grid applications.
Proceedings of the 6th IEEE/ACM International Conference on Grid Computing (GRID 2005), 2005

Highly latency tolerant Gaussian elimination.
Proceedings of the 6th IEEE/ACM International Conference on Grid Computing (GRID 2005), 2005

2004
Routing and resource discovery in Phoenix Grid-enabled message passing library.
Proceedings of the 4th IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid 2004), 2004

High performance LU factorization for non-dedicated clusters.
Proceedings of the 4th IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid 2004), 2004

2003
Virtual private grid: a command shell for utilizing hundreds of machines efficiently.
Future Gener. Comput. Syst., 2003

Phoenix: a parallel programming model for accommodating dynamically joining/leaving resources.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2003

2002
Reducing pause time of conservative collectors.
Proceedings of The Workshop on Memory Systems Performance (MSP 2002), 2002

AnZenMail: A Secure and Certified E-mail System.
Proceedings of the Software Security -- Theories and Systems, 2002

2001
Fusion of Concurrent Invocations of Exclusive Methods.
Proceedings of the Parallel Computing Technologies, 2001

Predicting Scalability of Parallel Garbage Collectors on Shared Memory Multiprocessors.
Proceedings of the 15th International Parallel & Distributed Processing Symposium (IPDPS-01), 2001

2000
Extending Java virtual machine with integer-reference conversion.
Concurr. Pract. Exp., 2000

The MicroGrid: a Scientific Tool for Modeling Computational Grids.
Proceedings of the Proceedings Supercomputing 2000, 2000

Performance Evaluation of OpenMP Applications with Nested Parallelism.
Proceedings of the Languages, 2000

Online Computation of Critical Paths for Multithreaded Languages.
Proceedings of the Parallel and Distributed Processing, 2000

A Heuristic Algorithm for Mapping Communicating Tasks on Heterogeneous Resources.
Proceedings of the 9th Heterogeneous Computing Workshop, 2000

1999
StackThreads/MP: Integrating Futures into Calling Standards.
Proceedings of the 1999 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP'99), 1999

1998
Comparing Reference Counting and Global Mark-and-Sweep on Parallel Computers.
Proceedings of the Languages, 1998

1997
A Scalable Mark-Sweep Garbage Collector on Large-Scale Shared-Memory Machines.
Proceedings of the ACM/IEEE Conference on Supercomputing, 1997

An Effective Garbage Collection Strategy for Parallel Programming Languages on Large Scale Distributed-Memory Machines.
Proceedings of the Sixth ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPOPP), 1997

Fine-grain Multithreading with Minimal Compiler Support - A Cost Effective Approach to Implementing Efficient Multithreading Languages.
Proceedings of the ACM SIGPLAN '97 Conference on Programming Language Design and Implementation (PLDI), 1997

An Efficient Compilation Framework for Languages Based on a Concurrent Process Calculus.
Proceedings of the Euro-Par '97 Parallel Processing, 1997

1996
Visualization of RNA secondary structures using highly parallel computers.
Comput. Appl. Biosci., 1996

1995
Schematic: A Concurrent Object-Oriented Extension to Scheme.
Proceedings of the Object-Based Parallel and Distributed Computation, 1995

1994
StackThreads: An Abstract Machine for Scheduling Fine-Grain Threads on Stock CPUs.
Proceedings of the Theory and Practice of Parallel Programming, 1994

ABCL/f: A Future-Based Polymorphic Typed Concurrent Object-Oriented Language- Its Design and Implementation.
Proceedings of the Specification of Parallel Algorithms, 1994

1993
Implementing concurrent object-oriented languages on multicomputers.
IEEE Parallel Distributed Technol. Syst. Appl., 1993

Highly Efficient and Encapsulated Re-use of Synchronization Code in Concurrent Object-Oriented Languages.
Proceedings of the Eighth Annual Conference on Object-Oriented Programming Systems, 1993

1992
An Efficient Implementation Scheme of Concurrent Object-Oriented Languages on Stock Multicomputers.
Proceedings of the Parallel Symbolic Computing: Languages, 1992


  Loading...