Kenjiro Taura

According to our database1, Kenjiro Taura authored at least 94 papers between 1992 and 2019.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Other 

Links

On csauthors.net:

Bibliography

2019
TP-PARSEC: A Task Parallel PARSEC Benchmark Suite.
JIP, 2019

PerfMemPlus: A Tool for Automatic Discovery of Memory Performance Problems.
Proceedings of the High Performance Computing - 34th International Conference, 2019

2018
Argobots: A Lightweight Low-Level Threading and Tasking Framework.
IEEE Trans. Parallel Distrib. Syst., 2018

Lessons learned from analyzing dynamic promotion for user-level threading.
Proceedings of the International Conference for High Performance Computing, 2018

Effectiveness of Moldable and Malleable Scheduling in Deep Learning Tasks.
Proceedings of the 24th IEEE International Conference on Parallel and Distributed Systems, 2018

Parallelized Software Offloading of Low-Level Communication with User-Level Threads.
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2018

2017
SDAC: Porting Scientific Data to Spark RDDs.
Proceedings of the Network and Parallel Computing, 2017

Autonomic Resource Management for Program Orchestration in Large-Scale Data Analysis.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Cache Friendly Parallelization of Neural Encoder-Decoder Models Without Padding on Multi-core Architecture.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

Delay Spotter: A Tool for Spotting Scheduler-Caused Delays in Task Parallel Runtime Systems.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

2016
Scalable Work Stealing of Native Threads on an x86-64 Infiniband Cluster.
JIP, 2016

Fragmented BWT: An Extended BWT for Full-Text Indexing.
Proceedings of the String Processing and Information Retrieval, 2016

Autotuning of a Cut-Off for Task Parallel Programs.
Proceedings of the 10th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip, 2016

Scaling FMM with Data-Driven OpenMP Tasks on Multicore Architectures.
Proceedings of the OpenMP: Memory, Devices, and Tasks, 2016

Tapas: An Implicitly Parallel Programming Framework for Hierarchical N-Body Algorithms.
Proceedings of the 22nd IEEE International Conference on Parallel and Distributed Systems, 2016

A Quest for Unified, Global View Parallel Programming Models for Our Future.
Proceedings of the 6th International Workshop on Runtime and Operating Systems for Supercomputers, 2016

From FLOPS to BYTES: disruptive change in high-performance computing towards the post-moore era.
Proceedings of the ACM International Conference on Computing Frontiers, CF'16, 2016

Low Latency and Resource-Aware Program Composition for Large-Scale Data Analysis.
Proceedings of the IEEE/ACM 16th International Symposium on Cluster, 2016

A Static Cut-off for Task Parallel Programs.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015
SIMD- and Cache-Friendly Algorithm for Sorting an Array of Structures.
PVLDB, 2015

DAGViz: a DAG visualization tool for analyzing task-parallel program traces.
Proceedings of the 2nd Workshop on Visual Performance Analysis, 2015

Scalable Task-Parallel SGD on Matrix Factorization in Multicore Architectures.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015

Uni-Address Threads: Scalable Thread Management for RDMA-Based Work Stealing.
Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, 2015

2014
Faster Set Intersection with SIMD instructions by Reducing Branch Mispredictions.
PVLDB, 2014

ParaLite: A Parallel Database System for Data-Intensive Workflows.
IEICE Transactions, 2014

Scalable analysis of multicore data reuse and sharing.
Proceedings of the 2014 International Conference on Supercomputing, 2014

MassiveThreads: A Thread Library for High Productivity Languages.
Proceedings of the Concurrent Objects and Beyond, 2014

2013
Design and implementation of GXP make - A workflow system based on make.
Future Generation Comp. Syst., 2013

Fork-Join and Data-Driven Execution Models on Multi-core Architectures: Case Study of the FMM.
Proceedings of the Supercomputing - 28th International Supercomputing Conference, 2013

Analysis of Data Reuse in Task-Parallel Runtimes.
Proceedings of the High Performance Computing Systems. Performance Modeling, Benchmarking and Simulation, 2013

Design and implementation of a customizable work stealing scheduler.
Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers, 2013

Parallel and memory-efficient Burrows-Wheeler transform.
Proceedings of the 2013 IEEE International Conference on Big Data, 2013

A selective checkpointing mechanism for query plans in a parallel database system.
Proceedings of the 2013 IEEE International Conference on Big Data, 2013

2012
Parallel Computational Reconfiguration Based on a PGAS Model.
JIP, 2012

Half-process: A Process Partially Sharing Its Address Space with Other Processes.
JIP, 2012

A Task Parallel Implementation of Fast Multipole Methods.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Acceleration of Data-Intensive Workflow Applications by Using File Access History.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

A Comparative Study of Data Processing Approaches for Text Processing Workflows.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

An Empirical Performance Study of Chapel Programming Language.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

ParaLite: Supporting Collective Queries in Database System to Parallelize User-Defined Executable.
Proceedings of the 12th IEEE/ACM International Symposium on Cluster, 2012

2010
Easy and instantaneous processing for data-intensive workflows.
Proceedings of the 3rd Workshop on Many-Task Computing on Grids and Supercomputers, 2010

File-access patterns of data-intensive workflow applications and their implications to distributed filesystems.
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, 2010

A global address space framework for irregular applications.
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, 2010

ParaTrac: a fine-grained profiler for data-intensive workflows.
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, 2010

Design and Implementation of GXP Make - A Workflow System Based on Make.
Proceedings of the Sixth International Conference on e-Science, 2010

File-Access Characteristics of Data-Intensive Workflow Applications.
Proceedings of the 10th IEEE/ACM International Conference on Cluster, 2010

Fine-Grained Profiling for Data-Intensive Workflows.
Proceedings of the 10th IEEE/ACM International Conference on Cluster, 2010

2009
Autonomous collaborative environment for project-based learning.
Robotics and Autonomous Systems, 2009

High performance wide-area overlay using deadlock-free routing.
Proceedings of the 18th ACM International Symposium on High Performance Distributed Computing, 2009

GMount: An Ad Hoc and Locality-Aware Distributed File System by Using SSH and FUSE.
Proceedings of the 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, 2009

2008
Collective operations for wide-area message-passing systems using adaptive spanning trees.
IJHPCN, 2008

gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments.
Proceedings of the Languages and Compilers for Parallel Computing, 2008

A scalable high-performance communication library for wide-area environments.
Proceedings of the 9th IEEE/ACM International Conference on Grid Computing (Grid 2008), Tsukuba, Japan, September 29, 2008

GMount: Build your grid file system on the fly.
Proceedings of the 9th IEEE/ACM International Conference on Grid Computing (Grid 2008), Tsukuba, Japan, September 29, 2008

A Stable Broadcast Algorithm.
Proceedings of the 8th IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2008), 2008

Scalable Data Gathering for Real-Time Monitoring Systems on Distributed Computing.
Proceedings of the 8th IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2008), 2008

2007
Locality-aware connection management and rank assignment for wide-area MPI.
Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007

A fast topology inference: a building block for network-aware parallel processing.
Proceedings of the 16th International Symposium on High-Performance Distributed Computing (HPDC-16 2007), 2007

Locality-aware Connection Management and Rank Assignment forWide-area MPI.
Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2007), 2007

2006
Autonomous Collaborative Environment for Project Based Learning.
Proceedings of the Intelligent Autonomous Systems 9, 2006

Monte Carlo Go Has a Way to Go.
Proceedings of the Proceedings, 2006

2005
Worldwide computing: Adaptive middleware and programming technology for dynamic Grid environments.
Scientific Programming, 2005

An Adaptive File Distribution Algorithm for Wide Area Network.
Scalable Computing: Practice and Experience, 2005

Collective operations for wide-area message passing systems using adaptive spanning trees.
Proceedings of the 6th IEEE/ACM International Conference on Grid Computing (GRID 2005), 2005

A scalable and efficient self-organizing failure detector for grid applications.
Proceedings of the 6th IEEE/ACM International Conference on Grid Computing (GRID 2005), 2005

Highly latency tolerant Gaussian elimination.
Proceedings of the 6th IEEE/ACM International Conference on Grid Computing (GRID 2005), 2005

2004
Routing and resource discovery in Phoenix Grid-enabled message passing library.
Proceedings of the 4th IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid 2004), 2004

High performance LU factorization for non-dedicated clusters.
Proceedings of the 4th IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid 2004), 2004

2003
Virtual private grid: a command shell for utilizing hundreds of machines efficiently.
Future Generation Comp. Syst., 2003

Phoenix: a parallel programming model for accommodating dynamically joining/leaving resources.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2003

2002
Reducing pause time of conservative collectors.
Proceedings of The Workshop on Memory Systems Performance (MSP 2002), 2002

AnZenMail: A Secure and Certified E-mail System.
Proceedings of the Software Security -- Theories and Systems, 2002

Virtual Private Grid: A Command Shell for Utilizing Hundreds of Machines Efficiently.
Proceedings of the 2nd IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2002), 2002

2001
Fusion of Concurrent Invocations of Exclusive Methods.
Proceedings of the Parallel Computing Technologies, 2001

Predicting Scalability of Parallel Garbage Collectors on Shared Memory Multiprocessors.
Proceedings of the 15th International Parallel & Distributed Processing Symposium (IPDPS-01), 2001

2000
Extending Java virtual machine with integer-reference conversion.
Concurrency - Practice and Experience, 2000

The MicroGrid: a Scientific Tool for Modeling Computational Grids.
Proceedings of the Proceedings Supercomputing 2000, 2000

Performance Evaluation of OpenMP Applications with Nested Parallelism.
Proceedings of the Languages, 2000

Online Computation of Critical Paths for Multithreaded Languages.
Proceedings of the Parallel and Distributed Processing, 2000

A Heuristic Algorithm for Mapping Communicating Tasks on Heterogeneous Resources.
Proceedings of the 9th Heterogeneous Computing Workshop, 2000

1999
StackThreads/MP: Integrating Futures into Calling Standards.
Proceedings of the 1999 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP'99), 1999

1998
Comparing Reference Counting and Global Mark-and-Sweep on Parallel Computers.
Proceedings of the Languages, 1998

1997
A Scalable Mark-Sweep Garbage Collector on Large-Scale Shared-Memory Machines.
Proceedings of the ACM/IEEE Conference on Supercomputing, 1997

An Effective Garbage Collection Strategy for Parallel Programming Languages on Large Scale Distributed-Memory Machines.
Proceedings of the Sixth ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPOPP), 1997

Fine-grain Multithreading with Minimal Compiler Support - A Cost Effective Approach to Implementing Efficient Multithreading Languages.
Proceedings of the ACM SIGPLAN '97 Conference on Programming Language Design and Implementation (PLDI), 1997

An Efficient Compilation Framework for Languages Based on a Concurrent Process Calculus.
Proceedings of the Euro-Par '97 Parallel Processing, 1997

1996
Visualization of RNA secondary structures using highly parallel computers.
Computer Applications in the Biosciences, 1996

1995
Schematic: A Concurrent Object-Oriented Extension to Scheme.
Proceedings of the Object-Based Parallel and Distributed Computation, 1995

1994
StackThreads: An Abstract Machine for Scheduling Fine-Grain Threads on Stock CPUs.
Proceedings of the Theory and Practice of Parallel Programming, 1994

ABCL/f: A Future-Based Polymorphic Typed Concurrent Object-Oriented Language- Its Design and Implementation.
Proceedings of the Specification of Parallel Algorithms, 1994

1993
Implementing concurrent object-oriented languages on multicomputers.
IEEE P&DT, 1993

An Efficient Implementation Scheme of Concurrent Object-Oriented Languages on Stock Multicomputers.
Proceedings of the Fourth ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPOPP), 1993

Highly Efficient and Encapsulated Re-use of Synchronization Code in Concurrent Object-Oriented Languages.
Proceedings of the Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA), Eighth Annual Conference, Washington, DC, USA, September 26, 1993

1992
An Efficient Implementation Scheme of Concurrent Object-Oriented Languages on Stock Multicomputers.
Proceedings of the Parallel Symbolic Computing: Languages, 1992


  Loading...