S. Lennart Johnsson

CoRR, 2021

Analysis of Factors Affecting Power Consumption and Energy Efficiency of SGEMM on the Low-Power Myriad-2 VPU.

[BibT_eX]

[DOI]

Suyash Bakshi

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2021

2020

A Highly Efficient SGEMM Implementation using DMA on the Intel/Movidius Myriad-2.

[BibT_eX]

[DOI]

Suyash Bakshi

Proceedings of the 32nd IEEE International Symposium on Computer Architecture and High Performance Computing, 2020

An Adaptive Space-Filling Curve Trajectory for Ordering 3D Datasets to 1D: Application to Brain Magnetic Resonance Imaging Data for Classification.

[BibT_eX]

[DOI]

Proceedings of the Computational Science - ICCS 2020, 2020

Squeeze U-Net: A Memory and Energy Efficient Image Segmentation Network.

[BibT_eX]

[DOI]

Nazanin Beheshti

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019

Scalable machine learning computing a data summarization matrix with a parallel array DBMS.

[BibT_eX]

[DOI]

Carlos Ordonez

Yiqun Zhang

Distributed Parallel Databases, 2019

2018

A performance spectrum for parallel computational frameworks that solve PDEs.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2018

2017

A Cloud System for Machine Learning Exploiting a Parallel Array DBMS.

[BibT_eX]

[DOI]

Yiqun Zhang

Carlos Ordonez

Proceedings of the 28th International Workshop on Database and Expert Systems Applications, 2017

2016

Lifetime and Deployment Limits for Mobile, 3D-Perceptual Applications.

[BibT_eX]

[DOI]

Proceedings of the Virtual, Augmented and Mixed Reality, 2016

2014

Instrumentation for accurate energy-to-solution measurements of a texas instruments TMS320C6678 digital signal processor and its DDR3 memory.

[BibT_eX]

[DOI]

Proceedings of the 2nd International Workshop on Energy Efficient Supercomputing, 2014

Exploiting DMA for Performance and Energy Optimized STREAM on a DSP.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014

2012

Overview of Data Centers Energy Efficiency Evolution.

[BibT_eX]

[DOI]

Proceedings of the Handbook of Energy-Aware and Green Computing - Two Volume Set., 2012

2010

The SNIC/KTH PRACE prototype: Achieving high energy efficiency with commodity technology without acceleration.

[BibT_eX]

[DOI]

Daniel Ahlin

John Wang

Proceedings of the International Green Computing Conference 2010, 2010

2008

Automatic Generation of FFT for Translations of Multipole Expansions in Spherical Harmonics.

[BibT_eX]

[DOI]

Jakub Kurzak

B. Montgomery Pettitt

Int. J. High Perform. Comput. Appl., 2008

Scalable Grid-wide capacity allocation with the SweGrid Accounting System (SGAS).

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2008

2007

Scheduling FFT computation on SMP and multicore systems.

[BibT_eX]

[DOI]

Ayaz Ali

Jaspal Subhlok

Proceedings of the 21th Annual International Conference on Supercomputing, 2007

Adaptive Computation of Self Sorting In-Place FFTs on Hierarchical Memory Architectures.

[BibT_eX]

[DOI]

Ayaz Ali

Jaspal Subhlok

Proceedings of the High Performance Computing and Communications, 2007

Dynamic, context-aware, least-privilege grid delegation.

[BibT_eX]

[DOI]

Mehran Ahsant

Jim Basney

Proceedings of the 8th IEEE/ACM International Conference on Grid Computing (GRID 2007), 2007

Developing Assays for the Detection of Influenza in Human Samples.

[BibT_eX]

Proceedings of the International Conference on Bioinformatics & Computational Biology, 2007

2006

A Service-Oriented Approach to Enforce Grid Resource Allocations.

[BibT_eX]

[DOI]

Int. J. Cooperative Inf. Syst., 2006

Toward an On-Demand Restricted Delegation Mechanism for Grids.

[BibT_eX]

[DOI]

Proceedings of the 7th IEEE/ACM International Conference on Grid Computing (GRID 2006), 2006

Two Challenges in Genomics That Can Benefit from Petascale Platforms.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2006 Workshops: Parallel Processing, 2006

2005

Scheduling strategies for mapping application workflows onto the grid.

[BibT_eX]

[DOI]

Bo Liu

Proceedings of the 14th IEEE International Symposium on High Performance Distributed Computing, 2005

2004

Automatic Performance Tuning for Fast Fourier Transforms.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2004

New Grid Scheduling and Rescheduling Methods in the GrADS Project.

[BibT_eX]

[DOI]

Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

An OGSA-based accounting system for allocation enforcement across HPC centers.

[BibT_eX]

[DOI]

Proceedings of the Service-Oriented Computing, 2004

Scheduling workflow applications in GrADS.

[BibT_eX]

[DOI]

Bo Liu

Proceedings of the 4th IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid 2004), 2004

2003

CODELAB: A Developers' Tool for Efficient Code Generation and Optimization.

[BibT_eX]

[DOI]

Proceedings of the Computational Science - ICCS 2003, 2003

2002

Toward a Framework for Preparing and Executing Adaptive Grid Programs.

[BibT_eX]

[DOI]

Ken Kennedy

Mark Mazina

Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002

2001

Telescoping Languages: A Strategy for Automatic Generation of Scientific Problem-Solving Systems from Annotated Libraries.

[BibT_eX]

[DOI]

Linda Torczon

J. Parallel Distributed Comput., 2001

The GrADS Project: Software Support for High-Level Grid Application Development.

[BibT_eX]

[DOI]

Daniel A. Reed

Linda Torczon

Richard Wolski

Int. J. High Perform. Comput. Appl., 2001

2000

HPFBench: a high performance Fortran benchmark suite.

[BibT_eX]

[DOI]

ACM Trans. Math. Softw., 2000

An adaptive software library for fast Fourier transforms.

[BibT_eX]

[DOI]

Rishad Mahasoom

Proceedings of the 14th international conference on Supercomputing, 2000

1999

Some Metacomputing Experiences for Scientific Applications.

[BibT_eX]

[DOI]

Olle Larsson

Michael Feig

Parallel Process. Lett., 1999

Large scale distributed data repository: design of a molecular dynamics trajectory database.

[BibT_eX]

[DOI]

Michael Feig

Matin Abdullah

B. Montgomery Pettitt

Future Gener. Comput. Syst., 1999

1997

Hierarchical Load Balancing for Parallel Fast Legendre Transforms.

[BibT_eX]

Nadia Shalaby

Proceedings of the Eighth SIAM Conference on Parallel Processing for Scientific Computing, 1997

A Data-Parallel Implementation of the Geometric Partitioning Algorithm.

[BibT_eX]

Shang-Hua Teng

Proceedings of the Eighth SIAM Conference on Parallel Processing for Scientific Computing, 1997

A Data-Parallel Adaptive N-body Method.

[BibT_eX]

Shang-Hua Teng

Proceedings of the Eighth SIAM Conference on Parallel Processing for Scientific Computing, 1997

On the Accuracy of Anderson's Fast N-body Algorithm.

[BibT_eX]

Proceedings of the Eighth SIAM Conference on Parallel Processing for Scientific Computing, 1997

High Performance FORTRAN for Highly Unstructured Problems.

[BibT_eX]

[DOI]

Shang-Hua Teng

Proceedings of the Sixth ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPOPP), 1997

DPF: A Data Parallel Fortran Benchmark Suite.

[BibT_eX]

[DOI]

Proceedings of the 11th International Parallel Processing Symposium (IPPS '97), 1997

1996

Implementing O(N) N-Body Algorithms Efficiently in Data-Parallel Languages.

[BibT_eX]

[DOI]

Sci. Program., 1996

Local Basic Linear Algebra Subroutines (LBLAS) for the CM-5/5E.

[BibT_eX]

[DOI]

David Kramer

Int. J. High Perform. Comput. Appl., 1996

A Data-Parallel Implementation of Hierarchical N-Body Methods.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 1996

A Data-Parallel Implementation of O(N) Hierarchical N-Body Methods.

[BibT_eX]

[DOI]

Proceedings of the 1996 ACM/IEEE Conference on Supercomputing, 1996

1995

On the Conversion Between Binary Code and Binary-Reflected Gray Code on Binary Cubes.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 1995

All-to-All Communication on the Connection Machine CM-200.

[BibT_eX]

[DOI]

Sci. Program., 1995

ROMM Routing on Mesh and Torus Networks.

[BibT_eX]

[DOI]

Ted Nesson

Proceedings of the 7th Annual ACM Symposium on Parallel Algorithms and Architectures, 1995

1994

Index Transformation Algorithms in a Linear Algebra Framework.

[BibT_eX]

[DOI]

Alan Edelman

Steve Heller

IEEE Trans. Parallel Distributed Syst., 1994

POLYSHIFT Communications Software for the Connection Machine System CM-200.

[BibT_eX]

[DOI]

William George

Ralph G. Brickner

Sci. Program., 1994

Multiplication of Matrices of Arbitrary Shape on a Data Parallel Computer.

[BibT_eX]

[DOI]

Parallel Comput., 1994

Binary Cube Emulation of Butterfly Networks Encoded by Grad Code.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 1994

An Efficient Algorithms for Gray-to-Binary Permutation on Hypercubes.

[BibT_eX]

[DOI]

M. T. Raghunath

J. Parallel Distributed Comput., 1994

Embedding hyperpyramids into hypercubes.

[BibT_eX]

[DOI]

IBM J. Res. Dev., 1994

Optimal communication channel utilization for matrix transposition and related permutations on binary cubes.

[BibT_eX]

[DOI]

Discret. Appl. Math., 1994

ROMM Routing: A Class of Efficient Minimal Routing Algorithms.

[BibT_eX]

[DOI]

Ted Nesson

Proceedings of the Parallel Computer Routing and Communication, 1994

Scientific Software Libraries for Scalable.

[BibT_eX]

[DOI]

Proceedings of the Parallel Scientific Computing, First International Workshop, 1994

Mesh Decomposition and Communication Procedures for Finite Element Applications on the Connection Machine CM-5 System.

[BibT_eX]

[DOI]

Proceedings of the High-Performance Computing and Networking, 1994

1993

Block-Cyclic Dense Linear Algebra.

[BibT_eX]

[DOI]

Woody Lichtenstein

SIAM J. Sci. Comput., 1993

Minimizing the Communication Time for Matrix Multiplication on Multiprocessors.

[BibT_eX]

[DOI]

Parallel Comput., 1993

The Connection Machine Systems CM-5.

[BibT_eX]

[DOI]

Proceedings of the 5th Annual ACM Symposium on Parallel Algorithms and Architectures, 1993

1992

Cooley-Tukey FFT on the Connection Machine.

[BibT_eX]

[DOI]

Robert L. Krawitz

Parallel Comput., 1992

Generalized Shuffle Permutations on Boolean Cubes.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 1992

Local Basic Linear Algebra Subroutines (Lblas) for Distributed Memory Architectures and Languages With Array Syntax.

[BibT_eX]

[DOI]

Luis F. Ortiz

Int. J. High Perform. Comput. Appl., 1992

All-To-All Broadcast and Applications On the Connection Machine.

[BibT_eX]

[DOI]

Jean-Philippe Brunet

Int. J. High Perform. Comput. Appl., 1992

Massively Parallel Computing: Data Distribution and Communication.

[BibT_eX]

[DOI]

Proceedings of the Parallel Architectures and Their Efficient Use, 1992

1991

The Parallel Multipole Method on the Connection Machine.

[BibT_eX]

[DOI]

Feng Zhao

SIAM J. Sci. Comput., 1991

Performance Modeling of Distributed Memory Architectures.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 1991

1990

A dataparallel implementation of an explicit method for the three-dimensional compressible Navier-Stokes equations.

[BibT_eX]

[DOI]

Pelle Olsson

Parallel Comput., 1990

Embedding Meshes in Boolean Cubes by Graph Decomposition.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 1990

Embedding Three-Dimensional Meshes in Boolean Cubes by Graph Decomposition.

[BibT_eX]

Proceedings of the 1990 International Conference on Parallel Processing, 1990

1989

Optimum Broadcasting and Personalized Communication in Hypercubes.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 1989

The Finite Element Method on a Data Parallel Computing System.

[BibT_eX]

[DOI]

Int. J. High Speed Comput., 1989

Histogram Computation on Distributed Memory Architectures.

[BibT_eX]

[DOI]

Dimitris C. Gerogiannis

Stelios C. Orphanoudakis

Concurr. Pract. Exp., 1989

A study of dissipation operators for the euler equations and a three- dimensional channel flow.

[BibT_eX]

[DOI]

Pelle Olsson

Proceedings of the Proceedings Supercomputing '89, Reno, NV, USA, November 12-17, 1989, 1989

Element order and convergence rate of the conjugate gradient method for data parallel stress analysis.

[BibT_eX]

[DOI]

Proceedings of the Proceedings Supercomputing '89, Reno, NV, USA, November 12-17, 1989, 1989

A radix-2 FFT on connection machine.

[BibT_eX]

[DOI]

Proceedings of the Proceedings Supercomputing '89, Reno, NV, USA, November 12-17, 1989, 1989

Matrix multiplication on the connection machine.

[BibT_eX]

[DOI]

Tim Harris

Proceedings of the Proceedings Supercomputing '89, Reno, NV, USA, November 12-17, 1989, 1989

Dilation <i>d</i> embedding of a hyper-pyramid into a hypercube.

[BibT_eX]

[DOI]

Proceedings of the Proceedings Supercomputing '89, Reno, NV, USA, November 12-17, 1989, 1989

QCD with dynamical fermions on the connection machine.

[BibT_eX]

[DOI]

Proceedings of the Proceedings Supercomputing '89, Reno, NV, USA, November 12-17, 1989, 1989

Data Parallel Algorithms for the Finite Element Method.

[BibT_eX]

Proceedings of the Fourth SIAM Conference on Parallel Processing for Scientific Computing, 1989

Optimizing Tridiagonal Solvers for Alternating Direction Methods on Boolean Cube Multiprocessors.

[BibT_eX]

Proceedings of the Fourth SIAM Conference on Parallel Processing for Scientific Computing, 1989

1988

Expressing Boolean cube matrix algorithms in shared memory primitives.

[BibT_eX]

[DOI]

Proceedings of the Third Conference on Hypercube Concurrent Computers and Applications, 1988

Optimal algorithms for stable dimension permutations on Boolean cubes.

[BibT_eX]

[DOI]

Proceedings of the Third Conference on Hypercube Concurrent Computers and Applications, 1988

QED on the connection machine.

[BibT_eX]

[DOI]

Proceedings of the Third Conference on Hypercube Concurrent Computers and Applications, 1988

1987

Solving banded systems on a parallel processor.

[BibT_eX]

[DOI]

Jack J. Dongarra

Parallel Comput., 1987

Communication Efficient Basic Linear Algebra Computations on Hypercube Architectures.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 1987

The Communication Efficiency fo Meshes, Boolean Cubes and Cube Connected Cycles for Wafer Scale Integraton.

[BibT_eX]

Abhiram G. Ranade

Proceedings of the International Conference on Parallel Processing, 1987

Algorithms for Matrix Transposition on Boolean n-Cube Configured Ensemble Architectures.

[BibT_eX]

Proceedings of the International Conference on Parallel Processing, 1987

On the Embedding of Arbitrary Meshes in Boolean Cubes With Expansion Two Dilation Two.

[BibT_eX]

Proceedings of the International Conference on Parallel Processing, 1987

1986

Distributed Routing Algorithms for Broadcasting and Personalized Communication in Hypercubes.

[BibT_eX]

Proceedings of the International Conference on Parallel Processing, 1986

1985

Solving Narrow Banded Systems on Ensemble Architectures.

[BibT_eX]

[DOI]

ACM Trans. Math. Softw., 1985

Generation of layouts from MOS circuit schematics: a graph theoretic approach.

[BibT_eX]

[DOI]

Tak-Kwong Ng

Proceedings of the 22nd ACM/IEEE conference on Design automation, 1985

1983

The Tree Machine: An Evaluation of Strategies for Reducing Program Loading Time.

[BibT_eX]

Peyyun Peggy Li

Proceedings of the International Conference on Parallel Processing, 1983

1981

A Mathematical Approach to the Design of VLSI Networks for Real-Time Computation Problems.

[BibT_eX]

Danny Cohen