Peter M. Kogge

According to our database1, Peter M. Kogge authored at least 103 papers between 1973 and 2021.

Collaborative distances:

Awards

IEEE Fellow

IEEE Fellow 1990, "For contributions to high-performance computing systems.".

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2021
Scalability of Streaming Anomaly Detection in an Unbounded Key Space Using Migrating Threads.
Proceedings of the High Performance Computing - 36th International Conference, 2021

Passel: Improved Scalability and Efficiency of Distributed SVM using a Cacheless PGAS Migrating Thread Architecture.
Proceedings of the 12th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2021

Greatly Accelerated Scaling of Streaming Problems with A Migrating Thread Architecture.
Proceedings of the 11th IEEE/ACM Workshop on Irregular Applications: Architectures and Algorithms, 2021

Deluge: Achieving Superior Efficiency, Throughput, and Scalability with Actor Based Streaming on Migrating Threads.
Proceedings of the 2021 IEEE High Performance Extreme Computing Conference, 2021

Exploring Strategies to Improve Locality Across Many-Core Affinities.
Proceedings of the Euro-Par 2021: Parallel Processing Workshops, 2021

Locality: The 3rd Wall and the Need for Innovation in Parallel Architectures.
Proceedings of the Architecture of Computing Systems - 34th International Conference, 2021

2020
Cache Oblivious Strategies to Exploit Multi-Level Memory on Manycore Systems.
Proceedings of the IEEE/ACM Workshop on Memory Centric High Performance Computing, 2020

Scalability of Sparse Matrix Dense Vector Multiply (SpMV) on a Migrating Thread Architecture.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

Machine Learning Algorithm Performance on the Lucata Computer.
Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020

Scalability of Streaming on Migrating Threads.
Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020

Implementing Sparse Linear Algebra Kernels on the Lucata Pathfinder-A Computer.
Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020

GrowHON: A Scalable Algorithm for Growing Higher-order Networks of Sequences.
Proceedings of the Complex Networks & Their Applications IX, 2020

2019
Application Performance of Physical System Simulations.
Proceedings of the Parallel Computing: Technology Trends, 2019

Scalability of Hybrid SpMV on Intel Xeon Phi Knights Landing.
Proceedings of the 17th International Conference on High Performance Computing & Simulation, 2019

Introducing Streaming into Linear Algebra-based Sparse Graph Algorithms.
Proceedings of the 17th International Conference on High Performance Computing & Simulation, 2019

Multi-threading Semantics for Highly Heterogeneous Systems Using Mobile Threads.
Proceedings of the 17th International Conference on High Performance Computing & Simulation, 2019

2018
Scalability of Hybrid Sparse Matrix Dense Vector (SpMV) Multiplication.
Proceedings of the 2018 International Conference on High Performance Computing & Simulation, 2018

Optimizing for KNL Usage Modes When Data Doesn't Fit in MCDRAM.
Proceedings of the 47th International Conference on Parallel Processing, 2018

Implementing the Jaccard Index on the Migratory Memory-Side Processing Emu Architecture.
Proceedings of the 2018 IEEE High Performance Extreme Computing Conference, 2018

2017
A Case for Migrating Execution for Irregular Applications.
Proceedings of the Seventh Workshop on Irregular Applications: Architectures and Algorithms, 2017

Graph Analytics: Complexity, Scalability, and Architectures.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

2016
Highly Scalable Near Memory Processing with Migrating Threads on the Emu System Architecture.
Proceedings of the 6th Workshop on Irregular Applications: Architecture and Algorithms, 2016

ParLearning 2016 Keynote.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

Jaccard Coefficients as a Potential Graph Benchmark.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

Molecular cellular networks: A non von Neumann architecture for molecular electronics.
Proceedings of the IEEE International Conference on Rebooting Computing, 2016

2015
Updating the Energy Model for Future Exascale Systems.
Proceedings of the High Performance Computing - 30th International Conference, 2015

2014
Reading the Tea-Leaves: How Architecture Has Evolved at the High End.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

2013
Energy-efficient multithreading for a hierarchical heterogeneous multicore through locality-cognizant thread generation.
J. Parallel Distributed Comput., 2013

Exascale Computing Trends: Adjusting to the "New Normal"' for Computer Architecture.
Comput. Sci. Eng., 2013

Comparative performance analysis of a Big Data NORA problem on a variety of architectures.
Proceedings of the 2013 International Conference on Collaboration Technologies and Systems, 2013

Big data, deep data, and the effect of system architectures on performance.
Proceedings of the 2013 International Conference on Collaboration Technologies and Systems, 2013

2011
Lightweight Chip Multi-Threading (LCMT): Maximizing Fine-Grained Parallelism On-Chip.
IEEE Trans. Parallel Distributed Syst., 2011

Using the TOP500 to trace and project technology and architecture trends.
Proceedings of the Conference on High Performance Computing Networking, 2011

Recomposing an Irregular Algorithm Using a Novel Low-Level PGAS Model.
Proceedings of the 2011 International Conference on Parallel Processing Workshops, 2011

2010
Introducing mNUMA: an extended PGAS architecture.
Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model, 2010

High throughput and low power dissipation in QCA pipelines using Bennett clocking.
Proceedings of the 2010 IEEE/ACM International Symposium on Nanoscale Architectures, 2010

Facing the Exascale Energy Wall.
Proceedings of the International Workshop on Innovative Architecture for Future Generation High Performance Processors and Systems, 2010

Exploring the Possible Past Futures of a Single Part Type Multi-core PIM Chip.
Proceedings of the International Workshop on Innovative Architecture for Future Generation High Performance Processors and Systems, 2010

Modeling bounds on migration overhead for a traveling thread architecture.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Models for generating locality-tuned traveling threads for a hierarchical multi-level heterogeneous multicore.
Proceedings of the 7th Conference on Computing Frontiers, 2010

2009
Analyzing the Inherent Reliability of Moderately Sized Magnetic and Electrostatic QCA Circuits Via Probabilistic Transfer Matrices.
IEEE Trans. Very Large Scale Integr. Syst., 2009

Organizing wires for reliability in magnetic QCA.
ACM J. Emerg. Technol. Comput. Syst., 2009

The Challenges of Petascale Architectures.
Comput. Sci. Eng., 2009

2008
Memory model effects on application performance for a lightweight multithreaded architecture.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

System Reliabilities When Using Triple Modular Redundancy in Quantum-Dot Cellular Automata.
Proceedings of the 23rd IEEE International Symposium on Defect and Fault-Tolerance in VLSI Systems (DFT 2008), 2008

Design of a mask-programmable memory/multiplier array using G4-FET technology.
Proceedings of the 45th Design Automation Conference, 2008

2007
On the Memory Access Patterns of Supercomputer Applications: Benchmark Selection and Its Implications.
IEEE Trans. Computers, 2007

Evaluating synchronization techniques for light-weight multithreaded/multicore architectures.
Proceedings of the SPAA 2007: Proceedings of the 19th Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2007

A Heterogeneous Lightweight Multithreaded Architecture.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Probabilistic Analysis of a Molecular Quantum-Dot Cellular Automata Adder.
Proceedings of the 22nd IEEE International Symposium on Defect and Fault-Tolerance in VLSI Systems (DFT 2007), 2007

General floorplan for reversible quantum-dot cellular automata.
Proceedings of the 4th Conference on Computing Frontiers, 2007

2006
Multi-core issues - Multi-Core for HPC: breakthrough or breakdown?
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Poster reception - The structural simulation toolkit: exploring novel architectures.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

M06 - Issues for the future of supercomputing: impact of Moore's law and architecture on application performance.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Facing up to the Inevitable: Intelligent Error Recovery in Massively Parallel Processing in Memory Architectures.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications & Conference on Real-Time Computing Systems and Applications, 2006

Fine-Grained Message Pipelining for Improved MPI Performance.
Proceedings of the 2006 IEEE International Conference on Cluster Computing, 2006

2005
Polygonal path simplification with angle constraints.
Comput. Geom., 2005

Generation of permutations for SIMD processors.
Proceedings of the 2005 ACM SIGPLAN/SIGBED Conference on Languages, 2005

The implications of working set analysis on supercomputing memory hierarchy design.
Proceedings of the 19th Annual International Conference on Supercomputing, 2005

Reversible computation with quantum-dot cellular automata (QCA).
Proceedings of the Second Conference on Computing Frontiers, 2005

2004
A low cost, multithreaded processing-in-memory system.
Proceedings of the 3rd Workshop on Memory Performance Issues, 2004

The "4-Diamond Circuit" - A Minimally Complex Nano-Scale Computational Building Block in QCA.
Proceedings of the 2004 IEEE Computer Society Annual Symposium on VLSI (ISVLSI 2004), 2004

Cache implications of aggressively pipelined high performance microprocessors.
Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software, 2004

Characterizing a new class of threads in scientific applications for high end supercomputers.
Proceedings of the 18th Annual International Conference on Supercomputing, 2004

Using Circuits and Systems-Level Research to Drive Nanotechnology.
Proceedings of the 22nd IEEE International Conference on Computer Design: VLSI in Computers & Processors (ICCD 2004), 2004

Quantum-Dot Cellular Automata (QCA) circuit partitioning: problem modeling and solutions.
Proceedings of the 41th Design Automation Conference, 2004

2003
Energy-efficient issue queue design.
IEEE Trans. Very Large Scale Integr. Syst., 2003

From Bits to Chips: A Multidisciplinary Curriculum for Microelectronics System Design Education.
Proceedings of the 2003 International Conference on Microelectronics Systems Education, 2003

Bouncing Threads: Merging a New Execution Model into a Nanotechnology Memory.
Proceedings of the 2003 IEEE Computer Society Annual Symposium on VLSI (ISVLSI 2003), 2003

The State of State.
Proceedings of the Ninth International Symposium on High-Performance Computer Architecture (HPCA'03), 2003

Implications of a PIM Architectural Model for MPI.
Proceedings of the 2003 IEEE International Conference on Cluster Computing (CLUSTER 2003), 2003

2002
Teaching students computer architecture for new, nanotechnologies.
Proceedings of the 2002 workshop on Computer architecture education, 2002

2001
Inherently Lower-Power High-Performance Superscalar Architectures.
IEEE Trans. Computers, 2001

Problems in designing with QCAs: Layout = Timing.
Int. J. Circuit Theory Appl., 2001

Polygonal path approximation with angle constraints.
Proceedings of the Twelfth Annual Symposium on Discrete Algorithms, 2001

Petaflop Computing for Protein Folding.
Proceedings of the Tenth SIAM Conference on Parallel Processing for Scientific Computing, 2001

Energy: efficient instruction dispatch buffer design for superscalar processors.
Proceedings of the 2001 International Symposium on Low Power Electronics and Design, 2001

Exploring and exploiting wire-level pipelining in emerging technologies.
Proceedings of the 28th Annual International Symposium on Computer Architecture, 2001

A Microserver View of HTMT.
Proceedings of the 15th International Parallel & Distributed Processing Symposium (IPDPS-01), 2001

2000
Optimization of high-performance superscalar architectures for energy efficiency.
Proceedings of the 2000 International Symposium on Low Power Electronics and Design, 2000

The Characterization of Data Intensive Memory Workloads on Distributed PIM Systems.
Proceedings of the Intelligent Memory Systems, Second International Workshop, 2000

A design of and design tools for a novel quantum dot based microprocessor.
Proceedings of the 37th Conference on Design Automation, 2000

1999
Application of STD to latch-power estimation.
IEEE Trans. Very Large Scale Integr. Syst., 1999

Accelerating object-oriented applications using method lookup caches and register windowing.
J. Syst. Archit., 1999

Mapping Irregular Applications to DIVA, a PIM-based Data-Intensive Architecture.
Proceedings of the ACM/IEEE Conference on Supercomputing, 1999

Microservers: a new memory semantics for massively parallel computing.
Proceedings of the 13th international conference on Supercomputing, 1999

Logic in wire: using quantum dots to implement a microprocessor.
Proceedings of the 6th IEEE International Conference on Electronics, Circuits and Systems, 1999

Prototyping Execution Models for HTMT Petaflop Machine in Java.
Proceedings of the Network-Based Parallel Computing: Communication, 1999

1998
The energy complexity of register files.
Proceedings of the 1998 International Symposium on Low Power Electronics and Design, 1998

Optimizing Data Scheduling on Processor-in-Memory Arrays.
Proceedings of the 12th International Parallel Processing Symposium / 9th Symposium on Parallel and Distributed Processing (IPPS/SPDP '98), March 30, 1998

1996
A parallel processing chip with embedded DRAM macros.
IEEE J. Solid State Circuits, 1996

Using Method Lookup Caches and Register Windowing to Speed Up Dynamically-Bound Object-Oriented Applications.
Proceedings of the 22rd EUROMICRO Conference '96, 1996

1995
Combined DRAM and logic chip for massively parallel systems.
Proceedings of the 16th Conference on Advanced Research in VLSI (ARVLSI '95), 1995

1994
Preface.
IBM J. Res. Dev., 1994

EXECUBE - A New Architecture for Scalable MPPs.
Proceedings of the 1994 International Conference on Parallel Processing, 1994

1992
Declarative Computing: A Technology Driver.
Proceedings of the Architektur von Rechensystemen, 1992

1988
VLSI and rule-based systems.
SIGARCH Comput. Archit. News, 1988

1985
Function-based computing and parallelism: A review.
Parallel Comput., 1985

1982
Am Architectural Trail to Threaded-Code Systems.
Computer, 1982

1977
The Microprogramming of Pipelined Processors.
Proceedings of the 4th Annual Symposium on Computer Architecture, 1977

1974
Parallel Solution of Recurrence Problems.
IBM J. Res. Dev., 1974

1973
A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations.
IEEE Trans. Computers, 1973

Maximal Rate Pipelined Solutions to Recurrance Problems.
Proceedings of the 1st Annual Symposium on Computer Architecture, 1973


  Loading...