John Kim

Orcid: 0000-0003-3958-3891

According to our database1, John Kim authored at least 144 papers between 1999 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Scalability Limitations of Processing-in-Memory using Real System Evaluations.
Proc. ACM Meas. Anal. Comput. Syst., 2024

2023
Introduction to the Special Issue on Next-Generation On-Chip and Off-Chip Communication Architectures for Edge, Cloud and HPC.
ACM J. Emerg. Technol. Comput. Syst., October, 2023

Accelerating Finite Field Arithmetic for Homomorphic Encryption on GPUs.
IEEE Micro, 2023

Special Issue on Emerging System Interconnects.
IEEE Micro, 2023

GME: GPU-based Microarchitectural Extensions to Accelerate Homomorphic Encryption.
CoRR, 2023

HiHGNN: Accelerating HGNNs through Parallelism and Data Reusability Exploitation.
CoRR, 2023

Hera: A Heterogeneity-Aware Multi-Tenant Inference Server for Personalized Recommendations.
CoRR, 2023

Dielectric Sensing using T-matched RAIN RFID Tags.
Proceedings of the IEEE International Conference on RFID, 2023

Session details: Architectures & System Software.
Proceedings of the 2023 International Conference on Research in Adaptive and Convergent Systems, 2023

GME: GPU-based Microarchitectural Extensions to Accelerate Homomorphic Encryption.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

Strix: An End-to-End Streaming Architecture with Two-Level Ciphertext Batching for Fully Homomorphic Encryption with Programmable Bootstrapping.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

Decoupled SSD: Rethinking SSD Architecture through Network-based Flash Controllers.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

Logical/Physical Topology-Aware Collective Communication in Deep Learning Training.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

VVQ: Virtualizing Virtual Channel for Cost-Efficient Protocol Deadlock Avoidance.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

The Case for Domain-Specific Networks.
Proceedings of the IEEE Symposium on High-Performance Interconnects, 2023


2022
Hybrid Memory Buffer Microarchitecture for High-Radix Routers.
IEEE Trans. Computers, 2022

Accelerating Polynomial Multiplication for Homomorphic Encryption on GPUs.
Proceedings of the 2022 IEEE International Symposium on Secure and Private Execution Environment Design (SEED), 2022

ARK: Fully Homomorphic Encryption Accelerator with Runtime Data Generation and Inter-Operation Key Reuse.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

Networked SSD: Flash Memory Interconnection Network for High-Bandwidth SSD.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

BTS: an accelerator for bootstrappable fully homomorphic encryption.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

Dynamic global adaptive routing in high-radix networks.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

A software-defined tensor streaming multiprocessor for large-scale machine learning.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

The Groq Software-defined Scale-out Tensor Streaming Multiprocessor : From chips-to-systems architectural overview.
Proceedings of the 2022 IEEE Hot Chips 34 Symposium, 2022

Challenges/Opportunities to Enable Dependable Scale-out System with Groq Deterministic Tensor-Streaming Processors.
Proceedings of the 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2022

Answer Fast: Accelerating BERT on the Tensor Streaming Processor.
Proceedings of the 33rd IEEE International Conference on Application-specific Systems, 2022

NaviSim: A Highly Accurate GPU Simulator for AMD RDNA GPUs.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2022

2021
Decoupled SSD: Reducing Data Movement on NAND-Based Flash SSD.
IEEE Comput. Archit. Lett., 2021

The Case for Dynamic Bias in Global Adaptive Routing.
IEEE Comput. Archit. Lett., 2021

Network-on-Chip Microarchitecture-based Covert Channel in GPUs.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

GNNMark: A Benchmark Suite to Characterize Graph Neural Network Training on GPUs.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2021

Ghost Routing to Enable Oblivious Computation on Memory-centric Networks.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

BoomGate: Deadlock Avoidance in Non-Minimal Routing for High-Radix Networks.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

Trident: A Hybrid Correlation-Collision GPU Cache Timing Attack for AES Key Recovery.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

Design for Health System Resilience in Challenging Times: A Framework for Remote Cancer Care Through Community Codesign.
Proceedings of the AMIA 2021, American Medical Informatics Association Annual Symposium, San Diego, CA, USA, October 30, 2021, 2021

2020
MGPU-TSM: A Multi-GPU System with Truly Shared Memory.
CoRR, 2020

HALCONE : A Hardware-Level Timestamp-based Cache Coherence Scheme for Multi-GPU systems.
CoRR, 2020

Deep Learning Training in Facebook Data Centers: Design of Scale-up and Scale-out Systems.
CoRR, 2020

Bodeum: Encouraging Working Parents to Provide Emotional Support for Stay-at-Home Parents in Korea.
Proceedings of the PervasiveHealth '20: 14th EAI International Conference on Pervasive Computing Technologies for Healthcare, 2020

Non-invasive prediction of lymph node risk in oral cavity cancer patients using a combination of supervised and unsupervised machine learning algorithms.
Proceedings of the Medical Imaging 2020: Biomedical Applications in Molecular, 2020

Griffin: Hardware-Software Support for Efficient Page Migration in Multi-GPU Systems.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020

Navigator: Dynamic Multi-kernel Scheduling to Improve GPU Performance.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units.
Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020

Bandwidth Bottleneck in Network-on-Chip for High-Throughput Processors.
Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

Valkyrie: Leveraging Inter-TLB Locality to Enhance GPU Performance.
Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

2019
Towards Interpersonal Assistants: Next-Generation Conversational Agents.
IEEE Pervasive Comput., 2019

Practical and efficient incremental adaptive routing for HyperX networks.
Proceedings of the International Conference for High Performance Computing, 2019

Analysis of application installation logs on Android systems.
Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing, 2019

Ghost routers: energy-efficient asymmetric multicore processors with symmetric NoCs.
Proceedings of the 13th IEEE/ACM International Symposium on Networks-on-Chip, 2019

MGPUSim: enabling multi-GPU performance modeling and optimization.
Proceedings of the 46th International Symposium on Computer Architecture, 2019

DeepHiR: improving high-radix router throughput with deep hybrid memory buffer microarchitecture.
Proceedings of the ACM International Conference on Supercomputing, 2019

A Case for Software-Based Adaptive Routing in NUMA Systems.
Proceedings of the 37th IEEE International Conference on Computer Design, 2019

A Novel Covert Channel Attack Using Memory Encryption Engine Cache.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

LYTNet: A Convolutional Neural Network for Real-Time Pedestrian Traffic Lights and Zebra Crossing Recognition for the Visually Impaired.
Proceedings of the Computer Analysis of Images and Patterns, 2019

Enforcing Last-Level Cache Partitioning through Memory Virtual Channels.
Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques, 2019

2018
MGSim + MGMark: A Framework for Multi-GPU System Research.
CoRR, 2018

A Cauchy-Davenport Theorem for Linear Maps.
Comb., 2018

A SSLBP-based feature extraction framework to detect bones from knee MRI scans.
Proceedings of the 2018 Conference on Research in Adaptive and Convergent Systems, 2018

Multi-dimensional Parallel Training of Winograd Layer on Memory-Centric Architecture.
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

SuperSim: Extensible Flit-Level Simulation of Large-Scale Interconnection Networks.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2018

TCEP: Traffic Consolidation for Energy-Proportional High-Radix Networks.
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

Emotion Recognition from Human Speech Using Temporal Information and Deep Learning.
Proceedings of the Interspeech 2018, 2018

Profiling DNN Workloads on a Volta-based DGX-1 System.
Proceedings of the 2018 IEEE International Symposium on Workload Characterization, 2018

BebeCODE: Collaborative Child Development Tracking System.
Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, 2018

2017
Evaluation of Performance Unfairness in NUMA System Architecture.
IEEE Comput. Archit. Lett., 2017

Footprint: Regulating Routing Adaptiveness in Networks-on-Chip.
Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

PlayBetter: A Phone-based Baby Play Support System for Childcare Bystander Parents.
Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, 2017

Itchtector: A Wearable-based Mobile System for Managing Itching Conditions.
Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, 2017

History-Based Arbitration for Fairness in Processor-Interconnect of NUMA Servers.
Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017

2016
Design and Analysis of Hybrid Flow Control for Hierarchical Ring Network-on-Chip.
IEEE Trans. Computers, 2016

UMH: A Hardware-Based Unified Memory Hierarchy for Systems with Multiple Discrete GPUs.
ACM Trans. Archit. Code Optim., 2016

A high-order multi-zone cut-stencil method for numerical simulations of high-speed flows over complex geometries.
J. Comput. Phys., 2016

PIkit: A New Kernel-Independent Processor-Interconnect Rootkit.
Proceedings of the 25th USENIX Security Symposium, 2016

Optimized multilayer perceptron using dynamic learning rate based microwave tomography breast cancer screening.
Proceedings of the 31st Annual ACM Symposium on Applied Computing, 2016

Contention-based congestion management in large-scale networks.
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

Adaptive and flexible key-value stores through soft data partitioning.
Proceedings of the 34th IEEE International Conference on Computer Design, 2016

TalkLIME: mobile system intervention to improve parent-child interaction for children with language delay.
Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing, 2016

iPAWS: Instruction-issue pattern-based adaptive warp scheduling for GPGPUs.
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

Automatically Exploiting Implicit Pipeline Parallelism from Multiple Dependent Kernels for GPUs.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

Accelerating Linked-list Traversal Through Near-Data Processing.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015
Delay-constrained scheduling for providing guaranteed QoS in a virtual machine environment.
Proceedings of the 2015 Conference on research in adaptive and convergent systems, 2015

Overcoming far-end congestion in large-scale networks.
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

Mobile System Design for Scratch Recognition.
Proceedings of the 33rd Annual ACM Conference Extended Abstracts on Human Factors in Computing Systems, 2015

Lexical Representation of Emotions for High Functioning Autism(HFA) via Emotional Story Intervention using Smart Media.
Proceedings of the 33rd Annual ACM Conference Extended Abstracts on Human Factors in Computing Systems, 2015

2014
Mutually Aware Prefetcher and On-Chip Network Designs for Multi-Cores.
IEEE Trans. Computers, 2014

Low-Overhead Network-on-Chip Support for Location-Oblivious Task Placement.
IEEE Trans. Computers, 2014

Innovative practices session 3C: Solving today's test challenges.
Proceedings of the 32nd IEEE VLSI Test Symposium, 2014

Microbank: Architecting Through-Silicon Interposer-Based Main Memory Systems.
Proceedings of the International Conference for High Performance Computing, 2014

The optimized grouping value for precise similarity comparison of dynamic birthmark.
Proceedings of the 2014 Conference on Research in Adaptive and Convergent Systems, 2014

Extending bufferless on-chip networks to high-throughput workloads.
Proceedings of the Eighth IEEE/ACM International Symposium on Networks-on-Chip, 2014

Multi-GPU System Design with Memory Networks.
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

Galaxy: a high-performance energy-efficient multi-chip architecture using photonic interconnects.
Proceedings of the 2014 International Conference on Supercomputing, 2014

Robot-based augmentative and alternative communication for nonverbal children with communication disorders.
Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing, 2014

Improving GPGPU resource utilization through alternative thread block scheduling.
Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, 2014

Transportation-network-inspired network-on-chip.
Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, 2014

Energy-efficient scheduling for memory-intensive GPGPU workloads.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014

TalkBetter: family-driven mobile intervention care for children with language delay.
Proceedings of the Computer Supported Cooperative Work, 2014

Security Vulnerability in Processor-Interconnect Router Design.
Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, 2014

2013
Scheduling in Heterogeneous Computing Environments for Proximity Queries.
IEEE Trans. Vis. Comput. Graph., 2013

Designing on-chip networks for throughput accelerators.
ACM Trans. Archit. Code Optim., 2013

Scalable high-radix router microarchitecture using a network switch organization.
ACM Trans. Archit. Code Optim., 2013

Clumsy Flow Control for High-Throughput Bufferless On-Chip Networks.
IEEE Comput. Archit. Lett., 2013

Hidden view game: designing human computation games to update maps and street views.
Proceedings of the 22nd International World Wide Web Conference, 2013

LOX Framework: Designing Human Computation Games to Update Street Views.
Proceedings of the Mobile Computing, Applications, and Services, 2013

A detailed and flexible cycle-accurate Network-on-Chip simulator.
Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2013

Memory-centric system interconnect design with Hybrid Memory Cubes.
Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013

2012
Exploiting New Interconnect Technologies in On-Chip Communication.
IEEE J. Emerg. Sel. Topics Circuits Syst., 2012

Guest Editorial New Interconnect Technologies in On-Chip Communication.
IEEE J. Emerg. Sel. Topics Circuits Syst., 2012

An enterprise-wide approach to GIG internet protocol multicast.
Proceedings of the 31st IEEE Military Communications Conference, 2012

Providing cost-effective on-chip network bandwidth in GPGPUs.
Proceedings of the 30th International IEEE Conference on Computer Design, 2012

Network within a network approach to create a scalable high-radix router microarchitecture.
Proceedings of the 18th IEEE International Symposium on High Performance Computer Architecture, 2012

Scalable on-chip network in power constrained manycore processors.
Proceedings of the 2012 International Green Computing Conference, 2012

What makes users rate (share, tag, edit...)?: predicting patterns of participation in online communities.
Proceedings of the CSCW '12 Computer Supported Cooperative Work, 2012

2011
High Performance Datacenter Networks: Architectures, Algorithms, and Opportunities
Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, ISBN: 978-3-031-01730-8, 2011

FeatherWeight: low-cost optical arbitration with QoS support.
Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

Leveraging torus topology with deadlock recovery for cost-efficient on-chip network.
Proceedings of the IEEE 29th International Conference on Computer Design, 2011

FlexiBuffer: reducing leakage power in on-chip network routers.
Proceedings of the 48th Design Automation Conference, 2011

Exploiting Mutual Awareness between Prefetchers and On-chip Networks in Multi-cores.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

An Alternative Memory Access Scheduling in Manycore Accelerators.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

2010
On-Chip Network Evaluation Framework.
Proceedings of the Conference on High Performance Computing Networking, 2010

Probabilistic Distance-Based Arbitration: Providing Equality of Service for Many-Core CMPs.
Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

Throughput-Effective On-Chip Networks for Manycore Accelerators.
Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

FlexiShare: Channel sharing for an energy-efficient nanophotonic crossbar.
Proceedings of the 16th International Conference on High-Performance Computer Architecture (HPCA-16 2010), 2010

Approximating age-based arbitration in on-chip networks.
Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

On-chip network design considerations for compute accelerators.
Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

2009
Cost-Efficient Dragonfly Topology for Large-Scale Systems.
IEEE Micro, 2009

HPCCD: Hybrid Parallel Continuous Collision Detection using CPUs and GPUs.
Comput. Graph. Forum, 2009

Exploring concentration and channel slicing in on-chip network router.
Proceedings of the Third International Symposium on Networks-on-Chips, 2009

Router microarchitecture and scalability of ring topology in on-chip networks.
Proceedings of the Second International Workshop on Network on Chip Architectures, 2009

Low-cost router microarchitecture for on-chip networks.
Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

Analyzing the impact of on-chip network traffic on program phases for CMPs.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2009

Firefly: illuminating future network-on-chip with nanophotonics.
Proceedings of the 36th International Symposium on Computer Architecture (ISCA 2009), 2009

Indirect adaptive routing on large scale interconnection networks.
Proceedings of the 36th International Symposium on Computer Architecture (ISCA 2009), 2009

Achieving predictable performance through better memory controller placement in many-core CMPs.
Proceedings of the 36th International Symposium on Computer Architecture (ISCA 2009), 2009

2008
Technology-Driven, Highly-Scalable Dragonfly Topology.
Proceedings of the 35th International Symposium on Computer Architecture (ISCA 2008), 2008

2007
Flattened Butterfly Topology for On-Chip Networks.
IEEE Comput. Archit. Lett., 2007

Orthogonal Organized Finite State Machine Application to Sensor Acquired Information.
Proceedings of the Parallel Computing Technologies, 2007

Flattened butterfly: a cost-efficient topology for high-radix networks.
Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), 2007

Design of Interconnection Networks.
Proceedings of the 15th Annual IEEE Symposium on High-Performance Interconnects, 2007

2006
Interconnect routing and scheduling - Adaptive routing in high-radix clos network.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

The BlackWidow High-Radix Clos Network.
Proceedings of the 33rd International Symposium on Computer Architecture (ISCA 2006), 2006

A Modeling and Similarity Measure Function for Multiple Trajectories in Moving Databases.
Proceedings of the Computational Science and Its Applications, 2006

2005
Microarchitecture of a High-Radix Router.
Proceedings of the 32st International Symposium on Computer Architecture (ISCA 2005), 2005

1999
Performance Metering of Distributed Access Using Java Servlets.
Proceedings of the Advances in Databases and Information Systems, 1999


  Loading...