Akshay Venkatesh

According to our database1, Akshay Venkatesh authored at least 32 papers between 2002 and 2022.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2022
Cross-domain Variational Capsules for Information Extraction.
CoRR, 2022

2019
Derived Hecke Algebra for Weight One Forms.
Exp. Math., 2019

2017
MPI-GDS: High Performance MPI Designs with GPUDirect-aSync for CPU-GPU Control Flow Decoupling.
Proceedings of the 46th International Conference on Parallel Processing, 2017

2016
CUDA-Aware OpenSHMEM: Extensions and Designs for High Performance OpenSHMEM on GPU Clusters.
Parallel Comput., 2016

Efficient Reliability Support for Hardware Multicast-Based Broadcast in GPU-enabled Streaming Applications.
Proceedings of the First International Workshop on Communication Optimizations in HPC, 2016

Designing High Performance Heterogeneous Broadcast for Streaming Applications on GPU Clusters.
Proceedings of the 28th International Symposium on Computer Architecture and High Performance Computing, 2016

Efficient Large Message Broadcast using NCCL and CUDA-Aware MPI for Deep Learning.
Proceedings of the 23rd European MPI Users' Group Meeting, EuroMPI 2016, 2016

Exploiting Maximal Overlap for Non-Contiguous Data Movement Processing on Modern GPU-Enabled Systems.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

CUDA M3: Designing Efficient CUDA Managed Memory-Aware MPI by Exploiting GDR and IPC.
Proceedings of the 23rd IEEE International Conference on High Performance Computing, 2016

CUDA Kernel Based Collective Reduction Operations on Large-scale GPU Clusters.
Proceedings of the IEEE/ACM 16th International Symposium on Cluster, 2016

2015
Designing Non-blocking Personalized Collectives with Near Perfect Overlap for RDMA-Enabled Clusters.
Proceedings of the High Performance Computing - 30th International Conference, 2015

A case for application-oblivious energy-efficient MPI runtime.
Proceedings of the International Conference for High Performance Computing, 2015

GPU-Aware Design, Implementation, and Evaluation of Non-blocking Collective Benchmarks.
Proceedings of the 22nd European MPI Users' Group Meeting, 2015

Impact of InfiniBand DC Transport Protocol on Energy Consumption of All-to-All Collective Algorithms.
Proceedings of the 23rd IEEE Annual Symposium on High-Performance Interconnects, 2015

Offloaded GPU Collectives Using CORE-Direct and CUDA Capabilities on InfiniBand Clusters.
Proceedings of the 22nd IEEE International Conference on High Performance Computing, 2015

Exploiting GPUDirect RDMA in Designing High Performance OpenSHMEM for NVIDIA GPU Clusters.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

Power-Check: An Energy-Efficient Checkpointing Framework for HPC Clusters.
Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015

Non-Blocking PMI Extensions for Fast MPI Startup.
Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015

2014
Designing MPI Library with Dynamic Connected Transport (DCT) of InfiniBand: Early Experiences.
Proceedings of the Supercomputing - 29th International Conference, 2014

A Comprehensive Performance Evaluation of OpenSHMEM Libraries on InfiniBand Clusters.
Proceedings of the OpenSHMEM and Related Technologies. Experiences, Implementations, and Tools, 2014

High Performance Alltoall and Allgather Designs for InfiniBand MIC Clusters.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Optimizing Collective Communication in UPC.
Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014

MIC-Check: a distributed check pointing framework for the intel many integrated cores architecture.
Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing, 2014

A high performance broadcast design with hardware multicast and GPUDirect RDMA for streaming applications on Infiniband clusters.
Proceedings of the 21st International Conference on High Performance Computing, 2014

2013
MVAPICH-PRISM: a proxy-based communication framework using InfiniBand and SCIF for intel MIC clusters.
Proceedings of the International Conference for High Performance Computing, 2013

Evaluation of Energy Characteristics of MPI Communication Primitives with RAPL.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

Efficient Inter-node MPI Communication Using GPUDirect RDMA for InfiniBand Clusters with NVIDIA GPUs.
Proceedings of the 42nd International Conference on Parallel Processing, 2013

Designing Optimized MPI Broadcast and Allreduce for Many Integrated Core (MIC) InfiniBand Clusters.
Proceedings of the IEEE 21st Annual Symposium on High-Performance Interconnects, 2013

Efficient Intra-node Communication on Intel-MIC Clusters.
Proceedings of the 13th IEEE/ACM International Symposium on Cluster, 2013

2012
MPI-based parallel synchronous vector evaluated particle swarm optimization for multi-objective design optimization of composite structures.
Eng. Appl. Artif. Intell., 2012

OMB-GPU: A Micro-Benchmark Suite for Evaluating MPI Libraries on GPU Clusters.
Proceedings of the Recent Advances in the Message Passing Interface, 2012

2002
Finite locally-quasiprimitive graphs.
Discret. Math., 2002


  Loading...