Manjunath Kudlur

According to our database1, Manjunath Kudlur authored at least 22 papers between 2004 and 2018.

Collaborative distances:



In proceedings 
PhD thesis 




Dynamic control flow in large-scale machine learning.
Proceedings of the Thirteenth EuroSys Conference, 2018

Exploring the structure of a real-time, arbitrary neural artistic stylization network.
Proceedings of the British Machine Vision Conference 2017, 2017


CUDA: Compiling and optimizing for a GPU platform.
Proceedings of the International Conference on Computational Science, 2012

Designing a unified programming model for heterogeneous machines.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

MacroSS: macro-SIMDization of streaming applications.
Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, 2010

The theory of deadlock avoidance via discrete control.
Proceedings of the 36th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2009

Bridging the computation gap between programmable processors and hardwired accelerators.
Proceedings of the 15th International Conference on High-Performance Computer Architecture (HPCA-15 2009), 2009

Flextream: Adaptive Compilation of Streaming Applications for Heterogeneous Architectures.
Proceedings of the PACT 2009, 2009

Orchestrating the execution of stream programs on multicore platforms.
Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation, 2008

Gadara: Dynamic Deadlock Avoidance for Multithreaded Programs.
Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation, 2008

Modulo scheduling for highly customized datapaths to increase hardware reusability.
Proceedings of the Sixth International Symposium on Code Generation and Optimization (CGO 2008), 2008

Optimus: efficient realization of streaming applications on FPGAs.
Proceedings of the 2008 International Conference on Compilers, 2008

Hierarchical coarse-grained stream compilation for software defined radio.
Proceedings of the 2007 International Conference on Compilers, 2007

Streamroller: : automatic synthesis of prescribed throughput accelerator pipelines.
Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis, 2006

Increasing hardware efficiency with multifunction loop accelerators.
Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis, 2006

Modulo graph embedding: mapping applications onto coarse-grained reconfigurable architectures.
Proceedings of the 2006 International Conference on Compilers, 2006

Cost Sensitive Modulo Scheduling in a Loop Accelerator Synthesis System.
Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-38 2005), 2005

Performance analysis of methods that overcome false sharing effects in software DSMs.
J. Parallel Distrib. Comput., 2004

Application-Specific Processing on a General-Purpose Core via Transparent Instruction Set Customization.
Proceedings of the 37th Annual International Symposium on Microarchitecture (MICRO-37 2004), 2004

FLASH: Foresighted Latency-Aware Scheduling Heuristic for Processors with Customized Datapaths.
Proceedings of the 2nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2004), 2004

Automatic Synthesis of Customized Local Memories for Multicluster Application Accelerators.
Proceedings of the 15th IEEE International Conference on Application-Specific Systems, 2004