Srimat T. Chakradhar

According to our database1, Srimat T. Chakradhar
  • authored at least 161 papers between 1988 and 2017.
  • has a "Dijkstra number"2 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Other 

Links

On csauthors.net:

Bibliography

2017
Accelerating deep neural network training with inconsistent stochastic gradient descent.
Neural Networks, 2017

2016
Accelerating Deep Neural Network Training with Inconsistent Stochastic Gradient Descent.
CoRR, 2016

Optimizing Memory Efficiency for Deep Convolutional Neural Networks on GPUs.
CoRR, 2016

Optimizing memory efficiency for deep convolutional neural networks on GPUs.
Proceedings of the International Conference for High Performance Computing, 2016

HppCnn: A High-Performance, Portable Deep-Learning Library for GPGPUs.
Proceedings of the 45th International Conference on Parallel Processing, 2016

2015
Automatic and Efficient Data Host-Device Communication for Many-Core Coprocessors.
Proceedings of the Languages and Compilers for Parallel Computing, 2015

Computing approximately, and efficiently.
Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015

Approximate computing and the quest for computing efficiency.
Proceedings of the 52nd Annual Design Automation Conference, 2015

2014
Scalable Effort Hardware Design.
IEEE Trans. VLSI Syst., 2014

ShuffleWatcher: Shuffle-aware Scheduling in Multi-tenant MapReduce Clusters.
Proceedings of the 2014 USENIX Annual Technical Conference, 2014

COMP: Compiler Optimizations for Manycore Processors.
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

A Coprocessor Sharing-Aware Scheduler for Xeon Phi-Based Compute Clusters.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Automating and optimizing data transfers for many-core coprocessors.
Proceedings of the 2014 International Conference on Supercomputing, 2014

GRapid: A compilation and runtime framework for rapid prototyping of graph applications on many-core processors.
Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems, 2014

Snapify: capturing snapshots of offload applications on xeon phi manycore processors.
Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing, 2014

Approximate computing for efficient information processing.
Proceedings of the 12th IEEE Symposium on Embedded Systems for Real-time Multimedia, 2014

2013
Managing the Quality vs. Efficiency Trade-off Using Dynamic Effort Scaling.
ACM Trans. Embedded Comput. Syst., 2013

Scheduling concurrent applications on a cluster of CPU-GPU nodes.
Future Generation Comp. Syst., 2013

Semi-automatic restructuring of offloadable tasks for many-core accelerators.
Proceedings of the International Conference for High Performance Computing, 2013

Quality programmable vector processors for approximate computing.
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

COSMIC: middleware for high performance and reliable multiprocessing on xeon phi coprocessors.
Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing, 2013

Analysis and characterization of inherent application resilience for approximate computing.
Proceedings of the 50th Annual Design Automation Conference 2013, 2013

Approximate computing: An integrated hardware approach.
Proceedings of the 2013 Asilomar Conference on Signals, 2013

2012
A Massively Parallel, Energy Efficient Programmable Accelerator for Learning and Classification.
TACO, 2012

ValuePack: value-based scheduling framework for CPU-GPU clusters.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

Automatic generation of software pipelines for heterogeneous parallel systems.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

Optimizing Data Warehousing Applications for GPUs Using Kernel Fusion/Fission.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

Apricot: an optimizing compiler and productivity tool for x86-compatible many-core coprocessors.
Proceedings of the International Conference on Supercomputing, 2012

Interference-driven resource management for GPU-based heterogeneous clusters.
Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, 2012

A virtual memory based runtime to support multi-tenancy in clusters with GPUs.
Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, 2012

PIC: Partitioned Iterative Convergence for Clusters.
Proceedings of the 2012 IEEE International Conference on Cluster Computing, 2012

Panacea: towards holistic optimization of MapReduce applications.
Proceedings of the 10th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2012

Scheduling Concurrent Applications on a Cluster of CPU-GPU Nodes.
Proceedings of the 12th IEEE/ACM International Symposium on Cluster, 2012

Tarazu: optimizing MapReduce on heterogeneous clusters.
Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems, 2012

2011
An Energy-Efficient Heterogeneous System for Embedded Learning and Classification.
Embedded Systems Letters, 2011

A parallel accelerator for semantic search.
Proceedings of the IEEE 9th Symposium on Application Specific Processors, 2011

MDR: performance model driven runtime for heterogeneous parallel platforms.
Proceedings of the 25th International Conference on Supercomputing, 2011, Tucson, AZ, USA, May 31, 2011

Energy-Aware Workload Consolidation on GPU.
Proceedings of the 2011 International Conference on Parallel Processing Workshops, 2011

Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework.
Proceedings of the 20th ACM International Symposium on High Performance Distributed Computing, 2011

Power management for heterogeneous clusters: An experimental study.
Proceedings of the 2011 International Green Computing Conference and Workshops, 2011

Dynamic effort scaling: managing the quality-efficiency tradeoff.
Proceedings of the 48th Design Automation Conference, 2011

Symphony: A Scheduler for Client-Server Applications on Coprocessor-Based Heterogeneous Clusters.
Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER), 2011

2010
High-performance operating system controlled online memory compression.
ACM Trans. Embedded Comput. Syst., 2010

Online memory compression for embedded systems.
ACM Trans. Embedded Comput. Syst., 2010

Data-aware scheduling of legacy kernels on heterogeneous platforms with distributed memory.
Proceedings of the SPAA 2010: Proceedings of the 22nd Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2010

A dynamically configurable coprocessor for convolutional neural networks.
Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010

Exploiting the forgiving nature of applications for scalable parallel execution.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Scalable effort hardware design: exploiting algorithmic resilience for energy efficiency.
Proceedings of the 47th Design Automation Conference, 2010

Best-effort computing: re-thinking parallel software and hardware.
Proceedings of the 47th Design Automation Conference, 2010

Best-effort semantic document search on GPUs.
Proceedings of 3rd Workshop on General Purpose Processing on Graphics Processing Units, 2010

A programmable parallel accelerator for learning and classification.
Proceedings of the 19th International Conference on Parallel Architecture and Compilation Techniques, 2010

2009
A framework for efficient and scalable execution of domain-specific templates on GPUs.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Best-effort parallel execution framework for Recognition and mining applications.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

A Massively Parallel FPGA-Based Coprocessor for Support Vector Machines.
Proceedings of the FCCM 2009, 2009

A Massively Parallel Coprocessor for Convolutional Neural Networks.
Proceedings of the 20th IEEE International Conference on Application-Specific Systems, 2009

2008
A Massively Parallel Digital Learning Processor.
Proceedings of the Advances in Neural Information Processing Systems 21, 2008

HERMES: A Software Architecture for Visibility and Control in Wireless Sensor Network Deployments.
Proceedings of the 7th International Conference on Information Processing in Sensor Networks, 2008

Efficient Software Architecture for IPSec Acceleration Using a Programmable Security Processor.
Proceedings of the Design, Automation and Test in Europe, 2008

2007
Exploring Software Partitions for Fast Security Processing on a Multiprocessor Mobile SoC.
IEEE Trans. VLSI Syst., 2007

Zero Cost Test Point Insertion Technique for Structured ASICs.
Proceedings of the 20th International Conference on VLSI Design (VLSI Design 2007), 2007

A low cost test data compression technique for high n-detection fault coverage.
Proceedings of the 2007 IEEE International Test Conference, 2007

A hybrid scheme for compacting test responses with unknown values.
Proceedings of the 2007 International Conference on Computer-Aided Design, 2007

Unknown blocking scheme for low control data volume and high observability.
Proceedings of the 2007 Design, Automation and Test in Europe Conference and Exposition, 2007

2006
A design methodology for application-specific networks-on-chip.
ACM Trans. Embedded Comput. Syst., 2006

A scalable scan-path test point insertion technique to enhance delay fault coverage for standard scan designs.
IEEE Trans. on CAD of Integrated Circuits and Systems, 2006

Test-Volume Reduction in Systems-on-a-Chip Using Heterogeneous and Multilevel Compression Techniques.
IEEE Trans. on CAD of Integrated Circuits and Systems, 2006

Using Shiftable Content Addressable Memories to Double Memory Capacity on Embedded Systems.
Proceedings of the 19th International Conference on VLSI Design (VLSI Design 2006), 2006

PIDISC: Pattern Independent Design Independent Seed Compression Technique.
Proceedings of the 19th International Conference on VLSI Design (VLSI Design 2006), 2006

Chisel: A Storage-efficient, Collision-free Hash-based Network Processing Architecture.
Proceedings of the 33rd International Symposium on Computer Architecture (ISCA 2006), 2006

Efficient unknown blocking using LFSR reseeding.
Proceedings of the Conference on Design, Automation and Test in Europe, 2006

Coverage loss by using space compactors in presence of unknown values.
Proceedings of the Conference on Design, Automation and Test in Europe, 2006

Unknown-tolerance analysis and test-quality control for test response compaction using space compactors.
Proceedings of the 43rd Design Automation Conference, 2006

Software architecture exploration for high-performance security processing on a multiprocessor mobile SoC.
Proceedings of the 43rd Design Automation Conference, 2006

2005
A Methodology for Architectural Design of Multimedia Multiprocessor SoCs.
IEEE Design & Test of Computers, 2005

Heterogeneous and Multi-Level Compression Techniques for Test Volume Reduction in Systems-on-Chip.
Proceedings of the 18th International Conference on VLSI Design (VLSI Design 2005), 2005

Distance Restricted Scan Chain Reordering to Enhance Delay Fault Coverage.
Proceedings of the 18th International Conference on VLSI Design (VLSI Design 2005), 2005

A Unified Architecture for Adaptive Compression of Data and Code on Embedded Systems.
Proceedings of the 18th International Conference on VLSI Design (VLSI Design 2005), 2005

Power Monitors: A Framework for System-Level Power Estimation Using Heterogeneous Power Models.
Proceedings of the 18th International Conference on VLSI Design (VLSI Design 2005), 2005

XWRC: externally-loaded weighted random pattern testing for input test data compression.
Proceedings of the Proceedings 2005 IEEE International Test Conference, 2005

A methodology for design, modeling, and analysis of networks-on-chip.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2005), 2005

H.264 HDTV Decoder Using Application-Specific Networks-On-Chip.
Proceedings of the 2005 IEEE International Conference on Multimedia and Expo, 2005

ChiYun Compact: A Novel Test Compaction Technique for Responses with Unknown Values.
Proceedings of the 23rd International Conference on Computer Design (ICCD 2005), 2005

Response shaper: a novel technique to enhance unknown tolerance for output response compaction.
Proceedings of the 2005 International Conference on Computer-Aided Design, 2005

CRAMES: compressed RAM for embedded systems.
Proceedings of the 3rd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2005

SECA: security-enhanced communication architecture.
Proceedings of the 2005 International Conference on Compilers, 2005

2004
Cypress: Compression and Encryption of Data and Code for Embedded Multimedia Systems.
IEEE Design & Test of Computers, 2004

Tamper Resistance Mechanisms for Secure, Embedded Systems.
Proceedings of the 17th International Conference on VLSI Design (VLSI Design 2004), 2004

On-chip networks: A scalable, communication-centric embedded system design paradigm.
Proceedings of the 17th International Conference on VLSI Design (VLSI Design 2004), 2004

A Case Study in Networks-on-Chip Design for Embedded Video.
Proceedings of the 2004 Design, 2004

Hybrid Delay Scan: A Low Hardware Overhead Scan-Based Delay Test Technique for High Fault Coverage and Compact Test Sets.
Proceedings of the 2004 Design, 2004

Re-configurable embedded core test protocol.
Proceedings of the 2004 Conference on Asia South Pacific Design Automation: Electronic Design and Solution Fair 2004, 2004

Open architecture test system: not why but when!
Proceedings of the 2004 Conference on Asia South Pacific Design Automation: Electronic Design and Solution Fair 2004, 2004

2003
Efficient RTL Power Estimation for Large Designs.
Proceedings of the 16th International Conference on VLSI Design (VLSI Design 2003), 2003

Embedding Security in Wireless Embedded Systems.
Proceedings of the 16th International Conference on VLSI Design (VLSI Design 2003), 2003

A Scalable Scan-Path Test Point Insertion Technique to Enhance Delay Fault Coverage for Standard Scan Designs.
Proceedings of the Proceedings 2003 International Test Conference (ITC 2003), Breaking Test Interface Bottlenecks, 28 September, 2003

CoCo: a hardware/software platform for rapid prototyping of code compression technologies.
Proceedings of the 40th Design Automation Conference, 2003

2001
Accurate Power Macro-modeling Techniques for Complex RTL Circuits.
Proceedings of the 14th International Conference on VLSI Design (VLSI Design 2001), 2001

2000
Testable Path Delay Fault Cover for Sequential Circuits.
J. Inf. Sci. Eng., 2000

Test Set and Fault Partitioning Techniques for Static Test Sequence Compaction for Sequential Circuits.
J. Electronic Testing, 2000

Test Set Compaction Using Relaxed Subsequence Removal.
J. Electronic Testing, 2000

A Practical Vector Restoration Technique for Large Sequential Circuits.
J. Electronic Testing, 2000

Resource-Constrained Compaction of Sequential Circuit Test Sets.
Proceedings of the 13th International Conference on VLSI Design (VLSI Design 2000), 2000

1999
Primitive delay faults: identification, testing, and design for testability.
IEEE Trans. on CAD of Integrated Circuits and Systems, 1999

Resynthesis and retiming for optimum partial scan.
IEEE Trans. on CAD of Integrated Circuits and Systems, 1999

Testing High Speed VLSI Devices Using Slower Testers.
Proceedings of the 17th IEEE VLSI Test Symposium (VTS '99), 1999

1998
Peripheral Partitioning and Tree Decomposition for Partial Scan.
Proceedings of the 11th International Conference on VLSI Design (VLSI Design 1991), 1998

Static test sequence compaction based on segment reordering and accelerated vector restoration.
Proceedings of the Proceedings IEEE International Test Conference 1998, 1998

Static compaction using overlapped restoration and segment pruning.
Proceedings of the 1998 IEEE/ACM International Conference on Computer-Aided Design, 1998

State Relaxation Based Subsequence Removal for Fast Static Compaction in Sequential Circuits.
Proceedings of the 1998 Design, 1998

Partitioning and Reordering Techniques for Static Test Sequence Compaction of Sequential Circuits.
Proceedings of the 7th Asian Test Symposium (ATS '98), 2-4 December 1998, Singapore, 1998

Vector Restoration Using Accelerated Validation and Refinement.
Proceedings of the 7th Asian Test Symposium (ATS '98), 2-4 December 1998, Singapore, 1998

1997
Redundancy removal and test generation for circuits with non-Boolean primitives.
IEEE Trans. on CAD of Integrated Circuits and Systems, 1997

Bottleneck removal algorithm for dynamic compaction in sequential circuits.
IEEE Trans. on CAD of Integrated Circuits and Systems, 1997

Deriving Signal Constraints to Accelerate Sequential Test Generation.
Proceedings of the 10th International Conference on VLSI Design (VLSI Design 1997), 1997

Design for Primitive Delay Fault Testability.
Proceedings of the Proceedings IEEE International Test Conference 1997, 1997

1996
Synthesis of initializable asynchronous circuits.
IEEE Trans. VLSI Syst., 1996

Initialization issues in asynchronous circuit synthesis.
J. Electronic Testing, 1996

Dynamic test Sequence compaction for Sequential Circuits.
Proceedings of the 9th International Conference on VLSI Design (VLSI Design 1996), 1996

Synchronous Test Generation Model for Asynchronous Circuits.
Proceedings of the 9th International Conference on VLSI Design (VLSI Design 1996), 1996

Retiming with logic duplication transformation: theory and an application to partial scan.
Proceedings of the 9th International Conference on VLSI Design (VLSI Design 1996), 1996

Sequential Circuits with combinational Test Generation Complexity.
Proceedings of the 9th International Conference on VLSI Design (VLSI Design 1996), 1996

Identification and Test Generation for Primitive Faults.
Proceedings of the Proceedings IEEE International Test Conference 1996, 1996

Testable path delay fault cover for sequential circuits.
Proceedings of the conference on European design automation, 1996

1995
A partition and resynthesis approach to testable design of large circuits.
IEEE Trans. on CAD of Integrated Circuits and Systems, 1995

Test function embedding algorithms with application to interconnected finite state machines.
IEEE Trans. on CAD of Integrated Circuits and Systems, 1995

Energy models for delay testing.
IEEE Trans. on CAD of Integrated Circuits and Systems, 1995

Combinational ATPG theorems for identifying untestable faults in sequential circuits.
IEEE Trans. on CAD of Integrated Circuits and Systems, 1995

Design of testable sequential circuits by repositioning flip-flops.
J. Electronic Testing, 1995

An exact algorithm for selecting partial scan flip-flops.
J. Electronic Testing, 1995

Redundancy Removal and Test Generation for Circuits with Non-Boolean Primitives.
Proceedings of the 13th IEEE VLSI Test Symposium (VTS'95), April 30, 1995

Optimum retiming of large sequential circuits.
Proceedings of the 8th International Conference on VLSI Design (VLSI Design 1995), 1995

Partial scan design for technology mapped circuits.
Proceedings of the 8th International Conference on VLSI Design (VLSI Design 1995), 1995

Acceleration techniques for dynamic vector compaction.
Proceedings of the 1995 IEEE/ACM International Conference on Computer-Aided Design, 1995

Bottleneck removal algorithm for dynamic compaction and test cycles reduction.
Proceedings of the Proceedings EURO-DAC'95, 1995

Software transformations for sequential test generation.
Proceedings of the 4th Asian Test Symposium (ATS '95), 1995

1994
First-order versus second-order single-layer recurrent neural networks.
IEEE Trans. Neural Networks, 1994

Energy minimization and design for testability.
J. Electronic Testing, 1994

Discrete test generation by continuous methods.
Proceedings of the 12th IEEE VLSI Test Symposium (VTS'94), 1994

Retiming sequential circuits to enhance testability.
Proceedings of the 12th IEEE VLSI Test Symposium (VTS'94), 1994

A Test Function Architecture for Interconnected Finite State Machines.
Proceedings of the Seventh International Conference on VLSI Design, 1994

Synthesis of Initializable Asynchronous Circuits.
Proceedings of the Seventh International Conference on VLSI Design, 1994

Initialization Isuues in the Synthesis of Asynchronous Circuits.
Proceedings of the Proceedings 1994 IEEE International Conference on Computer Design: VLSI in Computer & Processors, 1994

Signal Transition Graph Transformations for Initializability.
Proceedings of the EDAC - The European Conference on Design Automation, ETC - European Test Conference, EUROASIC - The European Event in ASIC Design, Proceedings, February 28, 1994

Resynthesis and Retiming for Optimum Partial Scan.
Proceedings of the 31st Conference on Design Automation, 1994

An Exact Algorithm for Selecting Partial Scan Flip-Flops.
Proceedings of the 31st Conference on Design Automation, 1994

1993
A transitive closure algorithm for test generation.
IEEE Trans. on CAD of Integrated Circuits and Systems, 1993

Finite state machine synthesis with fault tolerant test function.
J. Electronic Testing, 1993

A Synthesis Approach to Design for Testability.
Proceedings of the Proceedings IEEE International Test Conference 1993, Designing, Testing, and Diagnostics, 1993

Test function embedding algorithms with application to interconnected finite state machines.
Proceedings of the European Design Automation Conference 1993, 1993

Sequential Circuit Delay optimization Using Global Path Delays.
Proceedings of the 30th Design Automation Conference. Dallas, 1993

1992
Performance Analysis of Synchronized Iterative Algorithms on Multiprocessor Systems.
IEEE Trans. Parallel Distrib. Syst., 1992

A solvable class of quadratic 0-1 programming.
Discrete Applied Mathematics, 1992

Finite State Machine Synthesis with Fault Tolerant Test Function.
Proceedings of the 29th Design Automation Conference, 1992

1991
A Transitive Closure Based Algorithm for Test Generation.
Proceedings of the 28th Design Automation Conference, 1991

1990
Toward massively parallel automatic test generation.
IEEE Trans. on CAD of Integrated Circuits and Systems, 1990

Neural Net and Boolean Satisfiability Models of Logic Circuits.
IEEE Design & Test of Computers, 1990

Performance estimation in a massively parallel system.
Proceedings of the Proceedings Supercomputing '90, New York, NY, USA, November 12-16, 1990, 1990

Logic Simulation and Parallel Processing.
Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 1990

Polynomial time solvable fault detection problems.
Proceedings of the 20th International Symposium on Fault-Tolerant Computing, 1990

Automatic Test Generation Using Quadratic 0-1 Programming.
Proceedings of the 27th ACM/IEEE Design Automation Conference. Orlando, 1990

1988
Automatic test generation using neural networks.
Proceedings of the 1988 IEEE International Conference on Computer-Aided Design, 1988


  Loading...