David Gregg

Orcid: 0000-0003-3782-4612

Affiliations:
  • Trinity College Dublin, Ireland


According to our database1, David Gregg authored at least 122 papers between 2000 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
E<sup>2</sup>CSM: efficient FPGA implementation of elliptic curve scalar multiplication over generic prime field GF(p).
J. Supercomput., January, 2024

2023
On the RTL Implementation of FINN Matrix Vector Unit.
ACM Trans. Embed. Comput. Syst., November, 2023

Maple: A Processing Element for Row-Wise Product Based Sparse Tensor Accelerators.
CoRR, 2023

EC-Crypto: Highly Efficient Area-Delay Optimized Elliptic Curve Cryptography Processor.
IEEE Access, 2023

Dynamic Resource Partitioning for Multi-Tenant Systolic Array Based DNN Accelerator.
Proceedings of the 31st Euromicro International Conference on Parallel, 2023

Using Ensemble Inference to Improve Recall of Clone Detection.
Proceedings of the 17th IEEE International Workshop on Software Clones, 2023

2022
Winograd Convolution for Deep Neural Networks: Efficient Point Selection.
ACM Trans. Embed. Comput. Syst., November, 2022

Guest Editorial: Introduction to the Special Section on Communication-Efficient Distributed Machine Learning.
IEEE Trans. Netw. Sci. Eng., 2022

High-speed parallel reconfigurable Fp multipliers for elliptic curve cryptography applications.
Int. J. Circuit Theory Appl., 2022

On the RTL Implementation of FINN Matrix Vector Compute Unit.
CoRR, 2022

Using a Nearest-Neighbour, BERT-Based Approach for Scalable Clone Detection.
Proceedings of the IEEE International Conference on Software Maintenance and Evolution, 2022

Building SSA in a Compiler for PHP.
Proceedings of the SSA-based Compiler Design, 2022

2021
Low-precision Logarithmic Number Systems: Beyond Base-2.
ACM Trans. Archit. Code Optim., 2021

Taxonomy of Saliency Metrics for Channel Pruning.
IEEE Access, 2021

LOCAL: Low-Complex Mapping Algorithm for Spatial DNN Accelerators.
Proceedings of the IEEE Nordic Circuits and Systems Conference, NorCAS 2021, Oslo, 2021

Domino Saliency Metrics: Improving Existing Channel Saliency Metrics with Structural Information.
Proceedings of the AIxIA 2021 - Advances in Artificial Intelligence, 2021

2020
Error Analysis and Improving the Accuracy of Winograd Convolution for Deep Neural Networks.
ACM Trans. Math. Softw., 2020

Bonseyes AI Pipeline - Bringing AI to You: End-to-end integration of data, algorithms, and deployment tools.
ACM Trans. Internet Things, 2020

HOBFLOPS CNNs: Hardware Optimized Bitsliced Floating-Point Operations Convolutional Neural Networks.
CoRR, 2020

Exploiting Weight Redundancy in CNNs: Beyond Pruning and Quantization.
CoRR, 2020

Composition of Saliency Metrics for Channel Pruning with a Myopic Oracle.
CoRR, 2020

Composition of Saliency Metrics for Pruning with a Myopic Oracle.
Proceedings of the 2020 IEEE Symposium Series on Computational Intelligence, 2020

TASO: Time and Space Optimization for Memory-Constrained DNN Inference.
Proceedings of the 32nd IEEE International Symposium on Computer Architecture and High Performance Computing, 2020

High-Performance Low-Memory Lowering: GEMM-based Algorithms for DNN Convolution.
Proceedings of the 32nd IEEE International Symposium on Computer Architecture and High Performance Computing, 2020

Beyond Base-2 Logarithmic Number Systems (WiP Paper).
Proceedings of the 21st ACM SIGPLAN/SIGBED International Conference on Languages, 2020

2019
A Taxonomy of Channel Pruning Signals in CNNs.
CoRR, 2019

Winograd Convolution for DNNs: Beyond linear polinomials.
CoRR, 2019

Performance-Oriented Neural Architecture Search.
Proceedings of the 17th International Conference on High Performance Computing & Simulation, 2019

Scalar Arithmetic Multiple Data: Customizable Precision for Deep Neural Networks.
Proceedings of the 26th IEEE Symposium on Computer Arithmetic, 2019

Winograd Convolution for DNNs: Beyond Linear Polynomials.
Proceedings of the AI*IA 2019 - Advances in Artificial Intelligence, 2019

POSTER: Space and Time Optimal DNN Primitive Selection with Integer Linear Programming.
Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques, 2019

2018
Low Complexity Multiply-Accumulate Units for Convolutional Neural Networks with Weight-Sharing.
ACM Trans. Archit. Code Optim., 2018

Scalar Arithmetic Multiple Data: Customizable Precision for Deep Neural Networks.
CoRR, 2018

Improving accuracy of Winograd convolution for DNNs.
CoRR, 2018

Optimal DNN primitive selection with partitioned boolean quadratic programming.
Proceedings of the 2018 International Symposium on Code Generation and Optimization, 2018

2017
Efficient Multibyte Floating Point Data Formats Using Vectorization.
IEEE Trans. Computers, 2017

Low-memory GEMM-based convolution algorithms for deep neural networks.
CoRR, 2017

Mutual Inclusivity of the Critical Path and its Partial Schedule on Heterogeneous Systems.
CoRR, 2017

Low Complexity Multiply Accumulate Unit for Weight-Sharing Convolutional Neural Networks.
IEEE Comput. Archit. Lett., 2017

Bitslice Vectors: A Software Approach to Customizable Data Precision on Processors with SIMD Extensions.
Proceedings of the 46th International Conference on Parallel Processing, 2017

Parallel Multi Channel convolution using General Matrix Multiplication.
Proceedings of the 28th IEEE International Conference on Application-specific Systems, 2017

2016
Parallel Performance Problems on Shared-Memory Multicore Systems: Taxonomy and Observation.
IEEE Trans. Software Eng., 2016

Automatic Vectorization of Interleaved Data Revisited.
ACM Trans. Archit. Code Optim., 2016

Practical Algorithms for Finding Extremal Sets.
ACM J. Exp. Algorithmics, 2016

Customizable Precision of Floating-Point Arithmetic with Bitslice Vector Types.
CoRR, 2016

Spectral Convolution Networks.
CoRR, 2016

Vectorization of Multibyte Floating Point Data Formats.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015
Heuristics on Reachability Trees for Bicriteria Scheduling of Stream Graphs on Heterogeneous Multiprocessor Architectures.
ACM Trans. Embed. Comput. Syst., 2015

The Movidius Myriad Architecture's Potential for Scientific Computing.
IEEE Micro, 2015

Itemset Isomorphism: GI-Complete.
CoRR, 2015

Sorting Networks: The Final Countdown.
CoRR, 2015

Towards Optimal Sorting Networks: The Third Level.
CoRR, 2015

Exploiting Hyper-Loop Parallelism in Vectorization to Improve Memory Performance on CUDA GPGPU.
Proceedings of the 2015 IEEE TrustCom/BigDataSE/ISPA, 2015

An Efficient Vectorization Approach to Nested Thread-level Parallelism for CUDA GPUs.
Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

2014
Semi-automatic Composition of Data Layout Transformations for Loop Vectorization.
Proceedings of the Network and Parallel Computing, 2014

Efficient Exploitation of Hyper Loop Parallelism in Vectorization.
Proceedings of the Languages and Compilers for Parallel Computing, 2014

An improved simulated annealing heuristic for static partitioning of task graphs onto heterogeneous architectures.
Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems, 2014

Design considerations for parallel performance tools.
Proceedings of the CHI Conference on Human Factors in Computing Systems, 2014

2013
Orchestrating stream graphs using model checking.
ACM Trans. Archit. Code Optim., 2013

Compiler support for lightweight context switching.
ACM Trans. Archit. Code Optim., 2013

Fast asymmetric thread synchronization.
ACM Trans. Archit. Code Optim., 2013

Minimal Unroll Factor for Code Generation of Software Pipelining.
Int. J. Parallel Program., 2013

Heterogeneous Multiconstraint Application Partitioner (HMAP).
Proceedings of the 12th IEEE International Conference on Trust, 2013

A Parallel Runtime Framework for Communication Intensive Stream Applications.
Proceedings of the 12th IEEE International Conference on Trust, 2013

2012
Compiler techniques to improve dynamic branch prediction for indirect jump and call instructions.
ACM Trans. Archit. Code Optim., 2012

A practical solution for achieving language compatibility in scripting language compilers.
Sci. Comput. Program., 2012

Real-Time Sensor Signal Capture from a Harsh Environment.
Proceedings of the 16th IEEE/ACM International Symposium on Distributed Simulation and Real Time Applications, 2012

2011
Optimizing interpreters by tuning opcode orderings on virtual machines for modern architectures: or: how I learned to stop worrying and love hill climbing.
Proceedings of the 9th International Conference on Principles and Practice of Programming in Java, 2011

2010
GSFAP adaptive filtering using log arithmetic for resource-constrained embedded systems.
ACM Trans. Embed. Comput. Syst., 2010

Comparing integer data structures for 32- and 64-bit keys.
ACM J. Exp. Algorithmics, 2010

An output sensitive algorithm for computing a maximum independent set of a circle graph.
Inf. Process. Lett., 2010

A Program Generator for Intel AES-NI Instructions.
Proceedings of the Progress in Cryptology - INDOCRYPT 2010, 2010

Dynamic interpretation for dynamic scripting languages.
Proceedings of the CGO 2010, 2010

Code generation for hardware accelerated AES.
Proceedings of the 21st IEEE International Conference on Application-specific Systems Architectures and Processors, 2010

2009
A practical solution for scripting language compilers.
Proceedings of the 2009 ACM Symposium on Applied Computing (SAC), 2009

Portable Just-in-Time Specialization of Dynamically Typed Scripting Languages.
Proceedings of the Languages and Compilers for Parallel Computing, 2009

Mapping Streaming Languages to General Purpose Processors through Vectorization.
Proceedings of the Languages and Compilers for Parallel Computing, 2009

Using the Meeting Graph Framework to Minimise Kernel Loop Unrolling for Scheduled Loops.
Proceedings of the Languages and Compilers for Parallel Computing, 2009

Streamlining Offload Computing to High Performance Architectures.
Proceedings of the Computational Science, 2009

2008
A stochastic bitwidth estimation technique for compact and low-power custom processors.
ACM Trans. Embed. Comput. Syst., 2008

Virtual machine showdown: Stack versus registers.
ACM Trans. Archit. Code Optim., 2008

Efficiently implementing maximum independent set algorithms on circle graphs.
ACM J. Exp. Algorithmics, 2008

An experimental study of sorting and branch prediction.
ACM J. Exp. Algorithmics, 2008

Optimization strategies for a java virtual machine interpreter on the cell broadband engine.
Proceedings of the 5th Conference on Computing Frontiers, 2008

2007
Optimizing indirect branch prediction accuracy in virtual machine interpreters.
ACM Trans. Program. Lang. Syst., 2007

FPGA based Sparse Matrix Vector Multiplication using Commodity DRAM Memory.
Proceedings of the FPL 2007, 2007

2006
Analyzing Effects of Trace Cache Configurations on the Prediction of Indirect Branches.
J. Instr. Level Parallelism, 2006

Optimizing code-copying JIT compilers for virtual stack machines.
Concurr. Comput. Pract. Exp., 2006

FPGA Implementation of Adaptive Filters based on GSFAP using Log Arithmetic.
Proceedings of the IEEE Workshop on Signal Processing Systems, 2006

Fast and flexible instruction selection with on-demand tree-parsing automata.
Proceedings of the ACM SIGPLAN 2006 Conference on Programming Language Design and Implementation, 2006

Low-Cost Microarchitectural Techniques for Enhancing the Prediction of Return Addresses on High-Performance Trace Cache Processors.
Proceedings of the Computer and Information Sciences, 2006

High Performance Scientific Computing Using FPGAs with IEEE Floating Point and Logarithmic Arithmetic for Lattice QCD.
Proceedings of the 2006 International Conference on Field Programmable Logic and Applications (FPL), 2006

GSFAP adaptive filtering using log arithmetic for resource-constrained embedded systems.
Proceedings of the ACM/SIGDA 14th International Symposium on Field Programmable Gate Arrays, 2006

Efficient Floating-Point Implementation of High-Order (N)LMS Adaptive Filters in FPGA.
Proceedings of the Reconfigurable Computing: Architectures and Applications, 2006

2005
The case for virtual register machines.
Sci. Comput. Program., 2005

Estimating data bus size for custom processors in embedded systems.
Des. Autom. Embed. Syst., 2005

A method-level comparison of the Java Grande and SPEC JVM98 benchmark suites.
Concurr. Pract. Exp., 2005

Virtual machine showdown: stack versus registers.
Proceedings of the 1st International Conference on Virtual Execution Environments, 2005

Multiple-Valued Caches for Power-Efficient Embedded Systems.
Proceedings of the 35th IEEE International Symposium on Multiple-Valued Logic (ISMVL 2005), 2005

FPGA Implementation of a Lattice Quantum Chromodynamics Algorithm Using Logarithmic Arithmetic.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

B.Sc. Computer Game Development ... Why not?
Proceedings of the Digital Games Research Conference 2005, 2005

Tiger - An Interpreter Generation Tool.
Proceedings of the Compiler Construction, 14th International Conference, 2005

2004
Combining stack caching with dynamic superinstructions.
Proceedings of the 2004 Workshop on Interpreters, Virtual Machines and Emulators, 2004

Fine-Tuning Loop-Level Parallelism for Increasing Performance of DSP Applications on FPGAs.
Proceedings of the 12th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2004), 2004

Automatic Customization of Embedded Applications for Enhanced Performance and Reduced Power Using Optimizing Compiler Techniques.
Proceedings of the Euro-Par 2004 Parallel Processing, 2004

Stochastic Bit-Width Approximation Using Extreme Value Theory for Customizable Processors.
Proceedings of the Compiler Construction, 13th International Conference, 2004

Retargeting JIT Compilers by using C-Compiler Generated Executable Code.
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques (PACT 2004), 29 September, 2004

2003
The Structure and Performance of Efficient Interpreters.
J. Instr. Level Parallelism, 2003

Platform independent dynamic Java virtual machine analysis: the Java Grande Forum benchmark suite.
Concurr. Comput. Pract. Exp., 2003

Towards Superinstructions for Java Interpreters.
Proceedings of the Software and Compilers for Embedded Systems, 7th International Workshop, 2003

An Optimized Java Interpreter for Connected Devices and Embedded Systems.
Proceedings of the 2003 ACM Symposium on Applied Computing (SAC), 2003

Optimizing indirect branch prediction accuracy in virtual machine interpreters.
Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation 2003, 2003

The case for virtual register machines.
Proceedings of the 2003 Workshop on Interpreters, Virtual Machines and Emulators, 2003

A Language and Tool for Generating Efficient Virtual Machine Interpreters.
Proceedings of the Domain-Specific Program Generation, International Seminar, 2003

2002
Vmgen - a generator of efficient virtual machine interpreters.
Softw. Pract. Exp., 2002

Measuring the impact of object-oriented techniques in grande applications: a method-level analysis.
Proceedings of the 2002 Joint ACM-ISCOPE Conference on Java Grande 2002, 2002

Building an Interpreter with Vmgen.
Proceedings of the Compiler Construction, 11th International Conference, 2002

2001
Identification and Quantification of Hotspots in Java Grande Programs.
Proceedings of the High-Performance Computing and Networking, 9th International Conference, 2001

Implementing an Efficient Java Interpreter.
Proceedings of the High-Performance Computing and Networking, 9th International Conference, 2001

The Behavior of Efficient Virtual Machine Interpreters on Modern Architectures.
Proceedings of the Euro-Par 2001: Parallel Processing, 2001

Comparing Tail Duplication with Compensation Code in Single Path Global Instruction Scheduling.
Proceedings of the Compiler Construction, 10th International Conference, 2001

2000
Global Software Pipelining with Iteration Preselection.
Proceedings of the Compiler Construction, 9th International Conference, 2000


  Loading...