Doug Burger

Orcid: 0009-0006-6588-6596

Affiliations:
  • University of Texas at Austin, USA


According to our database1, Doug Burger authored at least 121 papers between 1995 and 2023.

Collaborative distances:

Awards

IEEE Fellow

IEEE Fellow 2010, "For contributions to processor and memory systems".

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2023
Microscaling Data Formats for Deep Learning.
CoRR, 2023

Shared Microexponents: A Little Shifting Goes a Long Way.
CoRR, 2023


2020
Pushing the Limits of Narrow Precision Inferencing at Cloud Scale with Microsoft Floating Point.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Mixed-Signal Charge-Domain Acceleration of Deep Neural Networks through Interleaved Bit-Partitioned Arithmetic.
Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

2019
Inside Project Brainwave's Cloud-Scale, Real-Time AI Processor.
IEEE Micro, 2019

Mixed-Signal Charge-Domain Acceleration of Deep Neural networks through Interleaved Bit-Partitioned Arithmetic.
CoRR, 2019

2018
Serving DNNs in Real Time at Datacenter Scale with Project Brainwave.
IEEE Micro, 2018


A Configurable Cloud-Scale DNN Processor for Real-Time AI.
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

2017
Configurable Clouds.
IEEE Micro, 2017

2016
A cloud-scale acceleration architecture.
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016


2015
A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services.
IEEE Micro, 2015

PocketTrend: Timely Identification and Delivery of Trending Search Content to Mobile Users.
Proceedings of the 24th International Conference on World Wide Web, 2015

Priority-based cache allocation in throughput processors.
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

A Scalable High-Bandwidth Architecture for Lossless Compression on FPGAs.
Proceedings of the 23rd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2015

2014
Scaling Power and Performance viaProcessor Composability.
IEEE Trans. Computers, 2014

What the Future Holds for Solid-State Memory.
Computer, 2014

Dynamic-vector execution on a general purpose EDGE chip multiprocessor.
Proceedings of the XIVth International Conference on Embedded Computer Systems: Architectures, 2014

General-purpose code acceleration with limited-precision analog computation.
Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

Author retrospective for a NUCA substrate for flexible CMP cache sharing.
Proceedings of the ACM International Conference on Supercomputing 25th Anniversary Volume, 2014

A Scalable Multi-engine Xpress9 Compressor with Asynchronous Data Transfer.
Proceedings of the 22nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2014

EVX: Vector execution on low power EDGE cores.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014

2013
Neural Acceleration for General-Purpose Approximate Programs.
IEEE Micro, 2013

Multicore Model from Abstract Single Core Inputs.
IEEE Comput. Archit. Lett., 2013

Power challenges may end the multicore era.
Commun. ACM, 2013

Using managed runtime systems to tolerate holes in wearable memories.
Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2013

How to implement effective prediction and forwarding for fusable dynamic multicore architectures.
Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013

Reconfigurable computing in the era of post-silicon scaling [panel discussion].
Proceedings of the 21st IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2013

2012
Power Limitations and Dark Silicon Challenge the Future of Multicore.
ACM Trans. Comput. Syst., 2012

Dark Silicon and the End of Multicore Scaling.
IEEE Micro, 2012

Charles R. (Chuck) Moore (1961 - 2012).
IEEE Micro, 2012

Architecture support for disciplined approximate programming.
Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems, 2012

2011
Preventing PCM banks from seizing too much power.
Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

Panel Statement.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Exploiting criticality to reduce bottlenecks in distributed uniprocessors.
Proceedings of the 17th International Conference on High-Performance Computer Architecture (HPCA-17 2011), 2011

Pocket cloudlets.
Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems, 2011

The Good Block: Hardware/Software Design for Composable, Block-Atomic Processors.
Proceedings of the 15th Workshop on Interaction between Compilers and Computer Architectures, 2011

2010
Dynamic vectorization in the E2 dynamic multicore architecture.
SIGARCH Comput. Archit. News, 2010

Phase-Change Technology and the Future of Main Memory.
IEEE Micro, 2010

The Future of Architectural Simulation.
IEEE Micro, 2010

Phase change memory architecture and the quest for scalability.
Commun. ACM, 2010

Use ECP, not ECC, for hard failures in resistive memories.
Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010

Dynamically replicated memory: building reliable systems from nanoscale resistive memories.
Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, 2010

Evolving Compiler Heuristics to Manage Communication and Contention.
Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, 2010

Using dead blocks as a virtual victim cache.
Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

2009
Composable Multicore Chips.
Proceedings of the Multicore Processors and Systems, 2009

Mixed-Signal Approximate Computation: A Neural Predictor Case Study.
IEEE Micro, 2009

Better I/O through byte-addressable, persistent memory.
Proceedings of the 22nd ACM Symposium on Operating Systems Principles 2009, 2009

Analysis of the TRIPS prototype block predictor.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2009

End-to-end validation of architectural power models.
Proceedings of the 2009 International Symposium on Low Power Electronics and Design, 2009

Architecting phase change memory as a scalable dram alternative.
Proceedings of the 36th International Symposium on Computer Architecture (ISCA 2009), 2009

An evaluation of the TRIPS computer system.
Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, 2009

2008
Multitasking workload scheduling on flexible core chip multiprocessors.
SIGARCH Comput. Archit. News, 2008

High performance dense linear algebra on a spatially distributed processor.
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008

Strategies for mapping dataflow blocks to distributed hardware.
Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-41 2008), 2008

Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency.
Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-41 2008), 2008

Low-power, high-performance analog neural branch prediction.
Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-41 2008), 2008

Register Bank Assignment for Spatially Partitioned Processors.
Proceedings of the Languages and Compilers for Parallel Computing, 2008

Counting Dependence Predictors.
Proceedings of the 35th International Symposium on Computer Architecture (ISCA 2008), 2008

Feature selection and policy optimization for distributed instruction placement using reinforcement learning.
Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, 2008

2007
A NUCA Substrate for Flexible CMP Cache Sharing.
IEEE Trans. Parallel Distributed Syst., 2007

Convergent Compilation Applied to Loop Unrolling.
Trans. High Perform. Embed. Archit. Compil., 2007

On-Chip Interconnection Networks of the TRIPS Chip.
IEEE Micro, 2007

Implementation and Evaluation of a Dynamically Routed Processor Operand Network.
Proceedings of the First International Symposium on Networks-on-Chips, 2007

Composable Lightweight Processors.
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-40 2007), 2007

Late-binding: enabling unordered load-store queues.
Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), 2007

2006
Dataflow Predication.
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-39 2006), 2006

Distributed Microarchitectural Protocols in the TRIPS Prototype Processor.
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-39 2006), 2006

Merging Head and Tail Duplication for Convergent Hyperblock Formation.
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-39 2006), 2006

Critical path analysis of the TRIPS architecture.
Proceedings of the 2006 IEEE International Symposium on Performance Analysis of Systems and Software, 2006

Design and Implementation of the TRIPS Primary Memory System.
Proceedings of the 24th International Conference on Computer Design (ICCD 2006), 2006

Implementation and Evaluation of On-Chip Network Architectures.
Proceedings of the 24th International Conference on Computer Design (ICCD 2006), 2006

Compiling for EDGE Architectures.
Proceedings of the Fourth IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2006), 2006

A spatial path scheduling algorithm for EDGE architectures.
Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, 2006

2004
TRIPS: A polymorphous architecture for exploiting ILP, TLP, and DLP.
ACM Trans. Archit. Code Optim., 2004

Tools for computer architecture research.
SIGMETRICS Perform. Evaluation Rev., 2004

Recent extensions to the SimpleScalar tool suite.
SIGMETRICS Perform. Evaluation Rev., 2004

Scalable Hardware Memory Disambiguation for High-ILP Processors.
IEEE Micro, 2004

Speculative Incoherent Cache Protocols.
IEEE Micro, 2004

Scaling to the End of Silicon with EDGE Architectures.
Computer, 2004

Billion-Transistor Architectures: There and Back Again.
Computer, 2004

Coherence decoupling: making use of incoherence.
Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems, 2004

Scalable selective re-execution for EDGE architectures.
Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems, 2004

Static Placement, Dynamic Issue (SPDI) Scheduling for EDGE Architectures.
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques (PACT 2004), 29 September, 2004

2003
Static energy reduction techniques for microprocessor caches.
IEEE Trans. Very Large Scale Integr. Syst., 2003

Bottlenecks in Multimedia Processing with SIMD Style Extensions and Architectural Enhancements.
IEEE Trans. Computers, 2003

Exploiting ILP, TLP, and DLP with the Polymorphous TRIPS Architecture.
IEEE Micro, 2003

Nonuniform Cache Architectures for Wire-Delay Dominated On-Chip Caches.
IEEE Micro, 2003

Universal Mechanisms for Data-Parallel Architectures.
Proceedings of the 36th Annual International Symposium on Microarchitecture, 2003

Microprocessor pipeline energy analysis.
Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003

Guided Region Prefetching: A Cooperative Hardware/Software Approach.
Proceedings of the 30th International Symposium on Computer Architecture (ISCA 2003), 2003

Exploiting Microarchitectural Redundancy For Defect Tolerance.
Proceedings of the 21st International Conference on Computer Design (ICCD 2003), 2003

Routed Inter-ALU Networks for ILP Scalability and Performance.
Proceedings of the 21st International Conference on Computer Design (ICCD 2003), 2003

Architectural versus physical solutions for on-chip communication challenges.
Proceedings of the 1st IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2003

Designing Ultra-large Instruction Issue Windows.
Proceedings of the Advances in Computer Systems Architecture, 2003

2002
Errata on "Measuring Experimental Error in Microprocessor Simulation".
SIGARCH Comput. Archit. News, 2002

The Optimal Logic Depth Per Pipeline Stage is 6 to 8 FO4 Inverter Delays.
Proceedings of the 29th International Symposium on Computer Architecture (ISCA 2002), 2002

Modeling the Effect of Technology Trends on the Soft Error Rate of Combinational Logic.
Proceedings of the 2002 International Conference on Dependable Systems and Networks (DSN 2002), 2002

An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches.
Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-X), 2002

2001
Designing a Modern Memory Hierarchy with Hardware Prefetching.
IEEE Trans. Computers, 2001

A design space evaluation of grid processor architectures.
Proceedings of the 34th Annual International Symposium on Microarchitecture, 2001

Measuring Experimental Error in Microprocessor Simulation.
Proceedings of the 28th Annual International Symposium on Computer Architecture, 2001

Filtering Superfluous Prefetches Using Density Vectors.
Proceedings of the 19th International Conference on Computer Design (ICCD 2001), 2001

Reducing DRAM Latencies with an Integrated Memory Hierarchy Design.
Proceedings of the Seventh International Symposium on High-Performance Computer Architecture (HPCA'01), 2001

Exploring the Design Space of Future CMPs.
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques (PACT 2001), 2001

2000
Clock rate versus IPC: the end of the road for conventional microarchitectures.
Proceedings of the 27th International Symposium on Computer Architecture (ISCA 2000), 2000

1999
DataScalar: A memory-centric approach to computing.
J. Syst. Archit., 1999

1997
The SimpleScalar tool set, version 2.0.
SIGARCH Comput. Archit. News, 1997

Limited bandwidth to affect processor design.
IEEE Micro, 1997

Billion-Transistor Architectures - Guest Editors' Introduction.
Computer, 1997

Changing Interaction of Compiler and Architecture.
Computer, 1997

Efficient Synchronization: Let Them Eat QOLB.
Proceedings of the 24th International Symposium on Computer Architecture, 1997

DataScalar Architectures.
Proceedings of the 24th International Symposium on Computer Architecture, 1997

Memory Systems.
Proceedings of the Computer Science and Engineering Handbook, 1997

1996
Paging tradeoffs in distributed-shared-memory multiprocessors.
J. Supercomput., 1996

Memory Systems.
ACM Comput. Surv., 1996

Memory Bandwidth Limitations of Future Microprocessors.
Proceedings of the 23rd Annual International Symposium on Computer Architecture, 1996

1995
Accuracy vs. performance in parallel simulation of interconnection networks.
Proceedings of IPPS '95, 1995

Techniques for Reducing Overheads of Shared-Memory Multiprocessing.
Proceedings of the 9th international conference on Supercomputing, 1995


  Loading...