William J. Dally

Proceedings of the 2019 Symposium on VLSI Circuits, Kyoto, Japan, June 9-14, 2019, 2019

Darwin: A Genomics Co-processor Provides up to 15, 000X Acceleration on Long Read Assembly.

[BibT_eX]

[DOI]

Yatish Turakhia

Gill Bejerano

Proceedings of the 2019 USENIX Annual Technical Conference, 2019

CaTDet: Cascaded Tracked Detector for Efficient Object Detection from Video.

[BibT_eX]

[DOI]

Huizi Mao

Taeyoung Kong

Proceedings of Machine Learning and Systems 2019, 2019

Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture.

[BibT_eX]

[DOI]

Yakun Sophia Shao

Jason Clemons

Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

A Delay Metric for Video Object Detection: What Average Precision Fails to Tell.

[BibT_eX]

[DOI]

Huizi Mao

Xiaodong Yang

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

MAGNet: A Modular Accelerator Generator for Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computer-Aided Design, 2019

Darwin-WGA: A Co-processor Provides Increased Sensitivity in Whole Genome Alignments with High Speedup.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019

A 0.11 PJ/OP, 0.32-128 Tops, Scalable Multi-Chip-Module-Based Deep Neural Network Accelerator Designed with A High-Productivity vlsi Methodology.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE Hot Chips 31 Symposium (HCS), 2019

Analog/Mixed-Signal Hardware Error Modeling for Deep Learning Inference.

[BibT_eX]

[DOI]

Proceedings of the 56th Annual Design Automation Conference 2019, 2019

A 2-to-20 GHz Multi-Phase Clock Generator with Phase Interpolators Using Injection-Locked Oscillation Buffers for High-Speed IOs in 16nm FinFET.

[BibT_eX]

[DOI]

Proceedings of the IEEE Custom Integrated Circuits Conference, 2019

A Fine-Grained GALS SoC with Pausible Adaptive Clocking in 16 nm FinFET.

[BibT_eX]

[DOI]

Matthew Fojtik

Ben Keller

Alicia Klinefelter

Proceedings of the 25th IEEE International Symposium on Asynchronous Circuits and Systems, 2019

2018

Optimal Operation of a Plug-In Hybrid Vehicle.

[BibT_eX]

[DOI]

IEEE Trans. Veh. Technol., 2018

Hardware-Enabled Artificial Intelligence.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Symposium on VLSI Circuits, 2018

A 1.17pJ/b 25Gb/s/pin ground-referenced single-ended serial link for off- and on-package communication in 16nm CMOS using a process- and temperature-adaptive voltage regulator.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Solid-State Circuits Conference, 2018

Efficient Sparse-Winograd Convolutional Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 6th International Conference on Learning Representations, 2018

Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training.

[BibT_eX]

[DOI]

Proceedings of the 6th International Conference on Learning Representations, 2018

Bandwidth-efficient deep learning.

[BibT_eX]

[DOI]

Song Han

Proceedings of the 55th Annual Design Automation Conference, 2018

Ground-referenced signaling for intra-chip and short-reach chip-to-chip interconnects.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Custom Integrated Circuits Conference, 2018

2017

CG-OoO: Energy-Efficient Coarse-Grain Out-of-Order Execution Near In-Order Energy with Near Out-of-Order Performance.

[BibT_eX]

[DOI]

Milad Mohammadi

ACM Trans. Archit. Code Optim., 2017

FPGAs versus GPUs in Data centers.

[BibT_eX]

[DOI]

IEEE Micro, 2017

HoLiSwap: Reducing Wire Energy in L1 Caches.

[BibT_eX]

[DOI]

CoRR, 2017

Deep Generative Adversarial Networks for Compressed Sensing Automates MRI.

[BibT_eX]

[DOI]

CoRR, 2017

Exploring the Regularity of Sparse Structure in Convolutional Neural Networks.

[BibT_eX]

[DOI]

CoRR, 2017

Fine-grained DRAM: energy-efficient DRAM for extreme bandwidth systems.

[BibT_eX]

[DOI]

Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

Efficient methods and hardware for deep learning.

[BibT_eX]

[DOI]

Proceedings of the Workshop on Trends in Machine-Learning (and impact on computer architecture), 2017

Trained Ternary Quantization.

[BibT_eX]

[DOI]

Proceedings of the 5th International Conference on Learning Representations, 2017

Efficient Sparse-Winograd Convolutional Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 5th International Conference on Learning Representations, 2017

DSD: Dense-Sparse-Dense Training for Deep Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 5th International Conference on Learning Representations, 2017

Architecting an Energy-Efficient DRAM System for GPUs.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA.

[BibT_eX]

[DOI]

William (Bill) J. Dally

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017

Exploring the Granularity of Sparsity in Convolutional Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017

2016

Reuse Distance-Based Probabilistic Cache Replacement.

[BibT_eX]

[DOI]

Subhasis Das

ACM Trans. Archit. Code Optim., 2016

A 28 nm 2 Mbit 6 T SRAM With Highly Configurable Low-Voltage Write-Ability Assist Implementation and Capacitor-Based Sense-Amplifier Input Offset Compensation.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, 2016

CG-OoO: Energy-Efficient Coarse-Grain Out-of-Order Execution.

[BibT_eX]

[DOI]

Milad Mohammadi

CoRR, 2016

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size.

[BibT_eX]

[DOI]

CoRR, 2016

DSD: Regularizing Deep Neural Networks with Dense-Sparse-Dense Training Flow.

[BibT_eX]

[DOI]

CoRR, 2016

Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding.

[BibT_eX]

[DOI]

Song Han

Huizi Mao

Proceedings of the 4th International Conference on Learning Representations, 2016

ESE: Efficient Speech Recognition Engine with Compressed LSTM on FPGA.

[BibT_eX]

[DOI]

CoRR, 2016

8.6 A 6.5-to-23.3fJ/b/mm balanced charge-recycling bus in 16nm FinFET CMOS at 1.7-to-2.6Gb/s/wire with clock forwarding and low-crosstalk contraflow wiring.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Solid-State Circuits Conference, 2016

EIE: Efficient Inference Engine on Compressed Deep Neural Network.

[BibT_eX]

[DOI]

Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

Deep compression and EIE: Efficient inference engine on compressed deep neural network.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE Hot Chips 28 Symposium (HCS), 2016

2015

On-Chip Active Messages for Speed, Scalability, and Efficiency.

[BibT_eX]

[DOI]

R. Curtis Harting

IEEE Trans. Parallel Distributed Syst., 2015

Learning both Weights and Connections for Efficient Neural Networks.

[BibT_eX]

[DOI]

CoRR, 2015

On-Demand Dynamic Branch Prediction.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2015

Network endpoint congestion control for fine-grained communication.

[BibT_eX]

[DOI]

Larry R. Dennison

Proceedings of the International Conference for High Performance Computing, 2015

Learning both Weights and Connections for Efficient Neural Network.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

SLIP: reducing wire energy in the memory hierarchy.

[BibT_eX]

[DOI]

Subhasis Das

Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

2014

Scaling the Power Wall: A Path to Exascale.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2014

Author retrospective for design tradeoffs for tiled CMP on-chip networks.

[BibT_eX]

[DOI]

Proceedings of the ACM International Conference on Supercomputing 25th Anniversary Volume, 2014

2013

Elastic Buffer Flow Control for On-Chip Networks.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2013

A 0.54 pJ/b 20 Gb/s Ground-Referenced Single-Ended Short-Reach Serial Link in 28 nm CMOS for Advanced Packaging Applications.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, 2013

Channel reservation protocol for over-subscribed channels and destinations.

[BibT_eX]

[DOI]

Daniel Becker

Proceedings of the International Conference for High Performance Computing, 2013

A 0.54pJ/b 20Gb/s ground-referenced single-ended short-haul serial link in 28nm CMOS for advanced packaging applications.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE International Solid-State Circuits Conference, 2013

A detailed and flexible cycle-accurate Network-on-Chip simulator.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2013

21st century digital design tools.

[BibT_eX]

[DOI]

Chris Malachowsky

Proceedings of the 50th Annual Design Automation Conference 2013, 2013

2012

A Hierarchical Thread Scheduler and Register File for Energy-Efficient Throughput Processors.

[BibT_eX]

[DOI]

ACM Trans. Comput. Syst., 2012

Unifying Primary Cache, Scratch, and Register File Memories in a Throughput Processor.

[BibT_eX]

[DOI]

Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012

Adaptive Backpressure: Efficient buffer management for on-chip networks.

[BibT_eX]

[DOI]

Proceedings of the 30th International IEEE Conference on Computer Design, 2012

Network congestion avoidance through Speculative Reservation.

[BibT_eX]

[DOI]

Proceedings of the 18th IEEE International Symposium on High Performance Computer Architecture, 2012

2011

Evaluating Elastic Buffer and Wormhole Flow Control.

[BibT_eX]

[DOI]

Daniel Becker

IEEE Trans. Computers, 2011

GPUs and the Future of Parallel Computing.

[BibT_eX]

[DOI]

IEEE Micro, 2011

Packet Chaining: Efficient Single-Cycle Allocation for On-Chip Networks.

[BibT_eX]

[DOI]

Daniel Becker

IEEE Comput. Archit. Lett., 2011

A compile-time managed multi-level register file hierarchy.

[BibT_eX]

[DOI]

Mark Gebhart

Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

Energy-efficient mechanisms for managing thread context in throughput processors.

[BibT_eX]

[DOI]

Proceedings of the 38th International Symposium on Computer Architecture (ISCA 2011), 2011

Panel Statement.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Power, Programmability, and Granularity: The Challenges of ExaScale Computing.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

The utility of fast active messages on many-core chips: Efficient supercomputing project.

[BibT_eX]

[DOI]

R. Curtis Harting

Vishal Parikh

Proceedings of the 2011 IEEE Hot Chips 23 Symposium (HCS), 2011

2010

The GPU Computing Era.

[BibT_eX]

[DOI]

John Nickolls

IEEE Micro, 2010

Buffer-space efficient and deadlock-free scheduling of stream applications on multi-core architectures.

[BibT_eX]

[DOI]

JongSoo Park

Proceedings of the SPAA 2010: Proceedings of the 22nd Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2010

Evaluating Bufferless Flow Control for On-chip Networks.

[BibT_eX]

[DOI]

Daniel Sánchez

Christos Kozyrakis

Proceedings of the NOCS 2010, 2010

Moving the needle, computer architecture research in academe and industry.

[BibT_eX]

[DOI]

Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010

Throughput computing.

[BibT_eX]

[DOI]

Proceedings of the 24th International Conference on Supercomputing, 2010

Block-Parallel Programming for Real-Time Embedded Applications.

[BibT_eX]

[DOI]

David Black-Schaffer

Proceedings of the 39th International Conference on Parallel Processing, 2010

Fine-grain dynamic instruction placement for L0 scratch-pad memory.

[BibT_eX]

[DOI]

JongSoo Park

Proceedings of the 2010 International Conference on Compilers, 2010

The Even/Odd Synchronizer: A Fast, All-Digital, Periodic Synchronizer.

[BibT_eX]

[DOI]

Stephen G. Tell

Proceedings of the 16th IEEE International Symposium on Asynchronous Circuits and Systems, 2010

2009

Stream Processors.

[BibT_eX]

[DOI]

Proceedings of the Multicore Processors and Systems, 2009

Cost-Efficient Dragonfly Topology for Large-Scale Systems.

[BibT_eX]

[DOI]

IEEE Micro, 2009

Operand Registers and Explicit Operand Forwarding.

[BibT_eX]

[DOI]

R. C. Halting

IEEE Comput. Archit. Lett., 2009

Router designs for elastic buffer on-chip networks.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

Allocator implementations for network-on-chip routers.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

Indirect adaptive routing on large scale interconnection networks.

[BibT_eX]

[DOI]

Proceedings of the 36th International Symposium on Computer Architecture (ISCA 2009), 2009

Elastic-buffer flow control for on-chip networks.

[BibT_eX]

[DOI]

Proceedings of the 15th International Conference on High-Performance Computer Architecture (HPCA-15 2009), 2009

2008

A Programmable 512 GOPS Stream Processor for Signal, Image, and Video Processing.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, 2008

Efficient Embedded Computing.

[BibT_eX]

[DOI]

Computer, 2008

Hierarchical Instruction Register Organization.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2008

An Energy-Efficient Processor Architecture for Embedded Systems.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2008

A portable runtime interface for multi-level memory hierarchies.

[BibT_eX]

[DOI]

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008

Technology-Driven, Highly-Scalable Dragonfly Topology.

[BibT_eX]

[DOI]

Proceedings of the 35th International Symposium on Computer Architecture (ISCA 2008), 2008

Stream Scheduling: A Framework to Manage Bulk Operations in Memory Hierarchies.

[BibT_eX]

[DOI]

Abhishek Das

Proceedings of the Euro-Par 2008, 2008

A tuning framework for software-managed memory hierarchies.

[BibT_eX]

[DOI]

Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, 2008

2007

Research Challenges for On-Chip Interconnection Networks.

[BibT_eX]

[DOI]

John D. Owens

Doddaballapur Narasimha-Murthy Jayasimha

Ron Ho

IEEE Micro, 2007

A 14-mW 6.25-Gb/s Transceiver in 90-nm CMOS.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, 2007

Flattened Butterfly Topology for On-Chip Networks.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2007

Compilation for explicitly managed memory hierarchies.

[BibT_eX]

[DOI]

Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007

Enabling Technology for On-Chip Interconnection Networks.

[BibT_eX]

[DOI]

Proceedings of the First International Symposium on Networks-on-Chips, 2007

A 14mW 6.25Gb/s Transceiver in 90nm CMOS for Serial Chip-to-Chip Communications.

[BibT_eX]

[DOI]

Proceedings of the 2007 IEEE International Solid-State Circuits Conference, 2007

Future of on-chip interconnection architectures.

[BibT_eX]

[DOI]

Shekhar Borkar

Proceedings of the 2007 International Symposium on Low Power Electronics and Design, 2007

Flattened butterfly: a cost-efficient topology for high-radix networks.

[BibT_eX]

[DOI]

Dennis Abts

Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), 2007

Executing irregular scientific applications on stream architectures.

[BibT_eX]

[DOI]

Proceedings of the 21th Annual International Conference on Supercomputing, 2007

Tradeoff between data-, instruction-, and thread-level parallelism in stream processors.

[BibT_eX]

[DOI]

Proceedings of the 21th Annual International Conference on Supercomputing, 2007

Interconnect-Centric Computing.

[BibT_eX]

[DOI]

Proceedings of the 13st International Conference on High-Performance Computer Architecture (HPCA-13 2007), 2007

[BibT_eX]

[DOI]

Proceedings of the 2007 Design, Automation and Test in Europe Conference and Exposition, 2007

Architectural Support for the Stream Execution Model on General-Purpose Processors.

[BibT_eX]

[DOI]

Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007

Stream Scheduling: A Framework to Manage Bulk Operations in a Memory Hierarchy.

[BibT_eX]

[DOI]

Abhishek Das

Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007

2006

Topology optimization of interconnection networks.

[BibT_eX]

[DOI]

Amit K. Gupta

IEEE Comput. Archit. Lett., 2006

Data parallel address architecture.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2006

Multi-core issues - Multi-Core for HPC: breakthrough or breakdown?

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Interconnect routing and scheduling - Adaptive routing in high-radix clos network.

[BibT_eX]

[DOI]

Dennis Abts

Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Sequoia: programming the memory hierarchy.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Architecture - The design space of data-parallel memory systems.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

The BlackWidow High-Radix Clos Network.

[BibT_eX]

[DOI]

Proceedings of the 33rd International Symposium on Computer Architecture (ISCA 2006), 2006

Design tradeoffs for tiled CMP on-chip networks.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual International Conference on Supercomputing, 2006

Computer Architecture in the Many-Core Era.

[BibT_eX]

[DOI]

Proceedings of the 24th International Conference on Computer Design (ICCD 2006), 2006

Pulsenet - A Parallel Flash Sampler and Digital Processor IC for Optical SETI.

[BibT_eX]

[DOI]

Proceedings of the IEEE 2006 Custom Integrated Circuits Conference, 2006

Compiling for stream processing.

[BibT_eX]

[DOI]

Abhishek Das

Peter R. Mattson

Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques (PACT 2006), 2006

2005

Hot Chips 16: Power, Parallelism, and Memory Performance.

[BibT_eX]

[DOI]

Keith Diefendorff

IEEE Micro, 2005

A 20-Gb/s 0.13-μm CMOS serial link transmitter using an LC-PLL to directly drive the output multiplexer.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, 2005

Fault Tolerance Techniques for the Merrimac Streaming Supercomputer.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE SC2005 Conference on High Performance Networking and Computing, 2005

Microarchitecture of a High-Radix Router.

[BibT_eX]

[DOI]

Proceedings of the 32st International Symposium on Computer Architecture (ISCA 2005), 2005

Scatter-Add in Data Parallel Architectures.

[BibT_eX]

[DOI]

Proceedings of the 11th International Conference on High-Performance Computer Architecture (HPCA-11 2005), 2005

Explaining the gap between ASIC and custom power: a custom perspective.

[BibT_eX]

[DOI]

Andrew Chang

Proceedings of the 42nd Design Automation Conference, 2005

2004

Stream Processors: Progammability and Efficiency.

[BibT_eX]

[DOI]

ACM Queue, 2004

A 33-mW 8-Gb/s CMOS clock multiplier and CDR for highly integrated I/Os.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, 2004

Globally Adaptive Load-Balanced Routing on Tori.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2004

Buffer and Delay Bounds in High Radix Interconnection Networks.

[BibT_eX]

[DOI]

Arjun Singh

IEEE Comput. Archit. Lett., 2004

The case for broader computer architecture education: keynote address.

[BibT_eX]

[DOI]

Proceedings of the 2004 workshop on Computer architecture education, 2004

Adaptive channel queue routing on k-ary n-cubes.

[BibT_eX]

[DOI]

Proceedings of the SPAA 2004: Proceedings of the Sixteenth Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2004

Analysis and Performance Results of a Molecular Modeling Application on Merrimac.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE SC2004 Conference on High Performance Networking and Computing, 2004

Evaluating the Imagine Stream Architecture.

[BibT_eX]

[DOI]

Proceedings of the 31st International Symposium on Computer Architecture (ISCA 2004), 2004

Stream Register Files with Indexed Access.

[BibT_eX]

[DOI]

Proceedings of the 10th International Conference on High-Performance Computer Architecture (HPCA-10 2004), 2004

2003

Guaranteed scheduling for switches with configuration overhead.

[BibT_eX]

[DOI]

IEEE/ACM Trans. Netw., 2003

A second-order semidigital clock recovery circuit based on injection locking.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, 2003

Jitter transfer characteristics of delay-locked loops - theories and design techniques.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, 2003

Programmable Stream Processors.

[BibT_eX]

[DOI]

Computer, 2003

Throughput-centric routing algorithm design.

[BibT_eX]

[DOI]

Stephen P. Boyd

Proceedings of the SPAA 2003: Proceedings of the Fifteenth Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2003

Merrimac: Supercomputing with Streams.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE SC2003 Conference on High Performance Networking and Computing, 2003

GOAL: A Load-Balanced Adaptive Routing Algorithm for Torus Networks.

[BibT_eX]

[DOI]

Proceedings of the 30th International Symposium on Computer Architecture (ISCA 2003), 2003

CMOS High-Speed I/Os - Present and Future.

[BibT_eX]

[DOI]

Proceedings of the 21st International Conference on Computer Design (ICCD 2003), 2003

Exploring the VLSI Scalability of Stream Processors.

[BibT_eX]

[DOI]

Proceedings of the Ninth International Symposium on High-Performance Computer Architecture (HPCA'03), 2003

A 33mW 8Gb/s CMOS clock multiplier and CDR for highly integrated I/Os.

[BibT_eX]

[DOI]

Proceedings of the IEEE Custom Integrated Circuits Conference, 2003

2002

A low-power multiplying DLL for low-jitter multigigahertz clock generation in highly integrated digital chips.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, 2002

Worst-case Traffic for Oblivious Routing Functions.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2002

Migration in Single Chip Multiprocessors.

[BibT_eX]

[DOI]

Kelly A. Shaw

IEEE Comput. Archit. Lett., 2002

Locality-preserving randomized oblivious routing on torus networks.

[BibT_eX]

[DOI]

Proceedings of the Fourteenth Annual ACM Symposium on Parallel Algorithms and Architectures, 2002

A Stream Processor Development Platform.

[BibT_eX]

[DOI]

Proceedings of the 20th International Conference on Computer Design (ICCD 2002), 2002

Media Processing Applications on the Imagine Stream Processor.

[BibT_eX]

[DOI]

Proceedings of the 20th International Conference on Computer Design (ICCD 2002), 2002

VLSI Design and Verification of the Imagine Processor.

[BibT_eX]

[DOI]

Proceedings of the 20th International Conference on Computer Design (ICCD 2002), 2002

The Imagine Stream Processor.

[BibT_eX]

[DOI]

Proceedings of the 20th International Conference on Computer Design (ICCD 2002), 2002

Scalable Opto-Electronic Network (SOENet).

[BibT_eX]

[DOI]

Proceedings of the 10th Annual IEEE Symposium on High Performance Interconnects (HOTIC 2002), August 21, 2002

Comparing Reyes and OpenGL on a Stream Architecture.

[BibT_eX]

[DOI]

Proceedings of the 2002 ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware, 2002

2001

A Delay Model for Router Microarchitectures.

[BibT_eX]

[DOI]

IEEE Micro, 2001

Imagine: Media Processing with Streams.

[BibT_eX]

[DOI]

IEEE Micro, 2001

Guest Editors' Introduction: Hot Chips 12.

[BibT_eX]

[DOI]

Marc Tremblay

Allen J. Baum

IEEE Micro, 2001

Monolithic chaotic communications system.

[BibT_eX]

[DOI]

Patrick Chiang

Ming-Ju Edward Lee

Proceedings of the 2001 International Symposium on Circuits and Systems, 2001

A Delay Model and Speculative Architecture for Pipelined Routers.

[BibT_eX]

[DOI]

Proceedings of the Seventh International Symposium on High-Performance Computer Architecture (HPCA'01), 2001

Route Packets, Not Wires: On-Chip Interconnection Networks.

[BibT_eX]

[DOI]

Proceedings of the 38th Design Automation Conference, 2001

Digital systems engineering.

[BibT_eX]

John W. Poulton

Cambridge University Press, ISBN: 978-0-521-59292-5, 2001

2000

Low-power area-efficient high-speed I/O circuit techniques.

[BibT_eX]

[DOI]

Ming-Ju Edward Lee

Patrick Chiang

IEEE J. Solid State Circuits, 2000

Efficient conditional operations for data-parallel architectures.

[BibT_eX]

[DOI]

Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture, 2000

Processor Mechanisms for Software Shared Memory.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing, Third International Symposium, 2000

Memory access scheduling.

[BibT_eX]

[DOI]

Proceedings of the 27th International Symposium on Computer Architecture (ISCA 2000), 2000

Smart Memories: a modular reconfigurable architecture.

[BibT_eX]

[DOI]

Proceedings of the 27th International Symposium on Computer Architecture (ISCA 2000), 2000

[BibT_eX]

[DOI]

Proceedings of the Sixth International Symposium on High-Performance Computer Architecture, 2000

Flit-Reservation Flow Control.

[BibT_eX]

[DOI]

Proceedings of the Sixth International Symposium on High-Performance Computer Architecture, 2000

Polygon Rendering on a Stream Architecture.

[BibT_eX]

[DOI]

Proceedings of the 2000 ACM SIGGRAPH/EUROGRAPHICS Workshop on Graphics Hardware, 2000

The role of custom design in ASIC Chips.

[BibT_eX]

[DOI]

Andrew Chang

Proceedings of the 37th Conference on Design Automation, 2000

Communication Scheduling.

[BibT_eX]

[DOI]

Proceedings of the ASPLOS-IX Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems, 2000

1999

Concurrent Event Handling through Multithreading.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 1999

VLSI Architecture: Past, Present, and Future.

[BibT_eX]

[DOI]

Steve Lacy

Proceedings of the 18th Conference on Advanced Research in VLSI (ARVLSI '99), 1999

1998

The bleeding edge.

[BibT_eX]

[DOI]

Randall Rettberg

David E. Culler

IEEE Micro, 1998

A tracking clock recovery receiver for 4-Gbps signaling.

[BibT_eX]

[DOI]

John Poulton

Steve Tell

IEEE Micro, 1998

An Efficient, Protected Message Interface.

[BibT_eX]

[DOI]

Computer, 1998

Point Sample Rendering.

[BibT_eX]

[DOI]

J. P. Grossman

Proceedings of the Rendering Techniques '98, Proceedings of the Eurographics Workshop in Vienna, Austria, June 29, 1998

A Bandwidth-efficient Architecture for Media Processing.

[BibT_eX]

[DOI]

Abelardo López-Lagunas

Peter R. Mattson

John D. Owens

Proceedings of the 31st Annual IEEE/ACM International Symposium on Microarchitecture, 1998

Exploiting Fine-grain Thread Level Parallelism on the MIT Multi-ALU Processor.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual International Symposium on Computer Architecture, 1998

Retrospective: the J-machine.

[BibT_eX]

[DOI]

Proceedings of the 25 Years of the International Symposia on Computer Architecture (Selected Papers)., 1998

Architecture of a Message-Driven Processor.

[BibT_eX]

[DOI]

Proceedings of the 25 Years of the International Symposia on Computer Architecture (Selected Papers)., 1998

The effects of explicitly parallel mechanisms on the multi-ALU processor cluster pipeline.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computer Design: VLSI in Computers and Processors, 1998

1997

Extended Ehemeral Logging: Log Storage Management for Applications with Long Lived Transactions.

[BibT_eX]

[DOI]

John S. Keen

ACM Trans. Database Syst., 1997

Transmitter equalization for 4-Gbps signaling.

[BibT_eX]

[DOI]

John W. Poulton

IEEE Micro, 1997

The M-machine multicomputer.

[BibT_eX]

[DOI]

Int. J. Parallel Program., 1997

1995

Thread prioritization: A thread scheduling mechanism for multiple-context parallel processors.

[BibT_eX]

[DOI]

Stuart Fiske

Future Gener. Comput. Syst., 1995

Evaluating the Locality Benefits of Active Messages.

[BibT_eX]

[DOI]

Ellen Spertus

Proceedings of the Fifth ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPOPP), 1995

The Named-State Register File: Implementation and Performance.

[BibT_eX]

[DOI]

Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture (HPCA 1995), 1995

Low-latency plesiochronous data retiming.

[BibT_eX]

[DOI]

Larry R. Dennison

Thucydides Xanthopoulos

Proceedings of the 16th Conference on Advanced Research in VLSI (ARVLSI '95), 1995

1994

Architectural and implementation issues for multithreading (panel session I).

[BibT_eX]

[DOI]

SIGARCH Comput. Archit. News, 1994

The Reliable Router: A Reliable and High-Performance Communication Substrate for Parallel Computers.

[BibT_eX]

[DOI]

Thucydides Xanthopoulos

Proceedings of the Parallel Computer Routing and Communication, 1994

Architecture and implementation of the reliable router.

[BibT_eX]

[DOI]

Thucydides Xanthopoulos

Proceedings of the Hot Interconnects II, 1994

XEL: Extended Ephemeral Logging for Log Storage Management.

[BibT_eX]

[DOI]

John S. Keen

Proceedings of the Third International Conference on Information and Knowledge Management (CIKM'94), Gaithersburg, Maryland, USA, November 29, 1994

Hardware Support for Fast Capability-based Addressing.

[BibT_eX]

[DOI]

Nicholas P. Carter

Proceedings of the ASPLOS-VI Proceedings, 1994

Named State and Efficient Context Switching.

[BibT_eX]

[DOI]

Proceedings of the Multithreaded Computer Architecture, 1994

Subspace Optimizations.

[BibT_eX]

[DOI]

Kathleen Knobe

Proceedings of the Automatic Parallelization: New Approaches to Code Generation, 1994

Issues in the Design and Implementation of Instruction Processors for Multicomputers (Position Statement).

[BibT_eX]

[DOI]

Proceedings of the Multithreaded Computer Architecture, 1994

1993

Deadlock-Free Adaptive Routing in Multicomputer Networks Using Virtual Channels.

[BibT_eX]

[DOI]

Hiromichi Aoki

IEEE Trans. Parallel Distributed Syst., 1993

A Universal Parallel Computer Architecture.

[BibT_eX]

[DOI]

New Gener. Comput., 1993

Performance Evaluation of Ephemeral Logging.

[BibT_eX]

[DOI]

John S. Keen

Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, 1993

Evaluation of Mechanisms for Fine-Grained Parallel Programs in the J-Machine and the CM-5.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual International Symposium on Computer Architecture, 1993

The J-Machine Multicomputer: An Architectural Evaluation.

[BibT_eX]

[DOI]

Michael D. Noakes

Deborah A. Wallach

Proceedings of the 20th Annual International Symposium on Computer Architecture, 1993

1992

Virtual-Channel Flow Control.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 1992

A Fast Translation Method for Paging on top of Segmentation.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 1992

The message-driven processor: a multicomputer processing node with efficient mechanisms.

[BibT_eX]

[DOI]

IEEE Micro, 1992

Processor Coupling: Integrating Compile Time and Runtime Scheduling for Parallelism.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual International Symposium on Computer Architecture. Gold Coast, 1992

The J-Machine Network.

[BibT_eX]

[DOI]

Proceedings of the Proceedings 1992 IEEE International Conference on Computer Design: VLSI in Computer & Processors, 1992

MDP Design Tools and Methods.

[BibT_eX]

[DOI]

Richard A. Lethin

Proceedings of the Proceedings 1992 IEEE International Conference on Computer Design: VLSI in Computer & Processors, 1992

The Message Driven Processor: An Integrated Multicomputer Processing Element.

[BibT_eX]

[DOI]

Proceedings of the Proceedings 1992 IEEE International Conference on Computer Design: VLSI in Computer & Processors, 1992

1991

Express Cubes: Improving the Performance of k-Ary n-Cube Interconnection Networks.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 1991

Experiences Implementing Dataflow on a General-Purpose Parallel Computer.

[BibT_eX]

Ellen Spertus

Proceedings of the International Conference on Parallel Processing, 1991

A Mechanism for Efficient Context Switching.

[BibT_eX]

[DOI]

Proceedings of the Proceedings 1991 IEEE International Conference on Computer Design: VLSI in Computer & Processors, 1991

1990

A hardware logic simulation system.

[BibT_eX]

[DOI]

Prathima Agrawal

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1990

Performance Analysis of k-Ary n-Cube Interconnection Networks.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 1990

Concurrent Aggregates (CA).

[BibT_eX]

[DOI]

Andrew A. Chien

Proceedings of the Second ACM SIGPLAN Symposium on Princiles & Practice of Parallel Programming (PPOPP), 1990

Simultaneous bidirectional signalling for IC systems.

[BibT_eX]

[DOI]

Kevin Lam

Larry R. Dennison

Proceedings of the 1990 IEEE International Conference on Computer Design: VLSI in Computers and Processors, 1990

1989

Experience with CST: Programming and Implementation.

[BibT_eX]

[DOI]

Waldemar Horwat

Andrew A. Chien

Proceedings of the ACM SIGPLAN'89 Conference on Programming Language Design and Implementation (PLDI), 1989

Universal Mechanisms for Concurrency.

[BibT_eX]

[DOI]

D. Scott Wills

Proceedings of the PARLE '89: Parallel Architectures and Languages Europe, 1989

The J-Machine: A Fine-Gain Concurrent Computer.

[BibT_eX]

Proceedings of the Information Processing 89, Proceedings of the IFIP 11th World Computer Congress, San Francisco, USA, August 28, 1989

Algorithms for Accuracy Enhancement in a Hardware Logic Simulator.

[BibT_eX]

[DOI]

Prathima Agrawal

Raffi Tutundjian

Proceedings of the 26th ACM/IEEE Design Automation Conference, 1989

Micro-Optimization of Floating Point Operations.

[BibT_eX]

[DOI]

Proceedings of the ASPLOS-III Proceedings, 1989

1988

Object-oriented concurrent programming in CST.

[BibT_eX]

[DOI]

Andrew A. Chien

Proceedings of the 1988 ACM SIGPLAN Workshop on Object-based Concurrent Programming, 1988

The Reconfigurable Arithmetic Processor.

[BibT_eX]

[DOI]

Stuart Fiske

Proceedings of the 15th Annual International Symposium on Computer Architecture, 1988

Mechanisms for Concurrent Computing.

[BibT_eX]

Proceedings of the International Conference on Fifth Generation Computer Systems, 1988

Finite-grain message passing concurrent computers.

[BibT_eX]

[DOI]

Proceedings of the Third Conference on Hypercube Concurrent Computers and Applications, 1988

1987

Deadlock-Free Message Routing in Multiprocessor Interconnection Networks.

[BibT_eX]

[DOI]

Charles L. Seitz

IEEE Trans. Computers, 1987

MARS: A Multiprocessor-Based Programmable Accelerator.

[BibT_eX]

[DOI]

IEEE Des. Test, 1987

Architecture and Design of the MARS Hardware Accelerator.

[BibT_eX]

[DOI]

Proceedings of the 24th ACM/IEEE Design Automation Conference. Miami Beach, FL, USA, June 28, 1987

1986

The Torus Routing Chip.

[BibT_eX]

[DOI]

Charles L. Seitz

Distributed Comput., 1986

1985

A Hardware Architecture for Switch-Level Simulation.

[BibT_eX]

[DOI]

Randal E. Bryant

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1985

An Object Oriented Architecture.

[BibT_eX]

[DOI]