We stand with Ukraine

We stand with Ukraine

Andreas Moshovos

Orcid: 0000-0001-7768-367X

Affiliations:

University of Toronto, Canada

According to our database¹, Andreas Moshovos authored at least 134 papers between 1997 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Awards

ACM Fellow

ACM Fellow 2017, "For contributions to high-performance architecture including memory dependence prediction and snooping coherence".

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

Online presence:

on orcid.org
on eecg.utoronto.ca

On csauthors.net:

Bibliography

2024

Schrodinger's FP Training Neural Networks with Dynamic Floating-Point Containers.

[BibT_eX]

[DOI]

,

Enrique Torres-Sánchez

,

,

,

Mostafa Mahmoud

,

Ameer Abdelhadi

,

,

Andreas Moshovos

Proceedings of the Seventh Annual Conference on Machine Learning and Systems, 2024

BitPruning: Learning Bitlengths for Aggressive and Accurate Quantization.

[BibT_eX]

[DOI]

,

Ghouthi Boukli Hacene

,

,

Alberto Delmas Lascorz

,

Matthieu Courbariaux

,

Omar Mohamed Awad

,

Isak Edo Vivancos

,

,

,

Andreas Moshovos

Proceedings of the IEEE International Symposium on Circuits and Systems, 2024

Marple: Scalable Spike Sorting for Untethered Brain-Machine Interfacing.

[BibT_eX]

[DOI]

,

,

,

Mostafa Mahmoud

,

Christina Giannoula

,

Ameer Abdelhadi

,

Andreas Moshovos

Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

Atalanta: A Bit is Worth a "Thousand" Tensor Values.

[BibT_eX]

[DOI]

Alberto Delmas Lascorz

,

Mostafa Mahmoud

,

,

,

,

Christina Giannoula

,

Ameer Abdelhadi

,

Andreas Moshovos

Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023

39 000-Subexposures/s Dual-ADC CMOS Image Sensor With Dual-Tap Coded-Exposure Pixels for Single-Shot HDR and 3-D Computational Imaging.

[BibT_eX]

[DOI]

,

Navid Sarhangnejad

,

,

,

,

,

,

,

,

,

Esther Y. H. Lin

,

,

,

,

Ameer M. S. Abdelhadi

,

Andreas Moshovos

,

Kiriakos N. Kutulakos

,

IEEE J. Solid State Circuits, November, 2023

cuSCNN : an Efficient CUDA Implementation of Sparse CNNs.

[BibT_eX]

[DOI]

Mohamed A. Elgammal

,

Omar Mohamed Awad

,

Isak Edo Vivancos

,

Andreas Moshovos

,

Proceedings of the 13th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies, 2023

2022

Schrödinger's FP: Dynamic Adaptation of Floating-Point Containers for Deep Learning Training.

[BibT_eX]

[DOI]

,

Enrique Torres-Sánchez

,

,

,

Mostafa Mahmoud

,

Ameer Abdelhadi

,

Andreas Moshovos

CoRR, 2022

APack: Off-Chip, Lossless Data Compression for Efficient Deep Learning Inference.

[BibT_eX]

[DOI]

Alberto Delmas Lascorz

,

Mostafa Mahmoud

,

Andreas Moshovos

CoRR, 2022

A 39, 000 Subexposures/s CMOS Image Sensor with Dual-tap Coded-exposure Data-memory Pixel for Adaptive Single-shot Computational Imaging.

[BibT_eX]

[DOI]

,

Navid Sarhangnejad

,

,

,

,

,

,

,

,

,

Esther Y. H. Lin

,

,

,

,

Ameer Abdelhadi

,

Andreas Moshovos

,

Kiriakos N. Kutulakos

,

Proceedings of the IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits 2022), 2022

Mokey: enabling narrow fixed-point inference for out-of-the-box floating-point transformer models.

[BibT_eX]

[DOI]

,

Mostafa Mahmoud

,

Ameer Abdelhadi

,

Andreas Moshovos

Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

A Massive-Scale Brain Activity Decoding Chip.

[BibT_eX]

[DOI]

Ameer Abdelhadi

,

,

Andreas Moshovos

Proceedings of the 2022 IEEE Hot Chips 34 Symposium, 2022

2021

Boveda: Building an On-Chip Deep Learning Memory Hierarchy Brick by Brick.

[BibT_eX]

[DOI]

Isak Edo Vivancos

,

,

,

Ameer Abdelhadi

,

,

,

Mostafa Mahmoud

,

Alberto Delmas Lascorz

,

Gennady Pekhimenko

,

Andreas Moshovos

Proceedings of the Fourth Conference on Machine Learning and Systems, 2021

FPRaker: A Processing Element For Accelerating Neural Network Training.

[BibT_eX]

[DOI]

Omar Mohamed Awad

,

Mostafa Mahmoud

,

,

,

,

Anand Jayarajan

,

Gennady Pekhimenko

,

Andreas Moshovos

Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

Noema: Hardware-Efficient Template Matching for Neural Population Pattern Detection.

[BibT_eX]

[DOI]

Ameer M. S. Abdelhadi

,

,

,

Hendrik Steenland

,

Andreas Moshovos

Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

2020

TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network Training and Inference.

[BibT_eX]

[DOI]

Mostafa Mahmoud

,

Isak Edo Vivancos

,

,

Omar Mohamed Awad

,

Gennady Pekhimenko

,

Jorge Albericio

,

Andreas Moshovos

CoRR, 2020

GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference.

[BibT_eX]

[DOI]

,

Andreas Moshovos

CoRR, 2020

BitPruning: Learning Bitlengths for Aggressive and Accurate Quantization.

[BibT_eX]

[DOI]

,

Ghouthi Boukli Hacene

,

,

Alberto Delmas Lascorz

,

Matthieu Courbariaux

,

,

,

Andreas Moshovos

CoRR, 2020

GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference.

[BibT_eX]

[DOI]

,

,

Omar Mohamed Awad

,

Andreas Moshovos

Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network Training.

[BibT_eX]

[DOI]

Mostafa Mahmoud

,

,

,

Omar Mohamed Awad

,

Gennady Pekhimenko

,

Jorge Albericio

,

Andreas Moshovos

Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

Late Breaking Results: Building an On-Chip Deep Learning Memory Hierarchy Brick by Brick.

[BibT_eX]

[DOI]

Isak Edo Vivancos

,

,

,

,

Mostafa Mahmoud

,

Alberto Delmas Lascorz

,

Andreas Moshovos

Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

2019

Accelerating Image-Sensor-Based Deep Learning Applications.

[BibT_eX]

[DOI]

Mostafa Mahmoud

,

Dylan Malone Stuart

,

,

Alberto Delmas Lascorz

,

,

,

,

,

Isak Edo Vivancos

,

Jorge Albericio

,

Andreas Moshovos

IEEE Micro, 2019

Training CNNs faster with Dynamic Input and Kernel Downsampling.

[BibT_eX]

[DOI]

,

,

Andreas Moshovos

CoRR, 2019

ShapeShifter: Enabling Fine-Grain Data Width Adaptation in Deep Learning.

[BibT_eX]

[DOI]

Alberto Delmas Lascorz

,

,

Isak Edo Vivancos

,

Dylan Malone Stuart

,

Omar Mohamed Awad

,

,

Mostafa Mahmoud

,

,

,

,

Andreas Moshovos

Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

Characterizing Sources of Ineffectual Computations in Deep Learning Networks.

[BibT_eX]

[DOI]

,

Mostafa Mahmoud

,

Andreas Moshovos

,

,

Robert D. Mullins

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2019

Laconic deep learning inference acceleration.

[BibT_eX]

[DOI]

,

Alberto Delmas Lascorz

,

Mostafa Mahmoud

,

,

,

Dylan Malone Stuart

,

,

Andreas Moshovos

Proceedings of the 46th International Symposium on Computer Architecture, 2019

Deep Learning Language Modeling Workloads: Where Time Goes on Graphics Processors.

[BibT_eX]

[DOI]

,

,

Andreas Moshovos

Proceedings of the IEEE International Symposium on Workload Characterization, 2019

SW+: On Accelerating Smith-Waterman Execution of GATK HaplotypeCaller.

[BibT_eX]

[DOI]

,

Andreas Moshovos

Proceedings of the Computational Intelligence Methods for Bioinformatics and Biostatistics, 2019

MemAlign: A Memory Structure to Accelerate Gene Sequencing.

[BibT_eX]

[DOI]

,

Andreas Moshovos

Proceedings of the 19th IEEE International Conference on Bioinformatics and Bioengineering, 2019

BWA-MEM Performance: Suffix Array Storage Size.

[BibT_eX]

[DOI]

,

,

Andreas Moshovos

Proceedings of the 2019 IEEE EMBS International Conference on Biomedical & Health Informatics, 2019

Bit-Tactical: A Software/Hardware Approach to Exploiting Value and Bit Sparsity in Neural Networks.

[BibT_eX]

[DOI]

Alberto Delmas Lascorz

,

,

Dylan Malone Stuart

,

,

Mostafa Mahmoud

,

,

,

,

Andreas Moshovos

Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019

2018

Proteus: Exploiting precision variability in deep neural networks.

[BibT_eX]

[DOI]

,

Jorge Albericio

,

Tayler H. Hetherington

,

,

Natalie D. Enright Jerger

,

,

Andreas Moshovos

Parallel Comput., 2018

Value-Based Deep-Learning Acceleration.

[BibT_eX]

[DOI]

Andreas Moshovos

,

Jorge Albericio

,

,

Alberto Delmas Lascorz

,

,

Tayler H. Hetherington

,

,

Natalie D. Enright Jerger

IEEE Micro, 2018

Laconic Deep Learning Computing.

[BibT_eX]

[DOI]

,

Mostafa Mahmoud

,

Alberto Delmas Lascorz

,

,

Andreas Moshovos

CoRR, 2018

DPRed: Making Typical Activation Values Matter In Deep Learning Computing.

[BibT_eX]

[DOI]

,

,

,

,

Andreas Moshovos

CoRR, 2018

Bit-Tactical: Exploiting Ineffectual Computations in Convolutional Neural Networks: Which, Why, and How.

[BibT_eX]

[DOI]

,

,

Dylan Malone Stuart

,

,

Mostafa Mahmoud

,

,

,

Andreas Moshovos

CoRR, 2018

Exploiting Typical Values to Accelerate Deep Learning.

[BibT_eX]

[DOI]

Andreas Moshovos

,

Jorge Albericio

,

,

Alberto Delmas Lascorz

,

,

,

Tayler H. Hetherington

,

,

Natalie D. Enright Jerger

Computer, 2018

Identifying and Exploiting Ineffectual Computations to Enable Hardware Acceleration of Deep Learning.

[BibT_eX]

[DOI]

Andreas Moshovos

,

Jorge Albericio

,

,

,

,

Mostafa Mahmoud

,

Tayler H. Hetherington

,

,

Dylan Malone Stuart

,

,

,

,

Natalie D. Enright Jerger

Proceedings of the 16th IEEE International New Circuits and Systems Conference, 2018

Value-Based Deep Learning Hardware Acceleration.

[BibT_eX]

[DOI]

Andreas Moshovos

Proceedings of the 11th International Workshop on Network on Chip Architectures, 2018

Diffy: a Déjà vu-Free Differential Deep Neural Network Accelerator.

[BibT_eX]

[DOI]

Mostafa Mahmoud

,

,

Andreas Moshovos

Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

Memory Requirements for Convolutional Neural Network Hardware Accelerators.

[BibT_eX]

[DOI]

,

Dylan Malone Stuart

,

Mostafa Mahmoud

,

Andreas Moshovos

Proceedings of the 2018 IEEE International Symposium on Workload Characterization, 2018

Gene Sequencing: Where Time Goes.

[BibT_eX]

[DOI]

,

Andreas Moshovos

Proceedings of the 2018 IEEE International Symposium on Workload Characterization, 2018

Characterizing Sources of Ineffectual Computations in Deep Learning Networks.

[BibT_eX]

[DOI]

,

Mostafa Mahmoud

,

Andreas Moshovos

Proceedings of the 2018 IEEE International Symposium on Workload Characterization, 2018

Loom: exploiting weight and activation precisions to accelerate convolutional neural networks.

[BibT_eX]

[DOI]

,

Alberto Delmas Lascorz

,

,

,

Andreas Moshovos

Proceedings of the 55th Annual Design Automation Conference, 2018

2017

Loom: Exploiting Weight and Activation Precisions to Accelerate Convolutional Neural Networks.

[BibT_eX]

[DOI]

,

Alberto Delmas Lascorz

,

,

Andreas Moshovos

CoRR, 2017

Cnvlutin2: Ineffectual-Activation-and-Weight-Free Deep Neural Network Computing.

[BibT_eX]

[DOI]

,

Alberto Delmas Lascorz

,

,

Andreas Moshovos

CoRR, 2017

Tartan: Accelerating Fully-Connected and Convolutional Layers in Deep Learning Networks by Exploiting Numerical Precision Variability.

[BibT_eX]

[DOI]

,

,

,

Andreas Moshovos

CoRR, 2017

Dynamic Stripes: Exploiting the Dynamic Precision Requirements of Activation Values in Neural Networks.

[BibT_eX]

[DOI]

,

,

,

Andreas Moshovos

CoRR, 2017

Stripes: Bit-Serial Deep Neural Network Computing.

[BibT_eX]

[DOI]

,

Jorge Albericio

,

Andreas Moshovos

IEEE Comput. Archit. Lett., 2017

IDEAL: image denoising accelerator.

[BibT_eX]

[DOI]

Mostafa Mahmoud

,

,

Alberto Delmas Lascorz

,

,

Jonathan Assouline

,

,

,

Andreas Moshovos

Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

Bit-pragmatic deep neural network computing.

[BibT_eX]

[DOI]

Jorge Albericio

,

,

,

,

,

,

Andreas Moshovos

Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

Bit-Pragmatic Deep Neural Network Computing.

[BibT_eX]

[DOI]

Jorge Albericio

,

,

,

,

Andreas Moshovos

Proceedings of the 5th International Conference on Learning Representations, 2017

2016

Stripes: Bit-serial deep neural network computing.

[BibT_eX]

[DOI]

,

Jorge Albericio

,

Tayler H. Hetherington

,

,

Andreas Moshovos

Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

Message from the program chair.

[BibT_eX]

[DOI]

Andreas Moshovos

Proceedings of the 2016 IEEE International Symposium on Performance Analysis of Systems and Software, 2016

Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing.

[BibT_eX]

[DOI]

Jorge Albericio

,

,

Tayler H. Hetherington

,

,

Natalie D. Enright Jerger

,

Andreas Moshovos

Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

Memory controller design under cloud workloads.

[BibT_eX]

[DOI]

Mostafa Mahmoud

,

Andreas Moshovos

Proceedings of the 2016 IEEE International Symposium on Workload Characterization, 2016

Proteus: Exploiting Numerical Precision Variability in Deep Neural Networks.

[BibT_eX]

[DOI]

,

Jorge Albericio

,

Tayler H. Hetherington

,

,

Natalie D. Enright Jerger

,

Andreas Moshovos

Proceedings of the 2016 International Conference on Supercomputing, 2016

2015

Reduced-Precision Strategies for Bounded Memory in Deep Neural Nets.

[BibT_eX]

[DOI]

,

Jorge Albericio

,

Tayler H. Hetherington

,

,

Natalie D. Enright Jerger

,

,

Andreas Moshovos

CoRR, 2015

Doppelgänger: a cache for approximate computing.

[BibT_eX]

[DOI]

Joshua San Miguel

,

Jorge Albericio

,

Andreas Moshovos

,

Natalie D. Enright Jerger

Proceedings of the 48th International Symposium on Microarchitecture, 2015

Self-contained, accurate precomputation prefetching.

[BibT_eX]

[DOI]

,

,

Vijayalakshmi Srinivasan

,

,

Andreas Moshovos

Proceedings of the 48th International Symposium on Microarchitecture, 2015

QTrace: a framework for customizable full system instrumentation.

[BibT_eX]

[DOI]

,

Andreas Moshovos

Proceedings of the 2015 IEEE International Symposium on Performance Analysis of Systems and Software, 2015

Prediction-based superpage-friendly TLB designs.

[BibT_eX]

[DOI]

Misel-Myrto Papadopoulou

,

,

,

Andreas Moshovos

Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

2014

Optimizing Memory Translation Emulation in Full System Emulators.

[BibT_eX]

[DOI]

,

,

Motohiro Kawahito

,

Andreas Moshovos

ACM Trans. Archit. Code Optim., 2014

ADDICT: Advanced Instruction Chasing for Transactions.

[BibT_eX]

[DOI]

,

,

Anastasia Ailamaki

,

Andreas Moshovos

Proc. VLDB Endow., 2014

Evaluating the memory system behavior of smartphone workloads.

[BibT_eX]

[DOI]

,

,

,

,

Michel Elnacouzi

,

,

Jorge Albericio

,

Natalie D. Enright Jerger

,

Andreas Moshovos

,

Kyros Kutulakos

,

Proceedings of the XIVth International Conference on Embedded Computer Systems: Architectures, 2014

Advanced branch predictors for soft processors.

[BibT_eX]

[DOI]

,

Andreas Moshovos

Proceedings of the 2014 International Conference on ReConFigurable Computing and FPGAs, 2014

What limits the operating frequency of a soft processor design.

[BibT_eX]

[DOI]

,

Andreas Moshovos

Proceedings of the 2014 International Conference on ReConFigurable Computing and FPGAs, 2014

Wormhole: Wisely Predicting Multidimensional Branches.

[BibT_eX]

[DOI]

Jorge Albericio

,

Joshua San Miguel

,

Natalie D. Enright Jerger

,

Andreas Moshovos

Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

BarTLB: Barren page resistant TLB for managed runtime languages.

[BibT_eX]

[DOI]

,

Andreas Moshovos

Proceedings of the 32nd IEEE International Conference on Computer Design, 2014

Image Signal Processors on FPGAs.

[BibT_eX]

[DOI]

,

Andreas Moshovos

Proceedings of the 22nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2014

An Architectural Approach to Characterizing and Eliminating Sources of Inefficiency in a Soft Processor Design.

[BibT_eX]

[DOI]

,

Andreas Moshovos

Proceedings of the 22nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2014

2013

Multi-grain coherence directories.

[BibT_eX]

[DOI]

,

,

Andreas Moshovos

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

QTrace: An interface for customizable full system instrumentation.

[BibT_eX]

[DOI]

,

,

Andreas Moshovos

Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2013

STREX: boosting instruction cache reuse in OLTP workloads through stratified transaction execution.

[BibT_eX]

[DOI]

,

,

,

Anastasia Ailamaki

,

Andreas Moshovos

Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013

RECAP: A region-based cure for the common cold (cache).

[BibT_eX]

[DOI]

,

,

,

Vijayalakshmi Srinivasan

,

Andreas Moshovos

Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013

Low-cost, high-performance branch predictors for soft processors.

[BibT_eX]

[DOI]

,

,

Andreas Moshovos

Proceedings of the 23rd International Conference on Field programmable Logic and Applications, 2013

Characterizing the performance benefits of fused CPU/GPU systems using FusionSim.

[BibT_eX]

[DOI]

Vitaly Zakharenko

,

,

Andreas Moshovos

Proceedings of the Design, Automation and Test in Europe, 2013

A dual grain hit-miss detector for large die-stacked DRAM caches.

[BibT_eX]

[DOI]

Michel El-Nacouzi

,

,

Myrto Papadopoulou

,

,

Natalie D. Enright Jerger

,

Andreas Moshovos

Proceedings of the Design, Automation and Test in Europe, 2013

2012

NCOR: An FPGA-Friendly Nonblocking Data Cache for Soft Processors with Runahead Execution.

[BibT_eX]

[DOI]

,

Andreas Moshovos

Int. J. Reconfigurable Comput., 2012

SPREX: A soft processor with Runahead execution.

[BibT_eX]

[DOI]

,

Andreas Moshovos

Proceedings of the 2012 International Conference on Reconfigurable Computing and FPGAs, 2012

SLICC: Self-Assembly of Instruction Cache Collectives for OLTP Workloads.

[BibT_eX]

[DOI]

,

,

Anastasia Ailamaki

,

Andreas Moshovos

Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012

Toward virtualizing branch direction prediction.

[BibT_eX]

[DOI]

Maryam Sadooghi-Alvandi

,

,

Andreas Moshovos

Proceedings of the 2012 Design, Automation & Test in Europe Conference & Exhibition, 2012

Reducing OLTP instruction misses with thread migration.

[BibT_eX]

[DOI]

,

,

Anastasia Ailamaki

,

Andreas Moshovos

Proceedings of the Eighth International Workshop on Data Management on New Hardware, 2012

ReCaP: a region-based cure for the common cold cache.

[BibT_eX]

[DOI]

,

,

Vijayalakshmi Srinivasan

,

Andreas Moshovos

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

Pointy: a hybrid pointer prefetcher for managed runtime systems.

[BibT_eX]

[DOI]

,

,

Andreas Moshovos

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011

Two-Stage, Pipelined Register Renaming.

[BibT_eX]

[DOI]

,

Andreas Moshovos

,

Andreas G. Veneris

IEEE Trans. Very Large Scale Integr. Syst., 2011

2010

On the Latency and Energy of Checkpointed Superscalar Register Alias Tables.

[BibT_eX]

[DOI]

,

Andreas Moshovos

,

Andreas G. Veneris

IEEE Trans. Very Large Scale Integr. Syst., 2010

Making Address-Correlated Prefetching Practical.

[BibT_eX]

[DOI]

Thomas F. Wenisch

,

Michael Ferdman

,

Anastasia Ailamaki

,

,

Andreas Moshovos

IEEE Micro, 2010

An Efficient Non-blocking Data Cache for Soft Processors.

[BibT_eX]

[DOI]

,

Andreas Moshovos

Proceedings of the ReConFig'10: 2010 International Conference on Reconfigurable Computing and FPGAs, 2010

Demystifying GPU microarchitecture through microbenchmarking.

[BibT_eX]

[DOI]

,

Misel-Myrto Papadopoulou

,

Maryam Sadooghi-Alvandi

,

Andreas Moshovos

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2010

Design space exploration of instruction schedulers for out-of-order soft processors.

[BibT_eX]

[DOI]

,

Andreas Moshovos

Proceedings of the International Conference on Field-Programmable Technology, 2010

2009

A physical-level study of the compacted matrix instruction scheduler for dynamically-scheduled superscalar processors.

[BibT_eX]

[DOI]

,

Andreas Moshovos

,

Andreas G. Veneris

Proceedings of the 2009 International Conference on Embedded Computer Systems: Architectures, 2009

A tagless coherence directory.

[BibT_eX]

[DOI]

,

Vijayalakshmi Srinivasan

,

Moinuddin K. Qureshi

,

Andreas Moshovos

Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

Practical off-chip meta-data for temporal memory streaming.

[BibT_eX]

[DOI]

Thomas F. Wenisch

,

Michael Ferdman

,

Anastasia Ailamaki

,

,

Andreas Moshovos

Proceedings of the 15th International Conference on High-Performance Computer Architecture (HPCA-15 2009), 2009

Towards a viable out-of-order soft core: Copy-Free, checkpointed register renaming.

[BibT_eX]

[DOI]

,

Andreas Moshovos

Proceedings of the 19th International Conference on Field Programmable Logic and Applications, 2009

Phantom-BTB: a virtualized branch target buffer design.

[BibT_eX]

[DOI]

,

Andreas Moshovos

Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, 2009

2008

L-CBF: A Low-Power, Fast Counting Bloom Filter Architecture.

[BibT_eX]

[DOI]

,

Andreas Moshovos

,

Andreas G. Veneris

IEEE Trans. Very Large Scale Integr. Syst., 2008

Temporal instruction fetch streaming.

[BibT_eX]

[DOI]

Michael Ferdman

,

Thomas F. Wenisch

,

Anastasia Ailamaki

,

,

Andreas Moshovos

Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-41 2008), 2008

A physical level study and optimization of CAM-based checkpointed register alias table.

[BibT_eX]

[DOI]

,

Andreas Moshovos

,

Andreas G. Veneris

Proceedings of the 2008 International Symposium on Low Power Electronics and Design, 2008

Temporal streams in commercial server applications.

[BibT_eX]

[DOI]

Thomas F. Wenisch

,

Michael Ferdman

,

Anastasia Ailamaki

,

,

Andreas Moshovos

Proceedings of the 4th International Symposium on Workload Characterization (IISWC 2008), 2008

Turbo-ROB: A Low Cost Checkpoint/Restore Accelerator.

[BibT_eX]

[DOI]

,

Andreas Moshovos

Proceedings of the High Performance Embedded Architectures and Compilers, 2008

Predictor virtualization.

[BibT_eX]

[DOI]

,

Stephen Somogyi

,

Andreas Moshovos

,

Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems, 2008

2007

A Building Block for Coarse-Grain Optimizations in the On-Chip Memory Hierarchy.

[BibT_eX]

[DOI]

,

Andreas Moshovos

IEEE Comput. Archit. Lett., 2007

A Framework for Coarse-Grain Optimizations in the On-Chip Memory Hierarchy.

[BibT_eX]

[DOI]

,

,

Andreas Moshovos

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-40 2007), 2007

On the latency, energy and area of checkpointed, superscalar register alias tables.

[BibT_eX]

[DOI]

,

,

Andreas Moshovos

,

Andreas G. Veneris

,

Aggeliki Arapoyanni

Proceedings of the 2007 International Symposium on Low Power Electronics and Design, 2007

Mechanisms for store-wait-free multiprocessors.

[BibT_eX]

[DOI]

Thomas F. Wenisch

,

Anastassia Ailamaki

,

,

Andreas Moshovos

Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), 2007

2006

Coarse-Grain Coherence Tracking: RegionScout and Region Coherence Arrays.

[BibT_eX]

[DOI]

Jason F. Cantin

,

,

Mikko H. Lipasti

,

Andreas Moshovos

,

IEEE Micro, 2006

Spatial Memory Streaming.

[BibT_eX]

[DOI]

Stephen Somogyi

,

Thomas F. Wenisch

,

Anastassia Ailamaki

,

,

Andreas Moshovos

Proceedings of the 33rd International Symposium on Computer Architecture (ISCA 2006), 2006

BranchTap: improving performance with very few checkpoints through adaptive speculation control.

[BibT_eX]

[DOI]

,

Andreas Moshovos

Proceedings of the 20th Annual International Conference on Supercomputing, 2006

2005

A Case for Asymmetric-Cell Cache Memories.

[BibT_eX]

[DOI]

Andreas Moshovos

,

,

,

IEEE Trans. Very Large Scale Integr. Syst., 2005

RegionScout: Exploiting Coarse Grain Sharing in Snoop-Based Coherence.

[BibT_eX]

[DOI]

Andreas Moshovos

Proceedings of the 32st International Symposium on Computer Architecture (ISCA 2005), 2005

RECAST: Boosting Tag Line Buffer Coverage in Low-Power High-Level Caches "for Free".

[BibT_eX]

[DOI]

,

Andreas Moshovos

,

Proceedings of the 23rd International Conference on Computer Design (ICCD 2005), 2005

Memory State Compressors for Giga-Scale Checkpoint/Restore.

[BibT_eX]

[DOI]

Andreas Moshovos

,

Alexandros Kostopoulos

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT 2005), 2005

2004

SEPAS: a highly accurate energy-efficient branch predictor.

[BibT_eX]

[DOI]

Amirali Baniasadi

,

Andreas Moshovos

Proceedings of the 2004 International Symposium on Low Power Electronics and Design, 2004

Accurate and Complexity-Effective Spatial Pattern Prediction.

[BibT_eX]

[DOI]

,

,

,

Andreas Moshovos

Proceedings of the 10th International Conference on High-Performance Computer Architecture (HPCA-10 2004), 2004

2003

Low-leakage asymmetric-cell SRAM.

[BibT_eX]

[DOI]

,

,

Andreas Moshovos

IEEE Trans. Very Large Scale Integr. Syst., 2003

Behavior and Performance of Interactive Multi-Player Game Servers.

[BibT_eX]

[DOI]

Ahmed Abdelkhalek

,

,

Andreas Moshovos

Clust. Comput., 2003

Checkpointing alternatives for high performance, power-aware processors.

[BibT_eX]

[DOI]

Andreas Moshovos

Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003

2002

Reducing Memory Latency via Read-after-Read Memory Dependence Prediction.

[BibT_eX]

[DOI]

Andreas Moshovos

,

Gurindar S. Sohi

IEEE Trans. Computers, 2002

Asymmetric-frequency clustering: a power-aware back-end for high-performance processors.

[BibT_eX]

[DOI]

Amirali Baniasadi

,

Andreas Moshovos

Proceedings of the 2002 International Symposium on Low Power Electronics and Design, 2002

Branch Predictor Prediction: A Power-Aware Branch Predictor for High-Performance Processors.

[BibT_eX]

[DOI]

Amirali Baniasadi

,

Andreas Moshovos

Proceedings of the 20th International Conference on Computer Design (ICCD 2002), 2002

2001

Microarchitectural innovations: boosting microprocessor performance beyond semiconductor technology scaling.

[BibT_eX]

[DOI]

Andreas Moshovos

,

Gurindar S. Sohi

Proc. IEEE, 2001

Instruction flow-based front-end throttling for power-aware high-performance processors.

[BibT_eX]

[DOI]

Amirali Baniasadi

,

Andreas Moshovos

Proceedings of the 2001 International Symposium on Low Power Electronics and Design, 2001

Slice-processors: an implementation of operation-based prediction.

[BibT_eX]

[DOI]

Andreas Moshovos

,

Dionisios N. Pnevmatikatos

,

Amirali Baniasadi

Proceedings of the 15th international conference on Supercomputing, 2001

JETTY: Filtering Snoops for Reduced Energy Consumption in SMP Servers.

[BibT_eX]

[DOI]

Andreas Moshovos

,

,

,

Alok N. Choudhary

Proceedings of the Seventh International Symposium on High-Performance Computer Architecture (HPCA'01), 2001

2000

Memory Dependence Prediction in Multimedia Applications.

[BibT_eX]

[DOI]

Andreas Moshovos

,

Gurindar S. Sohi

J. Instr. Level Parallelism, 2000

Instruction distribution heuristics for quad-cluster, dynamically-scheduled, superscalar processors.

[BibT_eX]

[DOI]

Amirali Baniasadi

,

Andreas Moshovos

Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture, 2000

CHIMAERA: a high-performance architecture with a tightly-coupled reconfigurable functional unit.

[BibT_eX]

[DOI]

,

Andreas Moshovos

,

,

Prithviraj Banerjee

Proceedings of the 27th International Symposium on Computer Architecture (ISCA 2000), 2000

Memory Dependence Speculation Tradeoffs in Centralized, Continuous-Window Superscalar Processors.

[BibT_eX]

[DOI]

Andreas Moshovos

,

Gurindar S. Sohi

Proceedings of the Sixth International Symposium on High-Performance Computer Architecture, 2000

1999

Speculative Memory Cloaking and Bypassing.

[BibT_eX]

[DOI]

Andreas Moshovos

,

Gurindar S. Sohi

Int. J. Parallel Program., 1999

Read-After-Read Memory Dependence Prediction.

[BibT_eX]

[DOI]

Andreas Moshovos

,

Gurindar S. Sohi

Proceedings of the 32nd Annual IEEE/ACM International Symposium on Microarchitecture, 1999

Improving virtual function call target prediction via dependence-based pre-computation.

[BibT_eX]

[DOI]

,

Andreas Moshovos

,

Gurindar S. Sohi

Proceedings of the 13th international conference on Supercomputing, 1999

1998

Dependance Based Prefetching for Linked Data Structures.

[BibT_eX]

[DOI]

,

Andreas Moshovos

,

Gurindar S. Sohi

Proceedings of the ASPLOS-VIII Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems, 1998

1997

Streamlining Inter-Operation Memory Communication via Data Dependence Prediction.

[BibT_eX]

[DOI]

Andreas Moshovos

,

Gurindar S. Sohi

Proceedings of the Thirtieth Annual IEEE/ACM International Symposium on Microarchitecture, 1997

Dynamic Speculation and Synchronization of Data Dependences.

[BibT_eX]

[DOI]

Andreas Moshovos

,

Scott E. Breach

,

T. N. Vijaykumar

,

Gurindar S. Sohi

Proceedings of the 24th International Symposium on Computer Architecture, 1997

Loading...