Olivier Temam

According to our database1, Olivier Temam authored at least 107 papers between 1992 and 2020.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2020
ParaML: A Polyvalent Multicore Accelerator for Machine Learning.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

2017
An Accelerator for High Efficient Vision Processing.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2017

DaDianNao: A Neural Network Supercomputer.
IEEE Trans. Computers, 2017

Introduction to the workshop on trends in machine learning.
Proceedings of the Workshop on Trends in Machine-Learning (and impact on computer architecture), 2017

2016
DianNao family: energy-efficient hardware accelerators for machine learning.
Commun. ACM, 2016

Enabling future progress in machine-learning.
Proceedings of the 2016 IEEE Symposium on VLSI Circuits, 2016

2015
Robust Design Space Modeling.
ACM Trans. Design Autom. Electr. Syst., 2015

A Small-Footprint Accelerator for Large-Scale Neural Networks.
ACM Trans. Comput. Syst., 2015

Leveraging the Error Resilience of Neural Networks for Designing Highly Energy Efficient Accelerators.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2015

Statistical Performance Comparisons of Computers.
IEEE Trans. Computers, 2015

Practical Iterative Optimization for the Data Center.
ACM Trans. Archit. Code Optim., 2015

Alternative Computing Designs and Technologies.
IEEE Micro, 2015

A High-Throughput Neural Network Accelerator.
IEEE Micro, 2015

Cluster Cache Monitor: Leveraging the Proximity Data in CMP.
Int. J. Parallel Program., 2015

Neuromorphic accelerators: a comparison between neuroscience and machine-learning approaches.
Proceedings of the 48th International Symposium on Microarchitecture, 2015

ShiDianNao: shifting vision processing closer to the sensor.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

Hardware Neural Networks: From Inflated Expectations to Plateau of Productivity.
Proceedings of the Federated Computing Research Conference, 2015

Retraining-based timing error mitigation for hardware neural networks.
Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015

PuDianNao: A Polyvalent Machine Learning Accelerator.
Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, 2015

2014
Performance Portability Across Heterogeneous SoCs Using a Generalized Library-Based Approach.
ACM Trans. Archit. Code Optim., 2014

DaDianNao: A Machine-Learning Supercomputer.
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

ArchRanker: A ranking approach to design space exploration.
Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

A low-cost memory interface for high-throughput accelerators.
Proceedings of the 2014 International Conference on Compilers, 2014

The improbable but highly appropriate marriage of 3D stacking and neuromorphic accelerators.
Proceedings of the 2014 International Conference on Compilers, 2014

DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning.
Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2014

Leveraging the error resilience of machine-learning applications for designing highly energy efficient accelerators.
Proceedings of the 19th Asia and South Pacific Design Automation Conference, 2014

Advanced technologies for brain-inspired computing.
Proceedings of the 19th Asia and South Pacific Design Automation Conference, 2014

2013
Cluster Cache Monitor.
Proceedings of the 25th International Symposium on Computer Architecture and High Performance Computing, 2013

Continuous real-world inputs can open up alternative accelerator designs.
Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013

Elastic CGRAs.
Proceedings of the 2013 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2013

Hardware neural network accelerators.
Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis, 2013

2012
Deconstructing iterative optimization.
ACM Trans. Archit. Code Optim., 2012

SWAP: Parallelization through Algorithm Substitution.
IEEE Micro, 2012

Configurable conduction delay circuits for high spiking rates.
Proceedings of the 2012 IEEE International Symposium on Circuits and Systems, 2012

A defect-tolerant accelerator for emerging high-performance applications.
Proceedings of the 39th International Symposium on Computer Architecture (ISCA 2012), 2012

Hardware spiking neurons design: Analog or digital?
Proceedings of the 2012 International Joint Conference on Neural Networks (IJCNN), 2012

BenchNN: On the broad potential application scope of hardware neural network accelerators.
Proceedings of the 2012 IEEE International Symposium on Workload Characterization, 2012

Statistical performance comparisons of computers.
Proceedings of the 18th IEEE International Symposium on High Performance Computer Architecture, 2012

Capacitance of TSVs in 3-D stacked chips a problem?: not for neuromorphic systems!
Proceedings of the 49th Annual Design Automation Conference 2012, 2012

Iterative optimization for the data center.
Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems, 2012

2011
Milepost GCC: Machine Learning Enabled Self-tuning Compiler.
Int. J. Parallel Program., 2011

How sensitive is processor customization to the workload's input datasets?
Proceedings of the IEEE 9th Symposium on Application Specific Processors, 2011

Automatic abstraction and fault tolerance in cortical microachitectures.
Proceedings of the 38th International Symposium on Computer Architecture (ISCA 2011), 2011

A Very Fast Simulator for Exploring the Many-Core Future.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Implementation of signal processing tasks on neuromorphic hardware.
Proceedings of the 2011 International Joint Conference on Neural Networks, 2011

2010
Collective optimization: A practical collaborative approach.
ACM Trans. Archit. Code Optim., 2010

ArchExplorer for Automatic Design Space Exploration.
IEEE Micro, 2010

CMA: Chip multi-accelerator.
Proceedings of the IEEE 8th Symposium on Application Specific Processors, 2010

Transparent sampling.
Proceedings of the 2010 International Conference on Embedded Computer Systems: Architectures, 2010

Evaluating iterative optimization across 1000 datasets.
Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation, 2010

ArchExplorer.org: A methodology for facilitating a fair Comparison of research ideas.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2010

The rebirth of neural networks.
Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010

A memory interface for multi-purpose multi-stream accelerators.
Proceedings of the 2010 International Conference on Compilers, 2010

Scalable hardware support for conditional parallelization.
Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

2009
Reconciling specialization and flexibility through compound circuits.
Proceedings of the 15th International Conference on High-Performance Computer Architecture (HPCA-15 2009), 2009

Collective Optimization.
Proceedings of the High Performance Embedded Architectures and Compilers, 2009

2008
A Practical Approach for Reconciling High and Predictable Performance in Non-Regular Parallel Programs.
Proceedings of the Design, Automation and Test in Europe, 2008

2007
Quick and Practical Run-Time Evaluation of Multiple Program Optimizations.
Trans. High Perform. Embed. Archit. Compil., 2007

High-Performance Embedded Architecture and Compilation Roadmap.
Trans. High Perform. Embed. Archit. Compil., 2007

Modeling self-developing biological neural networks.
Neurocomputing, 2007

UNISIM: An Open Simulation Environment and Library for Complex Architecture Design and Collaborative Development.
IEEE Comput. Archit. Lett., 2007

MiDataSets: Creating the Conditions for a More Realistic Evaluation of Iterative Optimization.
Proceedings of the High Performance Embedded Architectures and Compilers, 2007

Rapidly Selecting Good Compiler Optimizations using Performance Counters.
Proceedings of the Fifth International Symposium on Code Generation and Optimization (CGO 2007), 2007

Fast compiler optimisation evaluation using code-feature based performance prediction.
Proceedings of the 4th Conference on Computing Frontiers, 2007

2006
A Sampling Method Focusing on Practicality.
IEEE Micro, 2006

Load squared: Adding logic close to memory to reduce the latency of indirect loads in embedded and general systems.
J. Embed. Comput., 2006

Semi-Automatic Composition of Loop Transformations for Deep Parallelism and Memory Hierarchies.
Int. J. Parallel Program., 2006

CAPSULE: Hardware-Assisted Parallel Execution of Component-Based Programs.
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-39 2006), 2006

Automatic performance model construction for the fast software exploration of new hardware designs.
Proceedings of the 2006 International Conference on Compilers, 2006

2005
Load squared: adding logic close to memory to reduce the latency of indirect loads with high miss ratios.
SIGARCH Comput. Archit. News, 2005

Chaos in computer performance
CoRR, 2005

Symbiotic Processing: Toward a Better Balance Between Architecture, Compiler and User Efforts.
Proceedings of the 1st International Workshop on Reconfigurable Communication-centric Systems-on-Chip, 2005

Characterizing Self-developing Biological Neural Networks: A First Step Towards Their Application to Computing Systems.
Proceedings of the Computational Intelligence and Bioinspired Systems, 2005

Facilitating the search for compositions of program transformations.
Proceedings of the 19th Annual International Conference on Supercomputing, 2005

A Practical Method for Quickly Evaluating Program Optimizations.
Proceedings of the High Performance Embedded Architectures and Compilers, 2005

2004
A fast and accurate method for determining a lower bound on execution time.
Concurr. Comput. Pract. Exp., 2004

Towards a Systematic, Pragmatic and Architecture-Aware Program Optimization Process for Complex Processors.
Proceedings of the ACM/IEEE SC2004 Conference on High Performance Networking and Computing, 2004

MicroLib: A Case for the Quantitative Comparison of Micro-Architecture Mechanisms.
Proceedings of the 37th Annual International Symposium on Microarchitecture (MICRO-37 2004), 2004

From Sequences of Dependent Instructions to Functions: An Approach for Improving Performance without ILP or Speculation.
Proceedings of the 31st International Symposium on Computer Architecture (ISCA 2004), 2004

A Polyhedral Approach to Ease the Composition of Program Transformations.
Proceedings of the Euro-Par 2004 Parallel Processing, 2004

A New Optimized Implemention of the SystemC Engine Using Acyclic Scheduling.
Proceedings of the 2004 Design, 2004

VHC: Quickly Building an Optimizer for Complex Embedded Architectures.
Proceedings of the 2nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2004), 2004

BLOB computing.
Proceedings of the First Conference on Computing Frontiers, 2004

2003
DiST: a simple, reliable and scalable method to significantly reduce processor architecture simulation time.
Proceedings of the International Conference on Measurements and Modeling of Computer Systems, 2003

Putting Polyhedral Loop Transformations to Work.
Proceedings of the Languages and Compilers for Parallel Computing, 2003

2002
Increasing hardware data prefetching performance using the second-level cache.
J. Syst. Archit., 2002

Digital LC-2: from bits & gates to a little computer.
Proceedings of the 2002 workshop on Computer architecture education, 2002

On increasing architecture awareness in program optimizations to bridge the gap between peak and sustained processor performance: matrix-multiply revisited.
Proceedings of the 2002 ACM/IEEE conference on Supercomputing, 2002

2000
Load Scheduling with Profile Information.
Proceedings of the Euro-Par 2000, Parallel Processing, 6th International Euro-Par Conference, Munich, Germany, August 29, 2000

1999
Quantifying loop nest locality using SPEC'95 and the perfect benchmarks.
ACM Trans. Comput. Syst., 1999

An Algorithm for Optimally Exploiting Spatial and Temporal Locality in Upper Memory Levels.
IEEE Trans. Computers, 1999

1998
Dataflow Analysis of Branch Mispredictions and Its Application to Early Resolution of Branch Outcomes.
Proceedings of the 31st Annual IEEE/ACM International Symposium on Microarchitecture, 1998

Investigating Optimal Local Memory Performance.
Proceedings of the ASPLOS-VIII Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems, 1998

1997
A Cache Visualization Tool.
Computer, 1997

Data Caches for Superscalar Processors.
Proceedings of the 11th international conference on Supercomputing, 1997

1996
Improving Single-Process Performance with Multithreaded Processors.
Proceedings of the 10th international conference on Supercomputing, 1996

Streaming Prefetch.
Proceedings of the Euro-Par '96 Parallel Processing, 1996

A Quantitative Analysis of Loop Nest Locality.
Proceedings of the ASPLOS-VII Proceedings, 1996

1995
Influence of Cross-Interferences on Blocked Loops: A Case Study with Matric-Vector Multiply
ACM Trans. Program. Lang. Syst., 1995

Software assistance for data caches.
Future Gener. Comput. Syst., 1995

1994
Cache Interference Phenomena.
Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems, 1994

Using virtual lines to enhance locality exploitation.
Proceedings of the 8th international conference on Supercomputing, 1994

1993
To copy or not to copy: a compile-time technique for assessing when data copying should be used to eliminate cache conflicts.
Proceedings of the Proceedings Supercomputing '93, 1993

Speculative Prefetching.
Proceedings of the 7th international conference on Supercomputing, 1993

Evaluating the Impact of Cache Interferences on Numerical Codes.
Proceedings of the 1993 International Conference on Parallel Processing, 1993

Fast Enumeration of Solutions for Data Dependence Analysis and Data Locality Optimization.
Proceedings of the 1993 International Conference on Parallel Processing, 1993

1992
Characterizing the Behavior of Sparse Algorithms on Caches.
Proceedings of the Proceedings Supercomputing '92, 1992


  Loading...