Alexander V. Veidenbaum

According to our database1, Alexander V. Veidenbaum authored at least 136 papers between 1986 and 2020.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2020
NumbaSummarizer: A Python Library for Simplified Vectorization Reports.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

2019
MCompiler: A Synergistic Compilation Framework.
CoRR, 2019

Teaching Parallel Computing and Dependence Analysis with Python.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2019

Combining Prefetch Control and Cache Partitioning to Improve Multicore Performance.
Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

AFFIX: Automatic Acceleration Framework for FPGA Implementation of OpenVX Vision Algorithms.
Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

2018
An empirical study of the effect of source-level loop transformations on compiler stability.
Proc. ACM Program. Lang., 2018

OpenCV.js: computer vision processing for the open web platform.
Proceedings of the 9th ACM Multimedia Systems Conference, 2018

Towards an Achievable Performance for the Loop Nests.
Proceedings of the Languages and Compilers for Parallel Computing, 2018

New Opportunities for Compilers in Computer Security.
Proceedings of the Languages and Compilers for Parallel Computing, 2018

Acceleration Framework for FPGA Implementation of OpenVX Graph Pipelines.
Proceedings of the 26th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2018

2017
Special issue on energy efficient multi-core and many-core systems, Part II.
J. Parallel Distributed Comput., 2017

CAMFAS: A Compiler Approach to Mitigate Fault Attacks via Enhanced SIMDization.
IACR Cryptol. ePrint Arch., 2017

Using Hardware Counters to Predict Vectorization.
Proceedings of the Languages and Compilers for Parallel Computing, 2017

LORE: A loop repository for the evaluation of compilers.
Proceedings of the 2017 IEEE International Symposium on Workload Characterization, 2017

2016
Special issue on energy efficient multi-core and many-core systems, Part I.
J. Parallel Distributed Comput., 2016

Data-rate-aware FPGA-based acceleration framework for streaming applications.
Proceedings of the International Conference on ReConFigurable Computing and FPGAs, 2016

Polygonal Iteration Space Partitioning.
Proceedings of the Languages and Compilers for Parallel Computing, 2016

SIMD-based soft error detection.
Proceedings of the ACM International Conference on Computing Frontiers, CF'16, 2016

2015
Software fault tolerance for FPUs via vectorization.
Proceedings of the 2015 International Conference on Embedded Computer Systems: Architectures, 2015

WebRTCbench: a benchmark for performance assessment of webRTC implementations.
Proceedings of the 13th IEEE Symposium on Embedded Systems For Real-time Multimedia, 2015

2014
Preface.
Proceedings of the XIVth International Conference on Embedded Computer Systems: Architectures, 2014

Dynamic-vector execution on a general purpose EDGE chip multiprocessor.
Proceedings of the XIVth International Conference on Embedded Computer Systems: Architectures, 2014

A Compilation and Run-Time Framework for Maximizing Performance of Self-scheduling Algorithms.
Proceedings of the Network and Parallel Computing, 2014

Author retrospective for compiler-directed data prefetching in multiprocessors with memory hierarchies.
Proceedings of the ACM International Conference on Supercomputing 25th Anniversary Volume, 2014

Multiple stream tracker: a new hardware stride prefetcher.
Proceedings of the Computing Frontiers Conference, CF'14, 2014

2013
Compiler-Assisted, Selective Out-Of-Order Commit.
IEEE Comput. Archit. Lett., 2013

Temperature aware thread migration in 3D architecture with stacked DRAM.
Proceedings of the International Symposium on Quality Electronic Design, 2013

Effective Evaluation of Multi-core Based Systems.
Proceedings of the IEEE 12th International Symposium on Parallel and Distributed Computing, 2013

On the Determination of Inlining Vectors for Program Optimization.
Proceedings of the Compiler Construction - 22nd International Conference, 2013

Optimizing Program Performance via Similarity, Using a Feature-Agnostic Approach.
Proceedings of the Advanced Parallel Processing Technologies, 2013

2012
Improving Cache Management Policies Using Dynamic Reuse Distances.
Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012

Just in Time Load Balancing.
Proceedings of the Languages and Compilers for Parallel Computing, 2012

A fault tolerant self-scheduling scheme for parallel loops on shared memory systems.
Proceedings of the 19th International Conference on High Performance Computing, 2012

Selective search of inlining vectors for program optimization.
Proceedings of the Computing Frontiers Conference, CF'12, 2012

Revisiting level-0 caches in embedded processors.
Proceedings of the 15th International Conference on Compilers, 2012

2011
MZZ-HVS: Multiple Sleep Modes Zig-Zag Horizontal and Vertical Sleep Transistor Sharing to Reduce Leakage Power in On-Chip SRAM Peripheral Circuits.
IEEE Trans. Very Large Scale Integr. Syst., 2011

Reducing Power in All Major CAM and SRAM-Based Processor Units via Centralized, Dynamic Resource Size Management.
IEEE Trans. Very Large Scale Integr. Syst., 2011

On leakage power optimization in clock tree networks for ASICs and general-purpose processors.
Sustain. Comput. Informatics Syst., 2011

Pruning hardware evaluation space via correlation-driven application similarity analysis.
Proceedings of the 8th Conference on Computing Frontiers, 2011

2010
On the efficacy of call graph-level thread-level speculation.
Proceedings of the first joint WOSP/SIPEW International Conference on Performance Engineering, 2010

Post-synthesis sleep transistor insertion for leakage power optimization in clock tree networks.
Proceedings of the 11th International Symposium on Quality of Electronic Design (ISQED 2010), 2010

Exploiting power budgeting in thermal-aware dynamic placement for reconfigurable systems.
Proceedings of the 2010 International Symposium on Low Power Electronics and Design, 2010

RELOCATE: Register File Local Access Pattern Redistribution Mechanism for Power and Thermal Management in Out-of-Order Embedded Processor.
Proceedings of the High Performance Embedded Architectures and Compilers, 2010

Exploitation of nested thread-level speculative parallelism on multi-core systems.
Proceedings of the 7th Conference on Computing Frontiers, 2010

Multiple sleep modes leakage control in peripheral circuits of a all major SRAM-based processor units.
Proceedings of the 7th Conference on Computing Frontiers, 2010

2009
On the exploitation of loop-level parallelism in embedded applications.
ACM Trans. Embed. Comput. Syst., 2009

A configurable simulation environment for the efficient simulation of large-scale spiking neural networks on graphics processors.
Neural Networks, 2009

Brain Derived Vision Algorithm on High Performance Architectures.
Int. J. Parallel Program., 2009

Cache-aware partitioning of multi-dimensional iteration spaces.
Proceedings of of SYSTOR 2009: The Israeli Experimental Systems Conference 2009, 2009

Performance Characterization of Itanium® 2-Based Montecito Processor.
Proceedings of the Computer Performance Evaluation and Benchmarking, 2009

Power-aware load balancing of large scale MPI applications.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Efficient simulation of large-scale Spiking Neural Networks using CUDA graphics processors.
Proceedings of the International Joint Conference on Neural Networks, 2009

Synchronization optimizations for efficient execution on multi-cores.
Proceedings of the 23rd international conference on Supercomputing, 2009

Efficient Scheduling of Nested Parallel Loops on Multi-Core Systems.
Proceedings of the ICPP 2009, 2009

2008
Improving SDRAM access energy efficiency for low-power embedded systems.
ACM Trans. Embed. Comput. Syst., 2008

Optimizing CAM-based instruction cache designs for low-power embedded systems.
J. Syst. Archit., 2008

A hardware mechanism to reduce the energy consumption of the register file of in-order architectures.
Int. J. Embed. Syst., 2008

Comparative architectural characterization of SPEC CPU2000 and CPU2006 benchmarks on the intel® Core<sup>TM</sup> 2 Duo processor.
Proceedings of the 2008 International Conference on Embedded Computer Systems: Architectures, 2008

A centralized cache miss driven technique to improve processor power dissipation.
Proceedings of the 2008 International Conference on Embedded Computer Systems: Architectures, 2008

Cache-aware iteration space partitioning.
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008

A distributed processor state management architecture for large-window processors.
Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-41 2008), 2008

Improving performance and reducing energy-delay with adaptive resource resizing for out-of-order embedded processors.
Proceedings of the 2008 ACM SIGPLAN/SIGBED Conference on Languages, 2008

Impact of JVM superoperators on energy consumption in resource-constrained embedded systems.
Proceedings of the 2008 ACM SIGPLAN/SIGBED Conference on Languages, 2008

A Two-Level Load/Store Queue Based on Execution Locality.
Proceedings of the 35th International Symposium on Computer Architecture (ISCA 2008), 2008

Adaptive techniques for leakage power management in L2 cache peripheral circuits.
Proceedings of the 26th International Conference on Computer Design, 2008

ZZ-HVS: Zig-zag horizontal and vertical sleep transistor sharing to reduce leakage power in on-chip SRAM peripheral circuits.
Proceedings of the 26th International Conference on Computer Design, 2008

Dynamic register file resizing and frequency scaling to improve embedded processor performance and energy-delay efficiency.
Proceedings of the 45th Design Automation Conference, 2008

Multiple sleep mode leakage control for cache peripheral circuits in embedded processors.
Proceedings of the 2008 International Conference on Compilers, 2008

2007
A predictive decode filter cache for reducing power consumption in embedded processors.
ACM Trans. Design Autom. Electr. Syst., 2007

Comparative characterization of SPEC CPU2000 and CPU2006 on Itanium architecture.
Proceedings of the 2007 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, 2007

Tight analysis of the performance potential of thread speculation using spec CPU 2006.
Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007

Novel Brain-Derived Algorithms Scale Linearly with Number of Processing Elements.
Proceedings of the Parallel Computing: Architectures, 2007

Reducing leakage power in peripheral circuits of L2 caches.
Proceedings of the 25th International Conference on Computer Design, 2007

A simplified java bytecode compilation system for resource-constrained embedded processors.
Proceedings of the 2007 International Conference on Compilers, 2007

2006
On the performance potential of different types of speculative thread-level parallelism: The DL version of this paper includes corrections that were not made available in the printed proceedings.
Proceedings of the 20th Annual International Conference on Supercomputing, 2006

Fast Speculative Address Generation and Way Caching for Reducing L1 Data Cache Energy.
Proceedings of the 24th International Conference on Computer Design (ICCD 2006), 2006

Probablistic Self-Scheduling.
Proceedings of the Euro-Par 2006, Parallel Processing, 12th International Euro-Par Conference, Dresden, Germany, August 28, 2006

Challenges in exploitation of loop parallelism in embedded applications.
Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis, 2006

2005
Line Size Adaptivity Analysis of Parameterized Loop Nests for Direct Mapped Data Cache.
IEEE Trans. Computers, 2005

Decoupled State-Execute Architecture.
Proceedings of the High-Performance Computing - 6th International Symposium, 2005

Using a Way Cache to Improve Performance of Set-Associative Caches.
Proceedings of the High-Performance Computing - 6th International Symposium, 2005

An asymmetric clustered processor based on value content.
Proceedings of the 19th Annual International Conference on Supercomputing, 2005

A New Pointer-based Instruction Queue Design and Its Power-Performance Evaluation.
Proceedings of the 23rd International Conference on Computer Design (ICCD 2005), 2005

High performance annotation-aware JVM for Java cards.
Proceedings of the EMSOFT 2005, 2005

Energy-Effective Instruction Fetch Unit for Wide Issue Processors.
Proceedings of the Advances in Computer Systems Architecture, 10th Asia-Pacific Conference, 2005

2004
Guest Editor's Introduction: Application-Specific Processors.
IEEE Micro, 2004

A partitioned instruction queue to reduce instruction wakeup energy.
Int. J. High Perform. Comput. Netw., 2004

An Optimized Front-End Physical Register File with Banking and Writeback Filtering.
Proceedings of the Power-Aware Computer Systems, 4th International Workshop, 2004

Caching Values in the Load Store Queue.
Proceedings of the 12th International Workshop on Modeling, 2004

A Content Aware Integer Register File Organization.
Proceedings of the 31st International Symposium on Computer Architecture (ISCA 2004), 2004

Low Energy, Highly-Associative Cache Design for Embedded Processors.
Proceedings of the 22nd IEEE International Conference on Computer Design: VLSI in Computers & Processors (ICCD 2004), 2004

Energy-Efficient Design for Highly Associative Instruction Caches in Next-Generation Embedded Processors.
Proceedings of the 2004 Design, 2004

2003
Power-Aware Compilation for Register File Energy Reduction.
Int. J. Parallel Program., 2003

Guest Editors' Introduction: Application-Specific Microprocessors.
IEEE Des. Test Comput., 2003

A Data Cache with Dynamic Mapping.
Proceedings of the Languages and Compilers for Parallel Computing, 2003

Reducing data cache energy consumption via cached load/store queue.
Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003

A Simple Low-Energy Instruction Wakeup Mechanism.
Proceedings of the High Performance Computing, 5th International Symposium, 2003

Improving Branch Prediction Accuracy in Embedded Processors in the Presence of Context Switches.
Proceedings of the 21st International Conference on Computer Design (ICCD 2003), 2003

Reducing Power Consumption for High-Associativity Data Caches in Embedded Processors.
Proceedings of the 2003 Design, 2003

Energy Aware Register File Implementation through Instruction Predecode.
Proceedings of the 14th IEEE International Conference on Application-Specific Systems, 2003

Low Energy Associative Data Caches for Embedded Systems.
Proceedings of the Embedded Software for SoC, 2003

2002
Integrated I-cache Way Predictor and Branch Target Buffer to Reduce Energy Consumption.
Proceedings of the High Performance Computing, 4th International Symposium, 2002

Profile-Based Dynamic Voltage Scheduling Using Program Checkpoints.
Proceedings of the 2002 Design, 2002

2001
Guest Editor's Introduction.
Int. J. Parallel Program., 2001

2000
On Interaction between Interconnection Network Design and Latency Hiding Techniques in Multiprocessors.
J. Supercomput., 2000

Compiler-Directed Cache Assist Adaptivity.
Proceedings of the High Performance Computing, Third International Symposium, 2000

Compiler-Directed Cache Line Size Adaptivity.
Proceedings of the Intelligent Memory Systems, Second International Workshop, 2000

1999
Interconnection network organization and its impact on performance and cost in shared memory multiprocessors.
Parallel Comput., 1999

An Integrated Hardware/Software Data Prefetching Scheme for Shared-Memory Multiprocessors.
Int. J. Parallel Program., 1999

Non-Sequential Instruction Cache Prefetching for Multiple-Issue Processors.
Int. J. High Speed Comput., 1999

Adapting cache line size to application behavior.
Proceedings of the 13th international conference on Supercomputing, 1999

1998
Retrospective: The Cedar System.
Proceedings of the 25 Years of the International Symposia on Computer Architecture (Selected Papers)., 1998

1997
Instruction Cache Prefetching Using Multilevel Branch Prediction.
Proceedings of the High Performance Computing, International Symposium, 1997

Stride-directed Prefetching for Secondary Caches.
Proceedings of the 1997 International Conference on Parallel Processing (ICPP '97), 1997

The Effect of Limited Network Bandwidth and its Utilization by Latency Hiding Techniques in Large-Scale Shared Memory Systems.
Proceedings of the 1997 Conference on Parallel Architectures and Compilation Techniques (PACT '97), 1997

1995
Combining flow and dependence analyses to expose redundant array accesses.
Int. J. Parallel Program., 1995

On Shortest Path Routing in Single Stage Shuffle-Exchange Networks.
Proceedings of the 7th Annual ACM Symposium on Parallel Algorithms and Architectures, 1995

1994
Scalability of the Cedar system.
Proceedings of the Proceedings Supercomputing '94, 1994

1993

Performance Evaluation of Memory Caches in Multiprocessors.
Proceedings of the 1993 International Conference on Parallel Processing, 1993

1992
An Effective Write Policy for Software Coherence Schemes.
Proceedings of the Proceedings Supercomputing '92, 1992

1991
Detecting redundant accesses to array data.
Proceedings of the Proceedings Supercomputing '91, 1991

Comparison and analysis of software and directory coherence schemes.
Proceedings of the Proceedings Supercomputing '91, 1991

Chief: A Parallel Simulation Environment for Parallel Systems.
Proceedings of the Fifth International Parallel Processing Symposium, Proceedings, Anaheim, California, USA, April 30, 1991

A software coherence scheme with the assistance of directories.
Proceedings of the 5th international conference on Supercomputing, 1991


An Integrated Hardware/Software Solution for Effective Management of Local Storage in High-Performance Systems.
Proceedings of the International Conference on Parallel Processing, 1991

Preliminary Performance Analysis of the Cedar Multiprocessor Memory System.
Proceedings of the International Conference on Parallel Processing, 1991

1990
Compiler-Directed Cache Management in Multiprocessors.
Computer, 1990

Compiler-directed data prefetching in multiprocessors with memory hierarchies.
Proceedings of the 4th international conference on Supercomputing, 1990

1989
A version control approach to Cache coherence.
Proceedings of the 3rd international conference on Supercomputing, 1989

1988
A Cache Coherence Scheme With Fast Selective Invalidation.
Proceedings of the 15th Annual International Symposium on Computer Architecture, 1988

Performance of a shared memory system for vector multiprocessors.
Proceedings of the 2nd international conference on Supercomputing, 1988

Stale Data Detection and Coherence Enforcement Using Flow Analysis.
Proceedings of the International Conference on Parallel Processing, 1988

1987
The Performance of Software-managed Multiprocessor Caches on Parallel Numerical Programs.
Proceedings of the Supercomputing, 1987

1986
A Compiler-Assisted Cache Coherence Solution for Multiprcessors.
Proceedings of the International Conference on Parallel Processing, 1986


  Loading...