Michael J. Schulte

Orcid: 0000-0003-1305-406X

Affiliations:
  • AMD, Sunnyvale, CA, USA
  • University of Wisconsin-Madison, WI, USA (former)


According to our database1, Michael J. Schulte authored at least 141 papers between 1993 and 2023.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2023
DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies.
CoRR, 2023

AMD Instinct<sup>TM</sup> MI250X Accelerator enabled by Elevated Fanout Bridge Advanced Packaging Architecture.
Proceedings of the 2023 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), 2023


2021
What Made Us Stronger: An Inside Look Back at the History of AMD Microprocessor Development.
IEEE Micro, 2021

2020
Approximate Computing: From Circuits to Applications [Scanning the Issue].
Proc. IEEE, 2020

2017

Accelerating Matrix Processing with GPUs.
Proceedings of the 24th IEEE Symposium on Computer Arithmetic, 2017

2015
Achieving Exascale Capabilities through Heterogeneous Computing.
IEEE Micro, 2015

2014
Low-Cost Per-Core Voltage Domain Support for Power-Constrained High-Performance Processors.
IEEE Trans. Very Large Scale Integr. Syst., 2014

Energy-Efficient Pixel-Arithmetic.
IEEE Trans. Computers, 2014

Process variation-aware workload partitioning algorithms for GPUs supporting spatial-multitasking.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014

Memory scheduling towards high-throughput cooperative heterogeneous computing.
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013
High-Energy Physics.
Proceedings of the Handbook of Signal Processing Systems, 2013

Instruction Set Extensions for Matrix Decompositions on Software Defined Radio Architectures.
J. Signal Process. Syst., 2013

Binary Integer Decimal-Based Floating-Point Multiplication.
IEEE Trans. Computers, 2013

Modular Design of High-Throughput, Low-Latency Sorting Units.
IEEE Trans. Computers, 2013

Automating Stressmark Generation for Testing Processor Voltage Fluctuations.
IEEE Micro, 2013

Exploiting GPU peak-power and performance tradeoffs through reduced effective pipeline latency.
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

REEL: Reducing effective execution latency of floating point operations.
Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED), 2013

Performance boosting under reliability and power constraints.
Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 2013

Dynamic bandwidth scaling for embedded DSPs with 3D-stacked DRAM and wide I/Os.
Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 2013

Reevaluating the latency claims of 3D stacked memories.
Proceedings of the 18th Asia and South Pacific Design Automation Conference, 2013

2012
A study of decimal left shifters for binary numbers.
Inf. Comput., 2012

AUDIT: Stress Testing the Automatic Way.
Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012

Something old and something new: P-states can borrow microarchitecture techniques too.
Proceedings of the International Symposium on Low Power Electronics and Design, 2012

The case for GPGPU spatial multitasking.
Proceedings of the 18th IEEE International Symposium on High Performance Computer Architecture, 2012

Cost-effective power delivery to support per-core voltage domains for power-constrained processors.
Proceedings of the 49th Annual Design Automation Conference 2012, 2012

A Linear Algebra Core Design for Efficient Level-3 BLAS.
Proceedings of the 23rd IEEE International Conference on Application-Specific Systems, 2012

Virtual Floating-Point Units for Low-Power Embedded Processors.
Proceedings of the 23rd IEEE International Conference on Application-Specific Systems, 2012

Session MP6a: Computer arithmetic (invited).
Proceedings of the Conference Record of the Forty Sixth Asilomar Conference on Signals, 2012

Workload and power budget partitioning for single-chip heterogeneous processors.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

Lossless and lossy memory I/O link compression for improving performance of GPGPU workloads.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

Power-efficient computing for compute-intensive GPGPU applications.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011
Hardware Designs for Binary Integer Decimal-Based Rounding.
IEEE Trans. Computers, 2011

Modular high-throughput and low-latency sorting units for FPGAs in the Large Hadron Collider.
Proceedings of the IEEE 9th Symposium on Application Specific Processors, 2011

Analyzing the performance and energy impact of 3D memory integration on embedded DSPs.
Proceedings of the 2011 International Conference on Embedded Computer Systems: Architectures, 2011

Scratchpad memory optimizations for digital signal processing applications.
Proceedings of the Design, Automation and Test in Europe, 2011

Energy-efficient floating-point arithmetic for software-defined radio architectures.
Proceedings of the 22nd IEEE International Conference on Application-specific Systems, 2011

A decimal floating-point fused multiply-add unit with a novel decimal leading-zero anticipator.
Proceedings of the 22nd IEEE International Conference on Application-specific Systems, 2011

Truncated-matrix multipliers with coefficient shifting.
Proceedings of the Conference Record of the Forty Fifth Asilomar Conference on Signals, 2011

Session MP7b: Model-based design optimization.
Proceedings of the Conference Record of the Forty Fifth Asilomar Conference on Signals, 2011

Session MP8a4: DSP algorithms and architectures.
Proceedings of the Conference Record of the Forty Fifth Asilomar Conference on Signals, 2011

Energy-efficient floating-point arithmetic for digital signal processors.
Proceedings of the Conference Record of the Forty Fifth Asilomar Conference on Signals, 2011

Improving Throughput of Power-Constrained GPUs Using Dynamic Voltage/Frequency and Core Scaling.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

2010
Instruction set extensions for the advanced encryption standard on a multithreaded software defined radio platform.
Int. J. High Perform. Syst. Archit., 2010

A survey of hardware designs for decimal arithmetic.
IBM J. Res. Dev., 2010

CORDIC-based LMMSE equalizer for Software Defined Radio.
Proceedings of the 2010 International Conference on Embedded Computer Systems: Architectures, 2010

ARAL-CR: An adaptive reasoning and learning cognitive radio platform.
Proceedings of the 2010 International Conference on Embedded Computer Systems: Architectures, 2010

ERCBench: An Open-Source Benchmark Suite for Embedded and Reconfigurable Computing.
Proceedings of the International Conference on Field Programmable Logic and Applications, 2010

Galois field hardware architectures for network coding.
Proceedings of the 2010 ACM/IEEE Symposium on Architecture for Networking and Communications Systems, 2010

High-Energy Physics.
Proceedings of the Handbook of Signal Processing Systems, 2010

2009
Hardware Designs for Decimal Floating-Point Addition and Related Operations.
IEEE Trans. Computers, 2009

Low-Power Multiple-Precision Iterative Floating-Point Multiplier with SIMD Support.
IEEE Trans. Computers, 2009

Decimal Floating-Point Multiplication.
IEEE Trans. Computers, 2009

Instruction set extensions for software defined radio.
Microprocess. Microsystems, 2009

The Emerging Landscape of Computer Performance Evaluation.
Adv. Comput., 2009

Performance analysis of decimal floating-point libraries and its impact on decimal hardware and software solutions.
Proceedings of the 27th International Conference on Computer Design, 2009

FPGA Design Analysis of the Clustering Algorithm for the CERN Large Hadron Collider.
Proceedings of the FCCM 2009, 2009

A Combined Decimal and Binary Floating-Point Multiplier.
Proceedings of the 20th IEEE International Conference on Application-Specific Systems, 2009

A Decimal Floating-Point Adder with Decoded Operands and a Decimal Leading-Zero Anticipator.
Proceedings of the 19th IEEE Symposium on Computer Arithmetic, 2009

2008
Improved combined binary/decimal fixed-point multipliers.
Proceedings of the 26th International Conference on Computer Design, 2008

Implementing communications systems on an SDR SoC.
Proceedings of the IEEE International Conference on Acoustics, 2008

2007
A Decimal Floating-Point Divider Using Newton-Raphson Iteration.
J. VLSI Signal Process., 2007

The Sandbridge SB3011 Platform.
EURASIP J. Embed. Syst., 2007

A New Era of Performance Evaluation.
Computer, 2007

Software Solutions for Converting a MIMO-OFDM Channel into Multiple SISO-OFDM Channels.
Proceedings of the Third IEEE International Conference on Wireless and Mobile Computing, 2007

Trends in Low Power Handset Software Defined Radio.
Proceedings of the Embedded Computer Systems: Architectures, 2007

Benchmarks and performance analysis of decimal floating-point applications.
Proceedings of the 25th International Conference on Computer Design, 2007

Hardware design of a Binary Integer Decimal-based floating-point adder.
Proceedings of the 25th International Conference on Computer Design, 2007

Floating-point division algorithms for an x86 microprocessor with a rectangular multiplier.
Proceedings of the 25th International Conference on Computer Design, 2007

A parallel IEEE P754 decimal floating-point multiplier.
Proceedings of the 25th International Conference on Computer Design, 2007

Hardware Design of a Binary Integer Decimal-based IEEE P754 Rounding Unit.
Proceedings of the IEEE International Conference on Application-Specific Systems, 2007

Architecture Support for Reconfigurable Multithreaded Processors in Programmable Communication Systems.
Proceedings of the IEEE International Conference on Application-Specific Systems, 2007

Decimal Floating-Point Adder and Multifunction Unit with Injection-Based Rounding.
Proceedings of the 18th IEEE Symposium on Computer Arithmetic (ARITH-18 2007), 2007

Decimal Floating-Point Multiplication Via Carry-Save Addition.
Proceedings of the 18th IEEE Symposium on Computer Arithmetic (ARITH-18 2007), 2007

2006
Reciprocal and Reciprocal Square Root Units with Operand Modification and Multiplication.
J. VLSI Signal Process., 2006

A Low-Power Multithreaded Processor for Software Defined Radio.
J. VLSI Signal Process., 2006

Generation and visualization of four-dimensional MR angiography data using an undersampled 3-D projection trajectory.
IEEE Trans. Medical Imaging, 2006

Integer Multipliers with Overflow Detection.
IEEE Trans. Computers, 2006

Dual-mode floating-point multiplier architectures with parallel operations.
J. Syst. Archit., 2006

An Overview of Reconfigurable Hardware in Embedded Systems.
EURASIP J. Embed. Syst., 2006

2005
Guest Editorial.
J. VLSI Signal Process., 2005

High-Speed Multioperand Decimal Adders.
IEEE Trans. Computers, 2005

Sandbridge Software Tools.
Proceedings of the Embedded Computer Systems: Architectures, 2005

A combined two's complement and floating-point comparator.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2005), 2005

Future wireless convergence platforms.
Proceedings of the 3rd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2005

Instruction set extensions for software defined radio on a multithreaded processor.
Proceedings of the 2005 International Conference on Compilers, 2005

Decimal Floating-Point Square Root Using Newton-Raphson Iteration.
Proceedings of the 16th IEEE International Conference on Application-Specific Systems, 2005

Instruction Set Extensions for Reed-Solomon Encoding and Decoding.
Proceedings of the 16th IEEE International Conference on Application-Specific Systems, 2005

Efficient Function Approximation Using Truncated Multipliers and Squarers.
Proceedings of the 17th IEEE Symposium on Computer Arithmetic (ARITH-17 2005), 2005

Decimal Multiplication with Efficient Partial Product Generation.
Proceedings of the 17th IEEE Symposium on Computer Arithmetic (ARITH-17 2005), 2005

2004
Intrinsic Compiler Support for Interval Arithmetic.
Numer. Algorithms, 2004

A Low-Power Multithreaded Processor for Baseband Communication Systems.
Proceedings of the Computer Systems: Architectures, 2004

The 4D Cluster Visualization project.
Proceedings of the Medical Imaging 2004: Visualization, 2004

A 64-bit Decimal Floating-Point Adder.
Proceedings of the 2004 IEEE Computer Society Annual Symposium on VLSI (ISVLSI 2004), 2004

A Subword-Parallel Multiplication and Sum-of-Squares Unit.
Proceedings of the 2004 IEEE Computer Society Annual Symposium on VLSI (ISVLSI 2004), 2004

Multioperand Decimal Addition.
Proceedings of the 2004 IEEE Computer Society Annual Symposium on VLSI (ISVLSI 2004), 2004

A High-Frequency Decimal Multiplier.
Proceedings of the 22nd IEEE International Conference on Computer Design: VLSI in Computers & Processors (ICCD 2004), 2004

A Static Low-Power, High-Performance 32-bit Carry Skip Adder.
Proceedings of the 2004 Euromicro Symposium on Digital Systems Design (DSD 2004), Architectures, Methods and Tools, 31 August, 2004

Sandblaster low power DSP [parallel DSP arithmetic microarchitecture].
Proceedings of the IEEE 2004 Custom Integrated Circuits Conference, 2004

Decimal Floating-Point Division Using Newton-Raphson Iteration.
Proceedings of the 15th IEEE International Conference on Application-Specific Systems, 2004

A Low-Power Carry Skip Adder with Fast Saturation.
Proceedings of the 15th IEEE International Conference on Application-Specific Systems, 2004

2003
A Quadruple Precision and Dual Double Precision Floating-Point Multiplier.
Proceedings of the 2003 Euromicro Symposium on Digital Systems Design (DSD 2003), 2003

Combined Multiplication and Sum-of-Squares Units.
Proceedings of the 14th IEEE International Conference on Application-Specific Systems, 2003

Decimal Multiplication Via Carry-Save Addition.
Proceedings of the 14th IEEE International Conference on Application-Specific Systems, 2003

The Interval Logarithmic Number System.
Proceedings of the 16th IEEE Symposium on Computer Arithmetic (Arith-16 2003), 2003

2002
Guest Editorial.
J. VLSI Signal Process., 2002

A Java-Enabled DSP.
Proceedings of the Embedded Processor Design Challenges: Systems, Architectures, Modeling, and Simulation, 2002

2001
Combined IEEE Compliant and Truncated Floating Point Multipliers for Reduced Power Dissipation.
Proceedings of the 19th International Conference on Computer Design (ICCD 2001), 2001

Design Alternatives for Parallel Saturating Multioperand Adders.
Proceedings of the 19th International Conference on Computer Design (ICCD 2001), 2001

FPGA Resource Reduction Through Truncated Multiplication.
Proceedings of the Field-Programmable Logic and Applications, 2001

Analysis of Column Compression Multipliers.
Proceedings of the 15th IEEE Symposium on Computer Arithmetic (Arith-15 2001), 2001

2000
A Family of Variable-Precision Interval Arithmetic Processors.
IEEE Trans. Computers, 2000

Integer Multiplication with Overflow Detection or Saturation.
IEEE Trans. Computers, 2000

A New Approach to DSP Intrinsic Functions.
Proceedings of the 33rd Annual Hawaii International Conference on System Sciences (HICSS-33), 2000

Parallel saturating multioperand adders.
Proceedings of the 2000 International Conference on Compilers, 2000

A Hardware Algorithm for Variable-Precision Logarithm.
Proceedings of the 12th IEEE International Conference on Application-Specific Systems, 2000

1999
The Symmetric Table Addition Method for Accurate Function Approximation.
J. VLSI Signal Process., 1999

Approximating Elementary Functions with Symmetric Bipartite Tables.
IEEE Trans. Computers, 1999

The Interval-Enhanced GNU Fortran Compiler.
Reliab. Comput., 1999

Parallel Saturating Fractional Arithmetic Units.
Proceedings of the 9th Great Lakes Symposium on VLSI (GLS-VLSI '99), 1999

High-Speed Inverse Square Roots.
Proceedings of the 14th IEEE Symposium on Computer Arithmetic (Arith-14 '99), 1999

1998
Single-Number Interval I/O.
Proceedings of the Developments in Reliable Computing, 1998

A Combined Interval and Floating Point Multiplier.
Proceedings of the 8th Great Lakes Symposium on VLSI (GLS-VLSI '98), 1998

1997
Accurate Function Approximations by Symmetric Table Lookup and Addition.
Proceedings of the 1997 International Conference on Application-Specific Systems, 1997

Symmetric Bipartite Tables for Accurate Function Approximation.
Proceedings of the 13th Symposium on Computer Arithmetic (ARITH-13 '97), 1997

1996
Hardware interval multipliers.
RITA, 1996

Variable-precision, interval arithmetic coprocessors.
Reliab. Comput., 1996

Software for high radix on-line arithmetic.
Reliab. Comput., 1996

1995
Parallel reduced area multipliers.
J. VLSI Signal Process., 1995

A software interface and hardware design for variable-precision interval arithmetic.
Reliab. Comput., 1995

A High Radix On-Line Arithmetic for Credible and Accurate Computing.
J. Univers. Comput. Sci., 1995

A coprocessor for accurate and reliable numerical computations.
Proceedings of the 1995 International Conference on Computer Design (ICCD '95), 1995

A Processor for Staggered Interval Arithmetic.
Proceedings of the International Conference on Application Specific Array Processors (ASAP'95), 1995

Hardware Design and Arithmetic Algorithms for a Variable-Precision, Interval Arithmetic Coprocessor.
Proceedings of the 12th Symposium on Computer Arithmetic (ARITH-12 '95), 1995

The K5 transcendental functions.
Proceedings of the 12th Symposium on Computer Arithmetic (ARITH-12 '95), 1995

1994
Hardware Designs for Exactly Rounded Elemantary Functions.
IEEE Trans. Computers, 1994

Optimal initial approximations for the Newton-Raphson division algorithm.
Computing, 1994

A variable-precision interval arithmetic processor.
Proceedings of the International Conference on Application Specific Array Processors, 1994

1993
Reduced area multipliers.
Proceedings of the International Conference on Application-Specific Array Processors, 1993

Exact rounding of certain elementary functions.
Proceedings of the 11th Symposium on Computer Arithmetic, 29 June, 1993


  Loading...