Pekka Jääskeläinen

Orcid: 0000-0001-5707-8544

According to our database1, Pekka Jääskeläinen authored at least 118 papers between 2006 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Energy-Efficient Exposed Datapath Architecture With a RISC-V Instruction Set Mode.
IEEE Trans. Computers, February, 2024

Hierarchical Bitmask Implicit Grids for Efficient Point-in-Volume Queries on the GPU.
Proceedings of the 19th International Joint Conference on Computer Vision, 2024

Towards Efficient OpenCL Pipe Specification for Hardware Accelerators.
Proceedings of the 12th International Workshop on OpenCL and SYCL, 2024

2023
AEx: Automated High-Level Synthesis of Compiler Programmable Co-Processors.
J. Signal Process. Syst., September, 2023

Efficient OpenCL system integration of non-blocking FPGA accelerators.
Microprocess. Microsystems, March, 2023

PoCL-R: An Open Standard Based Offloading Layer for Heterogeneous Multi-Access Edge Computing with Server Side Scalability.
CoRR, 2023

TauBench 1.1: A Dynamic Benchmark for Graphics Rendering.
CoRR, 2023

AFOCL: Portable OpenCL Programming of FPGAs via Automated Built-in Kernel Management.
Proceedings of the IEEE Nordic Circuits and Systems Conference, 2023

Open Standard Software Stack for Low Latency Offloading from Lightweight Devices to Remote Heterogeneous Platforms.
Proceedings of the 2023 International Workshop on OpenCL, 2023

BrainTTA: A 28.6 TOPS/W Compiler Programmable Transport-Triggered NN SoC.
Proceedings of the 41st IEEE International Conference on Computer Design, 2023

On the OpenCL Support for Streaming Fixed-Function Accelerators on Embedded SoC FPGAs.
Proceedings of the Applied Reconfigurable Computing. Architectures, Tools, and Applications, 2023

2022
Image and Video Coding Techniques for Ultra-low Latency.
ACM Comput. Surv., January, 2022

Energy-Efficient Instruction Delivery in Embedded Systems With Domain Wall Memory.
IEEE Trans. Computers, 2022

Cross-vendor programming abstraction for diverse heterogeneous platforms.
Frontiers Comput. Sci., 2022

BrainTTA: A 35 fJ/op Compiler Programmable Mixed-Precision Transport-Triggered NN SoC.
CoRR, 2022

Real-Time Rendering of Point Clouds With Photorealistic Effects: A Survey.
IEEE Access, 2022

Tauray: A Scalable Real-Time Open-Source Path Tracer for Stereo and Light Field Displays.
Proceedings of the SIGGRAPH Asia 2022 Technical Communications, 2022

Pruned Lightweight Encoders for Computer Vision.
Proceedings of the 24th IEEE International Workshop on Multimedia Signal Processing, 2022

TauBench: Dynamic Benchmark for Graphics Rendering.
Proceedings of the 17th International Joint Conference on Computer Vision, 2022

Prebypass: Software Register File Bypassing for Reduced Interconnection Architectures.
Proceedings of the 25th Euromicro Conference on Digital System Design, 2022

Real-Time Light Field Path Tracing.
Proceedings of the Advances in Computer Graphics, 2022

OpenASIP 2.0: Co-Design Toolset for RISC-V Application-Specific Instruction-Set Processors.
Proceedings of the 33rd IEEE International Conference on Application-specific Systems, 2022

Dual-IS: Instruction Set Modality for Efficient Instruction Level Parallelism.
Proceedings of the Architecture of Computing Systems - 35th International Conference, 2022

2021
Design and management of image processing pipelines within CPS: Acquired experience towards the end of the FitOptiVis ECSEL Project.
Microprocess. Microsystems, November, 2021

Evaluation of Different Processor Architecture Organizations for On-Site Electronics in Harsh Environments.
Int. J. Parallel Program., 2021

PoCL-R: A Scalable Low Latency Distributed OpenCL Runtime.
Proceedings of the Embedded Computer Systems: Architectures, Modeling, and Simulation, 2021

Unified OpenCL Integration Methodology for FPGA Designs.
Proceedings of the IEEE Nordic Circuits and Systems Conference, NorCAS 2021, Oslo, 2021

Energy Efficient Multistandard Decompressor ASIP.
Proceedings of the ICCDE 2021: 7th International Conference on Computing and Data Engineering, Phuket, Thailand, January 15, 2021

Performance of Texture Compression Algorithms in Low-Latency Computer Vision Tasks.
Proceedings of the 9th European Workshop on Visual Information Processing, 2021

DDISH-GI: Dynamic Distributed Spherical Harmonics Global Illumination.
Proceedings of the Advances in Computer Graphics, 2021

2020
Energy Efficient Low Latency Multi-issue Cores for Intelligent Always-On IoT Applications.
J. Signal Process. Syst., 2020

Systematic Evaluation of the Quality Benefits of Spatiotemporal Sample Reprojection in Real-Time Stereoscopic Path Tracing.
IEEE Access, 2020

System Simulation of Memristor Based Computation in Memory Platforms.
Proceedings of the Embedded Computer Systems: Architectures, Modeling, and Simulation, 2020

POCL-R: Distributed OpenCL Runtime for Low Latency Remote Offloading.
Proceedings of the IWOCL '20: International Workshop on OpenCL, 2020

HIPCL: Tool for Porting CUDA Applications to Advanced OpenCL Platforms Through HIP.
Proceedings of the IWOCL '20: International Workshop on OpenCL, 2020

CPSoSaware: Cross-Layer Cognitive Optimization Tools & Methods for the Lifecycle Support of Dependable CPSoS.
Proceedings of the 2020 IEEE Computer Society Annual Symposium on VLSI, 2020

Programmable Dictionary Code Compression for Instruction Stream Energy Efficiency.
Proceedings of the 38th IEEE International Conference on Computer Design, 2020

Machine Learning is the Solution Also for Foveated Path Tracing Reconstruction.
Proceedings of the 15th International Joint Conference on Computer Vision, 2020

TTA-SIMD Soft Core Processors.
Proceedings of the 30th International Conference on Field-Programmable Logic and Applications, 2020

Design and management of image processing pipelines within CPS: 2 years of experience from the FitOptiVis ECSEL Project.
Proceedings of the 23rd Euromicro Conference on Digital System Design, 2020

2019
Exploiting Task Parallelism with OpenCL: A Case Study.
J. Signal Process. Syst., 2019

ALMARVI System Solution for Image and Video Processing in Healthcare, Surveillance and Mobile Applications.
J. Signal Process. Syst., 2019

LordCore: Energy-Efficient OpenCL-Programmable Software-Defined Radio Coprocessor.
IEEE Trans. Very Large Scale Integr. Syst., 2019

Blockwise Multi-Order Feature Regression for Real-Time Path-Tracing Reconstruction.
ACM Trans. Graph., 2019

Programmable and Scalable Architecture for Graphics Processing Units.
Trans. High Perform. Embed. Archit. Compil., 2019

Towards Efficient Code Generation for Exposed Datapath Architectures.
Proceedings of the 22nd International Workshop on Software and Compilers for Embedded Systems, 2019

Evaluation of Different Processor Architecture Organizations for On-site Electronics in Harsh Environments.
Proceedings of the Embedded Computer Systems: Architectures, Modeling, and Simulation, 2019

Foveated Real-Time Path Tracing in Visual-Polar Space.
Proceedings of the 30th Eurographics Symposium on Rendering, 2019

SHRIMP: Efficient Instruction Delivery with Domain Wall Memory.
Proceedings of the 2019 IEEE/ACM International Symposium on Low Power Electronics and Design, 2019

Reducing Computational Complexity of Real-Time Stereoscopic Ray Tracing with Spatiotemporal Sample Reprojection.
Proceedings of the 14th International Joint Conference on Computer Vision, 2019

AEx: Automated Customization of Exposed Datapath Soft-Cores.
Proceedings of the 22nd Euromicro Conference on Digital System Design, 2019

The FitOptiVis ECSEL project: highly efficient distributed embedded image/video processing in cyber-physical systems.
Proceedings of the 16th ACM International Conference on Computing Frontiers, 2019

2018
Instruction Fetch Energy Reduction with Biased SRAMs.
J. Signal Process. Syst., 2018

Software Defined Radio Implementation of a Digital Self-interference Cancellation Method for Inband Full-Duplex Radio Using Mobile Processors.
J. Signal Process. Syst., 2018

PLOCTree: A Fast, High-Quality Hardware BVH Builder.
Proc. ACM Comput. Graph. Interact. Tech., 2018

Variable Length Instruction Compression on Transport Triggered Architectures.
Int. J. Parallel Program., 2018

Instantaneous foveated preview for progressive Monte Carlo rendering.
Comput. Vis. Media, 2018

Offloading C++17 Parallel STL on System Shared Virtual Memory Platforms.
Proceedings of the High Performance Computing, 2018

LoTTA: Energy-Efficient Processor for Always-On Applications.
Proceedings of the 2018 IEEE International Workshop on Signal Processing Systems, 2018

AivoTTA: an energy efficient programmable accelerator for CNN-based object recognition.
Proceedings of the 18th International Conference on Embedded Computer Systems: Architectures, 2018

Energy-Delay Trade-Offs in Instruction Register File Design.
Proceedings of the 2018 IEEE Nordic Circuits and Systems Conference, 2018

Transport Triggered Polar Decoders.
Proceedings of the 10th IEEE International Symposium on Turbo Codes & Iterative Information Processing, 2018

Transport-Triggered Soft Cores.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018

Sparse Sampling for Real-time Ray Tracing.
Proceedings of the 13th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2018), 2018

2017
Codesign Case Study on Transport-Triggered Architectures.
Proceedings of the Handbook of Hardware/Software Codesign., 2017

MergeTree: A Fast Hardware HLBVH Constructor for Animated Ray Tracing.
ACM Trans. Graph., 2017

Fast Hardware Construction and Refitting of Quantized Bounding Volume Hierarchies.
Comput. Graph. Forum, 2017

Foveated instant preview for progressive rendering.
Proceedings of the SIGGRAPH Asia 2017 Technical Briefs, Bangkok, Thailand, November 27, 2017

Exposed datapath optimizations for loop scheduling.
Proceedings of the 2017 International Conference on Embedded Computer Systems: Architectures, 2017

2016
Improving Code Density with Variable Length Encoding Aware Instruction Scheduling.
J. Signal Process. Syst., 2016

Integer Linear Programming-Based Scheduling for Transport Triggered Architectures.
ACM Trans. Archit. Code Optim., 2016

Xor-Masking: A Novel Statistical Method for Instruction Read Energy Reduction in Contemporary SRAM Technologies.
Proceedings of the 2016 IEEE International Workshop on Signal Processing Systems, 2016

Multi bounding volume hierarchies for ray tracing pipelines.
Proceedings of the SIGGRAPH ASIA 2016, Macao, December 5-8, 2016 - Technical Briefs, 2016

Aggressively bypassing list scheduler for transport triggered architectures.
Proceedings of the International Conference on Embedded Computer Systems: Architectures, 2016

OpenCL programmable exposed datapath high performance low-power image signal processor.
Proceedings of the IEEE Nordic Circuits and Systems Conference, 2016

Foveated Path Tracing - A Literature Review and a Performance Gain Analysis.
Proceedings of the Advances in Visual Computing - 12th International Symposium, 2016

Customized high performance low power processor for binaural speaker localization.
Proceedings of the 2016 IEEE International Conference on Electronics, Circuits and Systems, 2016

Half-precision Floating-point Ray Traversal.
Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2016), 2016

Software defined radio implementation of adaptive nonlinear digital self-interference cancellation for mobile inband full-duplex radio.
Proceedings of the 2016 IEEE Global Conference on Signal and Information Processing, 2016

2015
Embedded Multi-Core Systems Dedicated to Dynamic Dataflow Programs.
J. Signal Process. Syst., 2015

Code Density and Energy Efficiency of Exposed Datapath Architectures.
J. Signal Process. Syst., 2015

Data Intensive Computing: From Modeling to Implementation.
J. Signal Process. Syst., 2015

pocl: A Performance-Portable OpenCL Implementation.
Int. J. Parallel Program., 2015

MergeTree: a HLBVH constructor for mobile systems.
Proceedings of the SIGGRAPH Asia 2015 Technical Briefs, Kobe, Japan, November 2-6, 2015, 2015

Power optimizations for transport triggered SIMD processors.
Proceedings of the 2015 International Conference on Embedded Computer Systems: Architectures, 2015

Rapid customization of image processors using Halide.
Proceedings of the 2015 IEEE Global Conference on Signal and Information Processing, 2015

Parallel processing intensive digital front-end for IEEE 802.11ac receiver.
Proceedings of the 49th Asilomar Conference on Signals, Systems and Computers, 2015

2014
Compiler optimizations for code density of variable length instructions.
Proceedings of the 2014 IEEE Workshop on Signal Processing Systems, 2014

Programmable in-loop deblock filter processor for video decoders.
Proceedings of the 2014 IEEE Workshop on Signal Processing Systems, 2014

Grover: Looking for Performance Improvement by Disabling Local Memory Usage in OpenCL Kernels.
Proceedings of the 43rd International Conference on Parallel Processing, 2014

Efficient software synthesis of dynamic dataflow programs.
Proceedings of the IEEE International Conference on Acoustics, 2014

A high throughput LDPC decoder using a mid-range GPU.
Proceedings of the IEEE International Conference on Acoustics, 2014

Parallel programming of a symmetric transport-triggered architecture with applications in flexible LDPC encoding.
Proceedings of the IEEE International Conference on Acoustics, 2014

Heuristics for greedy transport triggered architecture interconnect exploration.
Proceedings of the 2014 International Conference on Compilers, 2014

2013
Inexpensive correctly rounded floating-point division and square root with input scaling.
Proceedings of the IEEE Workshop on Signal Processing Systems, 2013

Low-power application-specific FFT processor for LTE applications.
Proceedings of the 2013 International Conference on Embedded Computer Systems: Architectures, 2013

Special session on "Exposed data path architectures: Recent advances and applications".
Proceedings of the 2013 International Conference on Embedded Computer Systems: Architectures, 2013

Turbo decoding on tailored OpenCL processor.
Proceedings of the 2013 9th International Wireless Communications and Mobile Computing Conference, 2013

A 122Mb/s Turbo decoder using a mid-range GPU.
Proceedings of the 2013 9th International Wireless Communications and Mobile Computing Conference, 2013

Towards run-time actor mapping of dynamic dataflow programs onto multi-core platforms.
Proceedings of the 8th International Symposium on Image and Signal Processing and Analysis, 2013

Simplified floating-point division and square root.
Proceedings of the IEEE International Conference on Acoustics, 2013

2012
From Parallel Programs to Customized Parallel Processors.
PhD thesis, 2012

2011
Design Methodology for Offloading Software Executions to FPGA.
J. Signal Process. Syst., 2011

TCEMC: A co-design flow for application-specific multicores.
Proceedings of the 2011 International Conference on Embedded Computer Systems: Architectures, 2011

Customizable Datapath Integrated Lock Unit.
Proceedings of the 2011 International Symposium on System on Chip, 2011

Operation set customization in retargetable compilers.
Proceedings of the Conference Record of the Forty Fifth Asilomar Conference on Signals, 2011

2010
OpenCL-based design methodology for application-specific processors.
Proceedings of the 2010 International Conference on Embedded Computer Systems: Architectures, 2010

Customized Exposed Datapath Soft-Core Design Flow with Compiler Support.
Proceedings of the International Conference on Field Programmable Logic and Applications, 2010

2009
Reconfigurable video decoder with transform acceleration.
Proceedings of the IEEE Workshop on Signal Processing Systems, 2009

Programmable Accelerators for Reconfigurable Video Decoder.
Proceedings of the Embedded Computer Systems: Architectures, 2009

Programmable and Scalable Architecture for Graphics Processing Units.
Proceedings of the Embedded Computer Systems: Architectures, 2009

2008
Resource conflict detection in simulation of function unit pipelines.
J. Syst. Archit., 2008

Impact of Software Bypassing on Instruction Level Parallelism and Register File Traffic.
Proceedings of the Embedded Computer Systems: Architectures, 2008

Reducing Context Switch Overhead with Compiler-Assisted Threading.
Proceedings of the 2008 IEEE/IPIP International Conference on Embedded and Ubiquitous Computing (EUC 2008), 2008

2007
Resource Conflict Detection in Simulation of Function Unit Pipelines.
Proceedings of the Embedded Computer Systems: Architectures, 2007

Run-Time Scheduled Hardware Acceleration of MPEG-4 Video Decoding.
Proceedings of the International Symposium on System-on-Chip, 2007

2006
Software Pipelining Support for Transport Triggered Architecture Processors.
Proceedings of the Embedded Computer Systems: Architectures, 2006

Loop Scheduling for Transport Triggered Architecture Processors.
Proceedings of the International Symposium on System-on-Chip, 2006


  Loading...