Weng-Fai Wong

According to our database1, Weng-Fai Wong
  • authored at least 115 papers between 1989 and 2018.
  • has a "Dijkstra number"2 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Other 

Links

On csauthors.net:

Bibliography

2018
Gloss: Seamless Live Reconfiguration and Reoptimization of Stream Programs.
Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018

2017
Parallelizing Skip Lists for In-Memory Multi-Core Database Systems.
Proceedings of the 33rd IEEE International Conference on Data Engineering, 2017

Exploiting half precision arithmetic in Nvidia GPUs.
Proceedings of the 2017 IEEE High Performance Extreme Computing Conference, 2017

Automated Property Synthesis of ODEs Based Bio-pathways Models.
Proceedings of the Computational Methods in Systems Biology, 2017

Efficient floating point precision tuning for approximate computing.
Proceedings of the 22nd Asia and South Pacific Design Automation Conference, 2017

2016
Exploiting Single-Threaded Model in Multi-Core In-Memory Systems.
IEEE Trans. Knowl. Data Eng., 2016

TreeFTL: An Efficient Workload-Adaptive Algorithm for RAM Buffer Management of NAND Flash-Based Devices.
IEEE Trans. Computers, 2016

PI : a Parallel in-memory skip list based Index.
CoRR, 2016

2015
A Family of Bit-Representation-Optimized Formats for Fast Sparse Matrix-Vector Multiplication on the GPU.
IEEE Trans. Parallel Distrib. Syst., 2015

A Code Generation Framework for Targeting Optimized Library Calls for Multiple Platforms.
IEEE Trans. Parallel Distrib. Syst., 2015

In-memory Databases: Challenges and Opportunities From Software and Hardware Perspectives.
SIGMOD Record, 2015

3DFTL: a three-level demand-based translation strategy for flash device.
IEICE Electronic Express, 2015

DGCC: A New Dependency Graph based Concurrency Control Protocol for Multicore Database Systems.
CoRR, 2015

"Anti-Caching"-based elastic memory management for Big Data.
Proceedings of the 31st IEEE International Conference on Data Engineering, 2015

Parallelized Parameter Estimation of Biological Pathway Models.
Proceedings of the Hybrid Systems Biology - Fourth International Workshop, 2015

PAC: Program Analysis for Approximation-aware Compilation.
Proceedings of the 2015 International Conference on Compilers, 2015

2014
STT-RAM Cache Hierarchy With Multiretention MTJ Designs.
IEEE Trans. VLSI Syst., 2014

Mapping Streaming Applications onto GPU Systems.
IEEE Trans. Parallel Distrib. Syst., 2014

StreamJIT: a commensal compiler for high-performance stream programming.
Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, 2014

ASAC: automatic sensitivity analysis for approximate computing.
Proceedings of the SIGPLAN/SIGBED Conference on Languages, 2014

Optimizing MLC-based STT-RAM caches by dynamic block size reconfiguration.
Proceedings of the 32nd IEEE International Conference on Computer Design, 2014

EnVM: Virtual memory design for new memory architectures.
Proceedings of the 2014 International Conference on Compilers, 2014

A coherent hybrid SRAM and STT-RAM L1 cache architecture for shared memory multicores.
Proceedings of the 19th Asia and South Pacific Design Automation Conference, 2014

2013
GPU code generation for ODE-based applications with phased shared-data access patterns.
TACO, 2013

On-chip caches built on multilevel spin-transfer torque RAM cells and its optimizations.
JETC, 2013

Accelerating sparse matrix-vector multiplication on GPUs using bit-representation-optimized schemes.
Proceedings of the International Conference for High Performance Computing, 2013

A practical low-power memristor-based analog neural branch predictor.
Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED), 2013

Optimizing and Auto-Tuning Iterative Stencil Loops for GPUs with the In-Plane Method.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

TreeFTL: efficient RAM management for high performance of NAND flash-based storage systems.
Proceedings of the Design, Automation and Test in Europe, 2013

SAW: system-assisted wear leveling on the write endurance of NAND flash devices.
Proceedings of the 50th Annual Design Automation Conference 2013, 2013

2012
Approximate probabilistic analysis of biopathway dynamics.
Bioinformatics, 2012

Poster: Automated Mapping Streaming Applications onto GPUs.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Abstract: Mapping Streaming Applications onto GPU Systems.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Scalable framework for mapping streaming applications onto multi-GPU systems.
Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2012

ADAPT: Efficient workload-sensitive flash management based on adaptation, prediction and aggregation.
Proceedings of the IEEE 28th Symposium on Mass Storage Systems and Technologies, 2012

Automatic Refactoring of Legacy Fortran Code to the Array Slicing Notation.
Proceedings of the 18th IEEE International Conference on Parallel and Distributed Systems, 2012

Guppy: A GPU-like soft-core processor.
Proceedings of the 2012 International Conference on Field-Programmable Technology, 2012

Tulipse: A Visualization Framework for User-Guided Parallelization.
Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012

Extending the lifetime of NAND flash memory by salvaging bad blocks.
Proceedings of the 2012 Design, Automation & Test in Europe Conference & Exhibition, 2012

Observational wear leveling: an efficient algorithm for flash memory management.
Proceedings of the 49th Annual Design Automation Conference 2012, 2012

2011
Guest Editorial - BSN2010 Special Issue.
IEEE Trans. Biomed. Circuits and Systems, 2011

Internet-based hardware/software co-design framework for embedded 3D graphics applications.
EURASIP J. Adv. Sig. Proc., 2011

Dynamic cache contention detection in multi-threaded applications.
Proceedings of the 7th International Conference on Virtual Execution Environments, 2011

Multi retention level STT-RAM cache designs with a dynamic refresh scheme.
Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

Processor caches with multi-level spin-transfer torque ram cells.
Proceedings of the 2011 International Symposium on Low Power Electronics and Design, 2011

Automated Architecture-Aware Mapping of Streaming Applications Onto GPUs.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Co-synthesis of FPGA-based application-specific floating point simd accelerators.
Proceedings of the ACM/SIGDA 19th International Symposium on Field Programmable Gate Arrays, 2011

A UML 2-based hardware-software co-design framework for body sensor network applications.
Proceedings of the Design, Automation and Test in Europe, 2011

2010
PiPA: Pipelined profiling and analysis on multicore systems.
TACO, 2010

Interprocedural Placement-Aware Configuration Prefetching for FPGA-Based Systems.
Proceedings of the 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2010

2009
Tolerating process variations in large, set-associative caches: The buddy cache.
TACO, 2009

Automatically patching errors in deployed software.
Proceedings of the 22nd ACM Symposium on Operating Systems Principles 2009, 2009

The salvage cache: A fault-tolerant cache architecture for next-generation memory technologies.
Proceedings of the 27th International Conference on Computer Design, 2009

Optimal Placement-aware Trace-Based Scheduling of Hardware Reconfigurations for FPGA Accelerators.
Proceedings of the FCCM 2009, 2009

A computing origami: folding streams in FPGAs.
Proceedings of the 46th Design Automation Conference, 2009

A DVS-based pipelined reconfigurable instruction memory.
Proceedings of the 46th Design Automation Conference, 2009

BSN Simulator: Optimizing Application Using System Level Simulation.
Proceedings of the Sixth International Workshop on Wearable and Implantable Body Sensor Networks, 2009

A UML-based approach for heterogeneous IP integration.
Proceedings of the 14th Asia South Pacific Design Automation Conference, 2009

2008
Fast, frequency-based, integrated register allocation and instruction scheduling.
Softw., Pract. Exper., 2008

Defining neighborhood relations for fast spatial-temporal partitioning of applications on reconfigurable architectures.
Proceedings of the 2008 International Conference on Field-Programmable Technology, 2008

Pipa: pipelined profiling and analysis on multi-core systems.
Proceedings of the Sixth International Symposium on Code Generation and Optimization (CGO 2008), 2008

How to Do a Million Watchpoints: Efficient Debugging Using Dynamic Instrumentation.
Proceedings of the Compiler Construction, 17th International Conference, 2008

2007
Editorial for the Special Issue on Field Programmable Technology.
VLSI Signal Processing, 2007

A UML-Based Design Framework for Time-Triggered Applications.
Proceedings of the 28th IEEE Real-Time Systems Symposium (RTSS 2007), 2007

VOSCH: Voltage scaled cache hierarchies.
Proceedings of the 25th International Conference on Computer Design, 2007

DRIM: a low power dynamically reconfigurable instruction memory hierarchy for embedded systems.
Proceedings of the 2007 Design, Automation and Test in Europe Conference and Exposition, 2007

Ubiquitous Memory Introspection.
Proceedings of the Fifth International Symposium on Code Generation and Optimization (CGO 2007), 2007

An Inter-Core Communication Enabled Multi-Core Simulator Based on SimpleScalar.
Proceedings of the 21st International Conference on Advanced Information Networking and Applications (AINA 2007), 2007

2006
Generating hardware from OpenMP programs.
Proceedings of the 2006 IEEE International Conference on Field Programmable Technology, 2006

Co-optimization of Performance and Power in a Superscalar Processor Design.
Proceedings of the Emerging Directions in Embedded and Ubiquitous Computing, 2006

DEP: detailed execution profile.
Proceedings of the 15th International Conference on Parallel Architecture and Compilation Techniques (PACT 2006), 2006

2005
Dynamic memory optimization using pool allocation and prefetching.
SIGARCH Computer Architecture News, 2005

Using UML 2.0 for System Level Design of Real Time SoC Platforms for Stream Processing.
Proceedings of the 11th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA 2005), 2005

Sensor Grid: Integration ofWireless Sensor Networks and the Grid.
Proceedings of the 30th Annual IEEE Conference on Local Computer Networks (LCN 2005), 2005

Cooperative Instruction Scheduling with Linear Scan Register Allocation.
Proceedings of the High Performance Computing, 2005

A Reconfigurable Instruction Memory Hierarchy for Embedded Systems.
Proceedings of the 2005 International Conference on Field Programmable Logic and Applications (FPL), 2005

An integrated performance and power model for superscalar processor designs.
Proceedings of the 2005 Conference on Asia South Pacific Design Automation, 2005

Design of clocked circuits using UML.
Proceedings of the 2005 Conference on Asia South Pacific Design Automation, 2005

Targeted Data Prefetching.
Proceedings of the Advances in Computer Systems Architecture, 10th Asia-Pacific Conference, 2005

A Performance and Power Co-optimization Approach for Modern Processors.
Proceedings of the Fifth International Conference on Computer and Information Technology (CIT 2005), 2005

2004
Data Integrity Framework and Language Support for Active Web Intermediaries.
Proceedings of the Web Content Caching and Distribution: 9th International Workshop, 2004

Model-Driven SoC Design via Executable UML to SystemC.
Proceedings of the 25th IEEE Real-Time Systems Symposium (RTSS 2004), 2004

Adaptive Compiler Directed Prefetching for EPIC Processors.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 2004

Configuration bitstream compression for dynamically reconfigurable FPGAs.
Proceedings of the 2004 International Conference on Computer-Aided Design, 2004

Windows CE for a reconfigurable system-on-a-chip processor.
Proceedings of the 2004 IEEE International Conference on Field-Programmable Technology, 2004

Tuning SoC platforms for multimedia processing: identifying limits and tradeoffs.
Proceedings of the 2nd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2004

Static Identification of Delinquent Loads.
Proceedings of the 2nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2004), 2004

Compiler orchestrated prefetching via speculation and predication.
Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems, 2004

2003
SilkRoad II: mixed paradigm cluster computing with RC_dag consistency.
Parallel Computing, 2003

Compiling to FPGAs via an EPIC compiler's intermediate representation.
Proceedings of the 2003 IEEE International Conference on Field-Programmable Technology, 2003

A Model for Hardware Realization of Kernel Loops.
Proceedings of the Field Programmable Logic and Application, 13th International Conference, 2003

The Performance Model of SilkRoad - A Multithreaded DSM System for Clusters.
Proceedings of the 3rd IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2003), 2003

2002
A Framework for Data Prefetching Using Off-Line Training of Markovian Predictors.
Proceedings of the 20th International Conference on Computer Design (ICCD 2002), 2002

PD-XML: extensible markup language for processor description.
Proceedings of the 2002 IEEE International Conference on Field-Programmable Technology, 2002

A co-simulation study of adaptive EPIC computing.
Proceedings of the 2002 IEEE International Conference on Field-Programmable Technology, 2002

Shell over a Cluster (SHOC): Towards Achieving Single System Image via the Shell.
Proceedings of the 2002 IEEE International Conference on Cluster Computing (CLUSTER 2002), 2002

SilkRoad II: A Multi-Paradigm Runtime System for Cluster Computing.
Proceedings of the 2002 IEEE International Conference on Cluster Computing (CLUSTER 2002), 2002

2001
Compiler Optimizations for Adaptive EPIC Processors.
Proceedings of the Embedded Software, First International Workshop, 2001

The emerging power crisis in embedded processors: what can a poor compiler do?
Proceedings of the 2001 International Conference on Compilers, 2001

2000
SilkRoad: A Multithreaded Runtime System with Software Distributed Shared Memory for SMP Clusters.
Proceedings of the 2000 IEEE International Conference on Cluster Computing (CLUSTER 2000), November 28th, 2000

1999
tmPVM - Task Migratable PVM.
Proceedings of the 13th International Parallel Processing Symposium / 10th Symposium on Parallel and Distributed Processing (IPPS / SPDP '99), 1999

1996
BaLinda Lisp: Design and Implementation.
Comput. Lang., 1996

1995
Fast Evaluation of the Elementary Functions in Single Precision.
IEEE Trans. Computers, 1995

Evaluation of the Hitachi S-3800 Supercomputer Using Six Benchmarks.
IJHPCA, 1995

Compiling Parallel Lisp for a Shared Memory Multiprocessor.
Proceedings of the Seventh IASTED/ISMM International Conference on Parallel and Distributed Computing and Systems, 1995

Highy Efficient Parallel Lisp Implementation on Distributed Systems.
Proceedings of the Parallel Computing: State-of-the-Art and Perspectives, 1995

Design and Implementation of Abstract Machine for Parallel Lisp Compilation.
Proceedings of the 1995 International Conference on Parallel Processing, 1995

1994
Fast Hardware-Based Algorithms for Elementary Function Computations Using Rectangular Multipliers.
IEEE Trans. Computers, 1994

A Simulation Study on the Interactions between Multithreaded Architectures and the Cache.
International Journal of High Speed Computing, 1994

Fast Evaluation of the Elementary Functions in Double Precision.
Proceedings of the 27th Annual Hawaii International Conference on System Sciences (HICSS-27), 1994

1992
A Model of Speculative Parallelism.
Parallel Processing Letters, 1992

Evaluation of the continuation bit in the Cyclic Pipeline Computer.
Parallel Computing, 1992

1991
Effects of Multiple Instruction Stream Execution on Cache Performance.
International Journal of High Speed Computing, 1991

1990
A self interpreter for BaLinda Lisp.
SIGPLAN Notices, 1990

1989
BIDDLE: a bidirectional data driven Lisp engine.
Proceedings of the IEEE International Workshop on Tools for Artificial Intelligence: Architectures, 1989


  Loading...