Jan-Jan Wu

According to our database1, Jan-Jan Wu authored at least 143 papers between 1987 and 2023.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2023
Exploiting Fine-Grained Structured Pruning for Efficient Inference on CNN Model.
Proceedings of the 29th IEEE International Conference on Parallel and Distributed Systems, 2023

Function Clustering to Optimize Resource Utilization on Container Platform.
Proceedings of the 29th IEEE International Conference on Parallel and Distributed Systems, 2023

Accelerate Inference of CNN Models on CPU via Column Combining Based on Simulated Annealing.
Proceedings of the Eleventh International Symposium on Computing and Networking, CANDAR 2023, Matsue, Japan, November 28, 2023

2022
Accelerating Video Captioning on Heterogeneous System Architectures.
ACM Trans. Archit. Code Optim., 2022

CNN Models Acceleration Using Filter Pruning and Sparse Tensor Core.
Int. J. Netw. Comput., 2022

Accelerating Convolutional Neural Networks via Inter-operator Scheduling.
Proceedings of the 28th IEEE International Conference on Parallel and Distributed Systems, 2022

A Cloud-Native Online Judge System.
Proceedings of the 46th IEEE Annual Computers, Software, and Applications Conferenc, 2022

Efficient Dual Batch Size Deep Learning for Distributed Parameter Server Systems.
Proceedings of the 46th IEEE Annual Computers, Software, and Applications Conferenc, 2022

Efficient Inference on Convolutional Neural Networks by Image Difficulty Prediction.
Proceedings of the IEEE International Conference on Big Data, 2022

2021
Parallel Asynchronous Stochastic Dual Coordinate Descent Algorithms for High Efficiency and Stable Convergence.
Proceedings of the 29th Euromicro International Conference on Parallel, 2021

Efficient Video Captioning on Heterogeneous System Architectures.
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

Accelerate CNN Models via Filter Pruning and Sparse Tensor Core.
Proceedings of the Ninth International Symposium on Computing and Networking, 2021

Optimal Branch Location for Cost-effective Inference on Branchynet.
Proceedings of the 2021 IEEE International Conference on Big Data (Big Data), 2021

2020
Convolution Filter Pruning for Transfer Learning on Small Dataset.
Proceedings of the International Computer Symposium, 2020

Exploiting Data Entropy for Neural Network Compression.
Proceedings of the 2020 IEEE International Conference on Big Data (IEEE BigData 2020), 2020

An Adaptive Layer Expansion Algorithm for Efficient Training of Deep Neural Networks.
Proceedings of the 2020 IEEE International Conference on Big Data (IEEE BigData 2020), 2020

2019
Exploiting SIMD Asymmetry in ARM-to-x86 Dynamic Binary Translation.
ACM Trans. Archit. Code Optim., 2019

Processor-Tracing Guided Region Formation in Dynamic Binary Translation.
ACM Trans. Archit. Code Optim., 2019

Optimizing data permutations in structured loads/stores translation and SIMD register mapping for a cross-ISA dynamic binary translator.
J. Syst. Archit., 2019

A collaborative CPU-GPU approach for deep learning on mobile devices.
Concurr. Comput. Pract. Exp., 2019

Exploiting Vector Processing in Dynamic Binary Translation.
Proceedings of the 48th International Conference on Parallel Processing, 2019

Task Scheduling Techniques for Deep Learning in Heterogeneous Environment.
Proceedings of the Seventh International Symposium on Computing and Networking Workshops, 2019

A Bicameralism Voting Framework for Combining Knowledge from Clients into Better Prediction.
Proceedings of the 2019 IEEE International Conference on Big Data (IEEE BigData), 2019

2018
Improving SIMD Parallelism via Dynamic Binary Translation.
ACM Trans. Embed. Comput. Syst., 2018

Efficient and retargetable SIMD translation in a dynamic binary translator.
Softw. Pract. Exp., 2018

A collaborative CPU-GPU approach for principal component analysis on mobile heterogeneous platforms.
J. Parallel Distributed Comput., 2018

Workload prediction and balance for distributed reachability processing for large-scale attribute graphs.
Concurr. Comput. Pract. Exp., 2018

Dynamic tuning of applications using restricted transactional memory.
Proceedings of the 2018 Conference on Research in Adaptive and Convergent Systems, 2018

Low Precision Deep Learning Training on Mobile Heterogeneous Platform.
Proceedings of the 26th Euromicro International Conference on Parallel, 2018

Communication Scheduling Optimization for Distributed Deep Learning Systems.
Proceedings of the 24th IEEE International Conference on Parallel and Distributed Systems, 2018

Energy-Efficient Core Allocation and Deployment for Container-Based Virtualization.
Proceedings of the 24th IEEE International Conference on Parallel and Distributed Systems, 2018

Communication Usage Optimization of Gradient Sparsification with Aggregation in Deep Learning.
Proceedings of the VII International Conference on Network, Communication and Computing, 2018

An Efficient Dynamic Load-Balancing Large Scale Graph-Processing System.
Proceedings of the VII International Conference on Network, Communication and Computing, 2018

Data Pinning and Back Propagation Memory Optimization for Deep Learning on GPU.
Proceedings of the Sixth International Symposium on Computing and Networking, 2018

Adaptive Communication for Distributed Deep Learning on Commodity GPU Cluster.
Proceedings of the 18th IEEE/ACM International Symposium on Cluster, 2018

Exploiting SIMD capability in an ARMv7-to-ARMv8 dynamic binary translator.
Proceedings of the International Conference on Compilers, 2018

Versatile Communication Optimization for Deep Learning by Modularized Parameter Server.
Proceedings of the IEEE International Conference on Big Data (IEEE BigData 2018), 2018

2017
Data partition optimisation for column-family NoSQL databases.
Int. J. Big Data Intell., 2017

Dynamic translation of structured Loads/Stores and register mapping for architectures with SIMD extensions.
Proceedings of the 18th ACM SIGPLAN/SIGBED Conference on Languages, 2017

CPU/GPU Collaboration Techniques for Transfer Learning on Mobile Devices.
Proceedings of the 23rd IEEE International Conference on Parallel and Distributed Systems, 2017

High Resource Utilization Auto-Scaling Algorithms for Heterogeneous Container Configurations.
Proceedings of the 23rd IEEE International Conference on Parallel and Distributed Systems, 2017

Efficient Cache Update for In-Memory Cluster Computing with Spark.
Proceedings of the 17th IEEE/ACM International Symposium on Cluster, 2017

Exploiting Asymmetric SIMD Register Configurations in ARM-to-x86 Dynamic Binary Translation.
Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

2016
Optimizing Control Transfer and Memory Virtualization in Full System Emulators.
ACM Trans. Archit. Code Optim., 2016

An Energy-Efficient Scheduler for Throughput Guaranteed Jobs on Asymmetric Multi-Core Platforms.
Proceedings of the 22nd IEEE International Conference on Parallel and Distributed Systems, 2016

Exploiting Longer SIMD Lanes in Dynamic Binary Translation.
Proceedings of the 22nd IEEE International Conference on Parallel and Distributed Systems, 2016

2015
A dynamic binary translation system in a client/server environment.
J. Syst. Archit., 2015

Energy-efficient task scheduling for multi-core platforms with per-core DVFS.
J. Parallel Distributed Comput., 2015

A Partial Workload Offloading Framework in a Mobile Cloud Computing Context.
Proceedings of the 8th IEEE International Conference on Service-Oriented Computing and Applications, 2015

Data Partition Optimization for Column-Family NoSQL Databases.
Proceedings of the 2015 IEEE International Conference on Smart City/SocialCom/SustainCom/DataCom/SC2 2015, 2015

SIMD Code Translation in an Enhanced HQEMU.
Proceedings of the 21st IEEE International Conference on Parallel and Distributed Systems, 2015

Resource Provision for Batch and Interactive Workloads in Data Centers.
Proceedings of the 21st IEEE International Conference on Parallel and Distributed Systems, 2015

Improving SIMD code generation in QEMU.
Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015

Job Dispatching and Scheduling for Heterogeneous Clusters - A Case Study on the Billing Subsystem of CHT Telecommunication.
Proceedings of the 39th IEEE Annual Computer Software and Applications Conference, 2015

Efficient distributed maximum matching for solving the container exchange problem in the maritime industry.
Proceedings of the 2015 IEEE International Conference on Big Data (IEEE BigData 2015), Santa Clara, CA, USA, October 29, 2015

2014
Efficient and Retargetable Dynamic Binary Translation on Multicores.
IEEE Trans. Parallel Distributed Syst., 2014

DBILL: an efficient and retargetable dynamic binary instrumentation framework using llvm backend.
Proceedings of the 10th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, 2014

Efficient memory virtualization for Cross-ISA system mode emulation.
Proceedings of the 10th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, 2014

Adaptive OpenCL Computation Offloading Framework on Mobile Device.
Proceedings of the Intelligent Systems and Applications, 2014

An Energy-Efficient Task Scheduler for Multi-core Platforms with Per-core DVFS Based on Task Characteristics.
Proceedings of the 43rd International Conference on Parallel Processing, 2014

An energy-efficient hypervisor scheduler for asymmetric multi-core.
Proceedings of the IEEE 3rd Global Conference on Consumer Electronics, 2014

2013
Improving dynamic binary optimization through early-exit guided code region formation.
Proceedings of the ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (co-located with ASPLOS 2013), 2013

Sampling-Based Phase Classification and Prediction for Multi-threaded Program Execution on Multi-core Architectures.
Proceedings of the 42nd International Conference on Parallel Processing, 2013

GPU Virtualization Support in Cloud System.
Proceedings of the Grid and Pervasive Computing - 8th International Conference, 2013

Automatic Resource Scaling for Web Applications in the Cloud.
Proceedings of the Grid and Pervasive Computing - 8th International Conference, 2013

Kylin: An efficient and scalable graph data processing system.
Proceedings of the 2013 IEEE International Conference on Big Data (IEEE BigData 2013), 2013

Data Replication for Distributed Graph Processing.
Proceedings of the 2013 IEEE Sixth International Conference on Cloud Computing, Santa Clara, CA, USA, June 28, 2013

2012
Scheduling of variable-time jobs for distributed systems with heterogeneous processor cardinality.
Int. J. Ad Hoc Ubiquitous Comput., 2012

Probability-Based Cloud Storage Providers Selection Algorithms with Maximum Availability.
Proceedings of the 41st International Conference on Parallel Processing, 2012

Workload characteristics-aware virtual machine consolidation algorithms.
Proceedings of the 4th IEEE International Conference on Cloud Computing Technology and Science Proceedings, 2012

HQEMU: a multi-threaded and retargetable dynamic binary translator on multicores.
Proceedings of the 10th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2012

Automatic Resource Scaling Based on Application Service Requirements.
Proceedings of the 2012 IEEE Fifth International Conference on Cloud Computing, 2012

Distributed Graph Database for Large-Scale Social Computing.
Proceedings of the 2012 IEEE Fifth International Conference on Cloud Computing, 2012

HSQL: A Highly Scalable Cloud Database for Multi-user Query Processing.
Proceedings of the 2012 IEEE Fifth International Conference on Cloud Computing, 2012

2011
Optimizing server placement in distributed systems in the presence of competition.
J. Parallel Distributed Comput., 2011

QoS-aware replica placement for grid computing.
Concurr. Comput. Pract. Exp., 2011

Energy-efficient Virtual Machine Provision Algorithms for Cloud Systems.
Proceedings of the IEEE 4th International Conference on Utility and Cloud Computing, 2011

Server Consolidation Algorithms with Bounded Migration Cost and Performance Guarantees in Cloud Computing.
Proceedings of the IEEE 4th International Conference on Utility and Cloud Computing, 2011

A Novel Approach for Finding Optimization Opportunities in Multicore Architectures.
Proceedings of the IEEE International Symposium on Parallel and Distributed Processing with Applications, 2011

An Empirical Study on Memory Sharing of Virtual Machines for Server Consolidation.
Proceedings of the IEEE International Symposium on Parallel and Distributed Processing with Applications, 2011

LnQ: Building High Performance Dynamic Binary Translators with Existing Compiler Backends.
Proceedings of the International Conference on Parallel Processing, 2011

SQLMR : A Scalable Database Management System for Cloud Computing.
Proceedings of the International Conference on Parallel Processing, 2011

Roystonea: A Cloud Computing System with Pluggable Component Architecture.
Proceedings of the 17th IEEE International Conference on Parallel and Distributed Systems, 2011

Job sequence scheduling for cloud computing.
Proceedings of the 2011 International Conference on Cloud and Service Computing, 2011

Energy-Aware Virtual Machine Dynamic Provision and Scheduling for Cloud Computing.
Proceedings of the IEEE International Conference on Cloud Computing, 2011

Optimal Algorithms for Cross-Rack Communication Optimization in MapReduce Framework.
Proceedings of the IEEE International Conference on Cloud Computing, 2011

2010
A High-Performance Multi-user Service System for Financial Analytics Based on Web Service and GPU Computation.
Proceedings of the IEEE International Symposium on Parallel and Distributed Processing with Applications, 2010

Replica-Aware Job Scheduling in Distributed Systems.
Proceedings of the Advances in Grid and Pervasive Computing, 5th International Conference, 2010

Metadata Partitioning for Large-Scale Distributed Storage Systems.
Proceedings of the IEEE International Conference on Cloud Computing, 2010

2009
Exploiting Spectral Reuse in Routing, Resource Allocation, and Scheduling for IEEE 802.16 Mesh Networks.
IEEE Trans. Veh. Technol., 2009

QoS-aware, access-efficient, and storage-efficient replica placement in grid environments.
J. Supercomput., 2009

Computation and communication schedule optimization for data-sharing tasks on uniprocessor.
J. Syst. Archit., 2009

Optimizing server placement for parallel I/O in switch-based clusters.
J. Parallel Distributed Comput., 2009

Route Throughput Analysis with Spectral Reuse for Multi-Rate Mobile Ad Hoc Networks.
J. Inf. Sci. Eng., 2009

Job Scheduling Techniques for Distributed Systems with Heterogeneous Processor Cardinality.
Proceedings of the 10th International Symposium on Pervasive Systems, 2009

Data-bandwidth-aware Job Scheduling in Grid and Cluster Environments.
Proceedings of the 15th IEEE International Conference on Parallel and Distributed Systems, 2009

GFS: A Distributed File System with Multi-source Data Access and Replication for Grid Computing.
Proceedings of the Advances in Grid and Pervasive Computing, 4th International Conference, 2009

2008
Optimal replica placement in hierarchical Data Grids with locality assurance.
J. Parallel Distributed Comput., 2008

A List-Based Strategy for Optimal Replica Placement in Data Grid Systems.
Proceedings of the 2008 International Conference on Parallel Processing, 2008

The Development of a Drug Discovery Virtual Screening Application on Taiwan Unigrid.
Proceedings of the Advances in Grid and Pervasive Computing, 2008

2007
Optimizing server placement in hierarchical grid environments.
J. Supercomput., 2007

An optimal scheduling algorithm for an agent-based multicast strategy on irregular networks.
J. Supercomput., 2007

Exploiting Spectral Reuse in Resource Allocation, Scheduling, and Routing for IEEE 802.16 Mesh Networks.
Proceedings of the 66th IEEE Vehicular Technology Conference, 2007

Block-Based Allocation Algorithms for FLASH Memory in Embedded Systems.
Proceedings of the Parallel Computing Technologies, 2007

Computation and communication schedule optimization for jobs with shared data.
Proceedings of the 13th International Conference on Parallel and Distributed Systems, 2007

Optimizing Server Placement for QoS Requirements in Hierarchical Grid Environments.
Proceedings of the Advances in Grid and Pervasive Computing, 2007

A High-Performance Virtual Storage System for Taiwan UniGrid.
Proceedings of the Advances in Grid and Pervasive Computing, 2007

Server Placement in the Presence of Competition.
Proceedings of the Advances in Grid and Pervasive Computing, 2007

2006
Optimizing I/O server placement for parallel I/O on switch-based irregular networks.
J. Supercomput., 2006

Parallel divide-and-conquer scheme for 2D Delaunay triangulation.
Concurr. Comput. Pract. Exp., 2006

Generalized Edge Coloring for Channel Assignment in Wireless Networks.
Proceedings of the 2006 International Conference on Parallel Processing (ICPP 2006), 2006

Optimal Placement of Replicas in Data Grid Environments with Locality Assurance.
Proceedings of the 12th International Conference on Parallel and Distributed Systems, 2006

A QoS-Aware Heuristic Algorithm for Replica Placement.
Proceedings of the 7th IEEE/ACM International Conference on Grid Computing (GRID 2006), 2006

Efficient Multi-Source Data Transfer in Data Grids.
Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2006), 2006

Optimal Replica Placement Strategy for Hierarchical Data Grid Systems.
Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2006), 2006

2005
Distributed Scheduling of Parallel I/O in the Presence of Data Replication.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Efficient Distributed Algorithms for Parallel I/O Scheduling.
Proceedings of the 11th International Conference on Parallel and Distributed Systems, 2005

I/O Processor Allocation for Mesh Cluster Computers.
Proceedings of the 11th International Conference on Parallel and Distributed Systems, 2005

Towards a Service-based Collaborative Framework for Data-intensive Grid Applications.
Proceedings of the 11th International Conference on Parallel and Distributed Systems, 2005

2004
Efficient Multiple Multicast on Heterogeneous Network of Workstations.
J. Supercomput., 2004

Efficient parallel implementations of near Delaunay triangulation with High Performance Fortran.
Concurr. Pract. Exp., 2004

2003
A Stop-or-Move Mobility model for PCS networks and its location-tracking strategies.
Comput. Commun., 2003

Placement of I/O servers to improve parallel I/O performance on switch-based clusters.
Proceedings of the 17th Annual International Conference on Supercomputing, 2003

Efficient Parallel I/O Scheduling in the Presence of Data Duplication.
Proceedings of the 32nd International Conference on Parallel Processing (ICPP 2003), 2003

2002
Locality-Preserving Dynamic Load Balancing for Data-Parallel Applications on Distributed-Memory Multiprocessors.
J. Inf. Sci. Eng., 2002

Partitioning Unstructured Meshes for Homogeneous and Heterogeneous Parallel Computing Environments.
Proceedings of the 31st International Conference on Parallel Processing (ICPP 2002), 2002

An Incremental Network Topology for Contention-free and Deadlock-free Routing.
Proceedings of the 9th International Conference on Parallel and Distributed Systems, 2002

A Parallel Divide-and-Conquer Scheme for Delaunay Triangulation.
Proceedings of the 9th International Conference on Parallel and Distributed Systems, 2002

2001
Efficient Parallel Implementations of 2D Delaunay Triangulation with High Performance Fortran.
Proceedings of the Tenth SIAM Conference on Parallel Processing for Scientific Computing, 2001

A Simple Incremental Network Topology for Wormhole Switch-Based Networks.
Proceedings of the 15th International Parallel & Distributed Processing Symposium (IPDPS-01), 2001

Parallel 2D Delaunay Triangulations in HPF and MPI.
Proceedings of the 15th International Parallel & Distributed Processing Symposium (IPDPS-01), 2001

2000
An Interleaving Transformation for Parallelizing Reductions for Distributed-Memory Parallel Machines.
J. Supercomput., 2000

Scheduling Multiple Multicast for Heterogeneous Network of Workstations with Non-Blocking Message-Passing.
Proceedings of the 2000 International Workshop on Parallel Processing, 2000

1999
CRAFT: a framework for F90/HPF compiler optimizations.
Concurr. Pract. Exp., 1999

Experience in Parallelizing Mesh Generation Code with High Performance Fortran.
Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing, 1999

A Hybrid Multithreading/Message-Passing Approach for Solving Irregular Problems on SMP Clusters.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 1999

1998
Supporting Efficieent Tree Structures for Distributed Scientific Computation.
J. Inf. Sci. Eng., 1998

Distributed Data Structure Design for Scientific Computation.
Proceedings of the 12th international conference on Supercomputing, 1998

Toward Supporting Data Parallel Programming on Clusters of Symmetric Multiprocessors.
Proceedings of the International Conference on Parallel and Distributed Systems, 1998

1997
An Algebraic Machinery for Optimizing Data Motion for HPF.
Sci. Program., 1997

VGDS: An Object-Oriented Framework for Distributed Scientific Computing.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 1997

A Framework for Parallel Tree-Based Scientific Simulations.
Proceedings of the 1997 International Conference on Parallel Processing (ICPP '97), 1997

1987
A Distributed Approach for Inferring Production Systems.
Proceedings of the 10th International Joint Conference on Artificial Intelligence. Milan, 1987


  Loading...