Hailong Yang

Orcid: 0000-0003-1101-7927

According to our database1, Hailong Yang authored at least 159 papers between 2006 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
LogSay: An Efficient Comprehension System for Log Numerical Reasoning.
IEEE Trans. Computers, July, 2024

AtRec: Accelerating Recommendation Model Training on CPUs.
IEEE Trans. Parallel Distributed Syst., June, 2024

Biocompatible Electrical and Optical Interfaces for Implantable Sensors and Devices.
Sensors, June, 2024

Knowledge structures construction and learning paths recommendation based on formal contexts.
Int. J. Mach. Learn. Cybern., April, 2024

Towards optimized tensor code generation for deep learning on sunway many-core processor.
Frontiers Comput. Sci., April, 2024

QAAS: quick accurate auto-scaling for streaming processing.
Frontiers Comput. Sci., February, 2024

Adaptive Auto-Tuning Framework for Global Exploration of Stencil Optimization on GPUs.
IEEE Trans. Parallel Distributed Syst., January, 2024

Defect Detection Scheme of Pins for Aviation Connectors Based on Image Segmentation and Improved RESNET-50.
Int. J. Image Graph., January, 2024

FDLoRA: Personalized Federated Learning of Large Language Model via Dual LoRA Tuning.
CoRR, 2024

INSPIRIT: Optimizing Heterogeneous Task Scheduling through Adaptive Priority in Task-based Runtime Systems.
CoRR, 2024

Minions: Accelerating Large Language Model Inference with Adaptive and Collective Speculative Decoding.
CoRR, 2024

Building a domain-specific compiler for emerging processors with a reusable approach.
Sci. China Inf. Sci., 2024

Tetris: Accelerating Sparse Convolution by Exploiting Memory Reuse on GPU.
Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2024

2023
A method of smoothing laser spot deformation.
Vis. Comput., December, 2023

HAOTuner: A Hardware Adaptive Operator Auto-Tuner for Dynamic Shape Tensor Compilers.
IEEE Trans. Computers, November, 2023

Ensemble learning-based nonlinear time series prediction and dynamic multi-objective optimization of organic rankine cycle (ORC) under actual driving cycle.
Eng. Appl. Artif. Intell., November, 2023

Improving Log-Based Anomaly Detection by Pre-Training Hierarchical Transformers.
IEEE Trans. Computers, September, 2023

Adapting combined tiling to stencil optimizations on sunway processor.
CCF Trans. High Perform. Comput., September, 2023

Input-Aware Sparse Tensor Storage Format Selection for Optimizing MTTKRP.
Computer, August, 2023

LogEncoder: Log-Based Contrastive Representation Learning for Anomaly Detection.
IEEE Trans. Netw. Serv. Manag., June, 2023

swSpAMM: optimizing large-scale sparse approximate matrix multiplication on Sunway Taihulight.
Frontiers Comput. Sci., 2023

LogGPT: Exploring ChatGPT for Log-Based Anomaly Detection.
CoRR, 2023

LogQA: Question Answering in Unstructured Logs.
CoRR, 2023

TrivialSpy: Identifying Software Triviality via Fine-grained and Dataflow-based Value Profiling.
Proceedings of the International Conference for High Performance Computing, 2023

EasyScale: Elastic Training with Consistent Accuracy and Improved Utilization on GPUs.
Proceedings of the International Conference for High Performance Computing, 2023

Exploiting Input Tensor Dynamics in Activation Checkpointing for Efficient Training on GPU.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

BiRFIA: Selective Binary Rewriting for Function Interception on ARM.
Proceedings of the 37th International Conference on Supercomputing, 2023

Exploiting Subgraph Similarities for Efficient Auto-tuning of Tensor Programs.
Proceedings of the 52nd International Conference on Parallel Processing, 2023

Accelerating Big Data Application by Eliminating Redundancy on Hadoop Cluster.
Proceedings of the 29th IEEE International Conference on Parallel and Distributed Systems, 2023

Efficient Deep Molecular Dynamic Model Training on Heterogeneous System.
Proceedings of the 29th IEEE International Conference on Parallel and Distributed Systems, 2023

gGMED: Towards GPU Accelerated Geometric Modeling Evaluation and Derivative Processes.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2023

LogGPT: Exploring ChatGPT for Log-Based Anomaly Detection.
Proceedings of the IEEE International Conference on High Performance Computing & Communications, 2023

Towards Optimized Hydrological Forecast Prediction of WRF-Hydro on GPU.
Proceedings of the IEEE International Conference on High Performance Computing & Communications, 2023

VClinic: A Portable and Efficient Framework for Fine-Grained Value Profilers.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2022
REVAL: Recommend Which Variables to Log With Pretrained Model and Graph Neural Network.
IEEE Trans. Netw. Serv. Manag., December, 2022

Efficient detection of silent data corruption in HPC applications with synchronization-free message verification.
J. Supercomput., 2022

Magas: matrix-based asynchronous graph analytics on shared memory systems.
J. Supercomput., 2022

Accelerating approximate matrix multiplication for near-sparse matrices on GPUs.
J. Supercomput., 2022

Input-Aware Sparse Tensor Storage Format Selection for Optimizing MTTKRP.
IEEE Trans. Computers, 2022

Analyzing the Spatiotemporal Vegetation Dynamics and Their Responses to Climate Change along the Ya'an-Linzhi Section of the Sichuan-Tibet Railway.
Remote. Sens., 2022

QoS-aware dynamic resource allocation with improved utilization and energy efficiency on GPU.
Parallel Comput., 2022

A configurational analysis of cross-channel integration.
Ind. Manag. Data Syst., 2022

Accelerating the cryo-EM structure determination in RELION on GPU cluster.
Frontiers Comput. Sci., 2022

Mimose: An Input-Aware Checkpointing Planner for Efficient Training on GPU.
CoRR, 2022

EasyScale: Accuracy-consistent Elastic Training for Deep Learning.
CoRR, 2022

FamilySeer: Towards Optimized Tensor Codes by Exploiting Computation Subgraph Similarity.
CoRR, 2022

CoGNN: Efficient Scheduling for Concurrent GNN Training on GPUs.
Proceedings of the SC22: International Conference for High Performance Computing, 2022

Adanomaly: Adaptive Anomaly Detection for System Logs with Adversarial Learning.
Proceedings of the 2022 IEEE/IFIP Network Operations and Management Symposium, 2022

PowerSpector: Towards Energy Efficiency with Calling-Context-Aware Profiling.
Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

StencilMART: Predicting Optimization Selection for Stencil Computations across GPUs.
Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

Toward accelerated stencil computation by adapting tensor core unit on GPU.
Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022

Vectorizing SpMV by Exploiting Dynamic Regular Patterns.
Proceedings of the 51st International Conference on Parallel Processing, 2022

NNLQP: A Multi-Platform Neural Network Latency Query and Prediction System with An Evolving Database.
Proceedings of the 51st International Conference on Parallel Processing, 2022

Towards Optimized Streaming Tensor Completion on multiple GPUs.
Proceedings of the 24th IEEE Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, 2022

Black-box Attacks to Log-based Anomaly Detection.
Proceedings of the 18th International Conference on Network and Service Management, 2022

2021
The Deep Learning Compiler: A Comprehensive Survey.
IEEE Trans. Parallel Distributed Syst., 2021

Towards efficient tile low-rank GEMM computation on sunway many-core processors.
J. Supercomput., 2021

swMR: A Framework for Accelerating MapReduce Applications on Sunway Taihulight.
IEEE Trans. Emerg. Top. Comput., 2021

Towards efficient canonical polyadic decomposition on sunway many-core processor.
Inf. Sci., 2021

User-level failure detection and auto-recovery of parallel programs in HPC systems.
Frontiers Comput. Sci., 2021

Adaptive watermark generation mechanism based on time series prediction for stream processing.
Frontiers Comput. Sci., 2021

Accelerating Sparse Approximate Matrix Multiplication on GPUs.
CoRR, 2021

dgQuEST: Accelerating Large Scale Quantum Circuit Simulation through Hybrid CPU-GPU Memory Hierarchies.
Proceedings of the Network and Parallel Computing, 2021

An optimized tensor completion library for multiple GPUs.
Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021

Automatic Code Generation and Optimization of Large-scale Stencil Computation on Many-core Processors.
Proceedings of the ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9, 2021

DRStencil: Exploiting Data Reuse within Low-order Stencil on GPU.
Proceedings of the 2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, 2021

csTuner: Scalable Auto-tuning Framework for Complex Stencil Computation on GPUs.
Proceedings of the IEEE International Conference on Cluster Computing, 2021

PriPro: Towards Effective Privacy Protection on Edge-Cloud System running DNN Inference.
Proceedings of the 21st IEEE/ACM International Symposium on Cluster, 2021

2020
Accelerating Sparse Cholesky Factorization on Sunway Manycore Architecture.
IEEE Trans. Parallel Distributed Syst., 2020

Massively Scaling Seismic Processing on Sunway TaihuLight Supercomputer.
IEEE Trans. Parallel Distributed Syst., 2020

HitAnomaly: Hierarchical Transformers for Anomaly Detection in System Log.
IEEE Trans. Netw. Serv. Manag., 2020

Temperature-Aware DRAM Cache Management - Relaxing Thermal Constraints in 3-D Systems.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

The Deep Learning Compiler: A Comprehensive Survey.
CoRR, 2020

Privacy for Rescue: A New Testimony Why Privacy is Vulnerable In Deep Models.
CoRR, 2020

An Optimal Recovery Approach for Liberation Codes in Distributed Storage Systems.
IEEE Access, 2020

swGBDT: Efficient Gradient Boosted Decision Tree on Sunway Many-Core Processor.
Proceedings of the Supercomputing Frontiers - 6th Asian Conference, 2020

ZeroSpy: exploring software inefficiency with redundant zeros.
Proceedings of the International Conference for High Performance Computing, 2020

SpTFS: sparse tensor format selection for MTTKRP via deep learning.
Proceedings of the International Conference for High Performance Computing, 2020

SympleGraph: distributed graph processing with precise loop-carried dependency guarantee.
Proceedings of the 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation, 2020

ODCP: Optimizing Data Caching and Placement in Distributed File System Using Erasure Coding.
Proceedings of the Network and Parallel Computing, 2020

Paddy: An Event Log Parsing Approach using Dynamic Dictionary.
Proceedings of the NOMS 2020, 2020

Extremely Low-bit Convolution Optimization for Quantized Neural Network on Modern Computer Architectures.
Proceedings of the ICPP 2020: 49th International Conference on Parallel Processing, 2020

A Gated Few-shot Learning Model For Anomaly Detection.
Proceedings of the 2020 International Conference on Information Networking, 2020

Accelerating De Novo Assembler WTDBG2 on Commodity Servers.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2020

Towards GPU Acceleration of Phonon Computation with ShengBTE.
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2020

Transfer Log-based Anomaly Detection with Pseudo Labels.
Proceedings of the 16th International Conference on Network and Service Management, 2020

swRodinia: A Benchmark Suite for Exploiting Architecture Properties of Sunway Processor.
Proceedings of the Benchmarking, Measuring, and Optimizing, 2020

2019
Distributed Graph Processing System and Processing-in-memory Architecture with Precise Loop-carried Dependency Guarantee.
ACM Trans. Comput. Syst., 2019

Improving Thread-level Parallelism in GPUs Through Expanding Register File to Scratchpad Memory.
ACM Trans. Archit. Code Optim., 2019

Accelerating in-memory transaction processing using general purpose graphics processing units.
Future Gener. Comput. Syst., 2019

A novel index system describing program runtime characteristics for workload consolidation.
Frontiers Comput. Sci., 2019

Intelligent-Unrolling: Exploiting Regular Patterns in Irregular Applications.
CoRR, 2019

Massively Scaling Seismic Processing on Sunway TaihuLight Supercomputer.
CoRR, 2019

swTVM: Exploring the Automated Compilation for Deep Learning on Sunway Architecture.
CoRR, 2019

swTensor: accelerating tensor decomposition on Sunway architecture.
CCF Trans. High Perform. Comput., 2019

FPowerTool: A Function-Level Power Profiling Tool.
IEEE Access, 2019

ADSM: Adaptive Data Scheduling Method for Hybrid Memories in Distributed System.
IEEE Access, 2019

Performance Evaluation and Analysis of Linear Algebra Kernels in the Prototype Tianhe-3 Cluster.
Proceedings of the Supercomputing Frontiers - 5th Asian Conference, 2019

Modeling Power Consumption of The Code Execution Using Performance Counters Statistics.
Proceedings of the 20th International Conference on Parallel and Distributed Computing, 2019

ASTracer: An Efficient Tracing Tool for HDFS with Adaptive Sampling.
Proceedings of the Network and Parallel Computing, 2019

Redundant loads: a software inefficiency indicator.
Proceedings of the 41st International Conference on Software Engineering, 2019

HSPP: Load-Balanced and Low-Latency File Partition and Placement Strategy on Distributed Heterogeneous Storage with Erasure Coding.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2019

Improving the Parallelism of CESM on GPU.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2019

Structure Characteristic-Aware Pruning Strategy for Convolutional Neural Networks.
Proceedings of the 21st IEEE International Conference on High Performance Computing and Communications; 17th IEEE International Conference on Smart City; 5th IEEE International Conference on Data Science and Systems, 2019

Towards a General and Efficient Linked-List Hash Table on GPUs.
Proceedings of the 21st IEEE International Conference on High Performance Computing and Communications; 17th IEEE International Conference on Smart City; 5th IEEE International Conference on Data Science and Systems, 2019

swCPD: Optimizing Canonical Polyadic Decomposition on Sunway Manycore Architecture.
Proceedings of the 21st IEEE International Conference on High Performance Computing and Communications; 17th IEEE International Conference on Smart City; 5th IEEE International Conference on Data Science and Systems, 2019

L-DAG: Enabling Loopy Workflow in Scientific Application with Automatic DAG Transformation.
Proceedings of the 2019 IEEE Intl Conf on Dependable, 2019

Generative Model for Probabilistic Inference.
Proceedings of the 2019 IEEE Intl Conf on Dependable, 2019

SMQoS: Improving Utilization and Energy Efficiency with QoS Awareness on GPUs.
Proceedings of the 2019 IEEE International Conference on Cluster Computing, 2019

Accelerating tile low-rank GEMM on sunway architecture: POSTER.
Proceedings of the 16th ACM International Conference on Computing Frontiers, 2019

2018
LWPTool: A Lightweight Profiler to Guide Data Layout Optimization.
IEEE Trans. Parallel Distributed Syst., 2018

SMGuard: A Flexible and Fine-Grained Resource Management Framework for GPUs.
IEEE Trans. Parallel Distributed Syst., 2018

SRAM- and STT-RAM-based hybrid, shared last-level cache for on-chip CPU-GPU heterogeneous architectures.
J. Supercomput., 2018

A compact filtering UWB antenna with band-notched function.
IEICE Electron. Express, 2018

Co-designed defected ground structure filter with UWB slot antenna.
IEICE Electron. Express, 2018

T1000: Mitigating the memory footprint of convolution neural networks with decomposition and re-fusion.
Future Gener. Comput. Syst., 2018

Generative Model for Heterogeneous Inference.
CoRR, 2018

BigRoots: An Effective Approach for Root-Cause Analysis of Stragglers in Big Data System.
IEEE Access, 2018

A Lightweight and Flexible Tool for Distinguishing Between Hardware Malfunctions and Program Bugs in Debugging Large-Scale Programs.
IEEE Access, 2018

Sparsing Deep Neural Network Using Semi-Discrete Matrix Decomposition.
IEEE Access, 2018

A Fine-Grained Performance Bottleneck Analysis Method for HDFS.
Proceedings of the Network and Parallel Computing, 2018

Towards Efficient SpMV on Sunway Manycore Architectures.
Proceedings of the 32nd International Conference on Supercomputing, 2018

SparkOT: Diagnosing Operation Level Inefficiency in Spark.
Proceedings of the 20th IEEE International Conference on High Performance Computing and Communications; 16th IEEE International Conference on Smart City; 4th IEEE International Conference on Data Science and Systems, 2018

Multi-role SpTRSV on Sunway Many-Core Architecture.
Proceedings of the 20th IEEE International Conference on High Performance Computing and Communications; 16th IEEE International Conference on Smart City; 4th IEEE International Conference on Data Science and Systems, 2018

Research on Asynchronous Inter-VM Communication Mechanism Based on Embedded Hypervisor.
Proceedings of the 2018 IEEE 42nd Annual Computer Software and Applications Conference, 2018

Performance Analysis and Optimization of Cyro-EM Structure Determination in RELION-2.
Proceedings of the Advanced Computer Architecture - 12th Conference, 2018

EffectFace: A Fast and Efficient Deep Neural Network Model for Face Recognition.
Proceedings of the Advanced Computer Architecture - 12th Conference, 2018

2017
iDPL: A scalable and flexible inter-continental testbed for data placement research and experiment.
Proceedings of the 2017 IEEE Symposium on Computers and Communications, 2017

PowerChief: Intelligent Power Allocation for Multi-Stage Applications to Improve Responsiveness on Power Constrained CMP.
Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

Efficient Asynchronous Communication between Virtual Machines in Embedded Systems.
Proceedings of the 19th IEEE International Conference on High Performance Computing and Communications; 15th IEEE International Conference on Smart City; 3rd IEEE International Conference on Data Science and Systems, 2017

Data Mining Based Root-Cause Analysis of Performance Bottleneck for Big Data Workload.
Proceedings of the 19th IEEE International Conference on High Performance Computing and Communications; 15th IEEE International Conference on Smart City; 3rd IEEE International Conference on Data Science and Systems, 2017

Prophet: Precise QoS Prediction on Non-Preemptive Accelerators to Improve Utilization in Warehouse-Scale Computers.
Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017

2016
Designing Future Warehouse-Scale Computers for Sirius, an End-to-End Voice and Vision Personal Assistant.
ACM Trans. Comput. Syst., 2016

Novel ultra-wideband (UWB) bandpass filter using multiple-mode resonator.
IEICE Electron. Express, 2016

A Spherical Self-Adaptive Gripper with shrinking of an elastic membrane.
Proceedings of the 2016 International Conference on Advanced Robotics and Mechatronics, 2016

VinaSC: Scalable Autodock Vina with fine-grained scheduling on heterogeneous platform.
Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine, 2016

Baymax: QoS Awareness and Increased Utilization for Non-Preemptive Accelerators in Warehouse Scale Computers.
Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, 2016

2015
POSTER: An Online Prefix-Preserving IP Address Anonymization Algorithm for Passive Measurement Systems.
Proceedings of the Security and Privacy in Communication Networks, 2015

Methods and Practices of Three-Way Decisions for Complex Problem Solving.
Proceedings of the Rough Sets and Knowledge Technology - 10th International Conference, 2015

Request Squeezer: Mitigating Tail Latency through Pruned Request Replication.
Proceedings of the 17th IEEE International Conference on High Performance Computing and Communications, 2015

Data Analysis and Synchronization on Inter-Continent Data Placement Laboratory.
Proceedings of the International Conference on Cloud Computing and Big Data, 2015

2014
iMeter: An integrated VM power model based on performance profiling.
Future Gener. Comput. Syst., 2014

Performance-Aware Based Correlated Datasets Replication Strategy.
Proceedings of the Trustworthy Computing and Services - International Conference, 2014

2013
Energy Efficiency Evaluation of Workload Execution on Intel Xeon Phi Coprocessor.
Proceedings of the Trustworthy Computing and Services, 2013

Bubble-flux: precise online QoS management for increased utilization in warehouse scale computers.
Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013

POIGEM: A Programming-Oriented Instruction Level GPU Energy Model for CUDA Program.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2013

2012
MapReduce Workload Modeling with Statistical Approach.
J. Grid Comput., 2012

Efficient Statistical Computing on Multicore and MultiGPU Systems.
Proceedings of the 15th International Conference on Network-Based Information Systems, 2012

Statistics-based Workload Modeling for MapReduce.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

UVMPM: A Unitary Approach for VM Power Metering Based on Performance Profiling.
Proceedings of the 41st International Conference on Parallel Processing Workshops, 2012

CPOP: Component Design and Parallelization towards POP Ocean Model Based on ESMF.
Proceedings of the 41st International Conference on Parallel Processing Workshops, 2012

2011
Energy Prediction for MapReduce Workloads.
Proceedings of the IEEE Ninth International Conference on Dependable, 2011

CDebugger: A scalable parallel debugger with dynamic communication topology configuration.
Proceedings of the 2011 International Conference on Cloud and Service Computing, 2011

2010
Accelerating Dock6's Amber Scoring with Graphic Processing Unit.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2010

2009
Extending network lifetime for ALLIANCES.
Comput. Commun., 2009

2008
A Novel Location Relay Selection Scheme for ALLIANCES.
IEEE Trans. Veh. Technol., 2008

2007
Alliances with Optimal Relay Selection.
Proceedings of the IEEE International Conference on Acoustics, 2007

2006
Ber Analysis of a Cooperative Random Access Protocol in Rayleigh Fading Channels.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Enhanced Collision Resolution via Cooperative Retransmissions.
Proceedings of the 40th Annual Conference on Information Sciences and Systems, 2006


  Loading...