Kai Lu
Orcid: 0000-0003-2284-7897Affiliations:
- National University of Defense Technology, College of Computer Science, National Key Laboratory of Parallel and Distributed Processing, Changsha, China
- National University of Defense Technology, Changsha, China (PhD 1999)
According to our database1,
Kai Lu
authored at least 172 papers
between 2003 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2025
Oases: Efficient Large-Scale Model Training on Commodity Servers via Overlapped and Automated Tensor Model Parallelism.
IEEE Trans. Parallel Distributed Syst., September, 2025
IEEE Trans. Computers, June, 2025
ACM Trans. Archit. Code Optim., June, 2025
IEEE J. Solid State Circuits, May, 2025
AutoPipe-H: A Heterogeneity-Aware Data-Paralleled Pipeline Approach on Commodity GPU Servers.
IEEE Trans. Computers, April, 2025
IEEE Trans. Inf. Forensics Secur., 2025
SIAM J. Sci. Comput., 2025
GraphCSR: A Space and Time-Efficient Sparse Matrix Representation for Web-scale Graph Processing.
Proceedings of the ACM on Web Conference 2025, 2025
Proceedings of the ACM on Web Conference 2025, 2025
DPGA-TextSyn: Differentially Private Genetic Algorithm for Synthetic Text Generation.
Proceedings of the Findings of the Association for Computational Linguistics, 2025
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025
Can Large Language Models Derive High-Level Cognition from Low-Level and Fragmented Foundational Information?
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025
2024
DELTA: Memory-Efficient Training via Dynamic Fine-Grained Recomputation and Swapping.
ACM Trans. Archit. Code Optim., December, 2024
MST: Topology-Aware Message Aggregation for Exascale Graph Processing of Traversal-Centric Algorithms.
ACM Trans. Archit. Code Optim., December, 2024
IEEE Trans. Parallel Distributed Syst., August, 2024
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., July, 2024
CCF Trans. High Perform. Comput., June, 2024
J. Supercomput., May, 2024
Softw. Test. Verification Reliab., March, 2024
Frontiers Inf. Technol. Electron. Eng., March, 2024
IEEE Trans. Parallel Distributed Syst., February, 2024
IEEE Trans. Inf. Forensics Secur., 2024
Proceedings of the International Conference for High Performance Computing, 2024
Proceedings of the 27th International Symposium on Research in Attacks, 2024
Proceedings of the 31st Annual Network and Distributed System Security Symposium, 2024
Fully Decentralized Data Distribution for Exascale-HPC: End of the Provider-Demander Matching Puzzle.
Proceedings of the IEEE International Conference on Cluster Computing, 2024
A Fully Integrated 400MHz Band Transceiver with a 96Mbps 16QAM Transmitter and a Phase Tracking Receiver in 40-nm CMOS.
Proceedings of the IEEE Asian Solid-State Circuits Conference, 2024
2023
Accelerating GNN Training by Adapting Large Graphs to Distributed Heterogeneous Architectures.
IEEE Trans. Computers, December, 2023
IEEE Trans. Software Eng., April, 2023
Compressed Collective Sparse-Sketch for Distributed Data-Parallel Training of Deep Learning Models.
IEEE J. Sel. Areas Commun., April, 2023
Inspecting End-to-End Encrypted Communication Differentially for the Efficient Identification of Harmful Media.
IEEE Trans. Inf. Forensics Secur., 2023
Programming bare-metal accelerators with heterogeneous threading models: a case study of Matrix-3000.
Frontiers Inf. Technol. Electron. Eng., 2023
Free energy perturbation-based large-scale virtual screening for effective drug discovery against COVID-19.
Int. J. High Perform. Comput. Appl., 2023
Automated Tensor Model Parallelism with Overlapped Communication for Efficient Foundation Model Training.
CoRR, 2023
Leveraging Free Labels to Power up Heterophilic Graph Learning in Weakly-Supervised Settings: An Empirical Study.
Proceedings of the Machine Learning and Knowledge Discovery in Databases: Research Track, 2023
VulHawk: Cross-architecture Vulnerability Detection with Entropy-based Binary Code Search.
Proceedings of the 30th Annual Network and Distributed System Security Symposium, 2023
Proceedings of the 37th International Conference on Supercomputing, 2023
ReForker: Patching x86_64 Binaries with the Fork Server to Improve Hardware-Assisted Fuzzing through Trampoline-Based Binary Rewriting.
Proceedings of the 2nd International Conference on Networks, 2023
2022
IEEE Trans. Parallel Distributed Syst., 2022
IEEE Trans. Computers, 2022
Frontiers Inf. Technol. Electron. Eng., 2022
TEES: topology-aware execution environment service for fast and agile application deployment in HPC.
Frontiers Inf. Technol. Electron. Eng., 2022
J. Comput. Sci. Technol., 2022
Towards Defense Against Adversarial Attacks on Graph Neural Networks via Calibrated Co-Training.
J. Comput. Sci. Technol., 2022
CCF Trans. High Perform. Comput., 2022
BioNet: a large-scale and heterogeneous biological network model for interaction prediction with graph convolution.
Briefings Bioinform., 2022
Appl. Intell., 2022
Game of Hide-and-Seek: Exposing Hidden Interfaces in Embedded Web Applications of IoT Devices.
Proceedings of the WWW '22: The ACM Web Conference 2022, Virtual Event, Lyon, France, April 25, 2022
vGraph: Memory-Efficient Multicore Graph Processing for Traversal-Centric Algorithms.
Proceedings of the SC22: International Conference for High Performance Computing, 2022
Proceedings of the SC22: International Conference for High Performance Computing, 2022
Proceedings of the 29th Annual Network and Distributed System Security Symposium, 2022
Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022
XTree: Traversal-Based Partitioning for Extreme-Scale Graph Processing on Supercomputers.
Proceedings of the 38th IEEE International Conference on Data Engineering, 2022
Proceedings of the Algorithms and Architectures for Parallel Processing, 2022
Full-credit Flow Control: A Novel Technique to Implement Deadlock-free Adaptive Routing.
Proceedings of the 2022 Design, Automation & Test in Europe Conference & Exhibition, 2022
2021
IEEE Trans. Computers, 2021
J. Comput. Sci. Technol., 2021
CoRR, 2021
QSIM: A novel approach to node proximity estimation based on Discrete-time quantum walk.
Appl. Intell., 2021
Proceedings of the High Performance Computing - 36th International Conference, 2021
Sparse Matrix-Vector Multiplication Cache Performance Evaluation and Design Exploration.
Proceedings of the 29th International Symposium on Modeling, 2021
2020
ACM Trans. Web, 2020
High-Scalable Collaborated Parallel Framework for Large-Scale Molecular Dynamic Simulation on Tianhe-2 Supercomputer.
IEEE ACM Trans. Comput. Biol. Bioinform., 2020
J. Inf. Secur. Appl., 2020
Int. J. Intell. Syst., 2020
CoRR, 2020
CCF Trans. High Perform. Comput., 2020
EcoFuzz: Adaptive Energy-Saving Greybox Fuzzing as a Variant of the Adversarial Multi-Armed Bandit.
Proceedings of the 29th USENIX Security Symposium, 2020
Proceedings of the 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation, 2020
Proceedings of 2020 International Conference on Medical Imaging and Computer-Aided Diagnosis, 2020
Representation Learning with Multiple Lipschitz-Constrained Alignments on Partially-Labeled Cross-Domain Data.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020
2019
A Cost-Efficient Router Architecture for HPC Inter-Connection Networks: Design and Implementation.
IEEE Trans. Parallel Distributed Syst., 2019
IEEE Trans. Knowl. Data Eng., 2019
Frontiers Comput. Sci., 2019
MapEff: An Effective Graph Isomorphism Agorithm Based on the Discrete-Time Quantum Walk.
Entropy, 2019
The Vulnerabilities of Graph Convolutional Networks: Stronger Attacks and Defensive Techniques.
CoRR, 2019
Concurr. Comput. Pract. Exp., 2019
Ad Hoc Sens. Wirel. Networks, 2019
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019
Evolutionarily Learning Multi-Aspect Interactions and Influences from Network Structure and Node Content.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019
Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques, 2019
2018
A Differentiated Caching Mechanism to Enable Primary Storage Deduplication in Clouds.
IEEE Trans. Parallel Distributed Syst., 2018
IEEE Trans. Knowl. Data Eng., 2018
Versionized process based on non-volatile random-access memory for fine-grained fault tolerance.
Frontiers Inf. Technol. Electron. Eng., 2018
Frontiers Inf. Technol. Electron. Eng., 2018
J. Comput. Sci. Technol., 2018
Marking Vertices to Find Graph Isomorphism Mapping Based on Continuous-Time Quantum Walk.
Entropy, 2018
Constructing a database for the relations between CNV and human genetic diseases via systematic text mining.
BMC Bioinform., 2018
Proceedings of the 2018 World Wide Web Conference on World Wide Web, 2018
Proceedings of the Wireless Algorithms, Systems, and Applications, 2018
Proceedings of the 5th International Conference on Systems and Informatics, 2018
One Size Does Not Fit All: The Case for Chunking Configuration in Backup Deduplication.
Proceedings of the 18th IEEE/ACM International Symposium on Cluster, 2018
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018
2017
Frontiers Inf. Technol. Electron. Eng., 2017
HPDedup: A Hybrid Prioritized Data Deduplication Mechanism for Primary Storage in the Cloud.
CoRR, 2017
Sci. China Inf. Sci., 2017
Community detection in attributed networks based on heterogeneous vertex interactions.
Appl. Intell., 2017
Proceedings of the 13th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, 2017
How Double-Fetch Situations turn into Double-Fetch Vulnerabilities: A Study of Double Fetches in the Linux Kernel.
Proceedings of the 26th USENIX Security Symposium, 2017
Embedding-based Representation of Categorical Data by Hierarchical Value Coupling Learning.
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2017
Proceedings of the Intelligent Data Engineering and Automated Learning - IDEAL 2017 - 18th International Conference, Guilin, China, October 30, 2017
Proceedings of the 2017 IEEE International Conference on Web Services, 2017
Proceedings of the Neural Information Processing - 24th International Conference, 2017
Proceedings of the 37th IEEE International Conference on Distributed Computing Systems Workshops, 2017
Proceedings of the Computing Frontiers Conference, 2017
Proceedings of the 17th IEEE/ACM International Symposium on Cluster, 2017
2016
StageFS: A Parallel File System Optimizing Metadata Performance for SSD Based Clusters.
Proceedings of the 2016 IEEE Trustcom/BigDataSE/ISPA, 2016
Proceedings of the 2016 IEEE Trustcom/BigDataSE/ISPA, 2016
Application-Based Coarse-Grained Incremental Checkpointing Based on Non-volatile Memory.
Proceedings of the Network and Parallel Computing, 2016
Alleviating network congestion for HPC clusters with fat-tree interconnection leveraging software-defined networking.
Proceedings of the 3rd International Conference on Systems and Informatics, 2016
mAMBER: A CPU/MIC collaborated parallel framework for AMBER on Tianhe-2 supercomputer.
Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine, 2016
Proceedings of the Advanced Data Mining and Applications - 12th International Conference, 2016
2015
Sci. Program., 2015
J. Comput. Sci. Technol., 2015
Int. J. Parallel Program., 2015
IEICE Trans. Inf. Syst., 2015
Proceedings of the 23rd Euromicro International Conference on Parallel, 2015
Identifying Repeated Interleavings to Improve the Efficiency of Concurrency Bug Detection.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2015
Proceedings of the 17th IEEE International Conference on High Performance Computing and Communications, 2015
2014
IEEE Trans. Parallel Distributed Syst., 2014
Frontiers Comput. Sci., 2014
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2014
Approximate Maximum Common Sub-graph Isomorphism Based on Discrete-Time Quantum Walk.
Proceedings of the 22nd International Conference on Pattern Recognition, 2014
Online Taint Propagation Analysis with Precise Pointer-to Analysis for Detecting Bugs in Binaries.
Proceedings of the 2014 IEEE International Conference on High Performance Computing and Communications, 2014
Proceedings of the IEEE 12th International Conference on Dependable, 2014
2013
Quantum Inf. Process., 2013
Int. J. Next Gener. Comput., 2013
IEICE Trans. Inf. Syst., 2013
IEICE Trans. Inf. Syst., 2013
Comput. Networks, 2013
Proceedings of the 2013 13th International Conference on Quality Software, 2013
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2013
Proceedings of the Fifth International Conference on Digital Image Processing, 2013
Proceedings of the Advanced Parallel Processing Technologies, 2013
2012
J. Parallel Distributed Comput., 2012
Proceedings of the Network and Parallel Computing, 9th IFIP International Conference, 2012
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012
Self-adaptive management of the sleep depths of idle nodes in large scale systems to balance between energy consumption and response times.
Proceedings of the 4th IEEE International Conference on Cloud Computing Technology and Science Proceedings, 2012
Proceedings of the Asia-Pacific Workshop on Systems, 2012
2011
J. Comput. Sci. Technol., 2011
2010
Proceedings of the 29th Annual ACM Symposium on Principles of Distributed Computing, 2010
Proceedings of the 2010 IEEE International Conference on Cluster Computing, 2010
2009
Mixing Concrete and Symbolic Execution to Improve the Performance of Dynamic Test Generation.
Proceedings of the NTMS 2009, 2009
Proceedings of the International Conference on Networked Computing and Advanced Information Management, 2009
Proceedings of the Information and Communications Security, 11th International Conference, 2009
Proceedings of the 18th ACM International Symposium on High Performance Distributed Computing, 2009
Decoupling Dynamic Test Generation from Specific Operating System Details Based on Whole System Virtual Machine.
Proceedings of the Fourth International Conference on Frontier of Computer Science and Technology, 2009
Proceedings of the 2009 IEEE International Conference on Cluster Computing, August 31, 2009
Proceedings of the Advanced Parallel Processing Technologies, 8th International Symposium, 2009
2003
Proceedings of the 2003 IEEE International Conference on Cluster Computing (CLUSTER 2003), 2003
Proceedings of the Advanced Parallel Programming Technologies, 5th International Workshop, 2003