Dehao Chen

Orcid: 0000-0001-5849-7492

According to our database1, Dehao Chen authored at least 24 papers between 2006 and 2023.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2023
Quaternion Extreme Learning Machine Based on Real Augmented Representation.
IEEE Signal Process. Lett., 2023

Overlap Communication with Dependent Computation via Decomposition in Large Deep Learning Models.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2022
LaMDA: Language Models for Dialog Applications.
CoRR, 2022

News Recommendation Model Based on Long-Term and Short-Term Interests.
Proceedings of the 5th International Conference on Big Data Technologies, 2022

2021
GSPMD: General and Scalable Parallelization for ML Computation Graphs.
CoRR, 2021

Exploring the Limits of Concurrency in ML Training on Google TPUS.
Proceedings of Machine Learning and Systems 2021, 2021

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding.
Proceedings of the 9th International Conference on Learning Representations, 2021

2020
Exploring the limits of Concurrency in ML Training on Google TPUs.
CoRR, 2020

Automatic Cross-Replica Sharding of Weight Update in Data-Parallel Training.
CoRR, 2020


2019
MLPerf Training Benchmark.
CoRR, 2019

Scale MLPerf-0.6 models on Google TPU-v3 Pods.
CoRR, 2019

Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling.
CoRR, 2019

GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

2018
Image Classification at Supercomputer Scale.
CoRR, 2018

GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism.
CoRR, 2018

2016
AutoFDO: automatic feedback-directed optimization for warehouse-scale applications.
Proceedings of the 2016 International Symposium on Code Generation and Optimization, 2016

2014
Hardware Counted Profile-Guided Optimization.
CoRR, 2014

2013
Taming Hardware Event Samples for Precise and Versatile Feedback Directed Optimizations.
IEEE Trans. Computers, 2013

2012
Providing Source Code Level Portability Between CPU and GPU with MapCG.
J. Comput. Sci. Technol., 2012

CUDA-Zero: a framework for porting shared memory GPU applications to multi-GPUs.
Sci. China Inf. Sci., 2012

2010
Taming hardware event samples for FDO compilation.
Proceedings of the CGO 2010, 2010

MapCG: writing parallel program portable between CPU and GPU.
Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

2006
Tree partition based parallel frequent pattern mining on shared memory systems.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006


  Loading...