Tianshi Chen

Orcid: 0000-0002-7601-0753

Affiliations:

Cambricon Technologies, Beijing, China
Chinese Academy of Sciences, Institute of Computing Technology, Beijing, China
University of Science and Technology of China, Department of Computer Science and Technology, Hefei, China (PhD 2010)

According to our database¹, Tianshi Chen authored at least 109 papers between 2007 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

FlashAttention-T: Towards Fully Tensorized Attention by Exploiting Tensor-Vector Parallelism.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2026

Cambricon-GS: An Accelerator for 3D Gaussian Splatting Training With Gaussian-Pixel Hybrid Parallelism.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2026

Cambricon-CIM: Enabling Energy-Efficient and Error-Resilient Analog CIM Acceleration via Reformation of Coding Bases.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2026

Hardwired-Neuron Language Processing Units as General-Purpose Cognitive Substrates.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2026

2025

VariPar: Variation-Aware Workload Partitioning in Chiplet-Based DNN Accelerators.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., December, 2025

SaaP: Rearchitect SoC-as-a-Processor to Orchestrate Hardware Heterogeneity.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., October, 2025

Hardwired-Neurons Language Processing Units as General-Purpose Cognitive Substrates.

[BibT_eX]

[DOI]

CoRR, August, 2025

QiMeng-Xpiler: Transcompiling Tensor Programs for Deep Learning Systems with a Neural-Symbolic Approach.

[BibT_eX]

[DOI]

CoRR, May, 2025

Efficient and Fast High-Performance Library Generation for Deep Learning Accelerators.

[BibT_eX]

[DOI]

IEEE Trans. Computers, January, 2025

QiMeng-Xpiler: Transcompiling Tensor Programs for Deep Learning Systems with a Neural-Symbolic Approach.

[BibT_eX]

[DOI]

Proceedings of the 19th USENIX Symposium on Operating Systems Design and Implementation, 2025

Cambricon-SR: An Accelerator for Neural Scene Representation with Sparse Encoding Table.

[BibT_eX]

[DOI]

Proceedings of the 52nd Annual International Symposium on Computer Architecture, 2025

Cambricon-DG: An Accelerator for Redundant-Free Dynamic Graph Neural Networks Based on Nonlinear Isolation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2025

2024

Real-Time Robust Video Object Detection System Against Physical-World Adversarial Attacks.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., January, 2024

Cambricon-LLM: A Chiplet-Based Hybrid Architecture for On-Device Inference of 70B LLM.

[BibT_eX]

[DOI]

Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024

Cambricon-M: A Fibonacci-Coded Charge-Domain SRAM-Based CIM Accelerator for DNN Inference.

[BibT_eX]

[DOI]

Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024

Cambricon-C: Efficient 4-Bit Matrix Unit via Primitivization.

[BibT_eX]

[DOI]

Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024

Cambricon-D: Full-Network Differential Acceleration for Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

Automated CPU Design by Learning from Input-Output Examples.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

Ex3: Automatic Novel Writing by Extracting, Excelsior and Expanding.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

Pushing the Limits of Machine Design: Automated CPU Design with AI.

[BibT_eX]

[DOI]

CoRR, 2023

Cambricon-R: A Fully Fused Accelerator for Real-Time Learning of Neural Scene Representation.

[BibT_eX]

[DOI]

Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

Cambricon-U: A Systolic Random Increment Memory Architecture for Unary Computing.

[BibT_eX]

[DOI]

Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

Heron: Automatically Constrained High-Performance Library Generation for Deep Learning Accelerators.

[BibT_eX]

[DOI]

Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2022

Rethinking the Importance of Quantization Bias, Toward Full Low-Bit Training.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2022

Enabling One-Size-Fits-All Compilation Optimization for Inference Across Machine Learning Computers.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2022

Real-Time Robust Video Object Detection System Against Physical-World Adversarial Attacks.

[BibT_eX]

[DOI]

CoRR, 2022

Cambricon-P: A Bitflow Architecture for Arbitrary Precision Computing.

[BibT_eX]

[DOI]

Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

2021

Distilling Object Detectors with Feature Richness.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Cambricon-Q: A Hybrid Architecture for Efficient Training.

[BibT_eX]

[DOI]

Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

2020

ParaML: A Polyvalent Multicore Accelerator for Machine Learning.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

Machine Learning Computers With Fractal von Neumann Architecture.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2020

Addressing Irregularity in Sparse Neural Networks Through a Cooperative Software/Hardware Approach.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2020

DWM: A Decomposable Winograd Method for Convolution Acceleration.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019

Addressing Sparsity in Deep Neural Networks.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2019

Cambricon-F: machine learning computers with fractal von neumann architecture.

[BibT_eX]

[DOI]

Proceedings of the 46th International Symposium on Computer Architecture, 2019

2018

An Instruction Set Architecture for Machine Learning.

[BibT_eX]

[DOI]

ACM Trans. Comput. Syst., 2018

BenchIP: Benchmarking Intelligence Processors.

[BibT_eX]

[DOI]

J. Comput. Sci. Technol., 2018

Cambricon-S: Addressing Irregularity in Sparse Neural Networks through A Cooperative Software/Hardware Approach.

[BibT_eX]

[DOI]

Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

2017

Secure Outsourcing of Virtual Appliance.

[BibT_eX]

[DOI]

IEEE Trans. Cloud Comput., 2017

An Accelerator for High Efficient Vision Processing.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2017

DaDianNao: A Neural Network Supercomputer.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2017

A survey of neural network accelerators.

[BibT_eX]

[DOI]

Frontiers Comput. Sci., 2017

BENCHIP: Benchmarking Intelligence Processors.

[BibT_eX]

[DOI]

CoRR, 2017

Stealth-ACK: stealth transmissions of NoC acknowledgements.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2017

TuNao: A High-Performance and Energy-Efficient Reconfigurable Accelerator for Graph Processing.

[BibT_eX]

[DOI]

Proceedings of the 17th IEEE/ACM International Symposium on Cluster, 2017

2016

IMR: High-Performance Low-Cost Multi-Ring NoCs.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2016

Accelerating Architectural Simulation Via Statistical Techniques: A Survey.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2016

Iterative Point Matching via multi-direction geometric serialization and reliable correspondence selection.

[BibT_eX]

[DOI]

Neurocomputing, 2016

Geodesic-like features for point matching.

[BibT_eX]

[DOI]

Deheng Qian

Tianshi Chen

Hong Qiao

Neurocomputing, 2016

DianNao family: energy-efficient hardware accelerators for machine learning.

[BibT_eX]

[DOI]

Commun. ACM, 2016

Cambricon-X: An accelerator for sparse neural networks.

[BibT_eX]

[DOI]

Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

Cambricon: An Instruction Set Architecture for Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

2015

FreeRider: Non-Local Adaptive Network-on-Chip Routing with Packet-Carried Propagation of Congestion Information.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2015

Robust Design Space Modeling.

[BibT_eX]

[DOI]

ACM Trans. Design Autom. Electr. Syst., 2015

A Small-Footprint Accelerator for Large-Scale Neural Networks.

[BibT_eX]

[DOI]

ACM Trans. Comput. Syst., 2015

On the Easiest and Hardest Fitness Functions.

[BibT_eX]

[DOI]

Jun He

Tianshi Chen

Xin Yao

IEEE Trans. Evol. Comput., 2015

Statistical Performance Comparisons of Computers.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2015

A High-Throughput Neural Network Accelerator.

[BibT_eX]

[DOI]

IEEE Micro, 2015

Deterministic Replay: A Survey.

[BibT_eX]

[DOI]

ACM Comput. Surv., 2015

Neuromorphic accelerators: a comparison between neuroscience and machine-learning approaches.

[BibT_eX]

[DOI]

Zidong Du

Daniel D. Ben-Dayan Rubin

Proceedings of the 48th International Symposium on Microarchitecture, 2015

ShiDianNao: shifting vision processing closer to the sensor.

[BibT_eX]

[DOI]

Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

HERMES: a fast cross-ISA binary translator with post-optimization.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2015

PuDianNao: A Polyvalent Machine Learning Accelerator.

[BibT_eX]

[DOI]

Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, 2015

2014

Pre-Silicon Bug Forecast.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2014

Prevention from Soft Errors via Architecture Elasticity.

[BibT_eX]

[DOI]

J. Comput. Sci. Technol., 2014

An Elastic Architecture Adaptable to Various Application Scenarios.

[BibT_eX]

[DOI]

J. Comput. Sci. Technol., 2014

DaDianNao: A Machine-Learning Supercomputer.

[BibT_eX]

[DOI]

Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

ArchRanker: A ranking approach to design space exploration.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning.

[BibT_eX]

[DOI]

Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2014

2013

Effective and efficient microprocessor design space exploration using unlabeled design configurations.

[BibT_eX]

[DOI]

ACM Trans. Intell. Syst. Technol., 2013

Motion Estimation Without Integer-Pel Search.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2013

Scaling Up Estimation of Distribution Algorithms for Continuous Optimization.

[BibT_eX]

[DOI]

IEEE Trans. Evol. Comput., 2013

LDet: Determinizing Asynchronous Transfer for Postsilicon Debugging.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2013

Deterministic Replay Using Global Clock.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2013

Microarchitectural design space exploration made fast.

[BibT_eX]

[DOI]

Microprocess. Microsystems, 2013

2012

Program Regularization in Memory Consistency Verification.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2012

A large population size can be unhelpful in evolutionary algorithms.

[BibT_eX]

[DOI]

Theor. Comput. Sci., 2012

Linear Time Memory Consistency Verification.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2012

Pose Estimation for 3D Workpiece Grasping in Industrial Environment Based on Evolutionary Algorithm.

[BibT_eX]

[DOI]

J. Intell. Robotic Syst., 2012

Global Adaptive Routing Algorithm Without Additional Congestion Propagation Network

[BibT_eX]

[DOI]

CoRR, 2012

RepTFD: Replay Based Transient Fault Detection

[BibT_eX]

[DOI]

CoRR, 2012

A General Analysis of Evolutionary Algorithms for Hard and Easy Fitness Functions

[BibT_eX]

[DOI]

Jun He

Tianshi Chen

CoRR, 2012

An Elastic Architecture Adaptable to Millions of Application Scenarios.

[BibT_eX]

[DOI]

Proceedings of the Network and Parallel Computing, 9th IFIP International Conference, 2012

BenchNN: On the broad potential application scope of hardware neural network accelerators.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Symposium on Workload Characterization, 2012

Performance Prediction for Reconfigurable Processor.

[BibT_eX]

[DOI]

Proceedings of the 14th IEEE International Conference on High Performance Computing and Communication & 9th IEEE International Conference on Embedded Software and Systems, 2012

Statistical performance comparisons of computers.

[BibT_eX]

[DOI]

Proceedings of the 18th IEEE International Symposium on High Performance Computer Architecture, 2012

2011

The Godson Processors: Its Research, Development, and Contributions.

[BibT_eX]

[DOI]

J. Comput. Sci. Technol., 2011

Population Scalability Analysis of Abstract Population-based Random Search: Spectral Radius

[BibT_eX]

[DOI]

Jun He

Tianshi Chen

CoRR, 2011

Efficient Deterministic Replay Using Complete Race Detection

[BibT_eX]

[DOI]

CoRR, 2011

The Impact of Mutation Rate on the Computation Time of Evolutionary Dynamic Optimization

[BibT_eX]

[DOI]

CoRR, 2011

Brief announcement: program regularization in verifying memory consistency.

[BibT_eX]

[DOI]

Proceedings of the SPAA 2011: Proceedings of the 23rd Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2011

Effective and Efficient Microprocessor Design Space Exploration Using Unlabeled Design Configurations.

[BibT_eX]

[DOI]

Proceedings of the IJCAI 2011, 2011

Video Encoding without Integer-Pel Motion Estimation.

[BibT_eX]

[DOI]

Proceedings of the 2011 Data Compression Conference (DCC 2011), 2011

Empirical design bugs prediction for verification.

[BibT_eX]

[DOI]

Proceedings of the Design, Automation and Test in Europe, 2011

Towards Maximizing the Area Under the ROC Curve for Multi-Class Classification Problems.

[BibT_eX]

[DOI]

Ke Tang

Rui Wang

Tianshi Chen

Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, 2011

2010

Analysis of Computational Time of Simple Estimation of Distribution Algorithms.

[BibT_eX]

[DOI]

IEEE Trans. Evol. Comput., 2010

Choosing selection pressure for wide-gap problems.

[BibT_eX]

[DOI]

Theor. Comput. Sci., 2010

LReplay: a pending period based deterministic replay scheme.

[BibT_eX]

[DOI]

Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010

Estimating design quality of digital systems via machine learning.

[BibT_eX]

[DOI]

Proceedings of the 17th IEEE International Conference on Electronics, 2010

On-the-Fly Reduction of Stimuli for Functional Verification.

[BibT_eX]

[DOI]

Proceedings of the 19th IEEE Asian Test Symposium, 2010

2009

A New Approach for Analyzing Average Time Complexity of Population-Based Evolutionary Algorithms on Unimodal Problems.

[BibT_eX]

[DOI]

IEEE Trans. Syst. Man Cybern. Part B, 2009

Empirical analysis of evolutionary algorithms with immigrants schemes for dynamic optimization.

[BibT_eX]

[DOI]

Memetic Comput., 2009

Global Clock, Physical Time Order and Pending Period Analysis in Multiprocessor Systems

[BibT_eX]

[DOI]

Yunji Chen

Tianshi Chen

Weiwu Hu

CoRR, 2009

Fast complete memory consistency verification.

[BibT_eX]

[DOI]

Proceedings of the 15th International Conference on High-Performance Computer Architecture (HPCA-15 2009), 2009

A multi-objective approach to Redundancy Allocation Problem in parallel-series systems.

[BibT_eX]

[DOI]

Proceedings of the IEEE Congress on Evolutionary Computation, 2009

Rigorous time complexity analysis of Univariate Marginal Distribution Algorithm with margins.

[BibT_eX]

[DOI]

Proceedings of the IEEE Congress on Evolutionary Computation, 2009

A stochastic method for controlling the scaling parameters of Cauchy mutation in fast evolutionary programming.

[BibT_eX]

[DOI]

Yunji Chen

Ke Tang

Tianshi Chen

Proceedings of the IEEE Congress on Evolutionary Computation, 2009

When is an estimation of distribution algorithm better than an evolutionary algorithm?

[BibT_eX]

[DOI]

Proceedings of the IEEE Congress on Evolutionary Computation, 2009

2007

On the analysis of average time complexity of estimation of distribution algorithms.

[BibT_eX]

[DOI]

Proceedings of the IEEE Congress on Evolutionary Computation, 2007

Tianshi Chen

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...