We stand with Ukraine

We stand with Ukraine

Ting Cao

Orcid: 0000-0002-9107-013X

Affiliations:

Microsoft Research, Beijing, China

According to our database¹, Ting Cao authored at least 74 papers between 2020 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

Online presence:

on orcid.org

On csauthors.net:

Bibliography

2025

Video-in-the-Loop: Span-Grounded Long Video QA with Interleaved Reasoning.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, October, 2025

Efficient and Adaptive Diffusion Model Inference Through Lookup Table on Mobile Devices.

[BibT_eX]

[DOI]

,

,

,

,

,

,

IEEE Trans. Mob. Comput., September, 2025

AdaNav: Adaptive Reasoning with Uncertainty for Vision-Language Navigation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, September, 2025

Scaling LLM Test-Time Compute with Mobile NPU on Smartphones.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

CoRR, September, 2025

ProRe: A Proactive Reward System for GUI Agents via Reasoner-Actor Collaboration.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

CoRR, September, 2025

SparStencil: Retargeting Sparse Tensor Cores to Scientific Stencil Computations via Structured Sparsity Transformation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

CoRR, June, 2025

SeerAttention-R: Sparse Attention Adaptation for Long Reasoning.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

Hayden Kwok-Hay So

,

,

,

,

CoRR, June, 2025

SwarmThinkers: Learning Physically Consistent Atomic KMC Transitions at Scale.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

CoRR, May, 2025

Zoomer: Adaptive Image Focus Optimization for Black-box MLLM.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Saravan Rajmohan

,

,

,

,

CoRR, May, 2025

Empowering Agentic Video Analytics Systems with Video Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

CoRR, May, 2025

Scaling Up On-Device LLMs via Active-Weight Swapping Between DRAM and Flash.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

CoRR, April, 2025

MVDRAM: Enabling GeMV Execution in Unmodified DRAM for Low-Bit LLM Acceleration.

[BibT_eX]

[DOI]

,

,

Tomoya Nagatani

,

,

,

,

Shinya Takamaeda-Yamazaki

CoRR, March, 2025

BitDecoding: Unlocking Tensor Cores for Long-Context LLMs Decoding with Low-Bit KV Cache.

[BibT_eX]

[DOI]

,

,

,

,

CoRR, March, 2025

Advancing Mobile GUI Agents: A Verifier-Driven Approach to Practical Deployment.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

CoRR, March, 2025

StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition.

[BibT_eX]

[DOI]

,

,

,

,

,

,

CoRR, March, 2025

Anatomizing Deep Learning Inference in Web Browsers.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

ACM Trans. Softw. Eng. Methodol., February, 2025

LUT-DLA: Lookup Table as Efficient Extreme Low-Bit Deep Learning Accelerator.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Mohamed M. Sabry

,

CoRR, January, 2025

LeMo: Enabling LEss Token Involvement for MOre Context Fine-tuning.

[BibT_eX]

[DOI]

,

,

,

,

,

CoRR, January, 2025

Dissecting Bit-Level Scaling Laws in Quantizing Vision Generative Models.

[BibT_eX]

[DOI]

,

,

,

CoRR, January, 2025

PUDTune: Multi-Level Charging for High-Precision Calibration in Processing-Using-DRAM.

[BibT_eX]

[DOI]

,

,

,

,

Shinya Takamaeda-Yamazaki

IEEE Comput. Archit. Lett., 2025

JENGA: Enhancing LLM Long-Context Fine-tuning with Contextual Token Sparsity.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the 2025 USENIX Annual Technical Conference, 2025

Babel: A Scalable Pre-trained Model for Multi-Modal Sensing via Expandable Modality Alignment.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the 23rd ACM Conference on Embedded Networked Sensor Systems, 2025

Jigsaw: Toward Conflict-free Vectorized Stencil Computation by Tessellating Swizzled Registers.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2025

FlashFFTStencil: Bridging Fast Fourier Transforms to Memory-Efficient Stencil Computations on Tensor Core Units.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2025

LUT Tensor Core: A Software-Hardware Co-Design for LUT-Based Low-Bit LLM Inference.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the 52nd Annual International Symposium on Computer Architecture, 2025

LUT-DLA: Lookup Table as Efficient Extreme Low-Bit Deep Learning Accelerator.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Mohamed M. Sabry Aly

,

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2025

T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the Twentieth European Conference on Computer Systems, 2025

2024

HiMoDepth: Efficient Training-Free High-Resolution On-Device Depth Perception.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

IEEE Trans. Mob. Comput., May, 2024

Matryoshka: Optimization of Dynamic Diverse Quantum Chemistry Systems via Elastic Parallelism Transformation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

CoRR, 2024

Ripple: Accelerating LLM Inference on Smartphones with Correlation-Aware Neuron Management.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

CoRR, 2024

Making Every Frame Matter: Continuous Video Understanding for Large Models via Adaptive State Modeling.

[BibT_eX]

[DOI]

,

,

,

,

,

,

CoRR, 2024

SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs.

[BibT_eX]

[DOI]

,

,

,

,

Hayden Kwok-Hay So

,

,

,

CoRR, 2024

LUT Tensor Core: Lookup Table Enables Efficient Low-Bit LLM Inference Acceleration.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

CoRR, 2024

Advancing Multi-Modal Sensing Through Expandable Modality Alignment.

[BibT_eX]

[DOI]

,

,

,

,

,

,

CoRR, 2024

Exploring the Impact of In-Browser Deep Learning Inference on Quality of User Experience and Performance.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

CoRR, 2024

LoRAStencil: Low-Rank Adaptation of Stencil Computation on Tensor Cores.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the International Conference for High Performance Computing, 2024

Long Exposure: Accelerating Parameter-Efficient Fine-Tuning for LLMs under Shadowy Sparsity.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Proceedings of the International Conference for High Performance Computing, 2024

ConvStencil: Transform Stencil Computation to Matrix Multiplication on Tensor Cores.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2024

Ladder: Enabling Efficient Low-Precision Deep Learning Computing through Hardware-aware Tensor Transformation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation, 2024

LitePred: Transferable and Scalable Latency Prediction for Hardware-Aware Neural Architecture Search.

[BibT_eX]

[DOI]

,

,

,

,

Chengruidong Zhang

,

,

,

,

Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation, 2024

Poster: Design of Elastic Deep Neural Network Candidate Spaces for Inference on Diverse Devices.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the 22nd Annual International Conference on Mobile Systems, 2024

Empowering In-Browser Deep Learning Inference on Edge Through Just-In-Time Kernel Optimization.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the 22nd Annual International Conference on Mobile Systems, 2024

FlexNN: Efficient and Adaptive DNN Inference on Memory-Constrained Edge Devices.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the 30th Annual International Conference on Mobile Computing and Networking, 2024

Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Shanghang Zhang

,

Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Hybrid SLM and LLM for Edge-Cloud Collaborative Inference.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the Workshop on Edge and Mobile Foundation Models, 2024

PIM-DL: Expanding the Applicability of Commodity DRAM-PIMs for Deep Learning via Algorithm-System Co-Optimization.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

AFPQ: Asymmetric Floating Point Quantization for LLMs.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics, 2024

BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-Distillation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

Accelerating In-Browser Deep Learning Inference on Diverse Edge Clients through Just-in-Time Kernel Optimizations.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

CoRR, 2023

Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Shanghang Zhang

,

CoRR, 2023

Gamify Stencil Dwarf on Cloud for Democratizing Scientific Computing.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

CoRR, 2023

LUT-NN: Towards Unified Neural Network Inference by Table Lookup.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

CoRR, 2023

Boosting DNN Cold Inference on Edge Devices.

[BibT_eX]

[DOI]

,

,

,

,

Shangguang Wang

,

Proceedings of the 21st Annual International Conference on Mobile Systems, 2023

NN-Stretch: Automatic Neural Network Branching for Parallel Inference on Heterogeneous Multi-Processors.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Proceedings of the 21st Annual International Conference on Mobile Systems, 2023

LUT-NN: Empower Efficient Neural Network Inference with Centroid Learning and Table Lookup.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Proceedings of the 29th Annual International Conference on Mobile Computing and Networking, 2023

Efficient GPU Kernels for N: M-Sparse Weights in Deep Learning.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the Sixth Conference on Machine Learning and Systems, 2023

Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023

SpaceEvo: Hardware-Friendly Search Space Design for Efficient INT8 Inference.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

ElasticViT: Conflict-aware Supernet Training for Deploying Fast Vision Transformer on Diverse Mobile Devices.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Adam Accumulation to Reduce Memory Footprints of Both Activations and Gradients for Large-Scale DNN Training.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Proceedings of the ECAI 2023 - 26th European Conference on Artificial Intelligence, September 30 - October 4, 2023, Kraków, Poland, 2023

2022

Understanding and Optimizing Deep Learning Cold-Start Latency on Edge Devices.

[BibT_eX]

[DOI]

,

,

,

,

Shangguang Wang

,

CoRR, 2022

Hyperion: A Generic and Distributed Mobile Offloading Framework on OpenCL.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the 20th ACM Conference on Embedded Networked Sensor Systems, 2022

Turbo: Opportunistic Enhancement for Edge Video Analytics.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the 20th ACM Conference on Embedded Networked Sensor Systems, 2022

CoDL: efficient CPU-GPU co-execution for deep learning inference on mobile devices.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the MobiSys '22: The 20th Annual International Conference on Mobile Systems, Applications and Services, Portland, Oregon, 27 June 2022, 2022

MobiDepth: real-time depth estimation using on-device dual cameras.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the ACM MobiCom '22: The 28th Annual International Conference on Mobile Computing and Networking, Sydney, NSW, Australia, October 17, 2022

Romou: rapidly generate high-performance tensor kernels for mobile GPUs.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the ACM MobiCom '22: The 28th Annual International Conference on Mobile Computing and Networking, Sydney, NSW, Australia, October 17, 2022

SwiftPruner: Reinforced Evolutionary Pruning for Efficient Ad Relevance.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Proceedings of the 31st ACM International Conference on Information & Knowledge Management, 2022

2021

nn-METER: Towards Accurate Latency Prediction of DNN Inference on Diverse Edge Devices.

[BibT_eX]

[DOI]

,

,

,

,

,

GetMobile Mob. Comput. Commun., 2021

nn-Meter: towards accurate latency prediction of deep-learning model inference on diverse edge devices.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the MobiSys '21: The 19th Annual International Conference on Mobile Systems, Applications, and Services, Virtual Event, Wisconsin, USA, 24 June, 2021

AsyMo: scalable and efficient deep-learning inference on asymmetric mobile CPUs.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the ACM MobiCom '21: The 27th Annual International Conference on Mobile Computing and Networking, 2021

To Bridge Neural Network Design and Real-World Performance: A Behaviour Study for Neural Networks.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the Fourth Conference on Machine Learning and Systems, 2021

2020

Profiling and optimizing deep learning inference on mobile GPUs.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the APSys '20: 11th ACM SIGOPS Asia-Pacific Workshop on Systems, 2020

Loading...