Ting Cao

Orcid: 0000-0002-9107-013X

Affiliations:
  • Microsoft Research, Beijing, China


According to our database1, Ting Cao authored at least 74 papers between 2020 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Video-in-the-Loop: Span-Grounded Long Video QA with Interleaved Reasoning.
CoRR, October, 2025

Efficient and Adaptive Diffusion Model Inference Through Lookup Table on Mobile Devices.
IEEE Trans. Mob. Comput., September, 2025

AdaNav: Adaptive Reasoning with Uncertainty for Vision-Language Navigation.
CoRR, September, 2025

Scaling LLM Test-Time Compute with Mobile NPU on Smartphones.
CoRR, September, 2025

ProRe: A Proactive Reward System for GUI Agents via Reasoner-Actor Collaboration.
CoRR, September, 2025

SparStencil: Retargeting Sparse Tensor Cores to Scientific Stencil Computations via Structured Sparsity Transformation.
CoRR, June, 2025

SeerAttention-R: Sparse Attention Adaptation for Long Reasoning.
CoRR, June, 2025

SwarmThinkers: Learning Physically Consistent Atomic KMC Transitions at Scale.
CoRR, May, 2025

Zoomer: Adaptive Image Focus Optimization for Black-box MLLM.
CoRR, May, 2025

Empowering Agentic Video Analytics Systems with Video Language Models.
CoRR, May, 2025

Scaling Up On-Device LLMs via Active-Weight Swapping Between DRAM and Flash.
CoRR, April, 2025

MVDRAM: Enabling GeMV Execution in Unmodified DRAM for Low-Bit LLM Acceleration.
CoRR, March, 2025

BitDecoding: Unlocking Tensor Cores for Long-Context LLMs Decoding with Low-Bit KV Cache.
CoRR, March, 2025

Advancing Mobile GUI Agents: A Verifier-Driven Approach to Practical Deployment.
CoRR, March, 2025

StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition.
CoRR, March, 2025

Anatomizing Deep Learning Inference in Web Browsers.
ACM Trans. Softw. Eng. Methodol., February, 2025

LUT-DLA: Lookup Table as Efficient Extreme Low-Bit Deep Learning Accelerator.
CoRR, January, 2025

LeMo: Enabling LEss Token Involvement for MOre Context Fine-tuning.
CoRR, January, 2025

Dissecting Bit-Level Scaling Laws in Quantizing Vision Generative Models.
CoRR, January, 2025

PUDTune: Multi-Level Charging for High-Precision Calibration in Processing-Using-DRAM.
IEEE Comput. Archit. Lett., 2025

JENGA: Enhancing LLM Long-Context Fine-tuning with Contextual Token Sparsity.
Proceedings of the 2025 USENIX Annual Technical Conference, 2025

Babel: A Scalable Pre-trained Model for Multi-Modal Sensing via Expandable Modality Alignment.
Proceedings of the 23rd ACM Conference on Embedded Networked Sensor Systems, 2025

Jigsaw: Toward Conflict-free Vectorized Stencil Computation by Tessellating Swizzled Registers.
Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2025

FlashFFTStencil: Bridging Fast Fourier Transforms to Memory-Efficient Stencil Computations on Tensor Core Units.
Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2025

LUT Tensor Core: A Software-Hardware Co-Design for LUT-Based Low-Bit LLM Inference.
Proceedings of the 52nd Annual International Symposium on Computer Architecture, 2025

LUT-DLA: Lookup Table as Efficient Extreme Low-Bit Deep Learning Accelerator.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2025

T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge.
Proceedings of the Twentieth European Conference on Computer Systems, 2025

2024
HiMoDepth: Efficient Training-Free High-Resolution On-Device Depth Perception.
IEEE Trans. Mob. Comput., May, 2024

Matryoshka: Optimization of Dynamic Diverse Quantum Chemistry Systems via Elastic Parallelism Transformation.
CoRR, 2024

Ripple: Accelerating LLM Inference on Smartphones with Correlation-Aware Neuron Management.
CoRR, 2024

Making Every Frame Matter: Continuous Video Understanding for Large Models via Adaptive State Modeling.
CoRR, 2024

SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs.
CoRR, 2024

LUT Tensor Core: Lookup Table Enables Efficient Low-Bit LLM Inference Acceleration.
CoRR, 2024

Advancing Multi-Modal Sensing Through Expandable Modality Alignment.
CoRR, 2024

Exploring the Impact of In-Browser Deep Learning Inference on Quality of User Experience and Performance.
CoRR, 2024

LoRAStencil: Low-Rank Adaptation of Stencil Computation on Tensor Cores.
Proceedings of the International Conference for High Performance Computing, 2024

Long Exposure: Accelerating Parameter-Efficient Fine-Tuning for LLMs under Shadowy Sparsity.
Proceedings of the International Conference for High Performance Computing, 2024

ConvStencil: Transform Stencil Computation to Matrix Multiplication on Tensor Cores.
Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2024

Ladder: Enabling Efficient Low-Precision Deep Learning Computing through Hardware-aware Tensor Transformation.
Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation, 2024

LitePred: Transferable and Scalable Latency Prediction for Hardware-Aware Neural Architecture Search.
Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation, 2024

Poster: Design of Elastic Deep Neural Network Candidate Spaces for Inference on Diverse Devices.
Proceedings of the 22nd Annual International Conference on Mobile Systems, 2024

Empowering In-Browser Deep Learning Inference on Edge Through Just-In-Time Kernel Optimization.
Proceedings of the 22nd Annual International Conference on Mobile Systems, 2024

FlexNN: Efficient and Adaptive DNN Inference on Memory-Constrained Edge Devices.
Proceedings of the 30th Annual International Conference on Mobile Computing and Networking, 2024

Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Hybrid SLM and LLM for Edge-Cloud Collaborative Inference.
Proceedings of the Workshop on Edge and Mobile Foundation Models, 2024

PIM-DL: Expanding the Applicability of Commodity DRAM-PIMs for Deep Learning via Algorithm-System Co-Optimization.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

AFPQ: Asymmetric Floating Point Quantization for LLMs.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-Distillation.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
Accelerating In-Browser Deep Learning Inference on Diverse Edge Clients through Just-in-Time Kernel Optimizations.
CoRR, 2023

Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models.
CoRR, 2023

Gamify Stencil Dwarf on Cloud for Democratizing Scientific Computing.
CoRR, 2023

LUT-NN: Towards Unified Neural Network Inference by Table Lookup.
CoRR, 2023

Boosting DNN Cold Inference on Edge Devices.
Proceedings of the 21st Annual International Conference on Mobile Systems, 2023

NN-Stretch: Automatic Neural Network Branching for Parallel Inference on Heterogeneous Multi-Processors.
Proceedings of the 21st Annual International Conference on Mobile Systems, 2023

LUT-NN: Empower Efficient Neural Network Inference with Centroid Learning and Table Lookup.
Proceedings of the 29th Annual International Conference on Mobile Computing and Networking, 2023

Efficient GPU Kernels for N: M-Sparse Weights in Deep Learning.
Proceedings of the Sixth Conference on Machine Learning and Systems, 2023

Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference.
Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023

SpaceEvo: Hardware-Friendly Search Space Design for Efficient INT8 Inference.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

ElasticViT: Conflict-aware Supernet Training for Deploying Fast Vision Transformer on Diverse Mobile Devices.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Adam Accumulation to Reduce Memory Footprints of Both Activations and Gradients for Large-Scale DNN Training.
Proceedings of the ECAI 2023 - 26th European Conference on Artificial Intelligence, September 30 - October 4, 2023, Kraków, Poland, 2023

2022
Understanding and Optimizing Deep Learning Cold-Start Latency on Edge Devices.
CoRR, 2022

Hyperion: A Generic and Distributed Mobile Offloading Framework on OpenCL.
Proceedings of the 20th ACM Conference on Embedded Networked Sensor Systems, 2022

Turbo: Opportunistic Enhancement for Edge Video Analytics.
Proceedings of the 20th ACM Conference on Embedded Networked Sensor Systems, 2022

CoDL: efficient CPU-GPU co-execution for deep learning inference on mobile devices.
Proceedings of the MobiSys '22: The 20th Annual International Conference on Mobile Systems, Applications and Services, Portland, Oregon, 27 June 2022, 2022

MobiDepth: real-time depth estimation using on-device dual cameras.
Proceedings of the ACM MobiCom '22: The 28th Annual International Conference on Mobile Computing and Networking, Sydney, NSW, Australia, October 17, 2022

Romou: rapidly generate high-performance tensor kernels for mobile GPUs.
Proceedings of the ACM MobiCom '22: The 28th Annual International Conference on Mobile Computing and Networking, Sydney, NSW, Australia, October 17, 2022

SwiftPruner: Reinforced Evolutionary Pruning for Efficient Ad Relevance.
Proceedings of the 31st ACM International Conference on Information & Knowledge Management, 2022

2021
nn-METER: Towards Accurate Latency Prediction of DNN Inference on Diverse Edge Devices.
GetMobile Mob. Comput. Commun., 2021

nn-Meter: towards accurate latency prediction of deep-learning model inference on diverse edge devices.
Proceedings of the MobiSys '21: The 19th Annual International Conference on Mobile Systems, Applications, and Services, Virtual Event, Wisconsin, USA, 24 June, 2021

AsyMo: scalable and efficient deep-learning inference on asymmetric mobile CPUs.
Proceedings of the ACM MobiCom '21: The 27th Annual International Conference on Mobile Computing and Networking, 2021

To Bridge Neural Network Design and Real-World Performance: A Behaviour Study for Neural Networks.
Proceedings of the Fourth Conference on Machine Learning and Systems, 2021

2020
Profiling and optimizing deep learning inference on mobile GPUs.
Proceedings of the APSys '20: 11th ACM SIGOPS Asia-Pacific Workshop on Systems, 2020


  Loading...