Ting Cao
Orcid: 0000-0002-9107-013XAffiliations:
- Microsoft Research, Beijing, China
According to our database1,
Ting Cao
authored at least 74 papers
between 2020 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2025
CoRR, October, 2025
Efficient and Adaptive Diffusion Model Inference Through Lookup Table on Mobile Devices.
IEEE Trans. Mob. Comput., September, 2025
CoRR, September, 2025
CoRR, September, 2025
SparStencil: Retargeting Sparse Tensor Cores to Scientific Stencil Computations via Structured Sparsity Transformation.
CoRR, June, 2025
CoRR, May, 2025
CoRR, May, 2025
CoRR, April, 2025
CoRR, March, 2025
BitDecoding: Unlocking Tensor Cores for Long-Context LLMs Decoding with Low-Bit KV Cache.
CoRR, March, 2025
CoRR, March, 2025
StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition.
CoRR, March, 2025
ACM Trans. Softw. Eng. Methodol., February, 2025
CoRR, January, 2025
CoRR, January, 2025
CoRR, January, 2025
PUDTune: Multi-Level Charging for High-Precision Calibration in Processing-Using-DRAM.
IEEE Comput. Archit. Lett., 2025
Proceedings of the 2025 USENIX Annual Technical Conference, 2025
Babel: A Scalable Pre-trained Model for Multi-Modal Sensing via Expandable Modality Alignment.
Proceedings of the 23rd ACM Conference on Embedded Networked Sensor Systems, 2025
Jigsaw: Toward Conflict-free Vectorized Stencil Computation by Tessellating Swizzled Registers.
Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2025
FlashFFTStencil: Bridging Fast Fourier Transforms to Memory-Efficient Stencil Computations on Tensor Core Units.
Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2025
Proceedings of the 52nd Annual International Symposium on Computer Architecture, 2025
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2025
Proceedings of the Twentieth European Conference on Computer Systems, 2025
2024
IEEE Trans. Mob. Comput., May, 2024
Matryoshka: Optimization of Dynamic Diverse Quantum Chemistry Systems via Elastic Parallelism Transformation.
CoRR, 2024
Ripple: Accelerating LLM Inference on Smartphones with Correlation-Aware Neuron Management.
CoRR, 2024
Making Every Frame Matter: Continuous Video Understanding for Large Models via Adaptive State Modeling.
CoRR, 2024
CoRR, 2024
Exploring the Impact of In-Browser Deep Learning Inference on Quality of User Experience and Performance.
CoRR, 2024
Proceedings of the International Conference for High Performance Computing, 2024
Long Exposure: Accelerating Parameter-Efficient Fine-Tuning for LLMs under Shadowy Sparsity.
Proceedings of the International Conference for High Performance Computing, 2024
Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2024
Ladder: Enabling Efficient Low-Precision Deep Learning Computing through Hardware-aware Tensor Transformation.
Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation, 2024
LitePred: Transferable and Scalable Latency Prediction for Hardware-Aware Neural Architecture Search.
Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation, 2024
Poster: Design of Elastic Deep Neural Network Candidate Spaces for Inference on Diverse Devices.
Proceedings of the 22nd Annual International Conference on Mobile Systems, 2024
Empowering In-Browser Deep Learning Inference on Edge Through Just-In-Time Kernel Optimization.
Proceedings of the 22nd Annual International Conference on Mobile Systems, 2024
Proceedings of the 30th Annual International Conference on Mobile Computing and Networking, 2024
Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024
Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024
Proceedings of the Workshop on Edge and Mobile Foundation Models, 2024
PIM-DL: Expanding the Applicability of Commodity DRAM-PIMs for Deep Learning via Algorithm-System Co-Optimization.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024
Proceedings of the Findings of the Association for Computational Linguistics, 2024
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024
2023
Accelerating In-Browser Deep Learning Inference on Diverse Edge Clients through Just-in-Time Kernel Optimizations.
CoRR, 2023
Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models.
CoRR, 2023
Proceedings of the 21st Annual International Conference on Mobile Systems, 2023
NN-Stretch: Automatic Neural Network Branching for Parallel Inference on Heterogeneous Multi-Processors.
Proceedings of the 21st Annual International Conference on Mobile Systems, 2023
LUT-NN: Empower Efficient Neural Network Inference with Centroid Learning and Table Lookup.
Proceedings of the 29th Annual International Conference on Mobile Computing and Networking, 2023
Proceedings of the Sixth Conference on Machine Learning and Systems, 2023
Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference.
Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
ElasticViT: Conflict-aware Supernet Training for Deploying Fast Vision Transformer on Diverse Mobile Devices.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
Adam Accumulation to Reduce Memory Footprints of Both Activations and Gradients for Large-Scale DNN Training.
Proceedings of the ECAI 2023 - 26th European Conference on Artificial Intelligence, September 30 - October 4, 2023, Kraków, Poland, 2023
2022
CoRR, 2022
Proceedings of the 20th ACM Conference on Embedded Networked Sensor Systems, 2022
Proceedings of the 20th ACM Conference on Embedded Networked Sensor Systems, 2022
Proceedings of the MobiSys '22: The 20th Annual International Conference on Mobile Systems, Applications and Services, Portland, Oregon, 27 June 2022, 2022
Proceedings of the ACM MobiCom '22: The 28th Annual International Conference on Mobile Computing and Networking, Sydney, NSW, Australia, October 17, 2022
Proceedings of the ACM MobiCom '22: The 28th Annual International Conference on Mobile Computing and Networking, Sydney, NSW, Australia, October 17, 2022
Proceedings of the 31st ACM International Conference on Information & Knowledge Management, 2022
2021
nn-METER: Towards Accurate Latency Prediction of DNN Inference on Diverse Edge Devices.
GetMobile Mob. Comput. Commun., 2021
nn-Meter: towards accurate latency prediction of deep-learning model inference on diverse edge devices.
Proceedings of the MobiSys '21: The 19th Annual International Conference on Mobile Systems, Applications, and Services, Virtual Event, Wisconsin, USA, 24 June, 2021
Proceedings of the ACM MobiCom '21: The 27th Annual International Conference on Mobile Computing and Networking, 2021
To Bridge Neural Network Design and Real-World Performance: A Behaviour Study for Neural Networks.
Proceedings of the Fourth Conference on Machine Learning and Systems, 2021
2020
Proceedings of the APSys '20: 11th ACM SIGOPS Asia-Pacific Workshop on Systems, 2020