Posts by Collection

portfolio

preprints

Hybrid SLM and LLM for Edge-Cloud Collaborative Inference

Published: October 01, 2024

Recommended citation: Zixu Hao, Huiqiang Jiang, Shiqi Jiang, Ju Ren, Ting Cao. (2024). "Hybrid SLM and LLM for Edge-Cloud Collaborative Inference." EdgeFM’24 Workshop (Colocated with MobiCom’24).
Download Paper

SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs

Published: October 01, 2024

Recommended citation: Yizhao Gao, Zhichen Zeng, Dayou Du, Shijie Cao, Peiyuan Zhou, Jiaxing Qi, Junjie Lai, Hayden Kwok-Hay So, Ting Cao, Fan Yang, Mao Yang. (2024). "SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs." arXiv.
Download Paper

Bulk Bitwise Accumulation in Commercial DRAM

Published: December 01, 2024

Recommended citation: Tatsuya Kubo, Masayuki Usui, Tomoya Nagatani, Daichi Tokuda, Lei Qu, Ting Cao, Shinya Takamaeda-Yamazaki. (2024). "Bulk Bitwise Accumulation in Commercial DRAM." NeurIPS 2024 Workshop Machine Learning with new Compute Paradigms.
Download Paper

PUDTune: Multi-Level Charging for High-Precision Calibration in Processing-Using-DRAM

Published: January 01, 2025

Recommended citation: Tatsuya Kubo, Daichi Tokuda, Lei Qu, Ting Cao, Shinya Takamaeda-Yamazaki. (2025). "PUDTune: Multi-Level Charging for High-Precision Calibration in Processing-Using-DRAM." IEEE Computer Architecture Letters.
Download Paper

BitDecoding: Unlocking Tensor Cores for Long-Context LLMs Decoding with Low-Bit KV Cache

Published: March 01, 2025

Recommended citation: Dayou Du, Shijie Cao, Jianyi Cheng, Ting Cao, Mao Yang. (2025). "BitDecoding: Unlocking Tensor Cores for Long-Context LLMs Decoding with Low-Bit KV Cache." arXiv.
Download Paper

Advancing Mobile GUI Agents: A Verifier-Driven Approach to Practical Deployment

Published: March 01, 2025

Recommended citation: Gaole Dai, Shiqi Jiang, Ting Cao, Yuanchun Li, Yuqing Yang, Rui Tan, Mo Li, Lili Qiu. (2025). "Advancing Mobile GUI Agents: A Verifier-Driven Approach to Practical Deployment." arXiv.
Download Paper

MVDRAM: Enabling GeMV Execution in Unmodified DRAM for Low-Bit LLM Acceleration

Published: March 01, 2025

Recommended citation: Tatsuya Kubo, Daichi Tokuda, Tomoya Nagatani, Masayuki Usui, Lei Qu, Ting Cao, Shinya Takamaeda-Yamazaki. (2025). "MVDRAM: Enabling GeMV Execution in Unmodified DRAM for Low-Bit LLM Acceleration." arXiv.
Download Paper

Scaling Up On-Device LLMs via Active-Weight Swapping Between DRAM and Flash

Published: April 01, 2025

Recommended citation: Fucheng Jia, Zewen Wu, Shiqi Jiang, Huiqiang Jiang, Qianxi Zhang, Yuqing Yang, Yunxin Liu, Ju Ren, Deyu Zhang, Ting Cao. (2025). "Scaling Up On-Device LLMs via Active-Weight Swapping Between DRAM and Flash." arXiv.
Download Paper

SwarmThinkers: Learning Physically Consistent Atomic KMC Transitions at Scale

Published: May 01, 2025

Recommended citation: Qi Li, Kun Li, Haozhi Han, Honghui Shang, Xinfu He, Yunquan Zhang, Hong An, Ting Cao, Mao Yang. (2025). "SwarmThinkers: Learning Physically Consistent Atomic KMC Transitions at Scale." arXiv.
Download Paper

SeerAttention-R: Sparse Attention Adaptation for Long Reasoning

Published: June 01, 2025

Recommended citation: Yizhao Gao, Shuming Guo, Shijie Cao, Yuqing Xia, Yu Cheng, Lei Wang, Lingxiao Ma, Yutao Sun, Tianzhu Ye, Li Dong, Hayden Kwok-Hay So, Yu Hua, Ting Cao, Fan Yang, Mao Yang. (2025). "SeerAttention-R: Sparse Attention Adaptation for Long Reasoning." arXiv.
Download Paper

publications

Panthera: Holistic Memory Management for Big Data Processing over Hybrid Memories

Published in ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2019

Recommended citation: C. Wang, H. Cui, T. Cao, J. Zigman, H. Volos, O. Mutlu, F. Lv, X. Feng, and H. Xu. (2019). "Panthera: Holistic Memory Management for Big Data Processing over Hybrid Memories." ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI).
Download Paper

Profiling and optimizing deep learning inference on mobile GPUs

Published in Proceedings of the 11th ACM SIGOPS Asia-Pacific Workshop on Systems (APSys), 2020

Recommended citation: S. Jiang, L. Ran, T. Cao, Y. Xu, Y. Liu. (2020). "Profiling and optimizing deep learning inference on mobile GPUs." Proceedings of the 11th ACM SIGOPS Asia-Pacific Workshop on Systems (APSys).
Download Paper

To Bridge Neural Network Design and Real-World Performance: A Behaviour Study for Neural Networks

Published in Proceedings of Machine Learning and Systems (MLSys), 2021

Recommended citation: X. Tang, S. Han, L. Zhang, T. Cao, Y. Liu. (2021). "To Bridge Neural Network Design and Real-World Performance: A Behaviour Study for Neural Networks." Conference on Machine Learning and Systems (MLSys).
Download Paper

nn-Meter: towards accurate latency prediction of deep-learning model inference on diverse edge devices

Published in 19th International Conference on Mobile Systems, Applications, and Services (MobiSys), 2021

MobiSys 2021 Best Paper Award

Recommended citation: L. Zhang, S. Han, J. Wei, N. Zheng, T. Cao, Y. Yang, Y. Liu. (2021). "nn-Meter: towards accurate latency prediction of deep-learning model inference on diverse edge devices." 19th International Conference on Mobile Systems, Applications, and Services (MobiSys).
Download Paper

AsyMo: Scalable and Efficient Deep-Learning Inference on Asymmetric Mobile CPUs

Published in Proceedings of the 27th Annual International Conference on Mobile Computing and Networking (MobiCom), 2021

Recommended citation: Manni Wang, Shaohua Ding, Ting Cao, Yunxin Liu, Fengyuan Xu. (2021). "AsyMo: Scalable and Efficient Deep-Learning Inference on Asymmetric Mobile CPUs." Proceedings of the 27th Annual International Conference on Mobile Computing and Networking (MobiCom).
Download Paper

Unified Holistic Memory Management Supporting Multiple Big Data Processing Frameworks over Hybrid Memories

Published in ACM Transactions on Computer Systems (TOCS), 2021

Recommended citation: Lei Chen, Jiacheng Zhao, Chenxi Wang, Ting Cao, John Zigman, Haris Volos, Onur Mutlu, Fang Lv, Xiaobing Feng, Guoqing Harry Xu, Huimin Cui. (2021). "Unified Holistic Memory Management Supporting Multiple Big Data Processing Frameworks over Hybrid Memories." ACM Transactions on Computer Systems (TOCS), Vol 39(1-4): pp. 1-38.
Download Paper

nn-Meter: towards accurate latency prediction of DNN inference on diverse edge devices

Published in GetMobile: Mobile Computing and Communications, Research Highlights, 2021

ACM SigMobile Research Highlight

Recommended citation: L. Zhang, S. Han, J. Wei, N. Zheng, T. Cao, Y. Yang, Y. Liu. (2021). "nn-Meter: towards accurate latency prediction of DNN inference on diverse edge devices." GetMobile: Mobile Computing and Communications, Research Highlights, 25(4): pp. 19-23.
Download Paper

CoDL: Efficient CPU-GPU Co-execution for Deep Learning Inference on Mobile Devices

Published in 20th International Conference on Mobile Systems, Applications, and Services (MobiSys), 2022

Recommended citation: Fucheng Jia, Deyu Zhang, Ting Cao, Shiqi Jiang, Yunxin Liu, Ju Ren, Yaoxue Zhang. (2022). "CoDL: Efficient CPU-GPU Co-execution for Deep Learning Inference on Mobile Devices." 20th International Conference on Mobile Systems, Applications, and Services (MobiSys).
Download Paper

SwiftPruner: Reinforced Evolutionary Pruning for Efficient Ad Relevance

Published in ACM International Conference on Information and Knowledge Management (CIKM), 2022

Recommended citation: Li Lyna Zhang, Youkow Homma, Yujing Wang, Min Wu, Mao Yang, Ruofei Zhang, Ting Cao, Wei Shen. (2022). "SwiftPruner: Reinforced Evolutionary Pruning for Efficient Ad Relevance." ACM International Conference on Information and Knowledge Management (CIKM).
Download Paper

MobiDepth: Real-Time Depth Estimation Using On-Device Dual Cameras

Published in Proceedings of the 28th Annual International Conference on Mobile Computing and Networking (MobiCom), 2022

Recommended citation: Jinrui Zhang, Huan Yang, Ju Ren, Deyu Zhang, Bangwen He, Yuanchun Li, Ting Cao, Yaoxue Zhang, Yunxin Liu. (2022). "MobiDepth: Real-Time Depth Estimation Using On-Device Dual Cameras." Proceedings of the 28th Annual International Conference on Mobile Computing and Networking (MobiCom).
Download Paper

Romou: Rapidly Generate High-Performance Tensor Kernels for Mobile GPUs

Published in Proceedings of the 28th Annual International Conference on Mobile Computing and Networking (MobiCom), 2022

ArchProbe: 2022-2023 Top 100 Open Source Achievements Award

Recommended citation: Rendong Liang, Ting Cao, Jicheng Wen, Manni Wang, Yang Wang, Jianhua Zou, Yunxin Liu. (2022). "Romou: Rapidly Generate High-Performance Tensor Kernels for Mobile GPUs." Proceedings of the 28th Annual International Conference on Mobile Computing and Networking (MobiCom).
Download Paper

Hyperion: A Generic and Distributed Mobile Offloading Framework on OpenCL

Published in The 20th ACM Conference on Embedded Networked Sensor Systems (SenSys), 2022

Recommended citation: Ziyan Fu, Ju Ren, Yunxin Liu, Ting Cao, Deyu Zhang, Yuezhi Zhou, Yaoxue Zhang. (2022). "Hyperion: A Generic and Distributed Mobile Offloading Framework on OpenCL." The 20th ACM Conference on Embedded Networked Sensor Systems (SenSys).
Download Paper

Turbo: Opportunistic Enhancement for Edge Video Analytics

Published in The 20th ACM Conference on Embedded Networked Sensor Systems (SenSys), 2022

Recommended citation: Yan Lu, Shiqi Jiang, Ting Cao, Yuanchao Shu. (2022). "Turbo: Opportunistic Enhancement for Edge Video Analytics." The 20th ACM Conference on Embedded Networked Sensor Systems (SenSys).
Download Paper

Efficient GPU Kernels for N:M-SPARSE Weights in Deep Learning

Published in Sixth Conference on Machine Learning and Systems (MLSys), 2023

Recommended citation: Bin Lin, Ningxin Zheng, Lei Wang, Shijie Cao, Lingxiao Ma, Quanlu Zhang, Yi Zhu, Ting Cao, Jilong Xue, Yuqing Yang, Fan Yang. (2023). "Efficient GPU Kernels for N:M-SPARSE Weights in Deep Learning." Sixth Conference on Machine Learning and Systems (MLSys).
Download Paper

Boosting DNN Cold Inference on Devices

Published in The 21st Annual International Conference on Mobile Systems, Applications and Services (MobiSys), 2023

Recommended citation: Rongjie Yi, Ting Cao, Ao Zhou, Xiao Ma, Shangguang Wang, Mengwei Xu. (2023). "Boosting DNN Cold Inference on Devices." The 21st Annual International Conference on Mobile Systems, Applications and Services (MobiSys).
Download Paper

NN-Stretch: Automatic Neural Network Branching for Parallel Inference on Heterogeneous Multi-Processors

Published in The 21st International Conference on Mobile Systems, Applications, and Services (MobiSys), 2023

Recommended citation: Jianyu Wei, Ting Cao, Shijie Cao, Shiqi Jiang, Shaowei Fu, Mao Yang, Yanyong Zhang, Yunxin Liu. (2023). "NN-Stretch: Automatic Neural Network Branching for Parallel Inference on Heterogeneous Multi-Processors." The 21st International Conference on Mobile Systems, Applications, and Services (MobiSys).
Download Paper

VSPIM: SRAM Processing-in-Memory DNN Acceleration via Vector-Scalar Operations

Published in IEEE Transactions on Computers (TC), 2023

Recommended citation: Chen Nie, Chenyu Tang, Jie Lin, Huan Hu, Chenyang Lv, Ting Cao, Weifeng Zhang, Li Jiang, Xiaoyao Liang, Weikang Qian, Yanan Sun, Zhezhi He. (2023). "VSPIM: SRAM Processing-in-Memory DNN Acceleration via Vector-Scalar Operations." IEEE Transactions on Computers (TC).
Download Paper

HiMoDepth: Efficient Training-Free High-Resolution On-Device Depth Perception

Published in IEEE Transactions on Mobile Computing (TMC), 2023

Recommended citation: Jinrui Zhang, Huan Yang, Ju Ren, Deyu Zhang, Bangwen He, Youngki Lee, Ting Cao, Yuanchun Li, Yaoxue Zhang, Yunxin Liu. (2024). "HiMoDepth: Efficient Training-Free High-Resolution On-Device Depth Perception." IEEE Transactions on Mobile Computing (TMC), 23(5), 2024.
Download Paper

Accurate and Structured Pruning for Efficient Automatic Speech Recognition

Published in Conference of the International Speech Communication Association (INTERSPEECH), 2023

Recommended citation: Huiqiang Jiang, Li Lyna Zhang, Yuang Li, Yu Wu, Shijie Cao, Ting Cao, Yuqing Yang, Jinyu Li, Mao Yang, Lili Qiu. (2023). "Accurate and Structured Pruning for Efficient Automatic Speech Recognition." Conference of the International Speech Communication Association (INTERSPEECH).
Download Paper

Adam Accumulation to Reduce Memory Footprints of both Activations and Gradients for Large-scale DNN Training

Published in 26th European Conference on Artificial Intelligence (ECAI), 2023

Recommended citation: Yijia Zhang, Yibo Han, Shijie Cao, Guohao Dai, Youshan Miao, Ting Cao, Fan Yang, Ningyi Xu. (2023). "Adam Accumulation to Reduce Memory Footprints of both Activations and Gradients for Large-scale DNN Training." ECAI.
Download Paper

ElasticViT: Conflict-aware Supernet Training for Deploying Fast Vision Transformer on Diverse Mobile Devices

Published in International Conference on Computer Vision (ICCV), 2023

Recommended citation: Chen Tang, Li Lyna Zhang, Huiqiang Jiang, Jiahang Xu, Ting Cao, Quanlu Zhang, Yuqing Yang, Zhi Wang, Mao Yang. (2023). "ElasticViT: Conflict-aware Supernet Training for Deploying Fast Vision Transformer on Diverse Mobile Devices." International Conference on Computer Vision (ICCV).
Download Paper

SpaceEvo: Hardware-Friendly Search Space Design for Efficient INT8 Inference

Published in International Conference on Computer Vision (ICCV), 2023

Recommended citation: Xudong Wang, Li Lyna Zhang, Jiahang Xu, Quanlu Zhang, Yujing Wang, Yuqing Yang, Ningxin Zheng, Ting Cao, Mao Yang. (2023). "SpaceEvo: Hardware-Friendly Search Space Design for Efficient INT8 Inference." International Conference on Computer Vision (ICCV).
Download Paper

Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference

Published in ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2023

Recommended citation: Junyan Li, Li Lyna Zhang, Jiahang Xu, Yujing Wang, Shaoguang Yan, Yunqing Xia, Yuqing Yang, Ting Cao, Hao Sun, Weiwei Deng, Qi Zhang, Mao Yang. (2023). "Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference." ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD).
Download Paper

LUT-NN: Empower Efficient Neural Network Inference with Centroid Learning and Table Lookup

Published in The 29th Annual International Conference On Mobile Computing And Networking (MobiCom), 2023

Microsoft Research Focus 2023

Recommended citation: Xiaohu Tang, Yang Wang, Ting Cao, Li Lyna Zhang, Qi Chen, Deng Cai, Yunxin Liu, Mao Yang. (2023). "LUT-NN: Empower Efficient Neural Network Inference with Centroid Learning and Table Lookup." MobiCom.
Download Paper

ConvStencil: Transform Stencil Computation to Matrix Multiplication on Tensor Cores

Published in ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP), 2024

PPoPP 2024 Best Paper Award

Recommended citation: Yuetao Chen, Kun Li, Yuhao Wang, Donglin Bai, Lei Wang, Lingxiao Ma, Liang Yuan, Yunquan Zhang, Ting Cao, Mao Yang. (2024). "ConvStencil: Transform Stencil Computation to Matrix Multiplication on Tensor Cores." PPoPP.
Download Paper

LitePred: Transferable and Scalable Latency Prediction for Hardware-Aware Neural Architecture Search

Published in USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2024

Recommended citation: Chengquan Feng, Li Lyna Zhang, Yuanchi Liu, Jiahang Xu, Chengruidong Zhang, Zhiyuan Wang, Ting Cao, Mao Yang, Haisheng Tan. (2024). "LitePred: Transferable and Scalable Latency Prediction for Hardware-Aware Neural Architecture Search." NSDI.
Download Paper

PIM-DL: Expanding the Applicability of Commodity DRAM-PIMs for Deep Learning via Algorithm-System Co-Optimization

Published in ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2024

Recommended citation: Cong Li, Zhe Zhou, Yang Wang, Fan Yang, Ting Cao, Mao Yang, Yun Liang, Guangyu Sun. (2024). "PIM-DL: Expanding the Applicability of Commodity DRAM-PIMs for Deep Learning via Algorithm-System Co-Optimization." ASPLOS.
Download Paper

FlexNN: Efficient and Adaptive DNN Inference on Memory-Constrained Edge Devices

Published in The 30th Annual International Conference On Mobile Computing And Networking (MobiCom), 2024

Recommended citation: Xiangyu Li, Yuanchun Li, Yuanzhe Li, Ting Cao, Yunxin Liu. (2024). "FlexNN: Efficient and Adaptive DNN Inference on Memory-Constrained Edge Devices." MobiCom.
Download Paper

Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference

Published in The 51st Annual International Symposium on Computer Architecture 2024 (ISCA’24), 2024

Microsoft Research Focus 2024

Recommended citation: R. Hwang, J. Wei, S. Cao, C. Hwang, X. Tang, Ting Cao, M. Yang. (2024). "Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference." ISCA’24.
Download Paper

Empowering In-Browser Deep Learning Inference on Edge Through Just-In-Time Kernel Optimization

Published in The 22nd Annual International Conference on Mobile Systems, Applications and Services (MobiSys), 2024

Recommended citation: F. Jia, S. Jiang, Ting Cao, W. Cui, T. Xia, X. Cao, Y. Li, Q. Wang, D. Zhang, J. Ren, Y. Liu, L. Qiu, M. Yang. (2024). "Empowering In-Browser Deep Learning Inference on Edge Through Just-In-Time Kernel Optimization." MobiSys.
Download Paper

Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models

Published in IEEE International Conference on Multimedia and Expo (ICME’24), 2024

Recommended citation: Yijia Zhang, Lingran Zhao, Shijie Cao, Wenqiang Wang, Ting Cao, Fan Yang, Mao Yang, Shanghang Zhang, Ningyi Xu. (2024). "Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models." ICME’24.
Download Paper

Ladder: Enabling Efficient Low-Precision Deep Learning Computing through Hardware-aware Tensor Transformation

Published in The 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2024

Recommended citation: L. Wang, L. Ma, S. Cao, Q. Zhang, J. Xue, Y. Shi, N. Zheng, Z. Miao, F. Yang, Ting Cao, Y. Yang, M. Yang. (2024). "Ladder: Enabling Efficient Low-Precision Deep Learning Computing through Hardware-aware Tensor Transformation." OSDI.
Download Paper

AFPQ: Asymmetric Floating Point Quantization for LLMs

Published in 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024 Finding short paper), 2024

Recommended citation: Yijia Zhang, Sicheng Zhang, Shijie Cao, DaYou Du, Jianyu Wei, Ting Cao, Ningyi Xu. (2024). "AFPQ: Asymmetric Floating Point Quantization for LLMs." ACL.
Download Paper

BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-Distillation

Published in 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024 Main Conference, Long paper), 2024

Recommended citation: DaYou Du, Yijia Zhang, Shijie Cao, Jiaqi Guo, Ting Cao, Xiaowen Chu, Ningyi Xu. (2024). "BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-Distillation." ACL.
Download Paper

PruneAug: Bridging DNN Pruning and Inference Latency on Diverse Sparse Platforms Using Automatic Layerwise Block Pruning

Published in IEEE Transactions on Computers (TC), 2024

Recommended citation: Hanfei Geng, Yifei Liu, Yujie Zheng, Li Lyna Zhang, Jingwei Sun, Yujing Wang, Yang Wang, Guangzhong Sun, Mao Yang, Ting Cao, Yunxin Liu. (2024). "PruneAug: Bridging DNN Pruning and Inference Latency on Diverse Sparse Platforms Using Automatic Layerwise Block Pruning." TC.
Download Paper

VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models

Published in The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Recommended citation: Yifei Liu, Jicheng Wen, Yang Wang, Shengyu Ye, Li Lyna Zhang, Ting Cao, Cheng Li, Mao Yang. (2024). "VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models." EMNLP.
Download Paper

Long Exposure: Accelerating Parameter-Efficient Fine-Tuning for LLMs under Shadowy Sparsity

Published in International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’24), 2024

Recommended citation: Tuowei Wang, Kun Li, Zixu Hao, Donglin Bai, Ju Ren, Yaoxue Zhang, Ting Cao, Mao Yang. (2024). "Long Exposure: Accelerating Parameter-Efficient Fine-Tuning for LLMs under Shadowy Sparsity." SC’24.
Download Paper

LoRAStencil: Low-Rank Adaptation of Stencil Computation on Tensor Cores

Published in International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’24), 2024

Recommended citation: Yiwei Zhang, Kun Li, Liang Yuan, Jiawen Cheng, Yunquan Zhang, Ting Cao, Mao Yang. (2024). "LoRAStencil: Low-Rank Adaptation of Stencil Computation on Tensor Cores." SC’24.
Download Paper

Anatomizing Deep Learning Inference in Web Browsers

Published in ACM Transactions on Software Engineering and Methodology (TOSEM), 2025

Recommended citation: Qipeng Wang, Shiqi Jiang, Zhenpeng Chen, Xu Cao, Yuanchun Li, Aoyu Li, Yun Ma, Ting Cao, Xuanzhe Liu. (2024). "Anatomizing Deep Learning Inference in Web Browsers." TOSEM.
Download Paper

LUT-DLA: Lookup Table as Efficient Extreme Low-Bit Deep Learning Accelerator

Published in 31st IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2025

Recommended citation: Guoyu Li, Chunyun Chen, Shengyu Ye, Yang Wang, Fan Yang, Ting Cao, Mohamed M. Sabry Aly, Cheng Liu, Mao Yang. (2025). "LUT-DLA: Lookup Table as Efficient Extreme Low-Bit Deep Learning Accelerator." HPCA.
Download Paper

FlashFFTStencil: Bridging Fast Fourier Transforms to Memory-Efficient Stencil Computations on Tensor Core Units

Published in 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP), 2025

Recommended citation: Haozhi Han, Kun Li, Wei Cui, Donglin Bai, Yifeng Chen, Ting Cao, Mao Yang. (2025). "FlashFFTStencil: Bridging Fast Fourier Transforms to Memory-Efficient Stencil Computations on Tensor Core Units." PPoPP.
Download Paper

Jigsaw: Toward Conflict-free Vectorized Stencil Computation by Tessellating Swizzled Registers

Published in 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP), 2025

Recommended citation: Yiwei Zhang, Kun Li, Liang Yuan, Yunquan Zhang, Ting Cao, Mao Yang. (2025). "Jigsaw: Toward Conflict-free Vectorized Stencil Computation by Tessellating Swizzled Registers." PPoPP.
Download Paper

Efficient and Adaptive Diffusion Model Inference Through Lookup Table on Mobile Devices

Published in IEEE Transactions on Mobile Computing (TMC), 2025

Recommended citation: Qipeng Wang, Shiqi Jiang, Yifan Yang, Ruiqi Liu, Yuanchun Li, Ting Cao, Xuanzhe Liu. (2025). "Efficient and Adaptive Diffusion Model Inference Through Lookup Table on Mobile Devices." IEEE Transactions on Mobile Computing (TMC).
Download Paper

T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge

Published in The 2025 ACM European Conference on Computer Systems (EuroSys), 2025

Recommended citation: Jianyu Wei, Shijie Cao, Ting Cao, Lingxiao Ma, Lei Wang, Yanyong Zhang, Mao Yang. (2025). "T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge." EuroSys.
Download Paper

Babel: A Scalable Pre-trained Model for Multi-Modal Sensing via Expandable Modality Alignment

Published in The 23rd ACM Conference on Embedded Networked Sensor Systems (SenSys), 2025

Recommended citation: Shenghong Dai, Shiqi Jiang, Yifan Yang, Ting Cao, Mo Li, S. Banerjee, Lili Qiu. (2025). "Babel: A Scalable Pre-trained Model for Multi-Modal Sensing via Expandable Modality Alignment." SenSys.
Download Paper

LUTensor: A Software-Hardware Co-Design for LUT-Based Low-Bit LLM Inference

Published in The 52nd Annual International Symposium on Computer Architecture (ISCA), 2025

Recommended citation: Zhiwen Mo, Lei Wang, Jianyu Wei, Zhiwen Zeng, Shijie Cao, Lingxiao Ma, Naifeng Jing, Ting Cao, Jilong Xue, Fan Yang, Mao Yang. (2025). "LUTensor: A Software-Hardware Co-Design for LUT-Based Low-Bit LLM Inference." ISCA.
Download Paper

Bitnet.cpp: Efficient Edge Inference for Ternary LLMs

Published in Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL), 2025

Recommended citation: Jinheng Wang, Hansong Zhou, Ting Song, Shijie Cao, Yan Xia, Ting Cao, Jianyu Wei, Shuming Ma, Hongyu Wang, and Furu Wei. (2025). "Bitnet.cpp: Efficient Edge Inference for Ternary LLMs." Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL).
Download Paper

Dr. Ting Cao

Posts by Collection

portfolio

preprints

publications

talks

teaching