Posts by Collection

portfolio

preprints

Bulk Bitwise Accumulation in Commercial DRAM

Published:

Recommended citation: Tatsuya Kubo, Masayuki Usui, Tomoya Nagatani, Daichi Tokuda, Lei Qu, Ting Cao, Shinya Takamaeda-Yamazaki. (2024). "Bulk Bitwise Accumulation in Commercial DRAM." NeurIPS 2024 Workshop Machine Learning with new Compute Paradigms.
Download Paper

publications

nn-Meter: towards accurate latency prediction of deep-learning model inference on diverse edge devices

Published in 19th International Conference on Mobile Systems, Applications, and Services (MobiSys), 2021

MobiSys 2021 Best Paper Award

Recommended citation: L. Zhang, S. Han, J. Wei, N. Zheng, T. Cao, Y. Yang, Y. Liu. (2021). "nn-Meter: towards accurate latency prediction of deep-learning model inference on diverse edge devices." 19th International Conference on Mobile Systems, Applications, and Services (MobiSys).
Download Paper

AsyMo: Scalable and Efficient Deep-Learning Inference on Asymmetric Mobile CPUs

Published in Proceedings of the 27th Annual International Conference on Mobile Computing and Networking (MobiCom), 2021

Recommended citation: Manni Wang, Shaohua Ding, Ting Cao, Yunxin Liu, Fengyuan Xu. (2021). "AsyMo: Scalable and Efficient Deep-Learning Inference on Asymmetric Mobile CPUs." Proceedings of the 27th Annual International Conference on Mobile Computing and Networking (MobiCom).
Download Paper

Unified Holistic Memory Management Supporting Multiple Big Data Processing Frameworks over Hybrid Memories

Published in ACM Transactions on Computer Systems (TOCS), 2021

Recommended citation: Lei Chen, Jiacheng Zhao, Chenxi Wang, Ting Cao, John Zigman, Haris Volos, Onur Mutlu, Fang Lv, Xiaobing Feng, Guoqing Harry Xu, Huimin Cui. (2021). "Unified Holistic Memory Management Supporting Multiple Big Data Processing Frameworks over Hybrid Memories." ACM Transactions on Computer Systems (TOCS), Vol 39(1-4): pp. 1-38.
Download Paper

CoDL: Efficient CPU-GPU Co-execution for Deep Learning Inference on Mobile Devices

Published in 20th International Conference on Mobile Systems, Applications, and Services (MobiSys), 2022

Recommended citation: Fucheng Jia, Deyu Zhang, Ting Cao, Shiqi Jiang, Yunxin Liu, Ju Ren, Yaoxue Zhang. (2022). "CoDL: Efficient CPU-GPU Co-execution for Deep Learning Inference on Mobile Devices." 20th International Conference on Mobile Systems, Applications, and Services (MobiSys).
Download Paper

SwiftPruner: Reinforced Evolutionary Pruning for Efficient Ad Relevance

Published in ACM International Conference on Information and Knowledge Management (CIKM), 2022

Recommended citation: Li Lyna Zhang, Youkow Homma, Yujing Wang, Min Wu, Mao Yang, Ruofei Zhang, Ting Cao, Wei Shen. (2022). "SwiftPruner: Reinforced Evolutionary Pruning for Efficient Ad Relevance." ACM International Conference on Information and Knowledge Management (CIKM).
Download Paper

MobiDepth: Real-Time Depth Estimation Using On-Device Dual Cameras

Published in Proceedings of the 28th Annual International Conference on Mobile Computing and Networking (MobiCom), 2022

Recommended citation: Jinrui Zhang, Huan Yang, Ju Ren, Deyu Zhang, Bangwen He, Yuanchun Li, Ting Cao, Yaoxue Zhang, Yunxin Liu. (2022). "MobiDepth: Real-Time Depth Estimation Using On-Device Dual Cameras." Proceedings of the 28th Annual International Conference on Mobile Computing and Networking (MobiCom).
Download Paper

Romou: Rapidly Generate High-Performance Tensor Kernels for Mobile GPUs

Published in Proceedings of the 28th Annual International Conference on Mobile Computing and Networking (MobiCom), 2022

ArchProbe: 2022-2023 Top 100 Open Source Achievements Award

Recommended citation: Rendong Liang, Ting Cao, Jicheng Wen, Manni Wang, Yang Wang, Jianhua Zou, Yunxin Liu. (2022). "Romou: Rapidly Generate High-Performance Tensor Kernels for Mobile GPUs." Proceedings of the 28th Annual International Conference on Mobile Computing and Networking (MobiCom).
Download Paper

Turbo: Opportunistic Enhancement for Edge Video Analytics

Published in The 20th ACM Conference on Embedded Networked Sensor Systems (SenSys), 2022

Recommended citation: Yan Lu, Shiqi Jiang, Ting Cao, Yuanchao Shu. (2022). "Turbo: Opportunistic Enhancement for Edge Video Analytics." The 20th ACM Conference on Embedded Networked Sensor Systems (SenSys).
Download Paper

Efficient GPU Kernels for N:M-SPARSE Weights in Deep Learning

Published in Sixth Conference on Machine Learning and Systems (MLSys), 2023

Recommended citation: Bin Lin, Ningxin Zheng, Lei Wang, Shijie Cao, Lingxiao Ma, Quanlu Zhang, Yi Zhu, Ting Cao, Jilong Xue, Yuqing Yang, Fan Yang. (2023). "Efficient GPU Kernels for N:M-SPARSE Weights in Deep Learning." Sixth Conference on Machine Learning and Systems (MLSys).
Download Paper

Boosting DNN Cold Inference on Devices

Published in The 21st Annual International Conference on Mobile Systems, Applications and Services (MobiSys), 2023

Recommended citation: Rongjie Yi, Ting Cao, Ao Zhou, Xiao Ma, Shangguang Wang, Mengwei Xu. (2023). "Boosting DNN Cold Inference on Devices." The 21st Annual International Conference on Mobile Systems, Applications and Services (MobiSys).
Download Paper

NN-Stretch: Automatic Neural Network Branching for Parallel Inference on Heterogeneous Multi-Processors

Published in The 21st International Conference on Mobile Systems, Applications, and Services (MobiSys), 2023

Recommended citation: Jianyu Wei, Ting Cao, Shijie Cao, Shiqi Jiang, Shaowei Fu, Mao Yang, Yanyong Zhang, Yunxin Liu. (2023). "NN-Stretch: Automatic Neural Network Branching for Parallel Inference on Heterogeneous Multi-Processors." The 21st International Conference on Mobile Systems, Applications, and Services (MobiSys).
Download Paper

Accurate and Structured Pruning for Efficient Automatic Speech Recognition

Published in Conference of the International Speech Communication Association (INTERSPEECH), 2023

Recommended citation: Huiqiang Jiang, Li Lyna Zhang, Yuang Li, Yu Wu, Shijie Cao, Ting Cao, Yuqing Yang, Jinyu Li, Mao Yang, Lili Qiu. (2023). "Accurate and Structured Pruning for Efficient Automatic Speech Recognition." Conference of the International Speech Communication Association (INTERSPEECH).
Download Paper

Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference

Published in ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2023

Recommended citation: Junyan Li, Li Lyna Zhang, Jiahang Xu, Yujing Wang, Shaoguang Yan, Yunqing Xia, Yuqing Yang, Ting Cao, Hao Sun, Weiwei Deng, Qi Zhang, Mao Yang. (2023). "Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference." ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD).
Download Paper

AFPQ: Asymmetric Floating Point Quantization for LLMs

Published in 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024 Finding short paper), 2024

Recommended citation: Yijia Zhang, Sicheng Zhang, Shijie Cao, DaYou Du, Jianyu Wei, Ting Cao, Ningyi Xu. (2024). "AFPQ: Asymmetric Floating Point Quantization for LLMs." ACL.
Download Paper

Anatomizing Deep Learning Inference in Web Browsers

Published in ACM Transactions on Software Engineering and Methodology (TOSEM), 2025

Recommended citation: Qipeng Wang, Shiqi Jiang, Zhenpeng Chen, Xu Cao, Yuanchun Li, Aoyu Li, Yun Ma, Ting Cao, Xuanzhe Liu. (2024). "Anatomizing Deep Learning Inference in Web Browsers." TOSEM.
Download Paper

Bitnet.cpp: Efficient Edge Inference for Ternary LLMs

Published in Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL), 2025

Recommended citation: Jinheng Wang, Hansong Zhou, Ting Song, Shijie Cao, Yan Xia, Ting Cao, Jianyu Wei, Shuming Ma, Hongyu Wang, and Furu Wei. (2025). "Bitnet.cpp: Efficient Edge Inference for Ternary LLMs." Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL).
Download Paper

talks

teaching