Posts by Collection
portfolio
preprints
Hybrid SLM and LLM for Edge-Cloud Collaborative Inference
Published:
Recommended citation: Zixu Hao, Huiqiang Jiang, Shiqi Jiang, Ju Ren, Ting Cao. (2024). "Hybrid SLM and LLM for Edge-Cloud Collaborative Inference." EdgeFM’24 Workshop (Colocated with MobiCom’24).
Download Paper
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
Published:
Recommended citation: Yizhao Gao, Zhichen Zeng, Dayou Du, Shijie Cao, Peiyuan Zhou, Jiaxing Qi, Junjie Lai, Hayden Kwok-Hay So, Ting Cao, Fan Yang, Mao Yang. (2024). "SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs." arXiv.
Download Paper
Bulk Bitwise Accumulation in Commercial DRAM
Published:
Recommended citation: Tatsuya Kubo, Masayuki Usui, Tomoya Nagatani, Daichi Tokuda, Lei Qu, Ting Cao, Shinya Takamaeda-Yamazaki. (2024). "Bulk Bitwise Accumulation in Commercial DRAM." NeurIPS 2024 Workshop Machine Learning with new Compute Paradigms.
Download Paper
PUDTune: Multi-Level Charging for High-Precision Calibration in Processing-Using-DRAM
Published:
Recommended citation: Tatsuya Kubo, Daichi Tokuda, Lei Qu, Ting Cao, Shinya Takamaeda-Yamazaki. (2025). "PUDTune: Multi-Level Charging for High-Precision Calibration in Processing-Using-DRAM." IEEE Computer Architecture Letters.
Download Paper
BitDecoding: Unlocking Tensor Cores for Long-Context LLMs Decoding with Low-Bit KV Cache
Published:
Recommended citation: Dayou Du, Shijie Cao, Jianyi Cheng, Ting Cao, Mao Yang. (2025). "BitDecoding: Unlocking Tensor Cores for Long-Context LLMs Decoding with Low-Bit KV Cache." arXiv.
Download Paper
Advancing Mobile GUI Agents: A Verifier-Driven Approach to Practical Deployment
Published:
Recommended citation: Gaole Dai, Shiqi Jiang, Ting Cao, Yuanchun Li, Yuqing Yang, Rui Tan, Mo Li, Lili Qiu. (2025). "Advancing Mobile GUI Agents: A Verifier-Driven Approach to Practical Deployment." arXiv.
Download Paper
MVDRAM: Enabling GeMV Execution in Unmodified DRAM for Low-Bit LLM Acceleration
Published:
Recommended citation: Tatsuya Kubo, Daichi Tokuda, Tomoya Nagatani, Masayuki Usui, Lei Qu, Ting Cao, Shinya Takamaeda-Yamazaki. (2025). "MVDRAM: Enabling GeMV Execution in Unmodified DRAM for Low-Bit LLM Acceleration." arXiv.
Download Paper
Scaling Up On-Device LLMs via Active-Weight Swapping Between DRAM and Flash
Published:
Recommended citation: Fucheng Jia, Zewen Wu, Shiqi Jiang, Huiqiang Jiang, Qianxi Zhang, Yuqing Yang, Yunxin Liu, Ju Ren, Deyu Zhang, Ting Cao. (2025). "Scaling Up On-Device LLMs via Active-Weight Swapping Between DRAM and Flash." arXiv.
Download Paper
SwarmThinkers: Learning Physically Consistent Atomic KMC Transitions at Scale
Published:
Recommended citation: Qi Li, Kun Li, Haozhi Han, Honghui Shang, Xinfu He, Yunquan Zhang, Hong An, Ting Cao, Mao Yang. (2025). "SwarmThinkers: Learning Physically Consistent Atomic KMC Transitions at Scale." arXiv.
Download Paper
SeerAttention-R: Sparse Attention Adaptation for Long Reasoning
Published:
Recommended citation: Yizhao Gao, Shuming Guo, Shijie Cao, Yuqing Xia, Yu Cheng, Lei Wang, Lingxiao Ma, Yutao Sun, Tianzhu Ye, Li Dong, Hayden Kwok-Hay So, Yu Hua, Ting Cao, Fan Yang, Mao Yang. (2025). "SeerAttention-R: Sparse Attention Adaptation for Long Reasoning." arXiv.
Download Paper
publications
Panthera: Holistic Memory Management for Big Data Processing over Hybrid Memories
Published in ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2019
Recommended citation: C. Wang, H. Cui, T. Cao, J. Zigman, H. Volos, O. Mutlu, F. Lv, X. Feng, and H. Xu. (2019). "Panthera: Holistic Memory Management for Big Data Processing over Hybrid Memories." ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI).
Download Paper
Profiling and optimizing deep learning inference on mobile GPUs
Published in Proceedings of the 11th ACM SIGOPS Asia-Pacific Workshop on Systems (APSys), 2020
Recommended citation: S. Jiang, L. Ran, T. Cao, Y. Xu, Y. Liu. (2020). "Profiling and optimizing deep learning inference on mobile GPUs." Proceedings of the 11th ACM SIGOPS Asia-Pacific Workshop on Systems (APSys).
Download Paper
To Bridge Neural Network Design and Real-World Performance: A Behaviour Study for Neural Networks
Published in Proceedings of Machine Learning and Systems (MLSys), 2021
Recommended citation: X. Tang, S. Han, L. Zhang, T. Cao, Y. Liu. (2021). "To Bridge Neural Network Design and Real-World Performance: A Behaviour Study for Neural Networks." Conference on Machine Learning and Systems (MLSys).
Download Paper
nn-Meter: towards accurate latency prediction of deep-learning model inference on diverse edge devices
Published in 19th International Conference on Mobile Systems, Applications, and Services (MobiSys), 2021
MobiSys 2021 Best Paper Award
Recommended citation: L. Zhang, S. Han, J. Wei, N. Zheng, T. Cao, Y. Yang, Y. Liu. (2021). "nn-Meter: towards accurate latency prediction of deep-learning model inference on diverse edge devices." 19th International Conference on Mobile Systems, Applications, and Services (MobiSys).
Download Paper
AsyMo: Scalable and Efficient Deep-Learning Inference on Asymmetric Mobile CPUs
Published in Proceedings of the 27th Annual International Conference on Mobile Computing and Networking (MobiCom), 2021
Recommended citation: Manni Wang, Shaohua Ding, Ting Cao, Yunxin Liu, Fengyuan Xu. (2021). "AsyMo: Scalable and Efficient Deep-Learning Inference on Asymmetric Mobile CPUs." Proceedings of the 27th Annual International Conference on Mobile Computing and Networking (MobiCom).
Download Paper
Unified Holistic Memory Management Supporting Multiple Big Data Processing Frameworks over Hybrid Memories
Published in ACM Transactions on Computer Systems (TOCS), 2021
Recommended citation: Lei Chen, Jiacheng Zhao, Chenxi Wang, Ting Cao, John Zigman, Haris Volos, Onur Mutlu, Fang Lv, Xiaobing Feng, Guoqing Harry Xu, Huimin Cui. (2021). "Unified Holistic Memory Management Supporting Multiple Big Data Processing Frameworks over Hybrid Memories." ACM Transactions on Computer Systems (TOCS), Vol 39(1-4): pp. 1-38.
Download Paper
nn-Meter: towards accurate latency prediction of DNN inference on diverse edge devices
Published in GetMobile: Mobile Computing and Communications, Research Highlights, 2021
ACM SigMobile Research Highlight
Recommended citation: L. Zhang, S. Han, J. Wei, N. Zheng, T. Cao, Y. Yang, Y. Liu. (2021). "nn-Meter: towards accurate latency prediction of DNN inference on diverse edge devices." GetMobile: Mobile Computing and Communications, Research Highlights, 25(4): pp. 19-23.
Download Paper
CoDL: Efficient CPU-GPU Co-execution for Deep Learning Inference on Mobile Devices
Published in 20th International Conference on Mobile Systems, Applications, and Services (MobiSys), 2022
Recommended citation: Fucheng Jia, Deyu Zhang, Ting Cao, Shiqi Jiang, Yunxin Liu, Ju Ren, Yaoxue Zhang. (2022). "CoDL: Efficient CPU-GPU Co-execution for Deep Learning Inference on Mobile Devices." 20th International Conference on Mobile Systems, Applications, and Services (MobiSys).
Download Paper
SwiftPruner: Reinforced Evolutionary Pruning for Efficient Ad Relevance
Published in ACM International Conference on Information and Knowledge Management (CIKM), 2022
Recommended citation: Li Lyna Zhang, Youkow Homma, Yujing Wang, Min Wu, Mao Yang, Ruofei Zhang, Ting Cao, Wei Shen. (2022). "SwiftPruner: Reinforced Evolutionary Pruning for Efficient Ad Relevance." ACM International Conference on Information and Knowledge Management (CIKM).
Download Paper
MobiDepth: Real-Time Depth Estimation Using On-Device Dual Cameras
Published in Proceedings of the 28th Annual International Conference on Mobile Computing and Networking (MobiCom), 2022
Recommended citation: Jinrui Zhang, Huan Yang, Ju Ren, Deyu Zhang, Bangwen He, Yuanchun Li, Ting Cao, Yaoxue Zhang, Yunxin Liu. (2022). "MobiDepth: Real-Time Depth Estimation Using On-Device Dual Cameras." Proceedings of the 28th Annual International Conference on Mobile Computing and Networking (MobiCom).
Download Paper
Romou: Rapidly Generate High-Performance Tensor Kernels for Mobile GPUs
Published in Proceedings of the 28th Annual International Conference on Mobile Computing and Networking (MobiCom), 2022
ArchProbe: 2022-2023 Top 100 Open Source Achievements Award
Recommended citation: Rendong Liang, Ting Cao, Jicheng Wen, Manni Wang, Yang Wang, Jianhua Zou, Yunxin Liu. (2022). "Romou: Rapidly Generate High-Performance Tensor Kernels for Mobile GPUs." Proceedings of the 28th Annual International Conference on Mobile Computing and Networking (MobiCom).
Download Paper
Hyperion: A Generic and Distributed Mobile Offloading Framework on OpenCL
Published in The 20th ACM Conference on Embedded Networked Sensor Systems (SenSys), 2022
Recommended citation: Ziyan Fu, Ju Ren, Yunxin Liu, Ting Cao, Deyu Zhang, Yuezhi Zhou, Yaoxue Zhang. (2022). "Hyperion: A Generic and Distributed Mobile Offloading Framework on OpenCL." The 20th ACM Conference on Embedded Networked Sensor Systems (SenSys).
Download Paper
Turbo: Opportunistic Enhancement for Edge Video Analytics
Published in The 20th ACM Conference on Embedded Networked Sensor Systems (SenSys), 2022
Recommended citation: Yan Lu, Shiqi Jiang, Ting Cao, Yuanchao Shu. (2022). "Turbo: Opportunistic Enhancement for Edge Video Analytics." The 20th ACM Conference on Embedded Networked Sensor Systems (SenSys).
Download Paper
Efficient GPU Kernels for N:M-SPARSE Weights in Deep Learning
Published in Sixth Conference on Machine Learning and Systems (MLSys), 2023
Recommended citation: Bin Lin, Ningxin Zheng, Lei Wang, Shijie Cao, Lingxiao Ma, Quanlu Zhang, Yi Zhu, Ting Cao, Jilong Xue, Yuqing Yang, Fan Yang. (2023). "Efficient GPU Kernels for N:M-SPARSE Weights in Deep Learning." Sixth Conference on Machine Learning and Systems (MLSys).
Download Paper
Boosting DNN Cold Inference on Devices
Published in The 21st Annual International Conference on Mobile Systems, Applications and Services (MobiSys), 2023
Recommended citation: Rongjie Yi, Ting Cao, Ao Zhou, Xiao Ma, Shangguang Wang, Mengwei Xu. (2023). "Boosting DNN Cold Inference on Devices." The 21st Annual International Conference on Mobile Systems, Applications and Services (MobiSys).
Download Paper
NN-Stretch: Automatic Neural Network Branching for Parallel Inference on Heterogeneous Multi-Processors
Published in The 21st International Conference on Mobile Systems, Applications, and Services (MobiSys), 2023
Recommended citation: Jianyu Wei, Ting Cao, Shijie Cao, Shiqi Jiang, Shaowei Fu, Mao Yang, Yanyong Zhang, Yunxin Liu. (2023). "NN-Stretch: Automatic Neural Network Branching for Parallel Inference on Heterogeneous Multi-Processors." The 21st International Conference on Mobile Systems, Applications, and Services (MobiSys).
Download Paper
VSPIM: SRAM Processing-in-Memory DNN Acceleration via Vector-Scalar Operations
Published in IEEE Transactions on Computers (TC), 2023
Recommended citation: Chen Nie, Chenyu Tang, Jie Lin, Huan Hu, Chenyang Lv, Ting Cao, Weifeng Zhang, Li Jiang, Xiaoyao Liang, Weikang Qian, Yanan Sun, Zhezhi He. (2023). "VSPIM: SRAM Processing-in-Memory DNN Acceleration via Vector-Scalar Operations." IEEE Transactions on Computers (TC).
Download Paper
HiMoDepth: Efficient Training-Free High-Resolution On-Device Depth Perception
Published in IEEE Transactions on Mobile Computing (TMC), 2023
Recommended citation: Jinrui Zhang, Huan Yang, Ju Ren, Deyu Zhang, Bangwen He, Youngki Lee, Ting Cao, Yuanchun Li, Yaoxue Zhang, Yunxin Liu. (2024). "HiMoDepth: Efficient Training-Free High-Resolution On-Device Depth Perception." IEEE Transactions on Mobile Computing (TMC), 23(5), 2024.
Download Paper
Accurate and Structured Pruning for Efficient Automatic Speech Recognition
Published in Conference of the International Speech Communication Association (INTERSPEECH), 2023
Recommended citation: Huiqiang Jiang, Li Lyna Zhang, Yuang Li, Yu Wu, Shijie Cao, Ting Cao, Yuqing Yang, Jinyu Li, Mao Yang, Lili Qiu. (2023). "Accurate and Structured Pruning for Efficient Automatic Speech Recognition." Conference of the International Speech Communication Association (INTERSPEECH).
Download Paper
Adam Accumulation to Reduce Memory Footprints of both Activations and Gradients for Large-scale DNN Training
Published in 26th European Conference on Artificial Intelligence (ECAI), 2023
Recommended citation: Yijia Zhang, Yibo Han, Shijie Cao, Guohao Dai, Youshan Miao, Ting Cao, Fan Yang, Ningyi Xu. (2023). "Adam Accumulation to Reduce Memory Footprints of both Activations and Gradients for Large-scale DNN Training." ECAI.
Download Paper
ElasticViT: Conflict-aware Supernet Training for Deploying Fast Vision Transformer on Diverse Mobile Devices
Published in International Conference on Computer Vision (ICCV), 2023
Recommended citation: Chen Tang, Li Lyna Zhang, Huiqiang Jiang, Jiahang Xu, Ting Cao, Quanlu Zhang, Yuqing Yang, Zhi Wang, Mao Yang. (2023). "ElasticViT: Conflict-aware Supernet Training for Deploying Fast Vision Transformer on Diverse Mobile Devices." International Conference on Computer Vision (ICCV).
Download Paper
SpaceEvo: Hardware-Friendly Search Space Design for Efficient INT8 Inference
Published in International Conference on Computer Vision (ICCV), 2023
Recommended citation: Xudong Wang, Li Lyna Zhang, Jiahang Xu, Quanlu Zhang, Yujing Wang, Yuqing Yang, Ningxin Zheng, Ting Cao, Mao Yang. (2023). "SpaceEvo: Hardware-Friendly Search Space Design for Efficient INT8 Inference." International Conference on Computer Vision (ICCV).
Download Paper
Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference
Published in ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2023
Recommended citation: Junyan Li, Li Lyna Zhang, Jiahang Xu, Yujing Wang, Shaoguang Yan, Yunqing Xia, Yuqing Yang, Ting Cao, Hao Sun, Weiwei Deng, Qi Zhang, Mao Yang. (2023). "Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference." ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD).
Download Paper
LUT-NN: Empower Efficient Neural Network Inference with Centroid Learning and Table Lookup
Published in The 29th Annual International Conference On Mobile Computing And Networking (MobiCom), 2023
Recommended citation: Xiaohu Tang, Yang Wang, Ting Cao, Li Lyna Zhang, Qi Chen, Deng Cai, Yunxin Liu, Mao Yang. (2023). "LUT-NN: Empower Efficient Neural Network Inference with Centroid Learning and Table Lookup." MobiCom.
Download Paper
ConvStencil: Transform Stencil Computation to Matrix Multiplication on Tensor Cores
Published in ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP), 2024
PPoPP 2024 Best Paper Award
Recommended citation: Yuetao Chen, Kun Li, Yuhao Wang, Donglin Bai, Lei Wang, Lingxiao Ma, Liang Yuan, Yunquan Zhang, Ting Cao, Mao Yang. (2024). "ConvStencil: Transform Stencil Computation to Matrix Multiplication on Tensor Cores." PPoPP.
Download Paper
LitePred: Transferable and Scalable Latency Prediction for Hardware-Aware Neural Architecture Search
Published in USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2024
Recommended citation: Chengquan Feng, Li Lyna Zhang, Yuanchi Liu, Jiahang Xu, Chengruidong Zhang, Zhiyuan Wang, Ting Cao, Mao Yang, Haisheng Tan. (2024). "LitePred: Transferable and Scalable Latency Prediction for Hardware-Aware Neural Architecture Search." NSDI.
Download Paper
PIM-DL: Expanding the Applicability of Commodity DRAM-PIMs for Deep Learning via Algorithm-System Co-Optimization
Published in ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2024
Recommended citation: Cong Li, Zhe Zhou, Yang Wang, Fan Yang, Ting Cao, Mao Yang, Yun Liang, Guangyu Sun. (2024). "PIM-DL: Expanding the Applicability of Commodity DRAM-PIMs for Deep Learning via Algorithm-System Co-Optimization." ASPLOS.
Download Paper
FlexNN: Efficient and Adaptive DNN Inference on Memory-Constrained Edge Devices
Published in The 30th Annual International Conference On Mobile Computing And Networking (MobiCom), 2024
Recommended citation: Xiangyu Li, Yuanchun Li, Yuanzhe Li, Ting Cao, Yunxin Liu. (2024). "FlexNN: Efficient and Adaptive DNN Inference on Memory-Constrained Edge Devices." MobiCom.
Download Paper
Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference
Published in The 51st Annual International Symposium on Computer Architecture 2024 (ISCA’24), 2024
Recommended citation: R. Hwang, J. Wei, S. Cao, C. Hwang, X. Tang, Ting Cao, M. Yang. (2024). "Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference." ISCA’24.
Download Paper
Empowering In-Browser Deep Learning Inference on Edge Through Just-In-Time Kernel Optimization
Published in The 22nd Annual International Conference on Mobile Systems, Applications and Services (MobiSys), 2024
Recommended citation: F. Jia, S. Jiang, Ting Cao, W. Cui, T. Xia, X. Cao, Y. Li, Q. Wang, D. Zhang, J. Ren, Y. Liu, L. Qiu, M. Yang. (2024). "Empowering In-Browser Deep Learning Inference on Edge Through Just-In-Time Kernel Optimization." MobiSys.
Download Paper
Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models
Published in IEEE International Conference on Multimedia and Expo (ICME’24), 2024
Recommended citation: Yijia Zhang, Lingran Zhao, Shijie Cao, Wenqiang Wang, Ting Cao, Fan Yang, Mao Yang, Shanghang Zhang, Ningyi Xu. (2024). "Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models." ICME’24.
Download Paper
Ladder: Enabling Efficient Low-Precision Deep Learning Computing through Hardware-aware Tensor Transformation
Published in The 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2024
Recommended citation: L. Wang, L. Ma, S. Cao, Q. Zhang, J. Xue, Y. Shi, N. Zheng, Z. Miao, F. Yang, Ting Cao, Y. Yang, M. Yang. (2024). "Ladder: Enabling Efficient Low-Precision Deep Learning Computing through Hardware-aware Tensor Transformation." OSDI.
Download Paper
AFPQ: Asymmetric Floating Point Quantization for LLMs
Published in 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024 Finding short paper), 2024
Recommended citation: Yijia Zhang, Sicheng Zhang, Shijie Cao, DaYou Du, Jianyu Wei, Ting Cao, Ningyi Xu. (2024). "AFPQ: Asymmetric Floating Point Quantization for LLMs." ACL.
Download Paper
BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-Distillation
Published in 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024 Main Conference, Long paper), 2024
Recommended citation: DaYou Du, Yijia Zhang, Shijie Cao, Jiaqi Guo, Ting Cao, Xiaowen Chu, Ningyi Xu. (2024). "BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-Distillation." ACL.
Download Paper
PruneAug: Bridging DNN Pruning and Inference Latency on Diverse Sparse Platforms Using Automatic Layerwise Block Pruning
Published in IEEE Transactions on Computers (TC), 2024
Recommended citation: Hanfei Geng, Yifei Liu, Yujie Zheng, Li Lyna Zhang, Jingwei Sun, Yujing Wang, Yang Wang, Guangzhong Sun, Mao Yang, Ting Cao, Yunxin Liu. (2024). "PruneAug: Bridging DNN Pruning and Inference Latency on Diverse Sparse Platforms Using Automatic Layerwise Block Pruning." TC.
Download Paper
VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models
Published in The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Recommended citation: Yifei Liu, Jicheng Wen, Yang Wang, Shengyu Ye, Li Lyna Zhang, Ting Cao, Cheng Li, Mao Yang. (2024). "VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models." EMNLP.
Download Paper
Long Exposure: Accelerating Parameter-Efficient Fine-Tuning for LLMs under Shadowy Sparsity
Published in International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’24), 2024
Recommended citation: Tuowei Wang, Kun Li, Zixu Hao, Donglin Bai, Ju Ren, Yaoxue Zhang, Ting Cao, Mao Yang. (2024). "Long Exposure: Accelerating Parameter-Efficient Fine-Tuning for LLMs under Shadowy Sparsity." SC’24.
Download Paper
LoRAStencil: Low-Rank Adaptation of Stencil Computation on Tensor Cores
Published in International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’24), 2024
Recommended citation: Yiwei Zhang, Kun Li, Liang Yuan, Jiawen Cheng, Yunquan Zhang, Ting Cao, Mao Yang. (2024). "LoRAStencil: Low-Rank Adaptation of Stencil Computation on Tensor Cores." SC’24.
Download Paper
Anatomizing Deep Learning Inference in Web Browsers
Published in ACM Transactions on Software Engineering and Methodology (TOSEM), 2025
Recommended citation: Qipeng Wang, Shiqi Jiang, Zhenpeng Chen, Xu Cao, Yuanchun Li, Aoyu Li, Yun Ma, Ting Cao, Xuanzhe Liu. (2024). "Anatomizing Deep Learning Inference in Web Browsers." TOSEM.
Download Paper
LUT-DLA: Lookup Table as Efficient Extreme Low-Bit Deep Learning Accelerator
Published in 31st IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2025
Recommended citation: Guoyu Li, Chunyun Chen, Shengyu Ye, Yang Wang, Fan Yang, Ting Cao, Mohamed M. Sabry Aly, Cheng Liu, Mao Yang. (2025). "LUT-DLA: Lookup Table as Efficient Extreme Low-Bit Deep Learning Accelerator." HPCA.
Download Paper
FlashFFTStencil: Bridging Fast Fourier Transforms to Memory-Efficient Stencil Computations on Tensor Core Units
Published in 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP), 2025
Recommended citation: Haozhi Han, Kun Li, Wei Cui, Donglin Bai, Yifeng Chen, Ting Cao, Mao Yang. (2025). "FlashFFTStencil: Bridging Fast Fourier Transforms to Memory-Efficient Stencil Computations on Tensor Core Units." PPoPP.
Download Paper
Jigsaw: Toward Conflict-free Vectorized Stencil Computation by Tessellating Swizzled Registers
Published in 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP), 2025
Recommended citation: Yiwei Zhang, Kun Li, Liang Yuan, Yunquan Zhang, Ting Cao, Mao Yang. (2025). "Jigsaw: Toward Conflict-free Vectorized Stencil Computation by Tessellating Swizzled Registers." PPoPP.
Download Paper
Efficient and Adaptive Diffusion Model Inference Through Lookup Table on Mobile Devices
Published in IEEE Transactions on Mobile Computing (TMC), 2025
Recommended citation: Qipeng Wang, Shiqi Jiang, Yifan Yang, Ruiqi Liu, Yuanchun Li, Ting Cao, Xuanzhe Liu. (2025). "Efficient and Adaptive Diffusion Model Inference Through Lookup Table on Mobile Devices." IEEE Transactions on Mobile Computing (TMC).
Download Paper
T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge
Published in The 2025 ACM European Conference on Computer Systems (EuroSys), 2025
Recommended citation: Jianyu Wei, Shijie Cao, Ting Cao, Lingxiao Ma, Lei Wang, Yanyong Zhang, Mao Yang. (2025). "T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge." EuroSys.
Download Paper
Babel: A Scalable Pre-trained Model for Multi-Modal Sensing via Expandable Modality Alignment
Published in The 23rd ACM Conference on Embedded Networked Sensor Systems (SenSys), 2025
Recommended citation: Shenghong Dai, Shiqi Jiang, Yifan Yang, Ting Cao, Mo Li, S. Banerjee, Lili Qiu. (2025). "Babel: A Scalable Pre-trained Model for Multi-Modal Sensing via Expandable Modality Alignment." SenSys.
Download Paper
LUTensor: A Software-Hardware Co-Design for LUT-Based Low-Bit LLM Inference
Published in The 52nd Annual International Symposium on Computer Architecture (ISCA), 2025
Recommended citation: Zhiwen Mo, Lei Wang, Jianyu Wei, Zhiwen Zeng, Shijie Cao, Lingxiao Ma, Naifeng Jing, Ting Cao, Jilong Xue, Fan Yang, Mao Yang. (2025). "LUTensor: A Software-Hardware Co-Design for LUT-Based Low-Bit LLM Inference." ISCA.
Download Paper
Bitnet.cpp: Efficient Edge Inference for Ternary LLMs
Published in Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Recommended citation: Jinheng Wang, Hansong Zhou, Ting Song, Shijie Cao, Yan Xia, Ting Cao, Jianyu Wei, Shuming Ma, Hongyu Wang, and Furu Wei. (2025). "Bitnet.cpp: Efficient Edge Inference for Ternary LLMs." Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL).
Download Paper
Jenga: Enhancing Long-Context Fine-tuning of LLMs with Contextual Token Sparsity
Published in USENIX Annual Technical Conference (ATC'25), 2025
Recommended citation: Tuowei Wang, Xingyu Chen, Kun Li, Ting Cao, Ju Ren, Yaoxue Zhang. (2025). "Jenga: Enhancing Long-Context Fine-tuning of LLMs with Contextual Token Sparsity." ATC.
Download Paper
StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition
Published in International Conference on Computer Vision (ICCV'25), 2025
Recommended citation: Xin Ding, Hao Wu, Yifan Yang, Shiqi Jiang, Qianxi Zhang, Donglin Bai, Zhibo Chen, Ting Cao. (2025). "StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition." ICCV.
Download Paper
Neuralink: Fast on-Device LLM Inference with Neuron Co-Activation Linking
Published in ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2026
Recommended citation: Tuowei Wang, Ruwen Fan, Minxing Huang, Zixu Hao, Kun Li, Ting Cao, Youyou Lu, Yaoxue Zhang, Ju Ren. (2026). "Neuralink: Fast on-Device LLM Inference with Neuron Co-Activation Linking." ASPLOS.
Download Paper
AVA: Towards Agentic Video Analytics Systems with Video Language Models
Published in USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2026
Recommended citation: Yuxuan Yan, Shiqi Jiang, Ting Cao, Yifan Yang, Qianqian Yang, Yuanchao Shu, Qing Yang, Lili Qiu. (2026). "AVA: Towards Agentic Video Analytics Systems with Video Language Models." NSDI.
Download Paper
talks
Talk 1 on Relevant Topic in Your Field
Published:
teaching
Teaching experience 1
Undergraduate course, University 1, Department, 2014
Teaching experience 2
Workshop, University 1, Department, 2015