CV

Education

Ph.D in Version Control Theory, GitHub University, 2018 (expected)
M.S. in Jekyll, GitHub University, 2014
B.S. in GitHub, GitHub University, 2012

Work experience

Spring 2024: Academic Pages Collaborator
- GitHub University
- Duties includes: Updates and improvements to template
- Supervisor: The Users
Fall 2015: Research Assistant
- GitHub University
- Duties included: Merging pull requests
- Supervisor: Professor Hub
Summer 2015: Research Assistant
- GitHub University
- Duties included: Tagging issues
- Supervisor: Professor Git

Skills

Skill 1
Skill 2
- Sub-skill 2.1
- Sub-skill 2.2
- Sub-skill 2.3
Skill 3

Publications

V-Droid: Advancing Mobile GUI Agent Through Generative Verifiers

Gaole Dai, Shiqi Jiang, Ting Cao, Yuanchun Li, Yuqing Yang, Rui Tan, Mo Li, Lili Qiu. (2026). "V-Droid: Advancing Mobile GUI Agent Through Generative Verifiers." MobiCom.

Scaling LLM Test-Time Compute with Mobile NPU on Smartphones

Zixu Hao, Jianyu Wei, Tuowei Wang, Minxing Huang, Huiqiang Jiang, Shiqi Jiang, Ting Cao, Ju Ren. (2026). "Scaling LLM Test-Time Compute with Mobile NPU on Smartphones." EuroSys.

AVA: Towards Agentic Video Analytics Systems with Video Language Models

Yuxuan Yan, Shiqi Jiang, Ting Cao, Yifan Yang, Qianqian Yang, Yuanchao Shu, Qing Yang, Lili Qiu. (2026). "AVA: Towards Agentic Video Analytics Systems with Video Language Models." NSDI.

SeerAttention-R: Sparse Attention Adaptation for Long Reasoning

Yizhao Gao, Shuming Guo, Shijie Cao, Yuqing Xia, Yu Cheng, Lei Wang, Lingxiao Ma, Yutao Sun, Tianzhu Ye, Li Dong, Hayden Kwok-Hay So, Yu Hua, Ting Cao, Fan Yang, Mao Yang. (2026). "SeerAttention-R: Sparse Attention Adaptation for Long Reasoning." ICLR.

ProRe: A Proactive Reward System for GUI Agents via Reasoner–Actor Collaboration

Gaole Dai, Shiqi Jiang, Ting Cao, Yuqing Yang, Yuanchun Li, Rui Tan, Mo Li, Lili Qiu. (2026). "ProRe: A Proactive Reward System for GUI Agents via Reasoner–Actor Collaboration." ICLR.

Sculptor: Empowering LLMs with Cognitive Agency via Active Context Management

Mo Li, L.H. Xu, Qitai Tan, Long Ma, Ting Cao, Yunxin Liu, Flood Sung. (2026). "Sculptor: Empowering LLMs with Cognitive Agency via Active Context Management." ICLR.

Neuralink: Fast on-Device LLM Inference with Neuron Co-Activation Linking

Tuowei Wang, Ruwen Fan, Minxing Huang, Zixu Hao, Kun Li, Ting Cao, Youyou Lu, Yaoxue Zhang, Ju Ren. (2026). "Neuralink: Fast on-Device LLM Inference with Neuron Co-Activation Linking." ASPLOS.

BitDecoding: Unlocking Tensor Cores for Long-Context LLMs Decoding with Low-Bit KV Cache

Dayou Du, Shijie Cao, Jianyi Cheng, Ting Cao, Mao Yang. (2026). "BitDecoding: Unlocking Tensor Cores for Long-Context LLMs Decoding with Low-Bit KV Cache." HPCA.

MatXtract: Sparsity-Aware Matrix Transformation via Cascaded Compute Density EXtraction for SpMV

Luhan Wang, Kun Li*, Yifeng Chen, Haipeng Jia, Yunquan Zhang, Ting Cao, Yunxin Liu. (2026). "MatXtract: Sparsity-Aware Matrix Transformation via Cascaded Compute Density EXtraction for SpMV." TACO.

SparStencil: Retargeting Sparse Tensor Cores to Scientific Stencil Computations via Structured Sparsity Transformation

Qi Li, Kun Li, Liang Yuan, Junshi Chen, Hong An, Yunquan Zhang, Ting Cao, Mao Yang. (2025). "SparStencil: Retargeting Sparse Tensor Cores to Scientific Stencil Computations via Structured Sparsity Transformation." SC.

Matrix Is All You Need: Rearchitecting Quantum Chemistry to Scale on AI Accelerators

Haozhi Han, Kun Li, Fusong Ju, Yifeng Chen, Yunquan Zhang, Ting Cao, Mao Yang. (2025). "Matrix Is All You Need: Rearchitecting Quantum Chemistry to Scale on AI Accelerators." SC.

SeerAttention: Self-distilled Attention Gating for Efficient Long-context Prefilling

Yizhao Gao, Zhichen Zeng, DaYou Du, Shijie Cao, Peiyuan Zhou, Jiaxing Qi, Junjie Lai, Hayden So, Ting Cao, Fan Yang, Mao Yang. (2025). "SeerAttention: Self-distilled Attention Gating for Efficient Long-context Prefilling." NeurIPS.

StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition

Xin Ding, Hao Wu, Yifan Yang, Shiqi Jiang, Qianxi Zhang, Donglin Bai, Zhibo Chen, Ting Cao. (2025). "StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition." ICCV.

Jenga: Enhancing Long-Context Fine-tuning of LLMs with Contextual Token Sparsity

Tuowei Wang, Xingyu Chen, Kun Li, Ting Cao, Ju Ren, Yaoxue Zhang. (2025). "Jenga: Enhancing Long-Context Fine-tuning of LLMs with Contextual Token Sparsity." ATC.

Bitnet.cpp: Efficient Edge Inference for Ternary LLMs

Jinheng Wang, Hansong Zhou, Ting Song, Shijie Cao, Yan Xia, Ting Cao, Jianyu Wei, Shuming Ma, Hongyu Wang, and Furu Wei. (2025). "Bitnet.cpp: Efficient Edge Inference for Ternary LLMs." Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL).

LUTensor: A Software-Hardware Co-Design for LUT-Based Low-Bit LLM Inference

Zhiwen Mo, Lei Wang, Jianyu Wei, Zhiwen Zeng, Shijie Cao, Lingxiao Ma, Naifeng Jing, Ting Cao, Jilong Xue, Fan Yang, Mao Yang. (2025). "LUTensor: A Software-Hardware Co-Design for LUT-Based Low-Bit LLM Inference." ISCA.

Babel: A Scalable Pre-trained Model for Multi-Modal Sensing via Expandable Modality Alignment

Shenghong Dai, Shiqi Jiang, Yifan Yang, Ting Cao, Mo Li, S. Banerjee, Lili Qiu. (2025). "Babel: A Scalable Pre-trained Model for Multi-Modal Sensing via Expandable Modality Alignment." SenSys.

T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge

Jianyu Wei, Shijie Cao, Ting Cao, Lingxiao Ma, Lei Wang, Yanyong Zhang, Mao Yang. (2025). "T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge." EuroSys.

Efficient and Adaptive Diffusion Model Inference Through Lookup Table on Mobile Devices

Qipeng Wang, Shiqi Jiang, Yifan Yang, Ruiqi Liu, Yuanchun Li, Ting Cao, Xuanzhe Liu. (2025). "Efficient and Adaptive Diffusion Model Inference Through Lookup Table on Mobile Devices." IEEE Transactions on Mobile Computing (TMC).

Jigsaw: Toward Conflict-free Vectorized Stencil Computation by Tessellating Swizzled Registers

Yiwei Zhang, Kun Li, Liang Yuan, Yunquan Zhang, Ting Cao, Mao Yang. (2025). "Jigsaw: Toward Conflict-free Vectorized Stencil Computation by Tessellating Swizzled Registers." PPoPP.

FlashFFTStencil: Bridging Fast Fourier Transforms to Memory-Efficient Stencil Computations on Tensor Core Units

Haozhi Han, Kun Li, Wei Cui, Donglin Bai, Yifeng Chen, Ting Cao, Mao Yang. (2025). "FlashFFTStencil: Bridging Fast Fourier Transforms to Memory-Efficient Stencil Computations on Tensor Core Units." PPoPP.

LUT-DLA: Lookup Table as Efficient Extreme Low-Bit Deep Learning Accelerator

Guoyu Li, Chunyun Chen, Shengyu Ye, Yang Wang, Fan Yang, Ting Cao, Mohamed M. Sabry Aly, Cheng Liu, Mao Yang. (2025). "LUT-DLA: Lookup Table as Efficient Extreme Low-Bit Deep Learning Accelerator." HPCA.

Anatomizing Deep Learning Inference in Web Browsers

Qipeng Wang, Shiqi Jiang, Zhenpeng Chen, Xu Cao, Yuanchun Li, Aoyu Li, Yun Ma, Ting Cao, Xuanzhe Liu. (2025). "Anatomizing Deep Learning Inference in Web Browsers." TOSEM.

LoRAStencil: Low-Rank Adaptation of Stencil Computation on Tensor Cores

Yiwei Zhang, Kun Li, Liang Yuan, Jiawen Cheng, Yunquan Zhang, Ting Cao, Mao Yang. (2024). "LoRAStencil: Low-Rank Adaptation of Stencil Computation on Tensor Cores." SC’24.

Long Exposure: Accelerating Parameter-Efficient Fine-Tuning for LLMs under Shadowy Sparsity

Tuowei Wang, Kun Li, Zixu Hao, Donglin Bai, Ju Ren, Yaoxue Zhang, Ting Cao, Mao Yang. (2024). "Long Exposure: Accelerating Parameter-Efficient Fine-Tuning for LLMs under Shadowy Sparsity." SC’24.

VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models

Yifei Liu, Jicheng Wen, Yang Wang, Shengyu Ye, Li Lyna Zhang, Ting Cao, Cheng Li, Mao Yang. (2024). "VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models." EMNLP.

PruneAug: Bridging DNN Pruning and Inference Latency on Diverse Sparse Platforms Using Automatic Layerwise Block Pruning

Hanfei Geng, Yifei Liu, Yujie Zheng, Li Lyna Zhang, Jingwei Sun, Yujing Wang, Yang Wang, Guangzhong Sun, Mao Yang, Ting Cao, Yunxin Liu. (2024). "PruneAug: Bridging DNN Pruning and Inference Latency on Diverse Sparse Platforms Using Automatic Layerwise Block Pruning." TC.

BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-Distillation

DaYou Du, Yijia Zhang, Shijie Cao, Jiaqi Guo, Ting Cao, Xiaowen Chu, Ningyi Xu. (2024). "BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-Distillation." ACL.

AFPQ: Asymmetric Floating Point Quantization for LLMs

Yijia Zhang, Sicheng Zhang, Shijie Cao, DaYou Du, Jianyu Wei, Ting Cao, Ningyi Xu. (2024). "AFPQ: Asymmetric Floating Point Quantization for LLMs." ACL.

Ladder: Enabling Efficient Low-Precision Deep Learning Computing through Hardware-aware Tensor Transformation

L. Wang, L. Ma, S. Cao, Q. Zhang, J. Xue, Y. Shi, N. Zheng, Z. Miao, F. Yang, Ting Cao, Y. Yang, M. Yang. (2024). "Ladder: Enabling Efficient Low-Precision Deep Learning Computing through Hardware-aware Tensor Transformation." OSDI.

Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models

Yijia Zhang, Lingran Zhao, Shijie Cao, Wenqiang Wang, Ting Cao, Fan Yang, Mao Yang, Shanghang Zhang, Ningyi Xu. (2024). "Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models." ICME’24.

Empowering In-Browser Deep Learning Inference on Edge Through Just-In-Time Kernel Optimization

F. Jia, S. Jiang, Ting Cao, W. Cui, T. Xia, X. Cao, Y. Li, Q. Wang, D. Zhang, J. Ren, Y. Liu, L. Qiu, M. Yang. (2024). "Empowering In-Browser Deep Learning Inference on Edge Through Just-In-Time Kernel Optimization." MobiSys.

Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference

R. Hwang, J. Wei, S. Cao, C. Hwang, X. Tang, Ting Cao, M. Yang. (2024). "Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference." ISCA’24.

FlexNN: Efficient and Adaptive DNN Inference on Memory-Constrained Edge Devices

Xiangyu Li, Yuanchun Li, Yuanzhe Li, Ting Cao, Yunxin Liu. (2024). "FlexNN: Efficient and Adaptive DNN Inference on Memory-Constrained Edge Devices." MobiCom.

PIM-DL: Expanding the Applicability of Commodity DRAM-PIMs for Deep Learning via Algorithm-System Co-Optimization

Cong Li, Zhe Zhou, Yang Wang, Fan Yang, Ting Cao, Mao Yang, Yun Liang, Guangyu Sun. (2024). "PIM-DL: Expanding the Applicability of Commodity DRAM-PIMs for Deep Learning via Algorithm-System Co-Optimization." ASPLOS.

LitePred: Transferable and Scalable Latency Prediction for Hardware-Aware Neural Architecture Search

Chengquan Feng, Li Lyna Zhang, Yuanchi Liu, Jiahang Xu, Chengruidong Zhang, Zhiyuan Wang, Ting Cao, Mao Yang, Haisheng Tan. (2024). "LitePred: Transferable and Scalable Latency Prediction for Hardware-Aware Neural Architecture Search." NSDI.

ConvStencil: Transform Stencil Computation to Matrix Multiplication on Tensor Cores

Yuetao Chen, Kun Li, Yuhao Wang, Donglin Bai, Lei Wang, Lingxiao Ma, Liang Yuan, Yunquan Zhang, Ting Cao, Mao Yang. (2024). "ConvStencil: Transform Stencil Computation to Matrix Multiplication on Tensor Cores." PPoPP.

LUT-NN: Empower Efficient Neural Network Inference with Centroid Learning and Table Lookup

Xiaohu Tang, Yang Wang, Ting Cao, Li Lyna Zhang, Qi Chen, Deng Cai, Yunxin Liu, Mao Yang. (2023). "LUT-NN: Empower Efficient Neural Network Inference with Centroid Learning and Table Lookup." MobiCom.

Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference

Junyan Li, Li Lyna Zhang, Jiahang Xu, Yujing Wang, Shaoguang Yan, Yunqing Xia, Yuqing Yang, Ting Cao, Hao Sun, Weiwei Deng, Qi Zhang, Mao Yang. (2023). "Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference." ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD).

SpaceEvo: Hardware-Friendly Search Space Design for Efficient INT8 Inference

Xudong Wang, Li Lyna Zhang, Jiahang Xu, Quanlu Zhang, Yujing Wang, Yuqing Yang, Ningxin Zheng, Ting Cao, Mao Yang. (2023). "SpaceEvo: Hardware-Friendly Search Space Design for Efficient INT8 Inference." International Conference on Computer Vision (ICCV).

ElasticViT: Conflict-aware Supernet Training for Deploying Fast Vision Transformer on Diverse Mobile Devices

Chen Tang, Li Lyna Zhang, Huiqiang Jiang, Jiahang Xu, Ting Cao, Quanlu Zhang, Yuqing Yang, Zhi Wang, Mao Yang. (2023). "ElasticViT: Conflict-aware Supernet Training for Deploying Fast Vision Transformer on Diverse Mobile Devices." International Conference on Computer Vision (ICCV).

Adam Accumulation to Reduce Memory Footprints of both Activations and Gradients for Large-scale DNN Training

Yijia Zhang, Yibo Han, Shijie Cao, Guohao Dai, Youshan Miao, Ting Cao, Fan Yang, Ningyi Xu. (2023). "Adam Accumulation to Reduce Memory Footprints of both Activations and Gradients for Large-scale DNN Training." ECAI.

Accurate and Structured Pruning for Efficient Automatic Speech Recognition

Huiqiang Jiang, Li Lyna Zhang, Yuang Li, Yu Wu, Shijie Cao, Ting Cao, Yuqing Yang, Jinyu Li, Mao Yang, Lili Qiu. (2023). "Accurate and Structured Pruning for Efficient Automatic Speech Recognition." Conference of the International Speech Communication Association (INTERSPEECH).

HiMoDepth: Efficient Training-Free High-Resolution On-Device Depth Perception

Jinrui Zhang, Huan Yang, Ju Ren, Deyu Zhang, Bangwen He, Youngki Lee, Ting Cao, Yuanchun Li, Yaoxue Zhang, Yunxin Liu. (2024). "HiMoDepth: Efficient Training-Free High-Resolution On-Device Depth Perception." IEEE Transactions on Mobile Computing (TMC), 23(5), 2024.

VSPIM: SRAM Processing-in-Memory DNN Acceleration via Vector-Scalar Operations

Chen Nie, Chenyu Tang, Jie Lin, Huan Hu, Chenyang Lv, Ting Cao, Weifeng Zhang, Li Jiang, Xiaoyao Liang, Weikang Qian, Yanan Sun, Zhezhi He. (2023). "VSPIM: SRAM Processing-in-Memory DNN Acceleration via Vector-Scalar Operations." IEEE Transactions on Computers (TC).

NN-Stretch: Automatic Neural Network Branching for Parallel Inference on Heterogeneous Multi-Processors

Jianyu Wei, Ting Cao, Shijie Cao, Shiqi Jiang, Shaowei Fu, Mao Yang, Yanyong Zhang, Yunxin Liu. (2023). "NN-Stretch: Automatic Neural Network Branching for Parallel Inference on Heterogeneous Multi-Processors." The 21st International Conference on Mobile Systems, Applications, and Services (MobiSys).

Boosting DNN Cold Inference on Devices

Rongjie Yi, Ting Cao, Ao Zhou, Xiao Ma, Shangguang Wang, Mengwei Xu. (2023). "Boosting DNN Cold Inference on Devices." The 21st Annual International Conference on Mobile Systems, Applications and Services (MobiSys).

Efficient GPU Kernels for N:M-SPARSE Weights in Deep Learning

Bin Lin, Ningxin Zheng, Lei Wang, Shijie Cao, Lingxiao Ma, Quanlu Zhang, Yi Zhu, Ting Cao, Jilong Xue, Yuqing Yang, Fan Yang. (2023). "Efficient GPU Kernels for N:M-SPARSE Weights in Deep Learning." Sixth Conference on Machine Learning and Systems (MLSys).

Turbo: Opportunistic Enhancement for Edge Video Analytics

Yan Lu, Shiqi Jiang, Ting Cao, Yuanchao Shu. (2022). "Turbo: Opportunistic Enhancement for Edge Video Analytics." The 20th ACM Conference on Embedded Networked Sensor Systems (SenSys).

Hyperion: A Generic and Distributed Mobile Offloading Framework on OpenCL

Ziyan Fu, Ju Ren, Yunxin Liu, Ting Cao, Deyu Zhang, Yuezhi Zhou, Yaoxue Zhang. (2022). "Hyperion: A Generic and Distributed Mobile Offloading Framework on OpenCL." The 20th ACM Conference on Embedded Networked Sensor Systems (SenSys).

Romou: Rapidly Generate High-Performance Tensor Kernels for Mobile GPUs

Rendong Liang, Ting Cao, Jicheng Wen, Manni Wang, Yang Wang, Jianhua Zou, Yunxin Liu. (2022). "Romou: Rapidly Generate High-Performance Tensor Kernels for Mobile GPUs." Proceedings of the 28th Annual International Conference on Mobile Computing and Networking (MobiCom).

MobiDepth: Real-Time Depth Estimation Using On-Device Dual Cameras

Jinrui Zhang, Huan Yang, Ju Ren, Deyu Zhang, Bangwen He, Yuanchun Li, Ting Cao, Yaoxue Zhang, Yunxin Liu. (2022). "MobiDepth: Real-Time Depth Estimation Using On-Device Dual Cameras." Proceedings of the 28th Annual International Conference on Mobile Computing and Networking (MobiCom).

SwiftPruner: Reinforced Evolutionary Pruning for Efficient Ad Relevance

Li Lyna Zhang, Youkow Homma, Yujing Wang, Min Wu, Mao Yang, Ruofei Zhang, Ting Cao, Wei Shen. (2022). "SwiftPruner: Reinforced Evolutionary Pruning for Efficient Ad Relevance." ACM International Conference on Information and Knowledge Management (CIKM).

CoDL: Efficient CPU-GPU Co-execution for Deep Learning Inference on Mobile Devices

Fucheng Jia, Deyu Zhang, Ting Cao, Shiqi Jiang, Yunxin Liu, Ju Ren, Yaoxue Zhang. (2022). "CoDL: Efficient CPU-GPU Co-execution for Deep Learning Inference on Mobile Devices." 20th International Conference on Mobile Systems, Applications, and Services (MobiSys).

nn-Meter: towards accurate latency prediction of DNN inference on diverse edge devices

L. Zhang, S. Han, J. Wei, N. Zheng, T. Cao, Y. Yang, Y. Liu. (2021). "nn-Meter: towards accurate latency prediction of DNN inference on diverse edge devices." GetMobile: Mobile Computing and Communications, Research Highlights, 25(4): pp. 19-23.

Unified Holistic Memory Management Supporting Multiple Big Data Processing Frameworks over Hybrid Memories

Lei Chen, Jiacheng Zhao, Chenxi Wang, Ting Cao, John Zigman, Haris Volos, Onur Mutlu, Fang Lv, Xiaobing Feng, Guoqing Harry Xu, Huimin Cui. (2021). "Unified Holistic Memory Management Supporting Multiple Big Data Processing Frameworks over Hybrid Memories." ACM Transactions on Computer Systems (TOCS), Vol 39(1-4): pp. 1-38.

AsyMo: Scalable and Efficient Deep-Learning Inference on Asymmetric Mobile CPUs

Manni Wang, Shaohua Ding, Ting Cao, Yunxin Liu, Fengyuan Xu. (2021). "AsyMo: Scalable and Efficient Deep-Learning Inference on Asymmetric Mobile CPUs." Proceedings of the 27th Annual International Conference on Mobile Computing and Networking (MobiCom).

nn-Meter: towards accurate latency prediction of deep-learning model inference on diverse edge devices

L. Zhang, S. Han, J. Wei, N. Zheng, T. Cao, Y. Yang, Y. Liu. (2021). "nn-Meter: towards accurate latency prediction of deep-learning model inference on diverse edge devices." 19th International Conference on Mobile Systems, Applications, and Services (MobiSys).

To Bridge Neural Network Design and Real-World Performance: A Behaviour Study for Neural Networks

X. Tang, S. Han, L. Zhang, T. Cao, Y. Liu. (2021). "To Bridge Neural Network Design and Real-World Performance: A Behaviour Study for Neural Networks." Conference on Machine Learning and Systems (MLSys).

Profiling and optimizing deep learning inference on mobile GPUs

S. Jiang, L. Ran, T. Cao, Y. Xu, Y. Liu. (2020). "Profiling and optimizing deep learning inference on mobile GPUs." Proceedings of the 11th ACM SIGOPS Asia-Pacific Workshop on Systems (APSys).

Panthera: Holistic Memory Management for Big Data Processing over Hybrid Memories

C. Wang, H. Cui, T. Cao, J. Zigman, H. Volos, O. Mutlu, F. Lv, X. Feng, and H. Xu. (2019). "Panthera: Holistic Memory Management for Big Data Processing over Hybrid Memories." ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI).

Dr. Ting Cao

CV

Education

Work experience

Skills

Publications

Talks

Teaching

Service and leadership