Boosting DNN Cold Inference on Devices
Published in The 21st Annual International Conference on Mobile Systems, Applications and Services (MobiSys), 2023
Deep Neural Network (DNN) inference on edge devices often suffers from significant cold-start latency when models are first loaded or after periods of inactivity. This cold inference latency creates noticeable delays in user-facing applications and impacts the user experience. This paper presents a comprehensive approach to boost DNN cold inference performance on edge devices. By combining innovative memory management techniques, model prefetching strategies, and resource-aware scheduling, the proposed system significantly reduces cold-start latency across diverse edge devices and DNN architectures. Extensive evaluations on real-world mobile applications demonstrate substantial improvements in responsiveness without compromising inference accuracy or requiring model modifications.
Recommended citation: Rongjie Yi, Ting Cao, Ao Zhou, Xiao Ma, Shangguang Wang, Mengwei Xu. (2023). "Boosting DNN Cold Inference on Devices." The 21st Annual International Conference on Mobile Systems, Applications and Services (MobiSys).
Download Paper