AdaNav: Adaptive Reasoning with Uncertainty for Vision-Language Navigation
Published in International Conference on Machine Learning (ICML), 2026
Vision-Language Navigation (VLN) requires agents to follow natural language instructions by grounding them in sequential visual observations over long horizons. Explicit reasoning could enhance temporal consistency and perception-action alignment, but reasoning at fixed steps often leads to suboptimal performance and unnecessary computation. To address this, we propose AdaNav, an uncertainty-based adaptive reasoning framework for VLN. At its core is the Uncertainty-Adaptive Reasoning Block (UAR), a lightweight plugin that dynamically triggers reasoning. We introduce Action Entropy as a policy prior for UAR and progressively refine it through a Heuristics-to-RL training method, enabling agents to learn difficulty-aware reasoning policies under the strict data limitations of embodied tasks. Results show that with only 6K training samples, AdaNav achieves substantial gains over closed-source models trained on million-scale data, improving success rate by 20% on R2R val-unseen, 11.7% on RxR-CE, and 11.4% in real-world scenes.
Recommended citation: Xin Ding, Jianyu Wei, Yifan Yang, Shiqi Jiang, Qianxi Zhang, Hao Wu, Fucheng Jia, Liang Mi, Yuxuan Yan, Weijun Wang, Yunxin Liu, Zhibo Chen, Ting Cao. (2026). "AdaNav: Adaptive Reasoning with Uncertainty for Vision-Language Navigation." ICML.
Download Paper
