WEB OF SCIENCE
SCOPUS
The rapid advancement of robotics and deep learning has increasingly accelerated the use of Embodied AI, where robots autonomously explore and reason in complex real-world environments. With the growing demand for domestic service robots, efficient navigation in unfamiliar settings has become even more crucial. Object Goal Navigation (OGN) is a fundamental task for this capability, requiring a robot to find and reach a user-specified object in an unknown environment. Solving OGN demands advanced perception, contextual reasoning, and effective exploration strategies. Recent Vision-Language Models (VLMs) and Large Language Models (LLMs) provide agents with external common knowledge and reasoning capabilities. This paper poses the critical question: “Where should VLM/LLM knowledge be fused into Object Goal Navigation?” We categorize knowledge integration into the three stages adapted from the Perception-Prediction-Planning paradigm to offer a structured survey of Object Goal Navigation approaches shaped by the VLM era. We conclude by discussing current dataset limitations and future directions, including further studies on socially interactive navigation and operation in mixed indoor - outdoor environments.
더보기