NaVLA^2: A Vision-Language-Audio-Action Model for Multimodal Instruction Navigation

Published in AAAI, 2026