NaVLA^2: A Vision-Language-Audio-Action Model for Multimodal Instruction NavigationPublished in AAAI, 2026Share on Twitter Facebook Google+ LinkedIn Previous Next