Design and Experimental Verification of Underwater Robot Environmental Perception and Decision-Making System Based on Multimodal Fusion

Authors

  • Fanyeqi Yang Mechanical Design and Manufacture and Automation, Ulster College at Shaanxi University of Science &Technology, Xi’an, 710021, Shaanxi, China

Keywords:

Underwater Robot, Multimodal Fusion, Spatiotemporal Alignment, Graph Attention Network, Reinforcement Learning

Abstract

Aiming at insufficient single-sensor perception and poor autonomous decision robustness of underwater robots in complex environments (turbulence, light attenuation, acoustic interference), this paper proposes a multimodal fusion-based environmental perception and decision system integrating vision, sonar, inertial measurement unit (IMU) and water quality data. It constructs a Lie group calibration-factor graph optimization spatiotemporal alignment model to address multi-sensor temporal drift and spatial registration errors, designs a graph attention network (GAT)-based semantic association module for cross-modal structured fusion and key semantic region perception, and establishes a "strategic-tactical-reflex" three-layer reinforcement learning decision architecture to balance global planning accuracy and local response efficiency. Experiments on Select Dataset, UIED, Wanfang underwater decision test set, and verifications in indoor controlled pools and natural lakes show that in typical complex scenarios (200 NTU turbidity, 0.5 m/s water flow), the system achieves 92.3% target recognition accuracy (28.6% higher than single vision), 32.8 ms perception delay, over 90% task completion rate, and 0.87 path efficiency. This significantly enhances underwater robots’ perception and decision capabilities in complex environments, supporting marine resource exploration and underwater facility maintenance.

Downloads

Published

2025-11-30