Design and Experimental Verification of Underwater Robot Environmental Perception and Decision-Making System Based on Multimodal Fusion
Keywords:
Underwater Robot, Multimodal Fusion, Spatiotemporal Alignment, Graph Attention Network, Reinforcement LearningAbstract
Aiming at insufficient single-sensor perception and poor autonomous decision robustness of underwater robots in complex environments (turbulence, light attenuation, acoustic interference), this paper proposes a multimodal fusion-based environmental perception and decision system integrating vision, sonar, inertial measurement unit (IMU) and water quality data. It constructs a Lie group calibration-factor graph optimization spatiotemporal alignment model to address multi-sensor temporal drift and spatial registration errors, designs a graph attention network (GAT)-based semantic association module for cross-modal structured fusion and key semantic region perception, and establishes a "strategic-tactical-reflex" three-layer reinforcement learning decision architecture to balance global planning accuracy and local response efficiency. Experiments on Select Dataset, UIED, Wanfang underwater decision test set, and verifications in indoor controlled pools and natural lakes show that in typical complex scenarios (200 NTU turbidity, 0.5 m/s water flow), the system achieves 92.3% target recognition accuracy (28.6% higher than single vision), 32.8 ms perception delay, over 90% task completion rate, and 0.87 path efficiency. This significantly enhances underwater robots’ perception and decision capabilities in complex environments, supporting marine resource exploration and underwater facility maintenance.Downloads
Published
2025-11-30
Issue
Section
Articles
License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.