Abstract:
This study leverages the Semantic Segmentation of Underwater Imagery (SUIM) dataset, encompassing over 1,500
meticulously annotated images that delineate eight distinct object categories. These categories encompass a diverse array of items,
ranging from vertebrate fish and invertebrate reefs to aquatic vegetation, wreckage, human divers, robots, and the seafloor. The use of
this dataset involves a methodical synthesis of data through extensive oceanic expeditions and collaborative experiments, featuring
both human participants and robots. The research extends its scope to evaluating cutting-edge semantic segmentation techniques,
employing established metrics to gauge their performance comprehensively. Additionally, we introduce a fully convolutional encoder decoder model designed with a dual purpose: to deliver competitive performance and computational efficiency. Notably, this model
boasts a remarkable accuracy of 88%, underscoring its proficiency in underwater image segmentation. This study elucidates the model’s
practical benefits across diverse applications such as visual serving, saliency prediction, and intricate scene comprehension. Crucially,
the utilization of the Enhanced Super-Resolution Generative Adversarial Network (ESRGAN) raises image quality, enriching the
foundation upon which our model’s success rests. This research establishes a solid groundwork for future exploration in underwater
robot vision by presenting the model and the benchmark dataset.