Abstract:
In the fields of computer vision and robotics, 6D pose estimation is an important task. Given that existing methods for 6D pose estimation based on RGB-D images struggle to fully utilize feature information, this paper proposes an improved 6D pose estimation algorithm. The algorithm leverages the advantages of the YOLOv8n-seg and ResNet-UNet frameworks to effectively extract and utilize multimodal information from both RGB images and point cloud data. Semantic segmentation of RGB images is achieved using the YOLOv8n-seg module based on the PVN3D network, this method captures more detailed scene features. Additionally, the introduction of ResNet-UNet enhances detection accuracy through feature cascading and multiscale information fusion. Customized optimization of the loss function further improves overall performance. Experimental results show that on the LineMOD dataset, the mean precision is improved by 2%, validating the effectiveness of the proposed improved algorithm.