RISeg: Robot Interactive Object Segmentation via Body Frame-Invariant Features

Abstract

The ability to segment unseen objects is critical for robots to autonomously perform manipulation tasks in new environments. While previous works train deep neural networks on large datasets to learn RGB/RGB-D feature embeddings for segmentation, cluttered scenes often result in inaccurate masks, especially undersegmentation. We introduce RISeg, a novel approach that improves upon static image-based segmentation masks through robot interaction and a designed body frame-invariant feature (BFIF). The key insight is that the spatial twists of frames randomly attached to the same rigid object, when transformed to a fixed reference frame, will be equal despite varying linear and angular velocities. By identifying regions of segmentation uncertainty, strategically introducing object motion through minimal interactions (2-3 per scene), and matching BFIFs, RISeg is able to significantly improve segmentation without relying on object singulation. On cluttered real-world tabletop scenes, RISeg achieves an average object segmentation accuracy of 80.7%, an increase of 28.2% over state-of-the-art static methods.

Publication
IEEE International Conference on Robotics and Automation, 2024

Supplementary notes can be added here, including code and math.

Howard Qian
Howard Qian
Undergraduate student in Computer Science
Kejia Ren
Kejia Ren
Graduate student in Computer Science
Gaotian Wang
Gaotian Wang
Graduate student in Computer Science

My research interests include nonprehensile manipulation, Large Language Models and Task-skill planning.