Learning to Segment Object Affordances on Synthetic Data for Task-oriented Robotic Handovers
Abstract
The ability to perform successful robot-to-human handovers has the potential to improve robot capabilities in the circumstances involving symbiotic human-robot collaboration. Recent computer vision research has shown that object affordance segmentation can be trained on large hand-labeled datasets and perform well in task-oriented grasping pipelines. However, producing and training in such datasets can be time-consuming and resource-intensive. In this paper, we eliminate the necessity for training in these datasets by proposing a novel approach in which training occurs on a synthetic dataset that accurately translates to real-world robotic manipulation scenarios. The synthetic training dataset contains 30245 RGB images with ground truth affordance masks and bounding boxes with class labels for each rendered object. The object set used for rendering consists of 21 object classes capturing 10 affordance classes. We propose a variant of AffordanceNet enhanced with domain randomization on the generated dataset to perform affordance segmentation without the need of fine-tuning on real-world data. Our approach, outperforms the state-of-the-art method on synthetic data, by 23%, and achieves performance levels similar to other methods trained on massive, hand-labeled RGB datasets and fine-tuned on real images from the experimental setup. We demonstrate the effectiveness of our approach on a collaborative robot setup with an end-to-end robotic handover pipeline using various objects in real-world scenarios. Code, the synthetic training dataset, and supplementary material will be made publicly available.