PreAfford: Universal Affordance-Based Pre-Grasping for Diverse Objects and Environments

Kairui Ding 1 , Boyuan Chen 1 , Ruihai Wu 2 , Yuyang Li 3 , Zongzheng Zhang 1 , Huan-ang Gao 1 , Siqi Li 1 , Yixin Zhu 3 , Guyue Zhou 1,4 , Hao Dong 2 , Hao Zhao †1
1 Institute for AI Industry Research (AIR), Tsinghua University
2 CFCS, School of Computer Science, Peking University     3 Institute for Artificial Intelligence, Peking University
4 School of Vehicle and Mobility, Tsinghua University
Indicates Corresponding Author

Demonstration Video of PreAfford.

Abstract

Robotic manipulation with two-finger grippers is challenged by objects lacking distinct graspable features. Traditional pre-grasping methods, which typically involve repositioning objects or utilizing external aids like table edges, are limited in their adaptability across different object categories and environments. To overcome these limitations, we introduce PreAfford, a novel pre-grasping planning framework that incorporates a point-level affordance representation and a relay training approach. Our method significantly improves adaptability, allowing effective manipulation across a wide range of environments and object types. When evaluated on the ShapeNet-v2 dataset, PreAfford not only enhances grasping success rates by 69\% but also demonstrates its practicality through successful real-world experiments. These improvements highlight PreAfford's potential to redefine standards for robotic handling of complex manipulation tasks in diverse settings.

Introduction & Method

Illustration of PreAfford, demonstrating the application of a relay training paradigm where two synergistic modules cooperate to facilitate the manipulation of objects typically considered ungraspable. The pre-grasping module assesses environmental features such as edges, slopes, slots, and walls to propose strategic pre-grasping actions that enhance the likelihood of a successful grasp. Simultaneously, the grasping module evaluates these actions and provides feedback in the form of rewards, which are used to refine and optimize the pre-grasping strategies. Two color bars represent the pre-grasping and grasping phases, respectively, with the color intensity reflecting the calculated affordance values; higher values denote more optimal interaction conditions.


The framework of PreAfford. The framework consists of two main modules, each incorporating three networks: an affordance network, a proposal network, and a critic network. These networks respectively handle tasks of choosing the contact point, generating a proposal, and evaluating the proposal. PointNet++ (PN++) and MLP are employed to process point clouds and facilitate decision-making. During the inference phase, both modules collaborate to develop strategies for pre-grasping and grasping. In contrast, during the training phase, the grasping module generates rewards that are used to train the pre-grasping module, a process we refer to as relay.

Results

Qualitative Results. Here we demonstrate pre-grasping manipulation on training and testing categories in four scenarios—edge, slot, slope, and wall. Affordance maps highlight effective interaction areas, showing \method’s capability to devise suitable pre-grasping and grasping strategies for various object categories and scenes, including both seen and unseen objects.
Multi-feature scenario:} PreAfford effectively addresses scenarios where multiple environmental features are present simultaneously. (a) A complex environment, (b) Affordance heatmap.
Comparison with baselines. Pre-grasping increases grasping success rates by 52.9%. A closed-loop strategy further enhances this improvement by 16.4% across all categories.
Setting Train object categories Test object categories
Edge Wall Slope Slot Multi Avg. Edge Wall Slope Slot Multi Avg.
W/o pre-grasping 2.3 3.8 4.3 3.4 4.0 3.6 6.1 2.3 2.9 5.7 6.0 4.6
Random-direction Push 21.6 10.3 6.4 16.8 18.1 14.6 24.9 17.2 12.1 18.4 23.0 19.1
Center-point Push 32.5 23.7 40.5 39.2 39.0 35.0 25.1 17.4 28.0 30.2 21.5 24.4
Ours w/o closed-loop 67.2 41.5 58.3 76.9 63.6 61.5 56.4 37.3 62.6 75.8 55.4 57.5
Ours 81.4 43.4 73.1 83.5 74.1 71.1 83.7 47.6 80.5 83.0 74.6 73.9
Real world pre-grasping manipulations with affordance maps. Red areas in the maps indicate optimal pushing locations. Point clouds are captured by Femto Bolt. (a) move a tablet to table edge, (b) push a plate towards a wall, (c) push a keyboard up a slope, and (d) slide a tablet into a slot.
Real-world experiment results. Experiments were conducted twice for each object in every scene, comparing direct grasping (without pre-grasping) to grasping after pre-grasping. Success rates are presented as percentages.
Setting Seen categories Unseen categories
Edge Wall Slope Slot Multi Avg. Edge Wall Slope Slot Multi Avg.
W/o pre-grasping 0 0 0 0 0 0 10 0 5 0 0 3
With pre-grasping 70 45 80 90 85 74 80 30 75 90 85 72

BibTeX

If you find our work useful in your research, please consider citing:
@misc{ding2024preafford,
      title={PreAfford: Universal Affordance-Based Pre-Grasping for Diverse Objects and Environments}, 
      author={Kairui Ding and Boyuan Chen and Ruihai Wu and Yuyang Li and Zongzheng Zhang and Huan-ang Gao and Siqi Li and Yixin Zhu and Guyue Zhou and Hao Dong and Hao Zhao},
      year={2024},
      eprint={2404.03634},
      archivePrefix={arXiv},
      primaryClass={cs.RO}
}