ReKep ReKep
1Tsinghua University, 2Shanghai Qi Zhi Institute, 3Shanghai Artificial Intelligence Laboratory

* indicates equal contributions


We show that with proper data scaling, a single-task policy can generalize well to any new environment and any new object within the same category. Remarkably, the robot can even be deployed zero-shot in a hot pot restaurant 🍲!


Abstract

Data scaling has revolutionized fields like natural language processing and computer vision, providing models with remarkable generalization capabilities. In this paper, we investigate whether similar data scaling laws exist in robotics, particularly in robotic manipulation, and whether appropriate data scaling can yield single-task robot policies that can be deployed zero-shot for any object within the same category in any environment. To this end, we conduct a comprehensive empirical study on data scaling in imitation learning. By collecting data across numerous environments and objects, we study how a policy's generalization performance changes with the number of training environments, objects, and demonstrations. Throughout our research, we collect over 40,000 demonstrations and execute more than 15,000 real-world robot rollouts under a rigorous evaluation protocol. Our findings reveal several intriguing results: the generalization performance of the policy follows a roughly power-law relationship with the number of environments and objects. The diversity of environments and objects is far more important than the absolute number of demonstrations; once the number of demonstrations per environment or object reaches a certain threshold, additional demonstrations have minimal effect. Based on these insights, we propose an efficient data collection strategy. With four data collectors working for one afternoon, we collect sufficient data to enable the policies for two tasks to achieve approximately 90% success rates in novel environments with unseen objects.



Power-Law Data Scaling Laws

The policy's generalization ability to new objects, new environments, or both scales approximately as a power law with the number of training objects, training environments, or training environment-object pairs, respectively. This is evidenced by the correlation coefficient r in the video below.





Evaluation Videos

Building upon power-law data scaling laws, we propose an efficient data collection strategy. By collecting data from numerous environments (e.g., 32 environments), each featuring a unique manipulation object and 50 demonstrations, we can train a policy that generalizes effectively—achieving a 90% success rate—to any new environment and object. Below, we present sample rollouts from 8 unseen testing environments investigated in the paper.




More In-The-Wild Environments

We deployed the robot in various in-the-wild environments—including hot pot restaurants 🍲, cafés ☕, elevators 🛗, fountains ⛲, and other locations where data had not been previously collected. We found that the policy generalized surprisingly well!




Hardware Setup




Acknowledgments

The robot hardware is partially supported by Tsinghua ISR Lab. We would like to express our gratitude to Cheng Chi and Chuer Pan for their invaluable advice on UMI. We are also thankful to Linkai Wang for his assistance in setting up the movable platform. Additionally, we appreciate the thoughtful discussions and feedback provided by Tong Zhang, Ruiqian Nai, Geng Chen, Weijun Dong, Shengjie Wang, and Renhao Wang.



BibTeX

@misc{lin2024datascalinglawsimitation,
  title = {Data Scaling Laws in Imitation Learning for Robotic Manipulation},
  author = {Fanqi Lin and Yingdong Hu and Pingyue Sheng and Chuan Wen and Jiacheng You and Yang Gao},
  archivePrefix = {arXiv},
  eprint = {2410.18647},
  primaryClass = {cs.RO},
  url = {https://arxiv.org/abs/2410.18647},
  year = {2024}
}