Hi, thank you very much for releasing this great work and the codebase.
I have a question regarding Figure 4 in the paper. In the plot, the success rate at step 0 (before any RL training) is already around 0.2–0.6, depending on the task. However, in the franka_walkthrough.md documentation, it seems that the model is randomly initialized and there is no explicit pretraining or warm-start stage mentioned.
I would like to ask:
-
How is the initial policy (at step 0) able to achieve a non-zero success rate?
-
Did you use any kind of offline BC pretraining, imitation initialization, or previously collected policy checkpoints when generating the results in Figure 4?
-
Or did the authors observe non-zero success rate even with a purely random-initialized policy?
(In my experiments, a random-init model makes the arm move unpredictably and has ~0 success rate.)
I am probably missing something, so any clarification would be greatly appreciated.
Thanks again for your excellent work!
Hi, thank you very much for releasing this great work and the codebase.
I have a question regarding Figure 4 in the paper. In the plot, the success rate at step 0 (before any RL training) is already around 0.2–0.6, depending on the task. However, in the franka_walkthrough.md documentation, it seems that the model is randomly initialized and there is no explicit pretraining or warm-start stage mentioned.
I would like to ask:
How is the initial policy (at step 0) able to achieve a non-zero success rate?
Did you use any kind of offline BC pretraining, imitation initialization, or previously collected policy checkpoints when generating the results in Figure 4?
Or did the authors observe non-zero success rate even with a purely random-initialized policy?
(In my experiments, a random-init model makes the arm move unpredictably and has ~0 success rate.)
I am probably missing something, so any clarification would be greatly appreciated.
Thanks again for your excellent work!