Question about initial success rate in Figure 4 (0-step performance before training)

Hi, thank you very much for releasing this great work and the codebase.

I have a question regarding Figure 4 in the paper. In the plot, the success rate at **step 0** (before any RL training) is already around **0.2–0.6**, depending on the task. However, in the [franka_walkthrough.md](url) documentation, it seems that the model is **randomly initialized** and there is no explicit pretraining or warm-start stage mentioned.

I would like to ask:

- **How is the initial policy (at step 0) able to achieve a non-zero success rate?**

- Did you use any kind of offline BC pretraining, imitation initialization, or previously collected policy checkpoints when generating the results in Figure 4?

- Or did the authors observe non-zero success rate even with a purely random-initialized policy?
(In my experiments, a random-init model makes the arm move unpredictably and has ~0 success rate.)

I am probably missing something, so any clarification would be greatly appreciated.
Thanks again for your excellent work!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about initial success rate in Figure 4 (0-step performance before training) #43

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Question about initial success rate in Figure 4 (0-step performance before training) #43

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions