Skip to content

Training loop structure #7

Description

@etiennemontenegro

@noi01
This is your current training loop structure :

    for e in range(EPISODES):
        state = env.reset()
        state = np.reshape(state, [1, state_size])
        
        for time in range(500):
            # env.render() #no preview of environment
            action = agent.act(state)
            next_state, reward, done, _ = env.step(action)
            reward = reward if not done else -10
            next_state = np.reshape(next_state, [1, state_size])
            agent.remember(state, action, reward, next_state, done)
            state = next_state
  1. Is the second loop to limit the maximum steps the agent can take before the episode is over? If yes, this should be moved inside the environment step function. This would allow truncating the episode and give a reward based on the fact the agent did not reach the goal in time.

  2. The reward assignment here set reward to -10 when the environment is done. Is that really what you want to do? From my understanding, solving the environment should be positive for the agent. What is the goal of this line?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions