Training loop structure

@noi01 
This is your current training loop structure  : 

```python
    for e in range(EPISODES):
        state = env.reset()
        state = np.reshape(state, [1, state_size])
        
        for time in range(500):
            # env.render() #no preview of environment
            action = agent.act(state)
            next_state, reward, done, _ = env.step(action)
            reward = reward if not done else -10
            next_state = np.reshape(next_state, [1, state_size])
            agent.remember(state, action, reward, next_state, done)
            state = next_state
```
1. Is the second loop to limit the maximum steps the agent can take before the episode is over?  If yes, this should be moved inside the environment step function. This would allow truncating the episode and give a reward based on the fact the agent did not reach the goal in time. 

2. The reward assignment here set reward to -10 when the environment is done. Is that really what you want to do? From my understanding, solving the environment should be positive for the agent. What is the goal of this line?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training loop structure #7

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Training loop structure #7

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions