2021 Feedback and Suggestions

### General
- Is it fair to have the grade depend on just a (small) part of the lecture?
- Force the students to engage with the project sooner, not in the last minute
- Have a preparatory homework on RL, and have the RL lectures earlier in the semester, and perhaps more of them.
- Provide agents from past years as test competitors (shortly before submission?) or another form of benchmarking for students to decide more easily if an approach is worthwhile pursuing.
- Allow teams to submit two agents (one safe bet using straightforward ML methods, one more fancy using DeepRL).
### Environment
- Automate acceptance testing via github CI
- Environment could provide:
   - [ ] a feature telling which agent won the round
   - [x] generally more information about the opponents (e.g. their current score)
   - [ ] possibility to switch between training and evaluation more easily, e.g. using callable events without training
   - [x] ~~initialization at other than the standard starting state~~ -> Add custom `--scenario` and modify `build_arena`
   - [x] adjustable crate density, board size, and starting corner for training via command line option
   - [x] ~~possibility to pause an episode for inspection and later resume (instead of restart)~~ -> use step debugging
   - [ ] **or:** mention in the instructions that such things can be implemented during method development and should be undone for the final training/testing
- Potential bugs:
   - [x] Environment should call the function `game_events_occured()` also for the last step before the game ends.
   - [x] "We observed, that the crate distribution is not completely random and that in general
fewer crates are placed in the bottom right corner. Thereby, more free tiles can be found
on the bottom right, giving the agent, that starts in this corner, an advantage because the
probability to kill itself is considerably smaller. Our agent, for example, can play the game
very well, when starting on the bottom right, but is rather bad when starting elsewhere."
- Modify environment for training
   - [x] ~~shorter thinking intervals or other speed-ups~~
   - [x] switch-off multi-threading for easier debugging and profiling
   - [ ] provide a passive environment that the agent can call (instead of the other way around)
      - allows easy parallel execution of several environments for faster training
      - GUI and logging are not needed during training
      - makes the training procedure compatible with `TFPyEnvironment` in https://github.com/tensorflow/agents, or use gym for compatibility with keras_rl
      - example implementation (files `items_fast.py`, `environment_fast.py`, `agents_fast.py`, `bomberman_adapter.py`) in https://gitlab.com/koetherminator/fml-project 
- ~~Use IntEnums instead of strings to speed-up comparisons (?)~~
- Be closer to the original version of the game (e.g. drop several bombs simultaneously)
### Project instructions
- Remind students that the University logo must not be used in the report.
- Specify in more detail the grading criteria and requirements.
- Most articles in RL are about neural networks => point out the pre-NN literature (this would also put the project more in line with the rest of the lecture) and other recommended reading (e.g. about reward shaping).
- Split assignment into more fine-grained subtasks, e.g. task 1a: free coins under fixed crates
- Add more documentation about the game environment
   - Describe bomb behavior accurately (bombs are only dangerous for one step!). 
   - Collect crucial information (e.g. adjustable parameters, required Python version, number of coins created) in a table
   - Explain that self-play is best realized by multiple copies of the same agent (possibly plus some randomization to make behavior more diverse).
- Clarify that the environment can (and should!) be changed for debugging, profiling, and training -- just don't forget to undo the changes later on
   - timeout may be set to "infinity" to avoid interference with the debugger
   - board size, crate density etc. can be changed to create additional intermediate tasks
   - implement stop-and-resume for inspection  
- Explain symmetry of the game
   - reduce search space by exploiting symmetries
   - implement reward asymmetries to avoid undecided agents in symmetric situations
- Generally, give a few more tips on promising approaches.
- Provide Latex template for the report
- Make more clear that a mentoring tutor is available for questions
### Hardware
- Google Colab difficulties:
   - default Python version is only 3.7 => lots of extra work to install everything from scratch every time
   - only 2 hours of consecutive computing time

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2021 Feedback and Suggestions #14

General

Environment

Project instructions

Hardware

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

2021 Feedback and Suggestions #14

Description

General

Environment

Project instructions

Hardware

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions