Skip to content

2021 Feedback and Suggestions #14

@ukoethe

Description

@ukoethe

General

  • Is it fair to have the grade depend on just a (small) part of the lecture?
  • Force the students to engage with the project sooner, not in the last minute
  • Have a preparatory homework on RL, and have the RL lectures earlier in the semester, and perhaps more of them.
  • Provide agents from past years as test competitors (shortly before submission?) or another form of benchmarking for students to decide more easily if an approach is worthwhile pursuing.
  • Allow teams to submit two agents (one safe bet using straightforward ML methods, one more fancy using DeepRL).

Environment

  • Automate acceptance testing via github CI
  • Environment could provide:
    • a feature telling which agent won the round
    • generally more information about the opponents (e.g. their current score)
    • possibility to switch between training and evaluation more easily, e.g. using callable events without training
    • initialization at other than the standard starting state -> Add custom --scenario and modify build_arena
    • adjustable crate density, board size, and starting corner for training via command line option
    • possibility to pause an episode for inspection and later resume (instead of restart) -> use step debugging
    • or: mention in the instructions that such things can be implemented during method development and should be undone for the final training/testing
  • Potential bugs:
    • Environment should call the function game_events_occured() also for the last step before the game ends.
    • "We observed, that the crate distribution is not completely random and that in general
      fewer crates are placed in the bottom right corner. Thereby, more free tiles can be found
      on the bottom right, giving the agent, that starts in this corner, an advantage because the
      probability to kill itself is considerably smaller. Our agent, for example, can play the game
      very well, when starting on the bottom right, but is rather bad when starting elsewhere."
  • Modify environment for training
    • shorter thinking intervals or other speed-ups
    • switch-off multi-threading for easier debugging and profiling
    • provide a passive environment that the agent can call (instead of the other way around)
      • allows easy parallel execution of several environments for faster training
      • GUI and logging are not needed during training
      • makes the training procedure compatible with TFPyEnvironment in https://github.com/tensorflow/agents, or use gym for compatibility with keras_rl
      • example implementation (files items_fast.py, environment_fast.py, agents_fast.py, bomberman_adapter.py) in https://gitlab.com/koetherminator/fml-project
  • Use IntEnums instead of strings to speed-up comparisons (?)
  • Be closer to the original version of the game (e.g. drop several bombs simultaneously)

Project instructions

  • Remind students that the University logo must not be used in the report.
  • Specify in more detail the grading criteria and requirements.
  • Most articles in RL are about neural networks => point out the pre-NN literature (this would also put the project more in line with the rest of the lecture) and other recommended reading (e.g. about reward shaping).
  • Split assignment into more fine-grained subtasks, e.g. task 1a: free coins under fixed crates
  • Add more documentation about the game environment
    • Describe bomb behavior accurately (bombs are only dangerous for one step!).
    • Collect crucial information (e.g. adjustable parameters, required Python version, number of coins created) in a table
    • Explain that self-play is best realized by multiple copies of the same agent (possibly plus some randomization to make behavior more diverse).
  • Clarify that the environment can (and should!) be changed for debugging, profiling, and training -- just don't forget to undo the changes later on
    • timeout may be set to "infinity" to avoid interference with the debugger
    • board size, crate density etc. can be changed to create additional intermediate tasks
    • implement stop-and-resume for inspection
  • Explain symmetry of the game
    • reduce search space by exploiting symmetries
    • implement reward asymmetries to avoid undecided agents in symmetric situations
  • Generally, give a few more tips on promising approaches.
  • Provide Latex template for the report
  • Make more clear that a mentoring tutor is available for questions

Hardware

  • Google Colab difficulties:
    • default Python version is only 3.7 => lots of extra work to install everything from scratch every time
    • only 2 hours of consecutive computing time

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions