Skip to content

roma-vinn/GameBot-detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GameBot detection

This project is a result of participation in Samsung R&D Institute internship as a part of hands-on training at Mechanics and Mechanics faculty of Taras Shevchenko National University of Kiev.

Plan

Our mentors provided us a simple game, in which were a player, coins, monsters and simple distractions (overlapping trees).

game

For picking up coins you recieve gold and for killing monsters – experience. Our practice consisted of several points:

  • create a game bot to play this game,
  • collect data sets including both human and bot samples,
  • parse the data and extracte some features,
  • visualize features,
  • write classifiers based on different algorithms,
  • use boosting approach,
  • try to discriminate different people.

Game bot

Game bot was written by collective efforts of all group members. It has abilities:

  • to find effective ways to harvest greater number of gold/experience,
  • to make random ”occasional” clicks in order to emulate human play style,
  • to find ways out of dead ends.

Collecting datasets

The data was successfully collected by several members of our team and distributed between us.

Parser and feature extraction

Data is parsing from two log files, splitting on time blocks with constant length. In parallel, I am extracting features:

  • clicks per minute,
  • gold per minute,
  • experience per minute,
  • average holding time per minute,
  • deviation of clicks,
  • trajectory length,
  • average distance between clicks,
  • average time between clicks.

Visualization

With ”permutation importance” method of feature selection (which implemented by eli5 module) were found that features ‘gpm’ (gold per minute), ‘tbc’ (time before clicks), ‘epm’ (experience per minute), ‘aht’ (average holding time) have highest feature importances. After using t-SNE algorithm, we saw, that data is separable, but sometimes bot behaviour is very similar to human.

tsne plot

More visualization in visualization.ipynb

Classification

I learned different classification algorithms and tested them. Additionally to standard Logistic Regression, Naive Bayes, k-Nearest Neighbors, Decision Tree, Random Forest algorithms, I investigated:

  • Extra Trees algorithm, which is pretty similar to Random Forest, but shows better performance in terms of overfitting,
  • and Voting Classifier, which combines different simple models, and as a result it’s more stable on the ”real-life data”, which can be much different from training.

Finally, I used exactly Voting Classifier with majority of mentioned simple models. Also, GridSearchCV was used to pick up best parameters for inner models. Best results were shown, when 20s time windows were used.

accuracy for different time windows

Boosting

Comparing to Ada Boost, Bagging showed up better performance and consequently was selected as main boosting algorithm. As we can see from the table, after boosting the results became better on big time windows and stayed the same on smaller one.

bosting results

All classification results you can find in classification.ipynb.

Human vs Human

tsne for 2 people

Human discrimination suddenly turned out to be easier than “bot vs human”. Probably, the reason is that different people have significantly different play styles, but our bot tries to copy more general human behavioural features, like occasional clicks etc. Unfortunately, there were a few human logs, that belongs to one person, so ‘perfect’ accuracy on big time windows are not so demonstrative.

bosting results

Human vs Human vs Human

tsne for 2 people

Discriminate three people became harder task, because two people were playing too similar (maybe that was the same person playing differently), but results are still good enough (except 5 second time window).

bosting results

Conclusions

In general, classifiers based only on selected behavioural features are able to discriminate bot and human good enough.

I came up to conclusion that optimal size for time window is 20 seconds, because it’s not small enough to have high bias (to prevent ‘wrong blaming’), and not big enough to make our data more ‘blurry’.

About

GameBot detection practice project

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors