This project is a result of participation in Samsung R&D Institute internship as a part of hands-on training at Mechanics and Mechanics faculty of Taras Shevchenko National University of Kiev.
Our mentors provided us a simple game, in which were a player, coins, monsters and simple distractions (overlapping trees).
For picking up coins you recieve gold and for killing monsters – experience. Our practice consisted of several points:
- create a game bot to play this game,
- collect data sets including both human and bot samples,
- parse the data and extracte some features,
- visualize features,
- write classifiers based on different algorithms,
- use boosting approach,
- try to discriminate different people.
Game bot was written by collective efforts of all group members. It has abilities:
- to find effective ways to harvest greater number of gold/experience,
- to make random ”occasional” clicks in order to emulate human play style,
- to find ways out of dead ends.
The data was successfully collected by several members of our team and distributed between us.
Data is parsing from two log files, splitting on time blocks with constant length. In parallel, I am extracting features:
- clicks per minute,
- gold per minute,
- experience per minute,
- average holding time per minute,
- deviation of clicks,
- trajectory length,
- average distance between clicks,
- average time between clicks.
With ”permutation importance” method of feature selection (which implemented by eli5 module) were found that features ‘gpm’ (gold per minute), ‘tbc’ (time before clicks), ‘epm’ (experience per minute), ‘aht’ (average holding time) have highest feature importances. After using t-SNE algorithm, we saw, that data is separable, but sometimes bot behaviour is very similar to human.
More visualization in visualization.ipynb
I learned different classification algorithms and tested them. Additionally to standard Logistic Regression, Naive Bayes, k-Nearest Neighbors, Decision Tree, Random Forest algorithms, I investigated:
- Extra Trees algorithm, which is pretty similar to Random Forest, but shows better performance in terms of overfitting,
- and Voting Classifier, which combines different simple models, and as a result it’s more stable on the ”real-life data”, which can be much different from training.
Finally, I used exactly Voting Classifier with majority of mentioned simple models. Also, GridSearchCV was used to pick up best parameters for inner models. Best results were shown, when 20s time windows were used.
Comparing to Ada Boost, Bagging showed up better performance and consequently was selected as main boosting algorithm. As we can see from the table, after boosting the results became better on big time windows and stayed the same on smaller one.
All classification results you can find in classification.ipynb.
Human discrimination suddenly turned out to be easier than “bot vs human”. Probably, the reason is that different people have significantly different play styles, but our bot tries to copy more general human behavioural features, like occasional clicks etc. Unfortunately, there were a few human logs, that belongs to one person, so ‘perfect’ accuracy on big time windows are not so demonstrative.
Discriminate three people became harder task, because two people were playing too similar (maybe that was the same person playing differently), but results are still good enough (except 5 second time window).
In general, classifiers based only on selected behavioural features are able to discriminate bot and human good enough.
I came up to conclusion that optimal size for time window is 20 seconds, because it’s not small enough to have high bias (to prevent ‘wrong blaming’), and not big enough to make our data more ‘blurry’.







