Project: Classification of HMX Crystal Structures using Machine Learning
This machine learning model classifies the glide planes in HMX crystals, using a SVM algorithm.
The procedures to obtain the dataset are following:
1.copy_folder.sh was written in bash to generate 7 folders for 7 different glide planes, these are main folders to contain all the crystal structures.
2.run_single.sh written in bash gives necessary user information for gccm_single which was written in Fortran to find any possible orientations for a given glide plane. For example, if we want to choose the glide plane (101), we need to put 1, 0 and 1 for ‘Sx’, ‘Sy’ and ‘Sz’ in control.txt which will be read by gccm_single when we execute gccm_single. And run_single.sh takes care of this procedure automatically.
3.run_generate.sh will generates 12 different folders for 12 different orientations for each glide plane and set up all the necessary user information for gccm_generate. gccm_generate will be executed when we run run_generate.sh, and gccm_generate reads all the given information and files needed to create a crystal structure of a unit cell for each orientation.
4.create_difsize_folder.sh creates 100 folders for 100 crystal structures which will be generated by the following scripts.
5.run_difsize_hmx.sh creates 100 crystal structures of different size by increasing the number of unit cells along x and y axes from 1 to 10. It also calculates center of mass (COM) for each HMX molecule and print the coordinate information of COMs to an output file.
6.run_rotate.sh rotates each crystal structure 4 times with 90° each time to eliminate the effect of different size and capture the information of symmetry of different structures.
7.run_img2num.sh converts every crystal structure to images of 400 × 400 in pixels and then it reduces the resolution to 50 × 50 in pixels, and finally it converts the image to a feature vector and prints it out to an output file.
8.run_getdata.sh greps the position information of each structure and convert each of them into a single feature vector.
9.run_sum.sh reads all the feature vectors and combines them together and we get the dataset.
The dataset is available through the 'data' file.
To perform the classification:
main_svm.ipynb is the file which run the SVM algorithm. The script reads the dataset, prepares the data to feed to the machine learning model, performs data visualization, data splitting, the tunning of hyperparameters, data training, data prediction and results analysis.
Instead, main_cnn.ipynb and main_dt.ipynb performs classification based on the neural network (NN) algorithm and decision tree.