-
Notifications
You must be signed in to change notification settings - Fork 4
1. Data loading and preprocessing
The analyzed dataset contains data from 250 subjects. Each subject has its own directory, which contains the Scenario folder with the experiment log and the Data folder with BrainVision files.
In the following code sample, the first statement gets the names of all the folders representing the obtained data of the individual subjects, which are located in the directory, which is represented by the path specified in the param.path variable in absolute form. The value of this variable, like other configurable parameters relevant to this experiment, is set in the Param configuration class in the param.py file. The field with the names of these folders is stored in the dirs variable. The cycle ensures the reading of data of all subjects. A file with the extension .vhdr is used when reading BrainVision data using MNE. The name of this file is original for each subject, but in the Data folder there is only one file of this type for each subject, so its name is found using the glob() function, which returns a list of paths corresponding to a pattern. The name of the subject's .vhdr file is now stored in the file_name_vhdr field at index 0. If there is no .vhdr file in the folder, the loading process will move to the next subject.
The raw data is loaded into the raw variable using the mne.io.read_raw_brainvision() function with parameters name of the .vhdr file and preload=True, what ensures the data loading into the memory, which enables following filtering. The raw data is then filtered using high and low pass filters. In the MNE it is necessary that all subjects have the same number of channels, in our case 3. EOG channel, which was recorded in about half of the subjects, is removed from the raw data.
root, dirs, files = next(os.walk(param.path))
for folder_name in dirs:
path_file_vhdr = param.path + folder_mame + '/Data/*.vhdr'
file_name_vhdr = glob(path_file_vhdr)
if len(file_name_vhdr) == 0:
continue
raw = mne.io.read_raw_brainvision(file_name_vhdr[0], preload=True)
raw.filter(l_freq=param.l_freq, h_freq=param.h_freq)
if (raw.info['nchan'] == 4):
raw = raw.drop_channels(['EOG'])The thought number of the subject is also stored in the Data folder, specifically in a metadata text file. The thought number must be found due to the extraction of the target epochs. This number is always located on the third line of the metadata text file and its finding is shown in the following code sample.
path_file_txt = param.path + folder_name + '/Data/*.txt'
file_name_txt = glob(path_file_txt)
if len(file_name_txt) == 0:
continue
loaded_txt = open(file_name_txt[0], "r")
text = loaded_txt.readlines()
line = text[2]
event_id_target = int(line.split(": ")[1])The name of the metadata text file is found in a similar way as the finding the name of the .vhdr file. The file is then opened and the thought number is extracted from the third line into the event_id_target variable. If there is no metadata text file in the folder, the loading process will move to the next subject.
Then, events (stimuli) are obtained from the raw signal and the target epochs are extracted, i.e. those around the stimulus with the subject's thought number, which is shown in the following code sample.
The variable events_loaded contains all stimuli and times of their occurrence in the signal, as well as their text description. The time interval around the stimulus in which the epochs are extracted and the interval used for baseline correction are defined in the Param configuration class. The extraction of target epochs is in MNE done by the initialization of the mne.Epochs class with the given parameters and with event_id=event_id_target, which ensures that only the target epochs are extracted. These epochs are now labeled with the number of the stimulus around which they were extracted, but for classification purposes we need to mark them as target, ie as belonging to class 0, which is provided by the last line of the following code sample.
events_loaded = mne.events_from_annotations(raw)
epochs_t_subject = mne.Epochs(raw, events=events_loaded[0], event_id=event_id_target, tmin=param.t_min, tmax=param.t_max, baseline=param.baseline)
epochs_t_subject = mne.Epochs.combine_event_ids(epochs_t_subject, [str(event_id_target)], {'target': 0})After the extraction of the target epochs, non-target epochs around a randomly selected stimulus that represents a not thought number of the given subject are extracted. This part is shown in the following code sample. A random number, which must be different from the thought number of the subject, is generated. Non-target epochs with the same parameters as the target epochs are then extracted around this stimulus. The epochs are then labeled as non-target, thus belonging to class 1, and since the stimulus numbers range from 1 to 9, these epochs will be labeled as 11 for later purposes. The extracted target and non-target epochs are then appended to global variables gathering epochs of all subjects. The open metadata text file is closed.
non_target_random = random.randint(1, 9)
while event_id_target == non_target_random:
non_target_random = random.randint(1, 9)
epochs_n_subject = mne.Epochs(raw, events=events_loaded[0], event_id=non_target_random, tmin=param.t_min, tmax=param.t_max, baseline=param.baseline)
epochs_n_subject = mne.Epochs.combine_event_ids(epochs_n_subject, [str(non_target_random)], {'nonTarget': 11})
epochs_target.append(epochs_t_subject)
epochs_non_target.append(epochs_n_subject)
loaded_txt.close()In this moment, both target and non-target epochs are in the list of instances of the mne.Epochs class. The mne.concatenate_epochs() method concatenates this list into a single instance. Epochs that have a peak-to-peak amplitude greater than 150 μV (defined in the Param class) can now be removed from sets using the drop_bad() method with a given threshold. Both target and non-target epochs are then combined into a single variable called epochs_all. This process is found in the following code sample.
epochs_target = mne.concatenate_epochs(epochs_target)
reject = dict(eeg=param.amplitude)
epochs_target.drop_bad(reject=reject)
epochs_non_target = mne.concatenate_epochs(epochs_non_target)
epochs_non_target.drop_bad(reject=reject)
epochs_all = mne.concatenate_epochs([epochs_target, epochs_non_target])The first part of the following code sample demonstrates the conversion of signal units from V to μV using the class mne.decoding.Scaler. In the initialization of this class, we can specify, which channel will be scaled and with what constant will be scaled. So in our case it is an EEG channel and for conversion to μV is scaled by 1e6. Then all epochs are stored in the variable X, but already in the format of the 3D tensor of the library NumPy in the form (epochs_count x channels_count x values_count). The fit_transform() method of the Scaler class, which ensures the data conversion to μV, can be used for data in this form.
In the second part of the following code sample, the corresponding assignments in the form [1, 0] for the target epoch expressing belonging to the zero class, and in the form [0, 1] for the non-target epoch are created using the method from the Keras library. The sets of these labels are then combined into one field.
scalings = dict(eeg=1e6)
scaler = mne.decoding.Scaler(epochs_all.info, scalings=param.scaling)
X = epochs_all.get_data()
X = scaler.fit_transform(X)
out_t_labels = keras.utils.to_categorical(epochs_target.events[:, 2], 2)
out_n_labels = keras.utils.to_categorical(epochs_non_target.events[:, 2] - 10, 2)
out_labels = np.vstack((out_t_labels, out_n_labels))