(Code Licensed under the MIT License)
For further details, please refer to the paper:
Hassan Hayat, Carles Ventura, Agata Lapedriza, “Modeling Subjective Affect Annotations with Multi-Task Learning”, Sensors, 2022, 22(14):5245. DOI: 10.3390/s22145245
This package was developed by Mr. Hassan Hayat (hhassan0@uoc.edu). For any inquiries regarding this package, please feel free to contact him. The package is provided free for academic use and is distributed at your own risk.
- Ubuntu Linux
- Python: 3.x.x
- GPU: With CUDA support
- Audio Features: VGGish
- Visual Features: I3D Model
- Textual Features: BERT-Base
- TensorFlow: 1.14
-
Movie Clips Creation:
The COGNIMUSE dataset provides subtitle information with timestamps that indicate the start and end frames where a subtitle appears. These timestamps are used to create movie clips. -
Visual Feature Extraction:
An I3D model is used to extract visual features. The output from the 'Mixed-5c' layer of the model represents the visual features of each frame. Prior to feeding the frames into the model, each frame is resized to 224x224 pixels.
- Audio Feature Extraction:
The dataset’s dyadic conversations are divided into five sessions. A pre-trained VGGish model is used to extract audio features for each utterance. The audio representation is taken from the output of the last convolutional layer. Please refer to the paper for further details.
- Textual Feature Extraction:
The BERT-Base model is employed to obtain textual semantics for each news headline. The representation is derived from the final layer (i.e., the 12th layer) of the model. Please refer to the paper for more details.
To train, validate, and test using multi-modalities in a single-task setting, execute the following:
./st_main.pyTo train, validate, and test using multi-modalities in a multi-task setting, execute the following:
./mt_main.py