A multimodal deep learning framework that fuses visual features from EfficientNet and textual features from BERT to classify sentiment in image-text conversations using the MSCTD dataset.
natural-language-processing computer-vision deep-learning sentiment-analysis pytorch bert multimodal-learning feature-fusion efficientnet msctd-dataset
-
Updated
Apr 11, 2026 - Jupyter Notebook