This is a text classification task - sentiment classification.
Every document (a line in the data file) is a sentence extracted from social media (blogs).
The goal is to classify the sentiment of each sentence into "positive" or "negative".
Accuracy of this model: 97.91%
- The training data contains 7086 sentences, already labeled with 1 (positive sentiment) or 0 (negative sentiment).
- The test data contains 33052 sentences that are unlabeled.
- The submission should be a .txt file with 33052 lines.
- In each line, there should be exactly one integer, 0 or 1, according to the classification results.
Text Analytics
NLP + ML (Naive Bayes Theoerm)
Ensemble Learning
1. EDA and Pre-processing
- Bag-of-Words method
- Feature Extraction
- Removing the low frequency words
- Count the features
- Remove Stop Words
- Stemming
2. Balancing the unbalanced data classes
3. Naive Bayes Model for sentiment Classification
- Dataset segmentation
- Model building
- Prediction test
- Classification report
- Sample Data:
- Positive and Negative Class-wise Data Distribution Bar Plot :
- The Occurances of Features (Column-wise) :
- Positive and Negative Class-wise Data Distribution After Balancing Data-Classes:
- The Table with Results:
- Confusion Matrix
- The Prediction Accuracy Score for Training Data : 99.37%
- The Prediction Accuracy Score for Testing Data : 97.91%





