Given a short text (e.g. a social-media post), classify it into one of 7 emotion categories:
| Label | Emotion |
|---|---|
| 0 | anger |
| 1 | disgust |
| 2 | fear |
| 3 | joy |
| 4 | neutral |
| 5 | sadness |
| 6 | surprise |
The primary evaluation metric is Macro-F1.
All data files are located in data/:
| File | Description |
|---|---|
train.csv |
Training set (columns: id, text, label) |
valid.csv |
Validation set (same format as train) |
test_no_label.csv |
Test set — labels are withheld (columns: id, text) |
pip install -r requirements.txtThis covers the retained classical and transformer workflows in the repo.
It also includes the plotting dependency used by notebooks/model_analysis_plots.ipynb.
Use python3 in the commands below if your environment does not expose a python alias. If python already points to Python 3 on your machine, either form is fine.
The current project keeps reusable scripts in src/, notebooks in notebooks/,
generated experiment artifacts in artifacts/, and submission-style prediction
CSVs in submissions/.
src/train_tfidf_logistic_regression.py: best retained classical model.src/search_tfidf_models.py: validation search utility for TF-IDF + linear classifiers.src/train_transformer.py: main Hugging Face fine-tuning script fordistilroberta-base.notebooks/train_transformer_colab.ipynb: Colab notebook for running the retained transformer workflow on Google Drive.notebooks/model_analysis_plots.ipynb: matplotlib-based analysis notebook for report figures and model comparison.
.
├── Project1_emotion_classification_Spring2026.pdf
├── data/
│ ├── train.csv
│ ├── valid.csv
│ └── test_no_label.csv
├── src/
│ ├── baselines/
│ ├── evaluate.py
│ ├── run_experiments.py
│ ├── search_tfidf_models.py
│ ├── train_tfidf_logistic_regression.py
│ └── train_transformer.py
├── notebooks/
├── artifacts/
│ ├── outputs/
│ └── results/
│ ├── baselines/
│ ├── search/
│ └── tfidf/
└── submissions/
Use artifacts/results/ for validation predictions, metrics, and search output
used during model selection. Use artifacts/outputs/ for full transformer
experiment artifacts. Use submissions/ only for final test-set prediction CSVs.
All baseline scripts must be run from the project_1/ directory.
python3 src/baselines/mlp.pyThis script writes validation predictions to
artifacts/results/baselines/emb_mlp_valid_predictions.csv and test predictions
to submissions/emb_mlp_pred.csv by default.
To run the Bi-RNN baseline:
python3 src/baselines/rnn.pyThis script writes validation predictions to
artifacts/results/baselines/rnn_valid_predictions.csv and test predictions to
submissions/rnn_pred.csv by default.
To evaluate predictions on the validation set, run:
python3 src/evaluate.py --pred <path_to_pred.csv>Example:
python3 src/evaluate.py --pred artifacts/results/tfidf/tfidf_logreg_valid_predictions.csvThe script prints accuracy, macro-precision, macro-recall, and macro-F1, along with a per-class breakdown.
Note: The
src/evaluate.pyscript evaluates againstdata/valid.csv. For the final test set, submitsubmissions/prediction.csv(columns:id,label) following the course submission instructions.
Train on train.csv, evaluate on valid.csv, and export both validation and test predictions:
python3 src/train_tfidf_logistic_regression.pyThis single command produces two output files:
artifacts/results/tfidf/tfidf_logreg_valid_predictions.csv: validation predictions forsrc/evaluate.pysubmissions/prediction.csv: test-set predictions for submission formatting checks
It also saves summary metrics to:
artifacts/results/tfidf/tfidf_train_metrics.jsonartifacts/results/tfidf/tfidf_valid_metrics.json
Validation predictions are written to:
artifacts/results/tfidf/tfidf_logreg_valid_predictions.csvTest predictions are written to:
submissions/prediction.csvTo train the final classical model on train.csv + valid.csv before generating the test submission:
python3 src/train_tfidf_logistic_regression.py --train-on-allWith --train-on-all, the script still writes the validation prediction file, then retrains the classical model on the full labeled data (train.csv + valid.csv) before overwriting submissions/prediction.csv with the final test-set predictions.
To search TF-IDF feature and classifier combinations on validation:
python3 src/search_tfidf_models.pyThis writes the search table to artifacts/results/search/tfidf_search_results.csv and the best configuration summary to artifacts/results/search/tfidf_search_best.json.
Train one transformer experiment with the recommended setup:
python3 src/train_transformer.py \
--model-name distilroberta-base \
--experiment-name distilroberta_weighted_seed42 \
--loss-type weighted \
--seed 42 \
--train-batch-size 16 \
--eval-batch-size 32 \
--learning-rate 2e-5 \
--num-train-epochs 4 \
--max-length 128The retained Colab notebook runs this same script-based workflow and is useful when you want GPU-backed training on Google Colab:
notebooks/train_transformer_colab.ipynb
The notebook now reads both train/metrics.json and valid/metrics.json after a run so you can compare train-vs-validation performance directly.
Key options:
--loss-type plain|weighted|focal--sampler-strategy none|weighted--merge-train-validto retrain the final model on the full labeled set before generating test predictions--max-train-examples,--max-valid-examples,--max-test-examplesfor quick smoke tests--fp16or--bf16if your hardware supports mixed precision
Use a tiny checkpoint and small subsets to validate the pipeline before launching a long run:
python3 src/train_transformer.py \
--model-name hf-internal-testing/tiny-random-roberta \
--experiment-name smoke_tiny_roberta \
--output-dir artifacts/outputs/smoke \
--loss-type plain \
--num-train-epochs 1 \
--max-length 32 \
--max-train-examples 64 \
--max-valid-examples 64 \
--max-test-examples 64Artifacts are written under <output-dir>/<experiment_name>/. For the smoke-test command above, that means:
artifacts/outputs/smoke/smoke_tiny_roberta/
Typical artifact files include:
config.jsonclass_weights.jsontrain/metrics.jsonvalid_predictions.csvtest_predictions.csvvalid/metrics.jsonvalid/confusion_matrix.csvvalid/probabilities.npztest/probabilities.npzmodel/
An experiment summary row is also appended to <output-dir>/experiment_summary.csv.
The summary CSV now includes train and validation summary metrics, which makes it easier to inspect train-vs-valid gaps for overfitting analysis.
After choosing the best settings on validation:
python3 src/train_transformer.py \
--model-name distilroberta-base \
--experiment-name distilroberta_weighted_full_seed42 \
--output-dir artifacts/outputs/transformer \
--loss-type weighted \
--seed 42 \
--merge-train-validThis retrains on train.csv + valid.csv and writes the final test predictions to:
artifacts/outputs/transformer/distilroberta_weighted_full_seed42/test_predictions.csvUse the analysis notebook to generate matplotlib figures for:
- final model comparison
- TF-IDF train vs validation Macro-F1
- TF-IDF search trends
- confusion matrices
- per-class F1 comparison
- transformer train vs validation Macro-F1
- transformer validation curves when retained epoch outputs are available
Notebook:
notebooks/model_analysis_plots.ipynb
The notebook can now read the saved transformer train/metrics.json and valid/metrics.json files directly when those local outputs are available.
Your prediction file must be a CSV with exactly two columns: id and label.
id,label
eebbqej,4
ed00q6i,4
...