Skip to content

JTerZeus/classification-bankruptcy-ml

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Open In Colab

(https://colab.research.google.com/github/JTerZeus/classification-bankruptcy-ml/blob/main/notebooks/classification_project.ipynb)

Bankruptcy Prediction using Machine Learning

This repository contains the complete implementation and analysis for a university assignment on classification problems, focusing on corporate bankruptcy prediction using financial indicators.

The project evaluates and compares multiple machine learning classifiers under class imbalance conditions and answers specific performance-related questions defined in the assignment.

Assignment Context

The dataset consists of financial ratios, binary activity indicators, company status (healthy or bankrupt), and the corresponding year for each company. Each row represents a different company.

The implementation follows the full assignment specification, including:

  • data loading from Excel,
  • exploratory data analysis and visualization,
  • missing value checks,
  • Min–Max normalization,
  • Stratified K-Fold cross-validation (k=4),
  • class imbalance handling with undersampling (3:1 ratio),
  • training and evaluation of multiple classification models,
  • generation of confusion matrices,
  • storage of experimental results in CSV/Excel format,
  • additional analysis and visualization using pivot tables in Excel.

Repository Structure

  • notebooks/
    Executable Jupyter notebook developed in Google Colab.
    This is the main implementation and contains all experiments, figures and outputs.

  • data/
    Input dataset provided by the assignment.

  • results/
    Output files generated from the Python code and further analyzed in Excel (e.g. balancedDataOutcomes.csv / .xlsx).

  • report/
    Final project report (PDF), written in Greek, answering all assignment questions.

Notebook Execution

The project was originally developed in Google Colab. The Jupyter notebook can be executed cell-by-cell in a notebook environment.

An exported .py version of the notebook is also included for reference, but the notebook is the recommended way to run the code.

Models Implemented

The following eight (8) classification models were trained and evaluated:

  • Linear Discriminant Analysis (LDA)
  • Logistic Regression
  • Decision Tree
  • Random Forest
  • k-Nearest Neighbors (k-NN)
  • Naive Bayes
  • Support Vector Machine (SVM)
  • XGBoost (additional model)

Evaluation Metrics

Model performance was evaluated on both training and test sets using:

  • Accuracy
  • Precision
  • Recall (Sensitivity)
  • F1 Score
  • ROC-AUC
  • Specificity (computed during Excel analysis)

Due to class imbalance, F1 Score was selected as the primary metric for model comparison.

Results

The Python code generates a CSV file (balancedDataOutcomes.csv) containing detailed results for all folds, models and datasets. This file was later converted to Excel and used to compute additional metrics and create comparison plots using pivot tables.

The final report summarizes the results and answers:

  1. which model performs best overall, and
  2. whether the required performance constraints are satisfied.

References

[1] Scikit-learn Developers. Model Evaluation: Classification Metrics.
https://scikit-learn.org/stable/modules/model_evaluation.html#classification-metrics

[2] XGBoost Developers. XGBoost Documentation.
https://xgboost.readthedocs.io/en/stable/

[3] Wikipedia contributors. Confusion matrix.
https://en.wikipedia.org/wiki/Confusion_matrix

About

Comparison of multiple machine learning classifiers for corporate bankruptcy prediction using financial ratios, class imbalance handling, and cross‑validation.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors