Predicting an Insurance Premium: Assur'Aimant US Expansion

This project aims to help Assur'Aimant, a French insurer, estimate insurance premiums for its expansion in the United States. Currently, manual premium estimation is costly and time-consuming. This project uses machine learning to predict premiums based on customer demographics.

Project Context

Assur'Aimant wants to modernize its insurance premium estimation process for the US market. We were commissioned to develop an AI solution capable of accurately predicting premiums based on customer characteristics. This project includes exploratory data analysis (EDA) and the construction of a predictive model.

Data

The data collected from Assur'Aimant in Houston includes the following information:

BMI: Body Mass Index (18.5 - 24.9 ideally).
Sex: Gender of the subscriber (male or female).
Age: Age of the primary beneficiary.
Children: Number of dependent children covered by insurance.
Smoker: Smoking status (smoker or non-smoker).
Region: Region of residence in the United States (Northeast, Southeast, Southwest, Northwest).
Charges: Insurance premium billed (target variable).

Objectives

Exploratory Data Analysis (EDA): Understanding data, identifying trends, outliers and relationships between variables. This includes:
- Missing and duplicate values check (with missingno).
- Outlier detection.
- Univariate and bivariate analysis.
- Correlation analysis.
- Hypothesis validation with statistical tests.
- Visualizations with seaborn (box plots, violin plots, etc.).
Predictive Modeling: Building a machine learning model to predict insurance premiums. This includes:
- Creation of a base model (Dummy Model).
- Data separation (80% training, 20% test).
- Data preparation (logarithmic transformation if necessary, management of random_state and seed).
- Model selection (sklearn: Linear Regression, Lasso, Ridge, ElasticNet or any model that performs best).
- Model evaluation (R², RMSE).
- Pre-processing (Standardization, encoding of categorical variables with sklearn.pipeline.Pipeline).
- Optimization (PolynomialFeatures, GridSearchCV, RandomSearchCV).
- Analysis and interpretation of results (importance of variables).
Streamlit Application: Develop an interactive application allowing:
- User data entry.
- Real-time insuranc charge prediction.
- Use of a pre-trained model exported in .pkl.
- Integration of pre-processing pipelines.

Tools and Technologies

Python
pandas, numpy
scikit-learn
seaborn, missingno
streamlit

Projet files

app.py - Streamlit Application
notebooks/data_cleaning.ipynb - Data cleaning
notebooks/data_analysis.ipynb - Exploratory Data Analysis (EDA)
notebooks/data_model.ipynb - Model development and testing (model used for the steamlit app)
notebooks/analysis_vk.ipynb - Cleaning / EDA / model building and testing
model/model.pkl - Exported trained model
README.md - This file
requirements.txt - Dependencies & packages
asset - Folder contains some figure results from the analysis

How to Run

Follow these steps to execute the project:

Ensure Python is installed on your system.
Clone this repository to your local machine:

git clone https://github.com/MichAdebayo/simplon_insurance_price_prediction.git

Navigate to the project directory:

cd simplon_insurance_price_prediction

Install the required dependencies:

pip install -r requirements.txt

Running the Streamlit application

streamlit run app.py

Note:

If you wish to only test the app without cloning the repo, you can do so using this link. This is possible because the the application has been deployed on streamlit cloud.

Project Worflow

graph TD
    A[Load data] --> B(Exploratory Data Analysis);
    B --> C{Train Model};
    C -- Linear Regression --> D[Evaluation];
    C -- Lasso --> D;
    C -- Linear SVR --> D;
    C -- ElasticNet --> D;
    D --> E[Optimisation];
    E --> F[Streamlit Application]
    F --> G[Deployment on Cloud];

Authors

Michael ADEBAYO [GitHub]
Khadija AASSI [GitHub]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicting an Insurance Premium: Assur'Aimant US Expansion

Project Context

Data

Objectives

Tools and Technologies

Projet files

How to Run

Note:

Project Worflow

Authors

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
.devcontainer		.devcontainer
asset		asset
model		model
notebooks		notebooks
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Predicting an Insurance Premium: Assur'Aimant US Expansion

Project Context

Data

Objectives

Tools and Technologies

Projet files

How to Run

Note:

Project Worflow

Authors

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages