Project-PlayZone Technobel-Anonymisation-données-lactation

Project Description

This project was developed for Eleveo with the objective of generating synthetic lactation data that closely resembles real-world datasets while ensuring proper anonymization.

To achieve this, a Python script is used to read the production database, extract key statistical characteristics, and generate new data that preserves the original distributions and proportions.

Project Goal

The project aims to:

Ensure full anonymization of real data
Preserve statistical properties such as mean and standard deviation for all lactation parameters
Generate a SQLite database that maintains the original data distribution and proportions

Key Questions

Is the generated data fully anonymized?
Does the generated data accurately preserve the statistical characteristics of the original dataset?

Technologies Used

Sqlite 3 Database
Python (sk-learn)
Power BI

Project phases

EDA

This section outlines the most significant statistical findings derived from the Exploratory Data Analysis (EDA) phase.

65% of the alive cows belong to breed "4".

Lact quantity per laction follows a normal distribution

Number of lactation for each alive cows

Data Collection

Sqlite Database provided by Client

Data Cleaning

N/A in this project

Copy Pasted table

Three tables are directly copied from the input database to the output database:

Breed: contains all breeds present in the database
ETAPE_CTRL_TEST: contains all possible steps in a lactation control process
CTRL_TYPE: contains all possible types of lactation control

Generated Datas

Two Tables must be generated in order to assure anonymyzation :

EXPLOITATION

Creating number of farms desired with respect to input database repartition concerning municipality (first two number in postal code -> https://fr.wikipedia.org/wiki/Code_postal_en_Belgique)

Input DB

Output generated datas

IDENTANV

Creating number of farms desired with respect to input database repartition concerning number of cows per Postal code and per herd

Input DB

Output generated datas

Not perfect but sometimes the random doesn't really represents really -> accepted by client

Milk production data generation for entire production (CL_LAITLACT)

All datas were generated by GradientBoosterRegressor where : MILK depends on :

Breed
NOLACT
Veil Month
Veil Duration MG and PROT depend on MILK

smaller accuracy from one of the model is 65% :

Doesn't understand the importance of NOLACT and depend only on Breed but good enough for the customer

Milk production data generation during control (CL_LAITCTRL)

Results

Anonymization granted
Accuracy good (65%)
Data consistency accepted by customer
Data Solution accepted and used by customer

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
Annexe CL.pdf		Annexe CL.pdf
Integration_BDD_Power_BI.odt		Integration_BDD_Power_BI.odt
Python_Script_Eleveo_Working_In_progress.py		Python_Script_Eleveo_Working_In_progress.py
README.md		README.md
Wood_explanation.txt		Wood_explanation.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project-PlayZone Technobel-Anonymisation-données-lactation

Project Description

Project Goal

Key Questions

Technologies Used

Project phases

EDA

Data Collection

Data Cleaning

Copy Pasted table

Generated Datas

EXPLOITATION

Input DB

Output generated datas

IDENTANV

Input DB

Output generated datas

Milk production data generation for entire production (CL_LAITLACT)

Milk production data generation during control (CL_LAITCTRL)

Results

Use

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Project-PlayZone Technobel-Anonymisation-données-lactation

Project Description

Project Goal

Key Questions

Technologies Used

Project phases

EDA

Data Collection

Data Cleaning

Copy Pasted table

Generated Datas

EXPLOITATION

Input DB

Output generated datas

IDENTANV

Input DB

Output generated datas

Milk production data generation for entire production (CL_LAITLACT)

Milk production data generation during control (CL_LAITCTRL)

Results

Use

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages