Skip to content

Salierimra/04_Project_PlayZone_Anonymisation_Lactation_Datas

Repository files navigation

Project-PlayZone Technobel-Anonymisation-données-lactation

Project Description

This project was developed for Eleveo with the objective of generating synthetic lactation data that closely resembles real-world datasets while ensuring proper anonymization.

To achieve this, a Python script is used to read the production database, extract key statistical characteristics, and generate new data that preserves the original distributions and proportions.

Project Goal

The project aims to:

  • Ensure full anonymization of real data
  • Preserve statistical properties such as mean and standard deviation for all lactation parameters
  • Generate a SQLite database that maintains the original data distribution and proportions

Key Questions

  • Is the generated data fully anonymized?
  • Does the generated data accurately preserve the statistical characteristics of the original dataset?

Technologies Used

  • Sqlite 3 Database
  • Python (sk-learn)
  • Power BI

Project phases

EDA

This section outlines the most significant statistical findings derived from the Exploratory Data Analysis (EDA) phase.

65% of the alive cows belong to breed "4".

image

Lact quantity per laction follows a normal distribution

image

Number of lactation for each alive cows

image

Data Collection

Sqlite Database provided by Client

Data Cleaning

N/A in this project

Copy Pasted table

Three tables are directly copied from the input database to the output database:

  • Breed: contains all breeds present in the database
  • ETAPE_CTRL_TEST: contains all possible steps in a lactation control process
  • CTRL_TYPE: contains all possible types of lactation control

Generated Datas

Two Tables must be generated in order to assure anonymyzation :

EXPLOITATION

Creating number of farms desired with respect to input database repartition concerning municipality (first two number in postal code -> https://fr.wikipedia.org/wiki/Code_postal_en_Belgique)

Input DB
image
Output generated datas
image

IDENTANV

Creating number of farms desired with respect to input database repartition concerning number of cows per Postal code and per herd

Input DB
image image
Output generated datas

Not perfect but sometimes the random doesn't really represents really -> accepted by client

image image

Milk production data generation for entire production (CL_LAITLACT)

All datas were generated by GradientBoosterRegressor where : MILK depends on :

  • Breed
  • NOLACT
  • Veil Month
  • Veil Duration MG and PROT depend on MILK

smaller accuracy from one of the model is 65% :

  • Doesn't understand the importance of NOLACT and depend only on Breed but good enough for the customer

Milk production data generation during control (CL_LAITCTRL)

Results

  • Anonymization granted
  • Accuracy good (65%)
  • Data consistency accepted by customer
  • Data Solution accepted and used by customer

Use

About

This project for Eleveo was carried out to generate lactation data that closely reflects real-world data while ensuring proper anonymization.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages