Reinforcement Learning for Inventory Management

Overview

This notebook focuses on inventory optimization using Reinforcement Learning (RL), in which you will have to code an optimization algorithm to identify the most effective order-placement policy.

Task

Implement your RL algorithm in your_algorithm.py. You have complete freedom in the choice of the optimization algorithm but, please, respect the template to ensure compliance with the evaluation platform we will use for grading the assignement.

Use the notebook ML4CE_RL_INV_CW.ipynb to execute and assess the performance of your algorithm. Learning curves and reward distribution plots are provided to analyse the training process and compare your algorithm against different benchmark policies, respectively.

Project Structure

ReinforcementLearning/
├── ML4CE_RL_INV_CW.ipynb               # Main notebook
├── README.md                           # This file
├── algorithms/                         # Algorithm implementations
│   ├── __init__.py
│   ├── reinforce.py                    # REINFORCE with baseline
│   ├── simulated_annealing.py          # Simulated Annealing (SA) algorithm
│   ├── heuristic_policy.py             # Heuristic (s,S) policy
│   └── your_algorithm.py               # Your algorithm template
├── benchmarking/                       # Auxiliary files for performance evaluation
│   ├── policy_REINFORCE_with_baseline.py     # Pretrained policy using REINFORCE with baseline
│   ├── policy_SA.py                    # Pretrained policy using SA
│   └── test_demand_dataset.pickle      # Test dataset
├── ML4CE_RL_environment.py             # RL environment
├── common.py                           # Auxiliary functions
├── utils.py                            # Plotting and data management functions
├── requirements.txt                    # Python environment
└── SCstructure.png                     # Environment diagram

RL Environment

This project focuses on the three-echelon supply chain depicted below.

Description

The supplier is an olive oil producer company, whose bottles are sold in different stores around the city. Due to the distance between the production facilities and the stores, the company owns a distribution centre (DC) in the vicinity of the city. Retailers sell the product directly to the customers and place replenishment orders to maintain sufficient stock levels. Likewise, the DC must keep enough inventory level to supply the stores and places replenishment orders directly to the manufacturing company. The challenge is to develop a re-order policy for each participant, since each stage faces uncertain in the demand of the stage succeeding it.

Assumptions

Customer demand is modeled as a random variable following a Poisson distribution.
Production facilities have immediate access to an unlimited supply of raw materials.

What happens during an episode?

Considering a time horizon of 4 weeks, at each day or time step $t$:

DC and retailers place replenishment orders.
DC and retailers receive orders after the corresponding lead time from their respective suppliers and update both inventory on-hand and pipeline inventory.
Each stage satisfies demand of their respective clients according to current inventory levels.
1. Backlogged sales take priority over the orders arriving at current period $t$.
2. Then, the orders placed by the retailers at the current period are fulfilled with the remaining available inventory.
3. Finally, the backlog of each retailer is updated.
Profit is evaluated as the difference between the sales revenue and the different costs across the entire supply chain (i.e., delivery fees, variable order costs, holding cost, unfulfilled demand penalties and excess capacity cost).

Elements of the Markov Decision Process (MDP)

Action space: the agent must decide the number of units each retailer or DC reorders at each time step.
State space: states represent the inventory position at each time step, which is the difference between the total inventory and the backlog.
Reward: the agent tries to maximize the profit of the supply chain.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reinforcement Learning for Inventory Management

Overview

Task

Project Structure

RL Environment

Description

Assumptions

What happens during an episode?

Elements of the Markov Decision Process (MDP)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
algorithms		algorithms
benchmarking		benchmarking
.gitignore		.gitignore
ML4CE_RL_INV_CW.ipynb		ML4CE_RL_INV_CW.ipynb
ML4CE_RL_environment.py		ML4CE_RL_environment.py
README.md		README.md
SCstructure.png		SCstructure.png
common.py		common.py
requirements.txt		requirements.txt
utils.py		utils.py

Folders and files

Latest commit

History

Repository files navigation

Reinforcement Learning for Inventory Management

Overview

Task

Project Structure

RL Environment

Description

Assumptions

What happens during an episode?

Elements of the Markov Decision Process (MDP)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages