This project implements a Monte Carlo simulation engine for estimating business cash flows using historical data.
It models uncertainty in daily revenue and costs by sampling from empirically derived probability distributions and aggregating outcomes across many simulation runs. This produces a distribution of outcomes that allows estimation of expected revenue, expected cash position and probability of negative cashflow.
This implementation focuses on efficiency, locality and scalable throughput enabling large scale MC analysis.
Businesses rely on deterministic forecasts which produces a single outcome instead of mirroring the real world where dynamics are often stochastic and influenced by variability.
The goals are:
- Model uncertainty
- Generate a distribution of outcomes
- Estimate risk metrics like the probability of negative cash flow
This project provides a computationally efficient implementation that scales MC analysis.
The project was built to explore low-level systems design in simulation performance, some of the concepts covered include:
- Memory layout
- RNG
- Parallel execution
- Cache efficiency
Below is the order, explanation and choices:
We include
- iostream for cout which is for printing results
- vector for containers
- cmath for squareroots
- random for rng
- chrono for simulation time
- omp.h for openmp parallelization
This computes both the mean and standard deviation and returns both as a pair. We set the function up this was so that we could reuse the data as compared to the scenario of having two different functions We use reference to avoid copying the data directly which is expensive The length is gotten by extracting the size of the vector.
- Random seed - Its the rng seed
- Forecast length - Number of forward days for the forecast
- Simulation Budget - The monte carlo simulation budget
- Unit price - The price of a single sold unit
- Initial Cash - The cash currently available
-
std::mt19937 rng(random_seed)this initializes the random number generator -
The two vector containers contain the data to be used.
-
Next is the computation of the standard deviations and means
// Historical data
std::vector<int> sales = {50,28,40,38,70,39,29,10,38}; // sales vector
std::vector<int> expenses = {20,10,25,10,30,18,9,2,23}; //expenses vector
// computes the means and standard deviations using the updated fused function
auto [sales_mean, sales_std] = mean_stdev_generation(sales);
auto [expenses_mean, expenses_std] = mean_stdev_generation(expenses);#pragma omp parallel reduction(+:total_sales, total_cash, negative_count)- This is responsible for the accumulation, each thread will accumulate its own then there will be a final merge
- Within it, we now initialize the rng and normal distributions, we avoided initializing it within the loop in this version to avoid overhead that comes with initializing it per simulation.
-
pragma omp parallel for schedule(static)- This is for the parallelization of the mc loop. -
sales_sum&expenses_sum- They initialize the sales and expenses total per day. -
for (int day = 0; day < forecast_length; day++)- This is the number of forecast days under each simulation. Parallelization here only adds more overhead so I avoided it. However, since days are per sim, it ends up in vetter locality for cache friendliness -
s = std::max(0.0, s)ande = std::max(0.0, e)- These ensure that the minimum allowed is always 0. -
Next we have the weekly revenue calculation as well as the final cash and then our accumulations are done here:
// accumulates directly, the total sales, and final cash using the reduction
total_sales += sales_sum;
total_cash += final_cash;
if (final_cash < 0.0) negative_count++;- we then compute the other final statistics here and then are ready for output:
double mean_weekly_sales = total_sales / simulation_budget;
double mean_final_cash = total_cash / simulation_budget;
double prob_negative = static_cast<double>(negative_count) / simulation_budget * 100.0;- For performance profiling, we ran:
perf stat -r 20 -e task-clock,cycles,instructions,cache-references,cache-misses,branches,branch-misses ./weekly_cashflow| Configuration | Time (s) | vs Python |
|---|---|---|
| Python baseline | 0.210 | 1.00× |
| C++ — initial implementation | 0.151 | 1.39× |
| C++ — restructured implementation | 0.088 | 2.38× |
The restructured implementation produced a 1.71× improvement over the initial C++ version through three compounding changes:
Eliminated intermediate storage: The initial version stored all 100K simulation results in a vector of structs , then made a second OpenMP pass to compute statistics. The current version accumulates directly into OpenMP reduction variables no intermediate allocation, no second traversal.
Fixed parallelization structure: In the initial version, the
#pragma omp parallel for directive was placed outside the simulation
loop body, leaving the MC loop running sequentially. The current
version uses a correctly structured parallel region: one #pragma omp parallel block initialises per-thread RNG and distributions, with
#pragma omp for schedule(static) distributing simulation iterations
across threads.
Welford online algorithm for statistics: The initial mean/stdev computation made two passes over the data : one for mean, one for standard deviation. The current implementation uses Welford's online algorithm: a single pass that updates mean and variance incrementally, eliminating the redundant traversal and improving numerical stability for inputs clustered near large means.
git clone https://github.com/tommygrammar/weekly_cashflow_simulator
cd weekly_cashflow_simulator
mkdir build
cd build
cmake ..
make
./weekly_cashflow