Skip to content

agilpartida/avocado-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Analysis of US avocado sales and prices (2015–2020)

This project analyzes weekly avocado sales and pricing data in the United States between 2015 and 2020. The objective is to understand price elasticity, compare organic vs. conventional product behavior, identify regional patterns, and evaluate a simple forecasting approach using time series modelling.

The dataset contains 33,045 records and was obtained from Kaggle (Avocado Prices 2020). All analysis was performed in RStudio.

Project Objectives

  • Quantify the relationship between price and sales volume for organic and conventional avocados.
  • Examine regional pricing behavior, using Albany as a case study.
  • Detect outliers in price and volume and assess their potential impact.
  • Build a baseline 12‑week price forecast using an ARIMA model.

A few things that stood out

  • Organic avocados hold demand better: A 10% price increase was associated with roughly a 7.7% drop in volume. For conventional avocados the drop was around 13.2%. That suggests customers buying organic are less price-driven.
  • Albany prices are relatively stable. The time series doesn't show wild swings, which could be useful for inventory planning.
  • There are clear outliers in price and volume, especially on the organic side. Some weeks see prices or volumes far from the typical range, worth investigating further if this were a live business question.

None of these numbers are meant as universal rules; they come from a linear model on log-transformed data and should be interpreted with that in mind.

How the project is organized

avocado-analysis/
│
├── data/
│   └── avocado-updated-2020.csv
│
├── scripts/
│   └── avocado-analysis.R
│
├── imgs/
│   ├── albany_organic_price_forecast_3months.png
│   ├── albany_organic_prices_decomposition.png
│   ├── average_price_boxplot.png
│   ├── boxplot_precios.png
│   ├── price_by_type_boxplot.png
│   ├── series_temporales.png
│   └── total_volume_boxplot.png
│
└── README.md

The R script covers the full pipeline: data loading, cleaning, exploration, regression models, and the ARIMA forecast. All plots were generated with ggplot2, some made interactive with plotly during the analysis phase.

Forecast example

The chart below shows the 12-week forecast for organic avocado prices in Albany. The model captures the general level well, though the confidence bands widen as expected.

Albany organic price forecast

It's a simple univariate ARIMA fitted automatically. With more domain context (seasonality of supply, weather events) the forecast could be improved, but for a first pass it gives a useful baseline.

Tools and packages

Everything was written in R. The main libraries used:

  • Data handling: readr, dplyr, tidyr
  • Visualization: ggplot2, plotly
  • Modeling: stats::lm(), forecast (for ARIMA)
  • Summary statistics: base R functions like cor(), summary(), boxplot()

Why this project exists

I wanted to practice working with a real dataset end-to-end, from messy CSV to a presentable result, and to show how an analyst might approach a retail pricing question. No dashboard is included here; the output is the R script itself and the plots it produces.

The dataset is from 2020 and will not be updated, so the findings are a snapshot of that period.

About

Exploratory analysis and modelling of avocado sales data (organic and conventional) in multiple US markets using R. The project includes outlier detection, correlation analysis, calculation of price-sales elasticities and price forecasting using time series models.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages