Project for exploring store sales data, training forecasting models, and running exploratory notebooks and a small app.
app.py- main application entry (small demo / analysis runner).train.csv,store.csv- core datasets used for modeling and analysis.rossman.csv,rossman.ipynb,rossmann.ipynb- alternative dataset and notebooks.requirements.txt- Python dependencies for this project.
See the repository root for all files.
This repository contains preprocessing, exploratory analysis, and modeling artifacts for a store-sales forecasting task (Rossmann-style dataset). It includes Jupyter notebooks for interactive exploration and an app.py script for quick demos.
- Python 3.8+ recommended
- A Conda environment
data_scientistis used in this workspace (optional but recommended)
Install dependencies:
conda activate data_scientist
pip install -r requirements.txtIf you prefer venv:
python -m venv .venv
source .venv/Scripts/activate # Windows: .venv\\Scripts\\activate
pip install -r requirements.txttrain.csv— primary training data (historical sales per store/day).store.csv— store meta-data (store type, assortment, competition info).rossman.csv— alternate or combined dataset (if present).
Place any additional CSVs in the project root or point your notebook to the appropriate path.
- Activate environment and install dependencies (see Requirements).
- Run the notebooks for exploration:
jupyter notebook
# or
jupyter lab- Run the demo app (if
app.pyis present and runnable):
# Example: run the script directly
python app.py
# If it's a Streamlit app (check header); to run Streamlit:
streamlit run app.pyAdjust the command depending on the app type; inspect app.py for details.
rossman.ipynb/rossmann.ipynb— exploratory analysis and experiments.
Open them with Jupyter to explore preprocessing, feature engineering and model training steps.
- Ensure dependencies are installed.
- Open the main notebook used for training (look for
train.csvusage). - Follow notebook cells sequentially and re-run training/evaluation cells.
For script-based training (if present), run:
python train.py(If train.py is not present, use the provided notebooks to run training.)
- Keep datasets in the repository root or update notebook script paths.
- If you encounter
ModuleNotFoundError, ensure the active Python interpreter matches the environment whererequirements.txtwas installed. - If datasets are large, consider using a sample subset for quick iteration.
If you want, I can:
- Run the notebooks and produce summary outputs
- Add a small
requirements-dev.txtor environment YAML - Convert a main notebook into a reproducible script