This project analyzes hotel booking data to uncover patterns in cancellations, revenue trends, seasonality, and customer behavior. The outcome is an executive-style, interactive dashboard built in Google Sheets to support data-driven decision-making for hotel management teams.
| 📊 | Overall booking performance |
| 🚫 | Cancellation risk drivers |
| 💰 | Revenue and pricing trends |
| 📡 | Channel and customer segment performance |
|
Context: The hospitality industry faces high revenue volatility due to booking cancellations and fluctuating seasonal demand. Core Problem: High cancellation rates and inconsistent booking behavior reduce revenue predictability and operational efficiency. |
Objective — Use historical booking data to:
- ✅ Reduce cancellation rates
- ✅ Improve revenue stability
- ✅ Optimize pricing and deposit policies
- ✅ Identify high-value, stable customer segments
| "How can hotel management reduce cancellations and maximize revenue using booking behavior insights?" |
| Property | Details |
|---|---|
| 📁 Dataset Name | Hotel Booking Dataset |
| 📐 Type | Structured transactional data |
| 📏 Rows | ~8,700 (post-cleaning) |
| 📊 Columns | 30+ |
| 📅 Time Period | 2015–2017 |
| 🔗 Source | Approved academic dataset (imported into Google Sheets) |
Key Attributes:
| Category | Fields |
|---|---|
| 🏷️ Booking Status | is_canceled, reservation_status |
| 📅 Time Features | arrival_date_year, arrival_date_month |
| 👤 Guest Details | adults, children, babies, country |
| 📡 Channel Info | market_segment, distribution_channel |
| 💵 Revenue Proxy | adr |
All cleaning and preprocessing were performed in Google Sheets as per capstone requirements.
| Step | Description |
|---|---|
| 🔁 Duplicate Removal | Duplicate booking records removed using built-in deduplication |
| ❓ Missing Values | Numeric fields → 0 · Categorical fields → "Unknown" · "NULL" standardized |
| 🔢 Data Type Standardization | Converted numeric fields stored as text into proper numeric format |
| Negative ADR values flagged and handled based on business logic | |
| ✂️ Text Normalization | Trimmed whitespace and standardized category labels |
| 🌍 Country Mapping | ISO country codes mapped to full country names for dashboard readability |
📝 A detailed Logs/Audit sheet documents each transformation step for traceability.
Derived features created to support KPI and dashboard analysis:
| Feature | Formula / Logic |
|---|---|
| 👥 Total Guests | adults + children + babies |
| 🌙 Total Stay Length | weekday nights + weekend nights |
| 👨👩👧 Family Flag | Family vs Non-Family bookings |
| 💰 Revenue (Derived) | ADR × Total Stay Length |
| 📅 Month Number | For chronological sorting of monthly trends |
These KPIs provide an executive snapshot of booking performance and revenue stability.
Pivot tables were created in Google Sheets to support dashboard visualizations:
- 📊 Cancellation Rate by Market Segment
- 📈 Monthly Revenue & Booking Trends
- 💵 ADR by Hotel Type and Month
- ⏱️ Lead Time Group vs Cancellation %
- 🏷️ Deposit Type vs Cancellation %
- 👤 Customer Type Performance
These pivots serve as the data source for all charts in the dashboard.
The final dashboard presents decision-ready insights for non-technical stakeholders.
| 🔹 | Component | Description |
|---|---|---|
| 📋 | KPI Cards | Bookings, Cancellation Rate, Revenue, ADR |
| 📈 | Line Chart | Revenue trend by month |
| 📊 | Bar Charts | Cancellation by market segment, deposit type |
| 📉 | Column Chart | Lead time vs cancellation |
| 🎛️ | Filters / Slicers | Hotel type, year, market segment, customer type |
🎨 The dashboard is designed with a clean, executive layout for quick interpretation.
| 🔴 | Group and OTA bookings exhibit the highest cancellation rates. |
| 🟠 | Long lead-time bookings show higher cancellation risk. |
| 🟢 | Revenue peaks during specific seasonal periods. |
| 🔵 | Repeat guests demonstrate lower cancellation probability. |
| # | Recommendation |
|---|---|
| 1️⃣ | Introduce stricter or non-refundable policies for high-risk segments |
| 2️⃣ | Adjust pricing dynamically during peak demand months |
| 3️⃣ | Promote direct and corporate channels to reduce cancellation risk |
| 4️⃣ | Offer loyalty benefits to repeat guests to improve booking stability |
| 🚀 Future Scope | |
|---|---|
| Revenue derived using ADR (approximation) | Predictive cancellation modeling |
| No external market or competitor data | Time-series forecasting of demand |
| Static historical analysis (no real-time feed) | Real-time dashboard integration |
| — | Deeper segmentation using ML models |
| Resource | Link |
|---|---|
| 📊 Google Sheets (Dashboard & Analysis) | Open Spreadsheet |
| 🎤 Presentation (PPT) | Open on Canva |
| 📝 Project Report (PDF) | View Report |