Task: Detect and count individual trees and orchards in very high-resolution satellite imagery using Machine Learning and Deep Learning.
A diversity of horticulture plantations is grown commercially. This project presents an end-to-end pipeline from raw 4-band GeoTIFF satellite imagery all the way to per-pixel density-map predictions to detect and count individual trees and orchards using a custom PyTorch U-Net trained with density map regression.
This project was built as part of the GeoSpatial AI Challenge held at Visvesvaraya National Institute of Technology (VNIT), Nagpur, where our team secured 2nd Runner Up.
Challenge Problem Statement (PDF)
- Harsh Shinde - Lead
- Aditya Singh - Member
- Ankit Kirtane - Member
We started with two 4-band high-resolution satellite GeoTIFF images (Blue, Green, Red, NIR):
| Role | File | Purpose |
|---|---|---|
| Train | tree_orchard_count_new0.tif |
Model training and annotation |
| Test | tree_orchard_count_new1.tif |
Held-out evaluation and inference |
Train Image (tree_orchard_count_new0.tif) |
Test Image (tree_orchard_count_new1.tif) |
|---|---|
![]() |
![]() |
The original GeoTIFFs are far too large to process or annotate directly. We use tile_images.py to slice them into manageable 512 x 512 patches with overlap.
| Parameter | Value |
|---|---|
| Tile size | 512 x 512 px |
| Overlap | 128 px |
| Band mapping | R = B3, G = B2, B = B1 |
| Output format | PNG |
| Train tiles | 153 |
| Test tiles | 40 |
Images are split into train/ and test/ based on the SPLIT_MAP configuration.
| Train Tiles | Test Tiles |
|---|---|
![]() |
![]() |
The PNG tiles from Step 1 were uploaded to Roboflow for manual annotation.
- Class annotated:
tree/orchards - Annotation type: Bounding boxes around each individual tree/orchard
- Export format: Pascal VOC (XML)
Once downloaded, prepare_dataset.py reads the VOC XML files and generates binary segmentation / point masks:
- A small filled circle is drawn at each bounding-box center.
- This produces clean, non-overlapping point annotations ideal for density-map regression.
To train on the full 4-band multispectral data (not just the visual RGB PNGs), we go back to the original GeoTIFFs.
tile_and_mask_tif.pytiles the source images into 512 x 512 GeoTIFF tiles preserving:- All 4 spectral bands (B, G, R, NIR)
- Geospatial metadata (CRS, affine transform)
- Simultaneously generates matching binary mask GeoTIFFs from the VOC XML bounding-box annotations.
- Tiles with less than 30% valid pixels are automatically skipped.
To verify the annotations have been accurately mapped back onto the original GeoTIFFs, we generated previews overlaid with the annotated masks (green dots represent individual trees):
| Train Data Previews | Test Data Previews |
|---|---|
![]() |
![]() |
Output structure:
dataset/
├── image/
│ ├── train/ (153 tiles — 4-band GeoTIFF)
│ └── test/ (40 tiles — 4-band GeoTIFF)
└── annotation/
├── train/ (153 masks — 1-band uint8 GeoTIFF)
└── test/ (40 masks — 1-band uint8 GeoTIFF)
The complete processed dataset including the 4-band image tiles, point-mask annotation tiles, and the original full-scene TIF files has been published to Hugging Face.
Dataset Link: harshinde/tree-orchard-detection
Instead of traditional object detection, we use density map regression. Each point annotation is convolved with a Gaussian kernel (sigma = 2) to produce a smooth density map. The integral (pixel sum) of the predicted density map directly gives the tree count.
A lightweight 3-level U-Net built in PyTorch:
Below is a sample from the training set showing the RGB image, ground-truth mask, and the corresponding Gaussian density map:
Each prediction panel shows:
- Original RGB - the input satellite tile rendered as true-color.
- Predicted Density Map - heatmap where the sum gives the predicted tree count.
- Overlay - predicted density heatmap blended on top of the original image, highlighting detected tree locations.
- Python 3.10+
- CUDA-compatible GPU (recommended)
cd code/src
pip install -r requirements.txtcd code/src
python train.pyThe best model checkpoint (lowest validation MAE) is automatically saved to best_model.pth.
cd code/src
python predict.pyThis runs full-image inference on the target test GeoTIFF and outputs inference_output.png containing the RGB render, predicted density map, and an overlay visualization.
Kaggle - https://www.kaggle.com/code/harshshinde8/tree-orchard-detection
| Notebook | Description |
|---|---|
| code.ipynb | Complete end-to-end workflow - data loading, model definition, training, evaluation, and visualization |
| inference.ipynb | Load the pre-trained best_model.pth and run inference on any GeoTIFF - no training required |
flowchart LR
A["4-Band GeoTIFFs"] --> B["tile_images.py"]
B --> C["Roboflow Annotation"]
C --> D["prepare_dataset.py"]
D --> E["tile_and_mask_tif.py"]
E --> F["Hugging Face Dataset"]
E --> G["U-Net Training"]
G --> H["Inference and Counting"]










