MayaScan

LiDAR terrain-anomaly detection for archaeological review

MayaScan is a Python geospatial pipeline for turning raw LAZ/LAS point clouds into ranked candidate features for analyst review. It builds terrain rasters, detects positive-relief regions, scores them with interpretable geomorphic metrics, clusters them spatially, and exports GIS-ready outputs. The project includes both a command-line workflow and a Streamlit app for running and reviewing results in one place.

MayaScan is designed for triage, not confirmation. It highlights terrain anomalies that may deserve a closer look; archaeological interpretation and field validation still require expert review.

Visible architecture at Caracol, Belize

LiDAR can reveal large-scale architecture hidden beneath vegetation

Highlights

Converts LAZ/LAS input into DTM, LRM, and density rasters
Detects region-level candidate features instead of relying on centroid-only logic
Supports overlap-aware multi-threshold consensus to reduce one-threshold artifacts
Scores candidates with interpretable components such as density, relief, prominence, compactness, solidity, and area
Uses DBSCAN to group candidates into possible settlement patterns
Exports CSV, GeoJSON, KML, Markdown, PDF, and HTML outputs
Includes a Streamlit review app with presets, diagnostics, labeling, comparison mode, and ZIP export

Designed For

MayaScan is currently tuned for:

low-relief tropical landscapes
subtle platforms and mounds, roughly 0.3-2.0 m of relief
tile-by-tile exploratory analysis and ranking

Responsible Use

MayaScan identifies terrain anomalies, not confirmed archaeological sites.
All outputs should be treated as review aids and checked by domain experts.
Coordinate data and derived products should be handled carefully to reduce the risk of disturbance or looting.
This repository includes a single demonstration tile at data/lidar/sample.laz for reproducible testing.
The project intentionally avoids publishing curated site interpretations or sensitive location outputs.

Installation

Requirements

Python 3.10+
PDAL installed at the system level
Python packages from requirements.txt

Current package minimums:

numpy>=1.23, scipy>=1.9, pandas>=1.5
rasterio>=1.3, pyproj>=3.4, shapely>=2.0
scikit-learn>=1.2, matplotlib>=3.6, reportlab>=3.6, streamlit>=1.30

Install PDAL:

macOS: brew install pdal
Ubuntu: sudo apt install pdal
Windows (conda): conda install -c conda-forge pdal

Install Python dependencies:

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Recommended sanity checks:

pdal --version
python -c "import rasterio, pyproj, scipy, streamlit"

Quick Start

The repository includes a smoke-test tile at data/lidar/sample.laz.

Streamlit app

streamlit run app.py

Then:

Use data/lidar/sample.laz or upload your own .laz/.las file
Choose a preset; Balanced (Recommended) is the default starting point
Enter a run name and click Run MayaScan
Review the map, ranked candidates, diagnostics, and score breakdown in the Results tab
Optionally compare presets or add analyst labels (likely, unlikely, unknown)

CLI

Minimal run:

python maya_scan.py \
  -i data/lidar/sample.laz \
  --name caracol_sample_test \
  --overwrite \
  --try-smrf

Show all options:

python maya_scan.py --help

Outputs are written to:

runs/<run_name>/

The HTML report is written to:

runs/<run_name>/report.html

Advanced CLI example

python maya_scan.py \
  -i data/lidar/sample.laz \
  --name caracol_sample_test \
  --overwrite \
  --try-smrf \
  --pos-thresh auto:p96 \
  --min-density auto:p60 \
  --density-sigma 40 \
  --max-slope-deg 20 \
  --consensus-percentiles 95,96,97 \
  --consensus-min-support 2 \
  --consensus-radius-m 12 \
  --min-peak 0.50 \
  --min-area-m2 25 \
  --max-area-m2 1200 \
  --min-extent 0.38 \
  --max-aspect 3.5 \
  --edge-buffer-m 10 \
  --min-spacing-m 15 \
  --min-prominence 0.10 \
  --min-compactness 0.12 \
  --min-solidity 0.50 \
  --cluster-eps auto \
  --min-samples 4 \
  --report-top-n 30 \
  --label-top-n 60

How It Works

Ground model: PDAL converts the point cloud into a DTM raster. Optional SMRF classification can be applied first.
Local relief model: MayaScan computes a multi-scale LRM by subtracting a broader smoothed surface from a finer one.
Region detection: connected positive-relief regions are extracted and cleaned up morphologically.
Consensus support: optional multi-threshold runs match regions across percentile thresholds using raster overlap, with centroid distance as a secondary guard, and keep candidates with enough support.
Region metrics: each candidate region gets area, peak relief, prominence, extent, aspect ratio, compactness, solidity, and size metrics.
Density modeling: a smoothed feature-density surface is built and sampled at the region level.
Post-filtering: regions are filtered by density, shape, slope, edge proximity, and spacing to reduce noise and duplicates.
Scoring and clustering: remaining candidates are ranked, clustered with DBSCAN, and annotated with distance to the densest member of their assigned cluster.
Reporting: the pipeline writes GIS exports, plots, reports, and run metadata for reproducibility.

Key Parameters

Detection

--pos-thresh auto:p96 Sets the positive-relief threshold in LRM space. Higher percentiles usually produce fewer, stronger candidates.
--min-density auto:p60 Sets the density threshold used for filtering and scoring.
--density-sigma 40 Controls how broadly the candidate-density surface is smoothed.
--max-slope-deg 20 Rejects steep regions using the q75 slope statistic.

Consensus

--consensus-percentiles 95,96,97 Runs candidate extraction at multiple thresholds.
--consensus-min-support 2 Requires support from at least this many thresholded runs, including the primary run.
--consensus-radius-m 12 Sets the centroid-distance guard used when counting cross-threshold support; matches still require real raster overlap.
--no-consensus Disables consensus filtering entirely.

Shape cleanup

--min-peak, --min-area-m2, --max-area-m2 Remove features that are too weak, too small, or too large.
--min-extent, --max-aspect Suppress elongated or poorly filled regions.
--min-prominence, --min-compactness, --min-solidity Remove regions that look weak, linear, or fragmented.
--edge-buffer-m, --min-spacing-m Reduce tile-edge artifacts and near-duplicate detections.

Clustering and reporting

--cluster-eps auto Uses automatic or fixed DBSCAN radius in meters. auto estimates eps from the k-distance knee with a percentile fallback.
--min-samples Sets the minimum candidates needed to form a cluster.
--report-top-n, --label-top-n Control how many candidates are emphasized in reports and KML labels.

Outputs

Each run writes a folder under runs/<run_name>/. Common outputs include:

dtm.tif, lrm.tif, mound_density.tif
candidates.csv
candidates.geojson, candidates.kml
report.md, report.pdf, report.html
html/img/ candidate cutouts for the HTML report
plots/ diagnostic plots and histograms
run_params.json with resolved settings, thresholds, accounting, and runtimes
candidate_labels.csv when analyst labeling is used

The Streamlit app can also prepare a ZIP archive of run outputs. Across runs, MayaScan appends summary information to runs/manifest.csv.

Candidate exports include clustering fields such as cluster_id and dist_to_core_km, where dist_to_core_km is the distance to the densest candidate within the same cluster.

Scoring and Run Quality

By default, candidates are ranked with this multiplicative score:

Score =
  Density^1.00
  x PeakRelief^1.00
  x Extent^0.35
  x Support^0.40
  x Prominence^0.75
  x Compactness^0.20
  x Solidity^0.20
  x Area^0.50

The score is only meaningful within a run. It is a ranking signal, not a calibrated probability.

The Streamlit app also reports a simple run-quality heuristic based on five checks:

8 <= candidates <= 250
at least one non-noise cluster
top_score >= 2.0
median_score >= 0.35
noise / candidates <= 0.70

Quality badges such as Strong, Moderate, and Weak/noisy are meant for triage only.

Data Sources

Public LiDAR datasets can be downloaded from OpenTopography.

Typical workflow:

Download LAZ tiles for an area of interest
Place them under data/lidar/
Run MayaScan on the local files

MayaScan currently works on local input files only. No API key is required.

Limitations

MayaScan does not confirm archaeological features; it only prioritizes anomalies.
Scores are relative within a run and should not be interpreted as probabilities.
Strict consensus settings can suppress isolated true positives.
False positives are more common in rugged terrain, modern earthworks, and heavily modified landscapes.
Output quality depends on point-cloud quality, ground classification quality, and parameter choice.
Analyst labels are review metadata, not training labels.
The current workflow is mainly tuned for single-tile or tile-at-a-time analysis.

Future Work

multi-tile regional analysis
linear-feature detection
automated parameter adaptation from diagnostics and analyst feedback

Repository Layout

MayaScan/
├── app.py
├── maya_scan.py
├── README.md
├── requirements.txt
├── LICENSE
├── assets/
│   ├── mayascan_logo.svg
│   ├── caracol_caana.png
│   └── aguada_fenix_lidar.png
└── data/
    └── lidar/
        ├── .gitkeep
        └── sample.laz

Generated outputs under runs/ and local LiDAR files under data/lidar/ are typically gitignored, except for the bundled sample tile.

Tech Stack

Python
NumPy, SciPy, Pandas
Rasterio, PyProj, Shapely
PDAL
scikit-learn
Matplotlib, ReportLab
Streamlit

Development Note

Large language models were used for prototyping, debugging support, and documentation refinement. Method choices, parameter interpretation, and project validation were reviewed manually.

License

This project is licensed under the MIT License. See LICENSE for details.

Author

James Adelhelm
Software Developer on the Data Ingest team at AccuWeather.

MayaScan is an independent personal research and software project driven by an interest in Maya history. It is not affiliated with, endorsed by, or sponsored by AccuWeather.

Image Credits

Caana, Caracol (Belize)
Photo by Devon Jones - Wikimedia Commons
License: CC BY-SA 3.0
https://commons.wikimedia.org/wiki/File:Caracol-Temple.jpg

Aguada Fenix LiDAR
Courtesy of Takeshi Inomata - Wikimedia Commons
License: CC BY-SA 4.0
https://commons.wikimedia.org/wiki/File:Aguada_F%C3%A9nix_1.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MayaScan

Highlights

Designed For

Responsible Use

Installation

Requirements

Quick Start

Streamlit app

CLI

How It Works

Key Parameters

Detection

Consensus

Shape cleanup

Clustering and reporting

Outputs

Scoring and Run Quality

Data Sources

Limitations

Future Work

Repository Layout

Tech Stack

Development Note

License

Author

Image Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
assets		assets
data/lidar		data/lidar
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
maya_scan.py		maya_scan.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

MayaScan

Highlights

Designed For

Responsible Use

Installation

Requirements

Quick Start

Streamlit app

CLI

How It Works

Key Parameters

Detection

Consensus

Shape cleanup

Clustering and reporting

Outputs

Scoring and Run Quality

Data Sources

Limitations

Future Work

Repository Layout

Tech Stack

Development Note

License

Author

Image Credits

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages