LiDAR terrain-anomaly detection for archaeological review
MayaScan is a Python geospatial pipeline for turning raw LAZ/LAS point clouds into ranked candidate features for analyst review. It builds terrain rasters, detects positive-relief regions, scores them with interpretable geomorphic metrics, clusters them spatially, and exports GIS-ready outputs. The project includes both a command-line workflow and a Streamlit app for running and reviewing results in one place.
MayaScan is designed for triage, not confirmation. It highlights terrain anomalies that may deserve a closer look; archaeological interpretation and field validation still require expert review.
Visible architecture at Caracol, Belize
LiDAR can reveal large-scale architecture hidden beneath vegetation
- Converts LAZ/LAS input into DTM, LRM, and density rasters
- Detects region-level candidate features instead of relying on centroid-only logic
- Supports overlap-aware multi-threshold consensus to reduce one-threshold artifacts
- Scores candidates with interpretable components such as density, relief, prominence, compactness, solidity, and area
- Uses DBSCAN to group candidates into possible settlement patterns
- Exports CSV, GeoJSON, KML, Markdown, PDF, and HTML outputs
- Includes a Streamlit review app with presets, diagnostics, labeling, comparison mode, and ZIP export
MayaScan is currently tuned for:
- low-relief tropical landscapes
- subtle platforms and mounds, roughly
0.3-2.0 mof relief - tile-by-tile exploratory analysis and ranking
- MayaScan identifies terrain anomalies, not confirmed archaeological sites.
- All outputs should be treated as review aids and checked by domain experts.
- Coordinate data and derived products should be handled carefully to reduce the risk of disturbance or looting.
- This repository includes a single demonstration tile at
data/lidar/sample.lazfor reproducible testing. - The project intentionally avoids publishing curated site interpretations or sensitive location outputs.
- Python
3.10+ - PDAL installed at the system level
- Python packages from
requirements.txt
Current package minimums:
numpy>=1.23,scipy>=1.9,pandas>=1.5rasterio>=1.3,pyproj>=3.4,shapely>=2.0scikit-learn>=1.2,matplotlib>=3.6,reportlab>=3.6,streamlit>=1.30
Install PDAL:
- macOS:
brew install pdal - Ubuntu:
sudo apt install pdal - Windows (conda):
conda install -c conda-forge pdal
Install Python dependencies:
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtRecommended sanity checks:
pdal --version
python -c "import rasterio, pyproj, scipy, streamlit"The repository includes a smoke-test tile at data/lidar/sample.laz.
streamlit run app.pyThen:
- Use
data/lidar/sample.lazor upload your own.laz/.lasfile - Choose a preset;
Balanced (Recommended)is the default starting point - Enter a run name and click Run MayaScan
- Review the map, ranked candidates, diagnostics, and score breakdown in the Results tab
- Optionally compare presets or add analyst labels (
likely,unlikely,unknown)
Minimal run:
python maya_scan.py \
-i data/lidar/sample.laz \
--name caracol_sample_test \
--overwrite \
--try-smrfShow all options:
python maya_scan.py --helpOutputs are written to:
runs/<run_name>/
The HTML report is written to:
runs/<run_name>/report.html
Advanced CLI example
python maya_scan.py \
-i data/lidar/sample.laz \
--name caracol_sample_test \
--overwrite \
--try-smrf \
--pos-thresh auto:p96 \
--min-density auto:p60 \
--density-sigma 40 \
--max-slope-deg 20 \
--consensus-percentiles 95,96,97 \
--consensus-min-support 2 \
--consensus-radius-m 12 \
--min-peak 0.50 \
--min-area-m2 25 \
--max-area-m2 1200 \
--min-extent 0.38 \
--max-aspect 3.5 \
--edge-buffer-m 10 \
--min-spacing-m 15 \
--min-prominence 0.10 \
--min-compactness 0.12 \
--min-solidity 0.50 \
--cluster-eps auto \
--min-samples 4 \
--report-top-n 30 \
--label-top-n 60- Ground model: PDAL converts the point cloud into a DTM raster. Optional SMRF classification can be applied first.
- Local relief model: MayaScan computes a multi-scale LRM by subtracting a broader smoothed surface from a finer one.
- Region detection: connected positive-relief regions are extracted and cleaned up morphologically.
- Consensus support: optional multi-threshold runs match regions across percentile thresholds using raster overlap, with centroid distance as a secondary guard, and keep candidates with enough support.
- Region metrics: each candidate region gets area, peak relief, prominence, extent, aspect ratio, compactness, solidity, and size metrics.
- Density modeling: a smoothed feature-density surface is built and sampled at the region level.
- Post-filtering: regions are filtered by density, shape, slope, edge proximity, and spacing to reduce noise and duplicates.
- Scoring and clustering: remaining candidates are ranked, clustered with DBSCAN, and annotated with distance to the densest member of their assigned cluster.
- Reporting: the pipeline writes GIS exports, plots, reports, and run metadata for reproducibility.
--pos-thresh auto:p96Sets the positive-relief threshold in LRM space. Higher percentiles usually produce fewer, stronger candidates.--min-density auto:p60Sets the density threshold used for filtering and scoring.--density-sigma 40Controls how broadly the candidate-density surface is smoothed.--max-slope-deg 20Rejects steep regions using the q75 slope statistic.
--consensus-percentiles 95,96,97Runs candidate extraction at multiple thresholds.--consensus-min-support 2Requires support from at least this many thresholded runs, including the primary run.--consensus-radius-m 12Sets the centroid-distance guard used when counting cross-threshold support; matches still require real raster overlap.--no-consensusDisables consensus filtering entirely.
--min-peak,--min-area-m2,--max-area-m2Remove features that are too weak, too small, or too large.--min-extent,--max-aspectSuppress elongated or poorly filled regions.--min-prominence,--min-compactness,--min-solidityRemove regions that look weak, linear, or fragmented.--edge-buffer-m,--min-spacing-mReduce tile-edge artifacts and near-duplicate detections.
--cluster-eps autoUses automatic or fixed DBSCAN radius in meters.autoestimates eps from the k-distance knee with a percentile fallback.--min-samplesSets the minimum candidates needed to form a cluster.--report-top-n,--label-top-nControl how many candidates are emphasized in reports and KML labels.
Each run writes a folder under runs/<run_name>/. Common outputs include:
dtm.tif,lrm.tif,mound_density.tifcandidates.csvcandidates.geojson,candidates.kmlreport.md,report.pdf,report.htmlhtml/img/candidate cutouts for the HTML reportplots/diagnostic plots and histogramsrun_params.jsonwith resolved settings, thresholds, accounting, and runtimescandidate_labels.csvwhen analyst labeling is used
The Streamlit app can also prepare a ZIP archive of run outputs. Across runs, MayaScan appends summary information to runs/manifest.csv.
Candidate exports include clustering fields such as cluster_id and dist_to_core_km, where dist_to_core_km is the distance to the densest candidate within the same cluster.
By default, candidates are ranked with this multiplicative score:
Score =
Density^1.00
x PeakRelief^1.00
x Extent^0.35
x Support^0.40
x Prominence^0.75
x Compactness^0.20
x Solidity^0.20
x Area^0.50
The score is only meaningful within a run. It is a ranking signal, not a calibrated probability.
The Streamlit app also reports a simple run-quality heuristic based on five checks:
8 <= candidates <= 250- at least one non-noise cluster
top_score >= 2.0median_score >= 0.35noise / candidates <= 0.70
Quality badges such as Strong, Moderate, and Weak/noisy are meant for triage only.
Public LiDAR datasets can be downloaded from OpenTopography.
Typical workflow:
- Download LAZ tiles for an area of interest
- Place them under
data/lidar/ - Run MayaScan on the local files
MayaScan currently works on local input files only. No API key is required.
- MayaScan does not confirm archaeological features; it only prioritizes anomalies.
- Scores are relative within a run and should not be interpreted as probabilities.
- Strict consensus settings can suppress isolated true positives.
- False positives are more common in rugged terrain, modern earthworks, and heavily modified landscapes.
- Output quality depends on point-cloud quality, ground classification quality, and parameter choice.
- Analyst labels are review metadata, not training labels.
- The current workflow is mainly tuned for single-tile or tile-at-a-time analysis.
- multi-tile regional analysis
- linear-feature detection
- automated parameter adaptation from diagnostics and analyst feedback
MayaScan/
├── app.py
├── maya_scan.py
├── README.md
├── requirements.txt
├── LICENSE
├── assets/
│ ├── mayascan_logo.svg
│ ├── caracol_caana.png
│ └── aguada_fenix_lidar.png
└── data/
└── lidar/
├── .gitkeep
└── sample.laz
Generated outputs under runs/ and local LiDAR files under data/lidar/ are typically gitignored, except for the bundled sample tile.
- Python
- NumPy, SciPy, Pandas
- Rasterio, PyProj, Shapely
- PDAL
- scikit-learn
- Matplotlib, ReportLab
- Streamlit
Large language models were used for prototyping, debugging support, and documentation refinement. Method choices, parameter interpretation, and project validation were reviewed manually.
This project is licensed under the MIT License. See LICENSE for details.
James Adelhelm
Software Developer on the Data Ingest team at AccuWeather.
MayaScan is an independent personal research and software project driven by an interest in Maya history. It is not affiliated with, endorsed by, or sponsored by AccuWeather.
Caana, Caracol (Belize)
Photo by Devon Jones - Wikimedia Commons
License: CC BY-SA 3.0
https://commons.wikimedia.org/wiki/File:Caracol-Temple.jpg
Aguada Fenix LiDAR
Courtesy of Takeshi Inomata - Wikimedia Commons
License: CC BY-SA 4.0
https://commons.wikimedia.org/wiki/File:Aguada_F%C3%A9nix_1.jpg

