Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
166 changes: 144 additions & 22 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
@@ -1,39 +1,161 @@
# Copilot Instructions — Tissot

Tissot is a visual-first geospatial diagnostics engine written in Rust with Python bindings (PyO3).
See your distortion. Named after Tissot's indicatrix.

## Project Context

- **Language**: Rust (2024 edition, 1.83+) with Python bindings via PyO3/maturin
- **Purpose**: Projection distortion analysis, cartographic linting, spatial diffing, data quality checks
- **Key crates**: geo, proj, geozero, gdal, clap, axum, askama, serde, thiserror, anyhow (CLI only)
- **Philosophy**: Visual-first output (browser maps), zero-config to start, autofix capability
- **Purpose**: Projection distortion analysis, cartographic linting, spatial diffing, data quality checks, cloud-native format validation
- **Philosophy**: Visual-first output (browser maps default), zero-config to start, autofix capability
- **License**: MIT OR Apache-2.0
- **Repo**: https://github.com/chrislyonsKY/tissot

## Code Conventions
## Architecture — Six Engines + Supporting Modules

- Library code: use `thiserror` for errors, propagate with `?`, never `unwrap()` or `expect()`
- CLI binary: may use `anyhow` for error handling
- Logging: use `log` crate, never `println!` in library code
- Geometry: always use `geo` crate types (Point, LineString, Polygon)
- CRS: always use `proj` crate for coordinate transforms
- Serialization: `serde` with derive macros
- Testing: every module has `#[cfg(test)] mod tests`
- Formatting: `cargo fmt`, clippy with `-D warnings`
### Core (`src/core/`)
- `types.rs` — Domain, Severity, Finding, CheckContext, Layer, Feature, CrsInfo, Config, BoundingBox, Schema
- `rule.rs` — The `Rule` trait (central abstraction). All checkers implement this. Requires Send + Sync for Rayon parallelism.
- `registry.rs` — Collects, stores, and filters rules by domain/tags/config
- `config.rs` — Loads .tissot.yml, merges with env vars and CLI flags. Zero-config default.

## Architecture
### X-Ray Engine (`src/xray/`) — HERO FEATURE
- `distortion.rs` — Jacobian-based Tissot parameter computation (semimajor, semiminor, area scale, angular distortion)
- `heatmap.rs` — IDW interpolation of distortion values into a continuous grid
- `ellipse.rs` — Generates Tissot ellipse polygons as GeoJSON-serializable geo::Polygon
- `recommend.rs` — CRS recommendation engine: evaluates candidates against actual data distortion
- `sampling.rs` — Stratified grid sampling for large datasets (≤1K: all, ≤50K: 500, >50K: 1000)

Diagnostic rules implement the `Rule` trait (see src/core/rule.rs). The trait requires:
- `id()`, `name()`, `domain()`, `default_severity()`
- `check(&self, ctx: &CheckContext) -> Vec<Finding>`
- Optional: `can_fix()` and `fix()` for autofix support
- Optional: `score_weight()` for quality score calculation
### Checkers (`src/checkers/`)
Five diagnostic domains, each containing rules implementing the Rule trait:

Six subsystems: X-Ray engine, Checker engine, Fix engine, Score engine, Visual report server, IO layer.
**data_quality/** — Structural integrity
- `null_geometry.rs` — "data/null-geometry" (Error)
- `duplicate_geometry.rs` — "data/duplicate-geometry" (Warning)
- `schema_validation.rs` — "data/schema-validate" (Error)
- `extent_bounds.rs` — "data/extent-bounds" (Warning)
- `topology_gaps.rs` — "data/topology-gaps" (Warning, fixable)
- `topology_overlaps.rs` — "data/topology-overlaps" (Warning, fixable)
- `self_intersection.rs` — "data/self-intersection" (Error)

**projection/** — CRS suitability
- `area_distortion.rs` — "proj/area-distortion" (Warning >5%, Error >10%, fixable)
- `distance_distortion.rs` — "proj/distance-distortion" (Warning)
- `datum_mismatch.rs` — "proj/datum-mismatch" (Error)

**cloud/** — Cloud-native format validation (aligned with CNG Formats Guide)
- `format_recommendation.rs` — "cloud/format-recommendation" (Info)
- `crs_metadata.rs` — "cloud/crs-metadata" (Error)
- `multi_file_integrity.rs` — "cloud/multi-file-integrity" (Error)
- `spatial_index.rs` — "cloud/spatial-index" (Warning) [Phase 2]
- `compression.rs` — "cloud/compression" (Info) [Phase 2]
- `file_size.rs` — "cloud/file-size" (Warning) [Phase 2]

**cartography/** — Visual/perceptual map quality [Phase 3]
**diff/** — Change detection between versions [Phase 2]

### Score Engine (`src/score/`)
- `calculator.rs` — Weighted average: Projection 0.25, DataQuality 0.30, Accessibility 0.20, CloudReadiness 0.20, Classification 0.05. Per-category: 100 minus penalties (Error: -15, Warning: -5, Info: -1). Letter grade A-F.
- `categories.rs` — ScoreCategory struct with name, weight, severity counts, computed score
- `badge.rs` — SVG badge generation: "Tissot Score: 87/100 — B" with color coding

### Profile (`src/profile/`)
- `summary.rs` — ProfileSummary: file path, format, size, layer count, per-layer stats (features, geometry type, CRS, extent, fields, null count)
- `format_info.rs` — Format detection from extension, cloud-optimized flag, CNG guide URL

### Explain (`src/explain/`)
- `crs_database.rs` — Curated EPSG lookup table (20+ entries: 4326, 3857, 3089, 2205, UTM zones, Albers, Lambert, etc.) with preservation properties and plain-English descriptions
- `properties.rs` — explain_crs() returns CrsExplanation with projection family, preservation properties, warnings, recommended use

### Fix Engine (`src/fix/`) [Phase 2]
- `reproject.rs`, `topology.rs`, `symbology.rs`, `schema.rs`

### IO Layer (`src/io/`) — Geozero-first (DL-004)
- Primary: geozero + shapefile + flatgeobuf crates (pure Rust, Wasm-compatible)
- Optional: gdal crate behind `--features gdal` flag
- `wasm.rs` — Byte-array IO for browser target (DL-005)

### Report (`src/report/`)
- `visual/server.rs` — Local axum server, opens browser automatically
- `visual/xray_map.rs`, `findings_map.rs`, `score_dashboard.rs`, `diff_slider.rs`, `watch_dashboard.rs`, `combined_report.rs`, `profile_card.rs`, `benchmark_card.rs`
- `terminal.rs` — Rich terminal output (secondary to visual)
- `json.rs` — Machine-readable JSON
- `sarif.rs` — SARIF for CI/CD (Phase 2)

## CLI Commands (Phase 1)

```
tissot xray <file> # Projection distortion → browser map
tissot check <file> # Diagnostic linting → browser findings map
tissot check <file> --domain cloud # Cloud optimization rules only
tissot score <file> # Quality score → browser dashboard
tissot profile <file> # Dataset summary → terminal
tissot explain <epsg|file> # CRS reference → terminal
tissot --terminal # Any command: suppress browser, terminal only
tissot --json # Any command: machine-readable JSON output
```

## Key Crates

| Crate | Purpose |
|-------|---------|
| geo, geo-types | Geometry primitives and algorithms |
| proj | CRS transforms (bundled_proj feature) |
| geozero | Zero-copy format IO |
| shapefile | Pure Rust .shp reader |
| flatgeobuf | Pure Rust .fgb reader |
| gdal | Optional GDAL fallback |
| clap 4 | CLI with derive API |
| serde, serde_json | Serialization |
| serde_yaml | Config file parsing |
| thiserror | Library error types |
| anyhow | CLI error handling (main.rs only) |
| log, env_logger | Logging (never println! in library) |
| rstar | R-tree spatial indexing |
| axum | Local web server for visual reports |
| askama | HTML template engine |
| wasm-bindgen | Wasm↔JS bridge (DL-005) |
| rayon | Parallel rule execution |

## Code Conventions — ALWAYS FOLLOW

### Rust
- Edition 2024, minimum Rust 1.83
- `thiserror` for library errors, `anyhow` for CLI binary ONLY
- Propagate errors with `?` — NEVER `unwrap()` or `expect()` in library code
- `log` crate for output — NEVER `println!` in library code
- All geometry via `geo` crate types — NEVER raw coordinate tuples
- All CRS operations via `proj` crate — NEVER hand-rolled transforms
- `serde` with `#[derive(Serialize, Deserialize)]` on all public data structs
- All checker rules implement the `Rule` trait — no standalone diagnostic functions
- Every module has `#[cfg(test)] mod tests`
- `cargo fmt` and `cargo clippy -- -D warnings` must pass

### Findings
- ALWAYS include `geometry: Some(...)` when the finding has a spatial location
- ALWAYS include a `suggestion` with an actionable fix recommendation
- Set `fixable: true` only when `tissot fix` can resolve the issue
- Link to CNG Formats Guide URLs in cloud rule suggestions

### Visual Reports
- Browser is the PRIMARY output — `--terminal` is the opt-out
- MapLibre GL JS for all map rendering
- Self-contained HTML — NO CDN dependencies, must work offline
- Dark theme default

### Python Bindings
- Thin wrapper only — ALL computation stays in Rust
- Accept/return WKT, WKB, GeoJSON strings for geometry interop
- Maintain .pyi type stubs for every public function

## What NOT To Do

- Don't use raw coordinate tuples — use geo crate types
- Don't make terminal the default output — visual map is always default
- Don't require config for first run — zero-config must work
- Don't use `unwrap()` or `expect()` in library code
- Don't put computation logic in Python bindings
- Don't hardcode CRS/EPSG codes — use config or auto-detection
- Don't add CDN dependencies to HTML reports — everything must work offline
- Don't put computation logic in Python bindings — Rust only
- Don't use CDN-hosted assets in visual reports
- Don't assume WGS 84 — always read CRS from data source
- Don't add dependencies without justification
- Don't generate boring reports — every visual output should make someone want to screenshot it
1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ open = "5"

# Utilities
inventory = "0.3"
rstar = "0.12"

# IO formats
geojson = "0.24"
Expand Down
180 changes: 180 additions & 0 deletions src/checkers/data_quality/duplicate_geometry.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,180 @@
//! Rule: Detect features with identical geometry (duplicates by geometry hash).

use crate::core::rule::{CheckContext, Domain, Finding, Rule, RuleEntry, Severity, SpatialLocation};
use std::collections::HashMap;

/// Flags features that share identical geometry, hashed via debug string representation.
///
/// Unlike `DuplicateFeatures` which uses a broader check, this rule specifically
/// targets geometry-level duplicates using a canonical string representation.
pub struct DuplicateGeometry;

impl Default for DuplicateGeometry {
fn default() -> Self {
Self
}
}

impl Rule for DuplicateGeometry {
fn id(&self) -> &str {
"data/duplicate-geometry"
}

fn name(&self) -> &str {
"Duplicate Geometry"
}

fn domain(&self) -> Domain {
Domain::DataQuality
}

fn default_severity(&self) -> Severity {
Severity::Warning
}

fn check(&self, ctx: &CheckContext) -> Vec<Finding> {
let mut findings = Vec::new();

for layer in ctx.layers {
// Hash geometries by their canonical string representation (approximating WKT).
let mut seen: HashMap<String, Vec<usize>> = HashMap::new();

for (idx, feature) in layer.features.iter().enumerate() {
if let Some(ref geom) = feature.geometry {
let key = format!("{geom:?}");
seen.entry(key).or_default().push(idx);
}
}

for indices in seen.values() {
if indices.len() > 1 {
let dup_count = indices.len();
let labels: Vec<String> = indices
.iter()
.map(|&i| {
layer.features[i]
.id
.clone()
.unwrap_or_else(|| format!("#{i}"))
})
.collect();

// Attach the first duplicate's geometry for map rendering.
let first_geom = layer.features[indices[0]].geometry.clone();

findings.push(Finding {
rule_id: self.id().to_string(),
severity: self.default_severity(),
message: format!(
"{dup_count} features with identical geometry in layer '{}': [{}]",
layer.name,
labels.join(", ")
),
location: Some(SpatialLocation::Layer {
name: layer.name.clone(),
}),
geometry: first_geom,
metric: Some(dup_count as f64),
suggestion: Some(
"Review and remove duplicate geometries, or run `tissot fix --dedup`"
.into(),
),
fixable: false,
});
}
}
}

findings
}

fn score_weight(&self) -> f64 {
0.6
}
}

inventory::submit! {
RuleEntry {
factory: || Box::new(DuplicateGeometry),
}
}

#[cfg(test)]
mod tests {
use super::*;
use crate::core::config::Config;
use crate::core::rule::{Feature, Layer};
use std::collections::HashMap;

#[test]
fn detects_duplicate_geometries() {
let point = geo::Geometry::Point(geo::Point::new(1.0, 2.0));
let layer = Layer {
name: "test".into(),
crs: Some("EPSG:4326".into()),
features: vec![
Feature {
id: Some("a".into()),
geometry: Some(point.clone()),
properties: HashMap::new(),
},
Feature {
id: Some("b".into()),
geometry: Some(point.clone()),
properties: HashMap::new(),
},
Feature {
id: Some("c".into()),
geometry: Some(geo::Geometry::Point(geo::Point::new(9.0, 9.0))),
properties: HashMap::new(),
},
],
bounds: None,
};

let config = Config::default();
let ctx = CheckContext {
layers: &[layer],
config: &config,
file_path: "test.geojson",
};

let rule = DuplicateGeometry;
let findings = rule.check(&ctx);
assert_eq!(findings.len(), 1);
assert_eq!(findings[0].severity, Severity::Warning);
assert!(findings[0].geometry.is_some());
assert!(findings[0].message.contains("2 features"));
}

#[test]
fn no_findings_when_unique() {
let layer = Layer {
name: "test".into(),
crs: Some("EPSG:4326".into()),
features: vec![
Feature {
id: Some("a".into()),
geometry: Some(geo::Geometry::Point(geo::Point::new(1.0, 2.0))),
properties: HashMap::new(),
},
Feature {
id: Some("b".into()),
geometry: Some(geo::Geometry::Point(geo::Point::new(3.0, 4.0))),
properties: HashMap::new(),
},
],
bounds: None,
};

let config = Config::default();
let ctx = CheckContext {
layers: &[layer],
config: &config,
file_path: "test.geojson",
};

let rule = DuplicateGeometry;
assert!(rule.check(&ctx).is_empty());
}
}
Loading
Loading