labelwatch

An observatory for ATProto's labeling infrastructure. Monitors labeler behavior over time and flags integrity-risk patterns (rate spikes, drift, synchronized activity, boundary instability). It does not judge content or truth; it produces inspectable receipts about governance infrastructure behavior.

What it does

Discovers labelers via batch enumeration (listReposByCollection), a Jetstream sidecar that watches app.bsky.labeler.service records in real time, and a backstop scrape of curated labeler lists. All three channels feed into a single registry with evidence-based classification.

Ingests label events from com.atproto.label.queryLabels across all discovered labelers. Events are normalized, hashed (SHA-256), and stored in SQLite. Multi-ingest handles labelers that run their own endpoints.

Detects anomalies with four rules (rate spike, flip-flop, target concentration, churn index), all with warm-up gating to suppress false positives during labeler startup. Alerts include receipt hashes for auditability.

Derives labeler state with four independent signals: regime state (warming_up / stable / bursty / degraded / ...), auditability risk (0-100), inference risk (0-100), and temporal coherence (0-100). Four dials, not one trust score.

Analyzes boundary instability between labelers: label family normalization, JSD divergence, contradiction edges, shared-target overlap. Domain classification (moderation / metadata / novelty / political) filters real conflict from badge-ecosystem orthogonality.

Generates reports as static HTML + JSON sites: census, triage views (Active/Alerts/New/Opaque/All), per-labeler pages with evidence expanders, volume badges, discovery health cards, and boundary analysis.

Serves label climate via HTTP: per-DID reporting showing which labelers apply what labels, daily time series, top values, and example posts. Rate limited, disk cached, concurrency gated.

Quick start

python -m venv .venv
. .venv/bin/activate
pip install -e .

# Configure
cp config/config.toml.example config.toml
# Edit config.toml with your settings

# One-shot commands
labelwatch ingest --config config.toml
labelwatch scan --config config.toml
labelwatch report --format html --out report/ --now max

# Continuous operation
labelwatch run --config config.toml --db labelwatch.db

Architecture

Three systemd services, one SQLite database (WAL mode):

                                 ┌──────────────────────┐
                                 │  Jetstream            │
                                 │  (labeler.service     │
                                 │   records)            │
                                 └──────────┬───────────┘
                                            │
┌──────────────────┐            ┌───────────▼───────────┐
│  ATProto Service │            │  Discovery Stream     │
│  (queryLabels)   │            │  (discovery_stream.py) │
└────────┬─────────┘            │  + backstop scrape    │
         │                      └───────────┬───────────┘
         │ HTTP polling                     │
         ▼                                  ▼
┌──────────────────┐    ┌───────────────────────────────┐
│  Ingest          │───▶│  SQLite DB (schema v19, WAL)  │
│  (ingest.py)     │    │                               │
│  multi-ingest    │    │  label_events   labelers      │
└──────────────────┘    │  alerts         evidence      │
                        │  discovery_events              │
┌──────────────────┐    │  boundary_edges/targets        │
│  Rules + Scan    │───▶│  derived_author_day            │
│  (rules.py,      │    │  derived_author_labeler_day   │
│   scan.py)       │    └───────────────┬───────────────┘
│  receipted alerts│                    │
└──────────────────┘                    │
                                        ▼
┌──────────────────┐    ┌───────────────────────────────┐
│  Derive          │    │  Report        │  Climate API │
│  (derive.py)     │    │  (report.py)   │  (server.py) │
│  regime state    │    │  HTML + JSON   │  /v1/climate │
│  risk scores     │    │  static site   │  rate limited│
│  coherence       │    └────────────────┴──────────────┘
└──────────────────┘

Services

Service	Purpose	Resources
`labelwatch.service`	Main loop: ingest, scan, derive, report	2GB / 50% CPU
`labelwatch-discovery.service`	Jetstream sidecar for real-time labeler discovery	256MB / 10% CPU
`labelwatch-api.service`	HTTP API: climate, whatsonme (`/v1/*`)	512MB / 25% CPU

CLI

# Ingestion & scanning
labelwatch ingest --config config.toml       # Fetch label events
labelwatch scan --config config.toml         # Run detection rules
labelwatch run --config config.toml          # Continuous loop (all of the above)

# Discovery
labelwatch discover --config config.toml     # Batch labeler discovery
labelwatch discover --backstop              # Scrape labeler-lists.bsky.social
labelwatch discover-stream                   # Jetstream sidecar (runs continuously)

# Reporting
labelwatch report --format html --out report/    # Static HTML site
labelwatch report --alerts --since 24h           # Recent alerts
labelwatch report --labeler did:plc:...          # Single labeler

# Climate & account labels
labelwatch climate --did did:plc:...         # Generate climate report (CLI)
labelwatch whatsonme did:plc:...             # Account labels via queryLabels
labelwatch whatsonme @alice.bsky.social      # Also accepts @handles
labelwatch serve --port 8423                 # Start HTTP server

# Inspection
labelwatch labelers                          # List discovered labelers
labelwatch labelers --class declared         # Filter by visibility class
labelwatch census                            # Classification census
labelwatch coverage-delta                    # Upstream vs registry comparison
labelwatch reclassify --dry-run              # Preview reclassification

# Maintenance
labelwatch db-optimize                       # Run ANALYZE + query planner

API

Endpoint	Purpose
`GET /health`	Health check
`GET /v1/climate/{did_or_handle}`	Label climate report (local ingest data)
`GET /v1/whatsonme/{did_or_handle}`	Account-level labels via network queryLabels

Both /v1/climate/ and /v1/whatsonme/ accept DIDs or @handles. Query params: format=json|html, window=N (climate only), sources=did1,did2 (whatsonme only).

Rate limited, disk cached (climate), concurrency gated. Kill switch via CLIMATE_API_DISABLED=1.

Configuration

Create a config.toml (see config/config.toml.example):

db_path = "labelwatch.db"
service_url = "https://bsky.social"
labeler_dids = ["did:plc:example1", "did:plc:example2"]

window_minutes = 15
baseline_hours = 24
spike_k = 10.0
min_current_count = 50
flip_flop_window_hours = 24
max_events_per_scan = 200000

discovery_enabled = true
discovery_interval_hours = 24
boundary_enabled = true

Detection rules

Rule	What it detects
`label_rate_spike`	Label rate exceeds baseline by spike_k (default 10x)
`flip_flop`	Apply → negate → re-apply on same (uri, val) within window
`target_concentration`	HHI on target distribution indicates fixation on few targets
`churn_index`	Jaccard distance of target sets across adjacent windows

All rules include warm-up gating and collect evidence hashes for auditability.

Labeler classification

Three-axis classification from structured evidence:

Visibility: declared / protocol_public / observed_only / unresolved
Reachability: accessible / auth_required / down / unknown
Auditability: high / medium / low

Sticky evidence fields (observed_as_src, has_labeler_service, etc.) are never downgraded by transient probe failures.

Schema

SQLite with WAL mode. Current version: v21. Key tables:

Table	Purpose
`label_events`	Append-only ingested labels (SHA-256 deduped)
`labelers`	Registry with classification, regime state, risk scores, volume stats
`alerts`	Detection results with receipt hashes
`labeler_evidence`	Append-only classification evidence
`discovery_events`	Jetstream/batch/backstop discovery audit trail
`boundary_edges`	Cross-labeler contradiction/divergence edges
`derived_author_day`	Rollup: label counts per author per day
`derived_author_labeler_day`	Rollup: label counts per author/labeler/day

Related projects

driftwatch — reference ATProto labeler with drift detection, longitudinal tracking, and a decision ledger. Labelwatch watches labeler behavior; driftwatch watches information drift. Same observatory family.

What this system is and is not

Is: An observatory for ATProto's labeling infrastructure. Monitors labeler behavior, boundary instability, and hosting-locus distribution. Produces inspectable receipts about governance infrastructure.

Is not: A moderation tool, a content filter, a trust authority, or a user profiler. It does not decide who is right. It measures how the governance layer behaves.

Populations and coverage:

Label events come from queryLabels across all discovered labelers.
Hosting-locus enrichment comes from the driftwatch facts bridge. Coverage depends on resolver progress and may be partial — the card always states this.
"Coverage" always specifies a denominator. Actor coverage (unique DIDs) and event coverage (label events) answer different questions.

Host family is not operator identity. The hosting locus card shows where labeled targets are hosted, not who operates the infrastructure or why.

Design constraints

Aggregate-first, NOT profile-first
Observation only — does not moderate content, judge truth, or emit labels
No ML classifiers, no LLM-in-the-loop
Receipt hashing for auditability (SHA-256, not cryptographic signing)
Four independent risk dials, not one collapsed trust score
Descriptive language only — "concentration anomaly" not "bot farm"

License

Unless otherwise noted, this repository is licensed under MIT OR Apache-2.0, at your option. Contributions are accepted under the same terms.

Name		Name	Last commit message	Last commit date
Latest commit History 224 Commits
.github/workflows		.github/workflows
config		config
deploy		deploy
docs		docs
scripts		scripts
specs		specs
src/labelwatch		src/labelwatch
tests		tests
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
CLAUDE.md		CLAUDE.md
DOC_FILE_NAMES.md		DOC_FILE_NAMES.md
Dockerfile		Dockerfile
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
NEXT.md		NEXT.md
NON_GOALS.md		NON_GOALS.md
PROVENANCE.md		PROVENANCE.md
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

labelwatch

What it does

Quick start

Architecture

Services

CLI

API

Configuration

Detection rules

Labeler classification

Schema

Related projects

What this system is and is not

Design constraints

License

About

Licenses found

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

labelwatch

What it does

Quick start

Architecture

Services

CLI

API

Configuration

Detection rules

Labeler classification

Schema

Related projects

What this system is and is not

Design constraints

License

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages