Agent Briefing: bin-time-crawler

Mission Overview

bin-time-crawler automates waste collection schedule extraction for Australian councils. The current implementation targets Glen Eira City Council by downloading GeoJSON datasets, validating them, and producing structured crawl results that downstream systems can consume.

Architecture Snapshot

cmd/crawler/: CLI entry point. Parses flags, sets up dependencies, orchestrates a crawl run.
internal/application/: Application services (NewCrawlService, Run) coordinating crawlers and persistence.
internal/domain/: Domain contracts such as council.CrawlResult, council.Repository, and the crawling.Crawler interface.
internal/infrastructure/: Adapters for concrete councils, logging, persistence, configuration, validation, etc.
- crawling/gleneira/: Glen Eira-specific crawler, dataset configuration, payload builders, and GeoJSON parsing helpers.
- crawling/bininfo/: HTML scraper that normalises council waste guidance into structured payloads reused by crawlers.
- crawling/registry/: Central index of dataset endpoints and public-facing support URLs per council.
- config/: Default runtime configuration values.
- logging/: Structured logger abstraction.
- persistence/: Currently filesystem-backed implementation for saving crawl outputs.

Tech Stack & Dependencies

Language: Go (go 1.25.1).
HTTP Client: Standard library net/http with council-specific endpoints.
JSON Handling: encoding/json for dataset decoding.
HTML Parsing: golang.org/x/net/html v0.44.0 for extracting bin guidance from Glen Eira web pages with custom user-agent headers.
Validation: Custom internal/infrastructure/validation rules applied to remote datasets.
Persistence: Filesystem repository writing JSON output to output/.
Logging: Structured logging abstraction with optional stdout fallback.

Operational Constraints

Crawler must respect configured HTTP and run timeouts (config.Config).
Glen Eira dataset schemas (config.go) define required fields; validation failures abort the crawl.
CrawlService.Run() enforces input location validity and ensures each result includes RetrievedAt and QueryLocation if applicable.
File system paths (logs/, output/) must exist or be creatable by the executable.

Agent Conduct Rules

Preserve existing logging and validation behaviour when extending crawlers.
Do not introduce new dependencies without confirming compatibility with go.mod.
Follow existing package layout (internal/application, internal/domain, internal/infrastructure).
Ensure new crawlers implement crawling.Crawler and respect the contract used by CrawlService.
Tests or CLI runs should avoid network overload—use appropriate timeouts and handle cancellation signals.

Extension Guidelines

When adding councils, mirror internal/infrastructure/crawling/gleneira/ structure.
Reuse shared helpers (payload.go, geojson.go) or create council-specific equivalents.
Integrate new adapters via cmd/crawler/main.go and internal/application/crawl_service.go.

This document serves as a quick-start reference for agentic AI systems contributing to bin-time-crawler.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent Briefing: bin-time-crawler

Mission Overview

Architecture Snapshot

Tech Stack & Dependencies

Operational Constraints

Agent Conduct Rules

Extension Guidelines

FilesExpand file tree

AGENTS.md

Latest commit

History

AGENTS.md

File metadata and controls

Agent Briefing: bin-time-crawler

Mission Overview

Architecture Snapshot

Tech Stack & Dependencies

Operational Constraints

Agent Conduct Rules

Extension Guidelines