bin-time-crawler automates waste collection schedule extraction for Australian councils. The current implementation targets Glen Eira City Council by downloading GeoJSON datasets, validating them, and producing structured crawl results that downstream systems can consume.
cmd/crawler/: CLI entry point. Parses flags, sets up dependencies, orchestrates a crawl run.internal/application/: Application services (NewCrawlService,Run) coordinating crawlers and persistence.internal/domain/: Domain contracts such ascouncil.CrawlResult,council.Repository, and thecrawling.Crawlerinterface.internal/infrastructure/: Adapters for concrete councils, logging, persistence, configuration, validation, etc.crawling/gleneira/: Glen Eira-specific crawler, dataset configuration, payload builders, and GeoJSON parsing helpers.crawling/bininfo/: HTML scraper that normalises council waste guidance into structured payloads reused by crawlers.crawling/registry/: Central index of dataset endpoints and public-facing support URLs per council.config/: Default runtime configuration values.logging/: Structured logger abstraction.persistence/: Currently filesystem-backed implementation for saving crawl outputs.
- Language: Go (
go 1.25.1). - HTTP Client: Standard library
net/httpwith council-specific endpoints. - JSON Handling:
encoding/jsonfor dataset decoding. - HTML Parsing:
golang.org/x/net/html v0.44.0for extracting bin guidance from Glen Eira web pages with custom user-agent headers. - Validation: Custom
internal/infrastructure/validationrules applied to remote datasets. - Persistence: Filesystem repository writing JSON output to
output/. - Logging: Structured logging abstraction with optional stdout fallback.
- Crawler must respect configured HTTP and run timeouts (
config.Config). - Glen Eira dataset schemas (
config.go) define required fields; validation failures abort the crawl. CrawlService.Run()enforces input location validity and ensures each result includesRetrievedAtandQueryLocationif applicable.- File system paths (
logs/,output/) must exist or be creatable by the executable.
- Preserve existing logging and validation behaviour when extending crawlers.
- Do not introduce new dependencies without confirming compatibility with
go.mod. - Follow existing package layout (
internal/application,internal/domain,internal/infrastructure). - Ensure new crawlers implement
crawling.Crawlerand respect the contract used byCrawlService. - Tests or CLI runs should avoid network overload—use appropriate timeouts and handle cancellation signals.
- When adding councils, mirror
internal/infrastructure/crawling/gleneira/structure. - Reuse shared helpers (
payload.go,geojson.go) or create council-specific equivalents. - Integrate new adapters via
cmd/crawler/main.goandinternal/application/crawl_service.go.
This document serves as a quick-start reference for agentic AI systems contributing to bin-time-crawler.