Skip to content

Latest commit

 

History

History
51 lines (38 loc) · 3.23 KB

File metadata and controls

51 lines (38 loc) · 3.23 KB

Agent Briefing: bin-time-crawler

Mission Overview

bin-time-crawler automates waste collection schedule extraction for Australian councils. The current implementation targets Glen Eira City Council by downloading GeoJSON datasets, validating them, and producing structured crawl results that downstream systems can consume.

Architecture Snapshot

  • cmd/crawler/: CLI entry point. Parses flags, sets up dependencies, orchestrates a crawl run.
  • internal/application/: Application services (NewCrawlService, Run) coordinating crawlers and persistence.
  • internal/domain/: Domain contracts such as council.CrawlResult, council.Repository, and the crawling.Crawler interface.
  • internal/infrastructure/: Adapters for concrete councils, logging, persistence, configuration, validation, etc.
    • crawling/gleneira/: Glen Eira-specific crawler, dataset configuration, payload builders, and GeoJSON parsing helpers.
    • crawling/bininfo/: HTML scraper that normalises council waste guidance into structured payloads reused by crawlers.
    • crawling/registry/: Central index of dataset endpoints and public-facing support URLs per council.
    • config/: Default runtime configuration values.
    • logging/: Structured logger abstraction.
    • persistence/: Currently filesystem-backed implementation for saving crawl outputs.

Tech Stack & Dependencies

  • Language: Go (go 1.25.1).
  • HTTP Client: Standard library net/http with council-specific endpoints.
  • JSON Handling: encoding/json for dataset decoding.
  • HTML Parsing: golang.org/x/net/html v0.44.0 for extracting bin guidance from Glen Eira web pages with custom user-agent headers.
  • Validation: Custom internal/infrastructure/validation rules applied to remote datasets.
  • Persistence: Filesystem repository writing JSON output to output/.
  • Logging: Structured logging abstraction with optional stdout fallback.

Operational Constraints

  • Crawler must respect configured HTTP and run timeouts (config.Config).
  • Glen Eira dataset schemas (config.go) define required fields; validation failures abort the crawl.
  • CrawlService.Run() enforces input location validity and ensures each result includes RetrievedAt and QueryLocation if applicable.
  • File system paths (logs/, output/) must exist or be creatable by the executable.

Agent Conduct Rules

  • Preserve existing logging and validation behaviour when extending crawlers.
  • Do not introduce new dependencies without confirming compatibility with go.mod.
  • Follow existing package layout (internal/application, internal/domain, internal/infrastructure).
  • Ensure new crawlers implement crawling.Crawler and respect the contract used by CrawlService.
  • Tests or CLI runs should avoid network overload—use appropriate timeouts and handle cancellation signals.

Extension Guidelines

  • When adding councils, mirror internal/infrastructure/crawling/gleneira/ structure.
  • Reuse shared helpers (payload.go, geojson.go) or create council-specific equivalents.
  • Integrate new adapters via cmd/crawler/main.go and internal/application/crawl_service.go.

This document serves as a quick-start reference for agentic AI systems contributing to bin-time-crawler.