From edge diagnostics to comprehensive web endpoint intelligence.
Foundation for all subsequent phases. Establishes the probe architecture and output formatting.
- TLS diagnostics (chain, expiry, ciphers, ALPN)
- HTTP protocol detection (HTTP/1.1, H2, H3)
- Security headers analysis (HSTS, CSP, COOP/COEP, etc.)
- Latency breakdown (DNS, TCP, TLS handshake, TTFB)
- Output formats (pretty, JSON, compact)
- CLI with skip flags and timeout configuration
Understand what an endpoint serves and how it's structured.
Discover the structure of a web property without invasive crawling.
Implementation: src/checks/discovery.rs
- Parse and analyze
robots.txt- Extract allowed/disallowed paths
- Identify sitemap references
- Detect crawl-delay directives
- Parse
sitemap.xmland sitemap index files- Extract URL list with lastmod dates
- Handle gzipped sitemaps
- Respect sitemap limits (50k URLs)
- Probe common paths (configurable)
/api,/graphql,/health,/metrics/.well-known/*endpoints/favicon.ico,/manifest.json
- Report discovered structure
- Path tree visualization
- Response codes per path
- Content-type distribution
Analyze what's actually served at each discovered path.
Implementation: src/checks/content.rs
- Content-type detection
- MIME type from headers
- Actual content sniffing for mismatches
- Charset detection
- Response analysis
- Size (raw and compressed)
- Compression ratio and algorithm
- Cache headers (age, max-age, etag)
- Redirect chain analysis
- Hop count and destinations
- HTTP vs HTTPS redirects
- Redirect loops detection
- Resource hints
- Preload/prefetch directives
- Resource priorities
- New CLI flags:
--discover,--content-analysis - Discovery report section in output
- Content summary table
Fingerprint the technology stack powering an endpoint.
Implementation: src/checks/techstack.rs
- Server identification
- Parse
Serverheader - Fingerprint by behavior (error pages, defaults)
- Version detection where exposed
- Parse
- CDN detection
- Cloudflare, Fastly, Akamai, CloudFront, Vercel, Netlify
- Edge location identification
- Cache status headers
- Hosting signals
- IP geolocation (continent/country)
- ASN identification
- Cloud provider detection (AWS, GCP, Azure)
- Framework fingerprinting
- Next.js, Nuxt, Remix, SvelteKit (SSR frameworks)
- Rails, Django, Laravel, Express (backend)
- React, Vue, Angular (SPA indicators)
- CMS detection
- WordPress, Drupal, Ghost, Contentful
- Headless CMS indicators
- JavaScript library detection
- Major libraries from script patterns
- Build tool signatures (Webpack, Vite, esbuild)
- API technology hints
- GraphQL introspection (if enabled)
- REST/OpenAPI indicators
- gRPC-Web detection
- Create extensible fingerprint format (YAML/TOML)
- Header patterns
- HTML/JS patterns
- Cookie patterns
- Error page signatures
- Tech stack summary section
- Confidence scores per detection
- Version information where available
--techstackflag for isolated runs
Evaluate how well the endpoint is optimized for search engines and AI agents.
Implementation: src/checks/seo.rs
- Meta tags audit
- Title (length, presence, uniqueness hint)
- Description (length, presence)
- Canonical URL
- Robots meta directives
- Open Graph / Social
- og:title, og:description, og:image
- Twitter Card metadata
- Image dimensions and accessibility
- Technical SEO
- Mobile viewport configuration
- Structured heading hierarchy (H1-H6)
- Internal linking signals
- Hreflang for internationalization
- Performance signals
- Core Web Vitals hints (from headers/resource loading)
- Render-blocking resources
- Image optimization signals
Implementation: src/checks/aeo.rs
Prepare endpoints for the age of AI agents and LLM-powered search.
- Structured data analysis
- JSON-LD presence and validity
- Schema.org types used
- Rich snippet eligibility
- AI-readiness signals
llms.txtdetection and parsing.well-known/ai-plugin.json(ChatGPT plugins)- Clean, extractable content structure
- Content accessibility
- Text-to-markup ratio
- Semantic HTML usage
- Accessibility hints (alt text, ARIA)
- API discoverability
- OpenAPI/Swagger documentation
- GraphQL schema exposure
- Developer documentation links
- SEO score with breakdown
- AEO readiness rating
- Prioritized recommendations
--seoand--aeoflags
Make Sango production-ready and integrate with the broader ecosystem.
- HTML report output (
-f html) - Markdown report output (
-f markdown) - Diff mode for before/after comparisons
- Baseline/threshold configuration
- Connection pooling for multi-path probes
- Parallel check execution tuning
- Result caching for repeated runs
- Streaming output for long-running checks
- Config file support (
.sango.toml) - Custom fingerprint rules
- Ignore patterns for known issues
- Severity threshold customization
- GitHub Actions workflow
- Pre-commit hook support
- CI/CD exit code conventions
- Webhook notifications
Sango should enable users to answer these questions with a single command:
- Is this endpoint healthy? - TLS valid, protocols modern, headers secure
- What does this endpoint serve? - Content types, structure, paths
- What technology powers it? - Server, framework, CDN, CMS
- How discoverable is it? - SEO score, AEO readiness, structured data
- What should I fix first? - Prioritized, actionable recommendations
- Crawling: Sango probes, it doesn't crawl. Discovery is bounded.
- Vulnerability scanning: Security headers yes, CVE detection no.
- Performance testing: Latency measurement yes, load testing no.
- Monitoring: Point-in-time assessment, not continuous monitoring.
- Mutation: Sango never writes, posts, or modifies anything.
Each phase is designed to be implemented incrementally. New check modules follow the established pattern:
- Create
src/checks/{name}.rs - Define result struct with
issues: Vec<Issue> - Implement async check function
- Add to probe orchestration
- Add output formatting
- Add CLI flags
- Write tests
See existing check modules for reference implementations.