Integration plan for new capabilities. Ordered by impact/effort ratio.
- Install
firecrawl-cliglobally, setFIRECRAWL_API_KEYin env - Build a skill teaching the agent when/how to use
firecrawlcommands - Commands available:
firecrawl scrape,firecrawl crawl,firecrawl search,firecrawl map,firecrawl agent,firecrawl browser - Upgrades
/evolve(better signal collection viafirecrawl search),/absorb(crawl entire doc sites),/learn(clean scrape any page), general chat (web research viafirecrawl agent) - No MCP server needed — CLI is callable directly from
claude -pvia Bash - Install:
npx -y firecrawl-cli@latest init --all --browser - Free tier: 500 credits. Hobby: $16/mo (3k credits)
- Source: https://docs.firecrawl.dev/sdks/cli
- Build a skill that calls Cloudflare Browser Rendering REST API
- Use for bulk crawling in
/evolvesignal collection as a free fallback - Free tier: 5 crawls/day, 100 pages/crawl, 10 min browser/day. Paid: $0.09/hr, 100k pages
- AI extraction built-in (Workers AI or BYO model)
- Sitemap-aware, robots.txt compliant, R2 caching
- MCP option:
@cloudflare/playwright-mcpfor interactive browser automation - Source: https://developers.cloudflare.com/browser-rendering/rest-api/crawl/
- Claude Code already reads images natively via the Read tool
- Wire Telegram photo handler (
_handle_photo) to send images toclaude -p - Use cases: screenshot analysis, diagram understanding, OCR, UI inspection
- Cost: ~$0.004 per 1MP image (~400 extra input tokens)
- Effort: Low — already built into Claude, just need to pipe photos through
- Add
/speakcommand — convert agent response to voice message in Telegram pip install edge-tts— free, 300+ neural voices, 40+ languages- Telegram voice messages use Opus/OGG format (need MP3 -> OGG conversion)
- Risk: unofficial Microsoft API, could break
- Effort: Low —
pip install+ ~30 lines of code - Source: https://github.com/rany2/edge-tts
- Run
/absorband/evolvein isolated Docker containers - Prevent absorbed code from touching the host filesystem
- Build image with Claude CLI pre-installed
- Effort: Medium — Dockerfile + orchestration
- Cost: Free (your own infra)
- Best for: <5 concurrent agents on a single server
- Agent-focused Firecracker microVMs, <500ms startup
- Used by Perplexity, Manus, Hugging Face
- SDK:
sandbox.commands.run('claude -p "..."') - Hobby: free ($100 credit). Pro: $150/mo
- Best for: multi-tenant isolation, productionizing
- Source: https://e2b.dev
- 61.5k stars, <90ms sandbox creation, open-source
- Snapshots (save/resume state), stateful, multi-region
- $200 free compute included
- Best for: long-running stateful agents, computer-use scenarios
- Source: https://daytona.io
- Upgrade from Edge TTS when we need reliability or commercial compliance
- ElevenLabs: best quality, $5-99/mo, startup grant available
- OpenAI TTS: $15/1M chars, outputs Opus directly (Telegram native)
- Source: https://elevenlabs.io, https://platform.openai.com/docs/guides/text-to-speech
- Abstract Docker/E2B/Daytona behind a common interface
- Config toggle:
sandbox.backend: docker | e2b | daytona - Per-pipeline sandbox policy (e.g.
/absorbalways sandboxed, chat runs local)
| Item | Priority | Status |
|---|---|---|
| Firecrawl CLI + Skill | P0 | Done — CLI installed, skill created. Needs firecrawl login to authenticate. |
| Cloudflare /crawl Skill | P0 | Done — skill created. Needs CF_API_TOKEN + CF_ACCOUNT_ID in .env. |
| Vision (photo handler) | P1 | Done — _handle_photo wired with cleanup, reply context, TTS. |
| Document handler | P1 | Done — _handle_document for PDFs, code files, text files. |
Edge TTS / /speak |
P1 | Done — voice.py with edge-tts + whisper.cpp STT. |
| Docker Sandbox | P2 | Planned |
| E2B | P3 | Planned |
| Daytona | P3 | Planned |
| ElevenLabs / OpenAI TTS | P3 | Planned |
| Container Abstraction | P3 | Planned |