Skip to content

Latest commit

 

History

History
97 lines (53 loc) · 7.23 KB

File metadata and controls

97 lines (53 loc) · 7.23 KB

Product State: Monitor

This file describes Monitor's current user-visible truth. It should be updated whenever the repository gains, removes, or materially changes a workflow, capability, limitation, or user-facing behavior.

1. Product Summary

Monitor is an internal prototype CLI utility for a single user working on Windows 11 with Ubuntu WSL. Its job is to periodically capture the Windows desktop, summarize what is visible, and preserve a text record of what the user appears to be doing over time.

Today, Monitor provides a Bash CLI at src/monitor.sh, a Windows desktop screenshot helper, fake-command tests, project-specific documentation, local Codex configuration, and an implementation ExecPlan. The first CLI is best understood as a local prototype.

2. Users and Jobs To Be Done

The primary user is a developer or operator working locally in WSL on a Windows 11 machine. They want a lightweight way to reconstruct their recent activity from periodic screenshots and summaries without manually journaling every context switch.

The near-term contributor is a human or coding agent extending the first CLI. They need clear project boundaries, dependency expectations, storage rules, and validation commands.

3. Current Capabilities

Windows/WSL Desktop Screenshot Helper

Core: scripts/win-screenshot captures the full Windows virtual desktop from a WSL checkout and saves it as a PNG. It requires Windows 11, powershell.exe, and wslpath.

Project-Specific Bootstrap Documentation

Core: The root control documents now describe Monitor rather than the generic template. They define the current product state, roadmap, architecture, code style, terminal design expectations, and agent workflow.

First Implementation ExecPlan

Core: plans/00-first-monitor-cli.md describes the first runnable CLI implementation and records validation results, decisions, and follow-up context.

Monitor CLI

Core: src/monitor.sh is a foreground Bash CLI. It loads .env settings, parses flags, creates a human-readable timestamped run directory, captures screenshots, invokes Codex CLI image analysis with a detailed evidence-gathering prompt, writes per-screenshot summaries with human-readable datetime filenames, updates current-summary.md with a second detailed Codex text pass after each successful screenshot, updates activity-summary.md, appends index.tsv, prints a live terminal dashboard, and prints the complete final current-summary.md after a bounded invocation finishes. Capture failures, analyzer failures, and current-summary failures are recorded as local run artifacts instead of silently disappearing.

Codex CLI Analyzer

Core: Monitor invokes Codex CLI with screenshot image input, then invokes Codex CLI again with text input to maintain the run-level current summary. The prompts are sent over stdin and ask for detailed Markdown while avoiding reproduction of visible secret-like values. This avoids direct OpenAI Platform API calls and avoids requiring an OPENAI_API_KEY, but it still uses Codex plan capacity or credits.

Fake-Command Test Harness

Core: tests/monitor_cli_test.sh verifies CLI behavior with fake screenshot and fake analyzer commands. These tests do not capture the real desktop and do not consume Codex usage.

4. Core Workflows

Repository Bootstrap And Planning

A contributor reads the root control documents, verifies the lightweight repository checks, and continues from the ordered ExecPlan in plans/. This workflow currently ends with a project-specific repository ready for implementation.

Manual Desktop Screenshot Capture

A WSL user can run scripts/win-screenshot [output.png] to capture the Windows desktop. The script prints the saved PNG path on success. This helper is available now and will be reused by the planned Monitor CLI.

Monitoring Run

The main workflow starts with bash src/monitor.sh. The CLI captures a screenshot every 60 seconds by default, asks Codex CLI to summarize each image in detail, asks Codex CLI to refresh the run-level current summary after successful image analysis, updates deterministic rollup files, and prints a polished terminal dashboard. The dashboard shows bounded progress as x/y, unbounded progress as x/∞, percentage complete, time to the next screenshot, estimated time to completion, output paths, the latest analysis excerpt, and the current summary excerpt. activity-summary.md keeps a fuller deterministic detail section for the latest screenshot and current summary. The default run continues until stopped with Ctrl-C; --once and --count N provide bounded runs. After a bounded run finishes, the terminal prints the complete final detailed current-summary.md so the user can read the run narrative without opening a file. Capture failures in count-based runs produce capture_failed index rows and failure summaries so the run record explains why analysis did not happen.

CLI Validation

A contributor can run tests/monitor_cli_test.sh to verify the core loop without live Codex usage. A live Windows 11 WSL smoke test uses bash src/monitor.sh --once after codex login.

5. Product Constraints and Known Limits

Monitor is a local internal prototype, not a production monitoring product.

The first implementation is scoped to Windows 11 with Ubuntu WSL. Native Linux, macOS, browser-only capture, and remote desktop capture are not current targets.

Screenshots can contain highly sensitive information. Run artifacts belong under .monitor/, which is ignored by git. The first version does not include automatic redaction.

The first version avoids direct OpenAI API usage. Codex CLI analysis still sends screenshots and generated summaries to OpenAI through the user's Codex session and is governed by Codex plan limits, credits, and data controls. More detailed summaries make the local Markdown artifacts more sensitive, so .monitor/ must remain ignored and local.

There is no background daemon, service manager, cron integration, web UI, CI/CD, packaging, release process, or multi-user support yet.

6. Non-Goals

Monitor is not a spyware or employee surveillance system.

Monitor is not trying to infer private intent beyond visible desktop activity summaries.

Monitor is not implementing direct OpenAI API calls in the first version.

Monitor is not yet a cross-platform screenshot system, web dashboard, analytics service, or team collaboration product.

7. Relationship To Other Control Documents

  • README.md explains setup, dependencies, validation, and the Codex CLI billing distinction.
  • ROADMAP.md captures intended future direction.
  • ARCHITECTURE.md explains the planned CLI structure and boundaries.
  • CODESTYLE.md defines source and documentation conventions.
  • DESIGN.md defines terminal UX expectations and future visual-review rules.
  • PLANS.md defines the ExecPlan format.
  • AGENTS.md defines repository-specific agent instructions.

8. Open Questions

  • Should a later version support a fully local OCR or local vision-model analyzer to avoid OpenAI services entirely?
  • Should Monitor eventually delete screenshots after summarization, or is retaining PNGs necessary for auditability?
  • What level of redaction is acceptable before screenshots are analyzed or stored?