When AI builds tools, it defaults to writing deterministic rules (regex, if/else, templates) instead of using its own reasoning. This skill catches that — before you waste time building it.
You: "I'm building a tool that reviews survey scales"
AI (without this skill): AI (with this skill):
┌──────────────────────┐ ┌──────────────────────┐
│ if (score > 0.8) ✅ │ │ LLM analyzes each │
│ regex.test(content) │ │ item with rubric, │
│ template.replace() │ │ cites evidence, │
│ if/else chains │ │ scores confidence │
└──────────────────────┘ └──────────────────────┘
❌ AI-decorated ✅ AI-native
⏱️ 30 seconds from install to first review
Option A — One-line install (recommended):
curl -fsSL https://raw.githubusercontent.com/gtskevin/ai-native-review/main/install.sh | bashAuto-installs the skill + companion design rule for Codex and/or Claude Code when detected.
Option B — Manual install for Codex:
git clone https://github.com/gtskevin/ai-native-review.git
mkdir -p ~/.codex/skills/ai-native-review
cp ai-native-review/SKILL.md ~/.codex/skills/ai-native-review/
cp -R ai-native-review/rules ~/.codex/skills/ai-native-review/
cp -R ai-native-review/examples ~/.codex/skills/ai-native-review/
rm -rf ai-native-reviewFor Claude Code, replace ~/.codex with ~/.claude.
Then invoke the skill:
Claude Code: /ai-native-review
Codex: Review this design for AI-native quality
Describe the tool you're building, or point the AI to an existing app, Skill, or codebase.
Expected output:
✅ Task decomposition: 8 subtasks tagged (🧠5 ⚙️2 👤1)
✅ Decision tree: 5 judgment tasks tested
✅ Anti-pattern scan: 3 findings (P0: 1, P1: 2)
✅ Report generated with provenance + priority matrix
For every subtask that requires judgment, the skill runs a 4-point decision tree:
| Test | Question | If Yes | If No |
|---|---|---|---|
| 🔴 Delete Test | Remove the AI call. Still works? | AI-decorated ❌ | Continue |
| 🔴 Upgrade Test | Better model = better output? | Continue | Not AI-driven ❌ |
| 🟡 Context Test | AI gets enough context? | Continue | Context-starved |
| 🔵 Soft Rule Test | Prompt beats hardcoded rule? | AI-native ✅ | Disguised rule ❌ |
AI-native design has two complementary dimensions. A tool can have a strong AI reasoning core while still forcing users through a rigid click maze. It can also offer a chat box while its core decisions remain hardcoded. Review both.
| Dimension | Core Question | Healthy Pattern |
|---|---|---|
| AI-native intelligence | Does the model make the judgment calls that require understanding? | LLM judgment + evidence + confidence |
| AI-native interaction | Can users express the outcome they want while the system plans and executes the workflow under appropriate human control? | Goal-oriented input + tool use + visible progress + confirmation |
Reference interaction architecture:
User goal in natural language
↓
AI clarifies missing constraints
↓
AI proposes or silently forms a bounded plan
↓
AI uses tools and reports progress
↓
Low-risk, reversible actions → execute
High-risk or irreversible actions → ask for confirmation
↓
User reviews, corrects, overrides, or continues the conversation
Interaction review questions:
| Area | Review Question | Good Evidence |
|---|---|---|
| Intent | Can users describe the result they want without knowing internal steps, APIs, or menus? | Natural-language goal, contextual defaults |
| Orchestration | Can the model choose the next action or tool based on the current state? | Dynamic planning and tool selection |
| Visibility | Can users see what the system is doing and what changed? | Progress, evidence, action log, explicit outcomes |
| Control | Can users correct, refine, interrupt, undo, or take over? | Follow-up conversation, edit, retry, cancel, rollback |
| Safety | Are consequential actions gated appropriately? | Confirmation, scoped permissions, human handoff |
| Interface fit | Is conversation used where it reduces friction, while structured UI remains available where precision helps? | Chat + previews, forms, tables, or approval widgets |
No. Natural language is valuable when the user's desired outcome is easier to describe than the steps needed to achieve it. It should not replace every button, form, or workspace.
| Scenario | Recommended Interaction |
|---|---|
| Ambiguous, multi-step goal | Natural-language goal + AI planning |
| Repetitive low-risk workflow | One command or event trigger + visible execution |
| Precise structured input | Form, table, or direct manipulation, optionally assisted by AI |
| High-stakes action | AI proposes → human confirms |
| Creative or preference-heavy work | Conversational iteration + editable artifacts |
The goal is not "chat everywhere." The goal is to reduce unnecessary procedural work while preserving user understanding and control.
| Section | What It Tells You |
|---|---|
| Task Decomposition | Every subtask tagged: 🧠 judgment / ⚙️ code / 👤 human |
| Findings + Provenance | Which test found what, with code-level evidence |
| Interaction Audit | Whether users can state goals, supervise execution, and recover from mistakes |
| Priority Matrix | P0 (fix now) → P3 (someday) |
| "Is AI-Native Worth It?" | Cost-benefit per subtask |
| Drift Risk | Where future devs will be tempted to "add a quick rule" |
| Migration Path | If AI-decorated: phased plan with rollback |
Reviewing a questionnaire-checking Skill (full report):
| Subtask | Nature | AI-native? | Context? |
|---|---|---|---|
| Excel structure extraction | ⚙️ Precision | N/A — correct use of code | — |
| Scale classification | 🧠 Judgment | ✅ | ✅ |
| Translation equivalence | 🧠 Judgment | ✅ | |
| Item-writing error detection | 🧠 Judgment | ✅ | ✅ |
| Respondent experience | 🧠 Judgment | ✅ | ✅ |
Result: 🟢 5/5 capability allocation correct. 🟡 2 improvements added to report.
| Anti-Pattern | Example | Verdict |
|---|---|---|
| AI orchestration + rule execution | AI designs flow, regex does checking | ❌ |
| Regex replacing judgment | regex.test(content) for quality |
❌ |
| Template replacing generation | template.replace('{x}', val) |
❌ |
| if-else for business decisions | if (score > 0.8) good |
❌ |
| Keyword matching for semantics | text.includes('keyword') |
❌ |
| Unstructured AI output | No JSON schema, no confidence | ❌ |
| Click maze for a goal-oriented task | User manually executes steps the agent could plan | |
| Unbounded autonomy | Agent takes consequential actions without confirmation | ❌ |
Domain exemptions: Compliance, security, real-time, data masking, financial calculation — rules are correct here.
| Capability | AI-Native Review | Manual Review | Lint Rules |
|---|---|---|---|
| Catches "AI-decorated" anti-pattern | ✅ 4-point test | ❌ | |
| Structured findings with evidence | ✅ Provenance tagged | ❌ | |
| Priority matrix (P0→P3) | ✅ Auto-generated | ❌ Manual | ❌ |
| Cost-benefit "worth it?" analysis | ✅ Per subtask | ❌ Rarely done | ❌ |
| Drift risk markers | ✅ Future-proofing | ❌ | ❌ |
| Domain exemption awareness | ✅ Built-in | ❌ | |
| Goal-oriented interaction review | ✅ Control-aware | ❌ |
| Source | Contribution |
|---|---|
| OpenAI — A practical guide to building agents | Goal-oriented agents, tools, guardrails, human intervention |
| Anthropic — Building effective agents | Workflows vs. agents, feedback loops, meaningful oversight |
| Microsoft — Copilot agent UI guidelines | Conversation as the source of intent and control, progressive UI |
| Microsoft — Guidelines for Human-AI Interaction | Efficient invocation, correction, explanation, global controls |
| Google PAIR — Feedback + Control | Balance automation and user control |
🤔 Does this work with Codex or only Claude Code?
Yes. The installer detects Codex and Claude Code and installs the same SKILL.md bundle for either environment. In Claude Code, use /ai-native-review. In Codex, ask to review a design for AI-native quality.
🛠️ What kind of tools can I review?
Any tool that involves AI making decisions: apps, Skills, agents, pipelines, evaluators, content processors, recommendation systems. Use it before building, during implementation, or after a working product exists.
📊 How is this different from just asking "is my design good?"
That gets you opinions. This gives you a structured 4-point test with evidence, priority ranking, cost-benefit analysis, and migration paths. Repeatable and auditable.
⏱️ How long does a review take?
2-5 minutes depending on project complexity. The skill decomposes, tests, and generates a full report in one pass.
🔍 Is this only for design-time review?
No. Design-time review is the cheapest moment to catch architecture mistakes, but you can also use the skill during implementation or to audit an existing app, Skill, or tool. Fully automated CI/CD enforcement is not included yet because semantic findings still need contextual judgment.
💬 Does every AI-native app need a chat interface?
No. Natural language is useful when users can state the desired result more easily than they can navigate the procedure. Structured UI remains better for precise input, review, comparison, and approval. A strong AI-native product often combines conversation with contextual widgets or editable artifacts.
Found a new anti-pattern? Want to add domain-specific rules? Contributions welcome!
ai-native-review/
├── SKILL.md # The review skill (main)
├── rules/
│ └── ai-native-design.md # Companion design rule
├── examples/
│ └── ob-scale-review-report.md # Real review report
├── tests/
│ ├── test_dual_dimension_framework.sh
│ ├── test_install_compatibility.sh
│ └── test_install_smoke.sh
├── install.sh
├── README.md
├── README.zh-CN.md
├── .github/
│ ├── assets/ # Visual assets
│ ├── ISSUE_TEMPLATE/ # Issue templates
│ ├── PULL_REQUEST_TEMPLATE.md
│ └── social-preview.svg # og:image
├── CONTRIBUTING.md
├── CODE_OF_CONDUCT.md
└── SECURITY.md
Built with ❤️ by @gtskevin · Build AI-native, not AI-decorated 🧠