What is arthai?

Observability · Experimental, limited preview

⚠ Experimental — limited preview. Observability is in active development. Expect rough edges, breaking changes between releases, and gaps in coverage. Feedback welcome at productive@getarth.ai.

The toolkit ships an OTEL hook that emits a span for every prompt, tool call, agent spawn, skill invocation, and stop event. Paired with Claude Code's native OTEL, you get cost USD and token data on those spans — both streams flow into the same local Arth Intelligence container.

Prerequisites

Docker Desktop installed and running
Ports 4319, 3100, 5432 available on localhost (engine, dashboard, postgres)
The sentinel plugin installed (or prime, which includes sentinel)

Setup (one time)

/plugin install sentinel@arthai

/otel-setup

The skill verifies Docker is running, writes ~/.arthai/docker-compose.yml, starts the engine + dashboard + Postgres + Watchtower auto-updater, and writes the OTEL env vars to .claude/settings.local.json (project-local, git-ignored):

CLAUDE_CODE_ENABLE_TELEMETRY=1
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4319
OTEL_EXPORTER_OTLP_PROTOCOL=http/json
OTEL_METRICS_EXPORTER=otlp
OTEL_LOGS_EXPORTER=otlp
OTEL_TRACES_EXPORTER=otlp

Why `CLAUDE_CODE_ENABLE_TELEMETRY=1` is required

Two telemetry streams, complementary:

Stream	Source	Carries
Trace spans	toolkit `otel-telemetry` hook	session, prompt, tool calls, agent spawns, skill invocations, stop events
Cost + token metrics	Claude Code native OTEL (env-gated)	per-call cost USD, input/output/cache tokens, model

If native OTEL is off, traces still flow but the dashboard's cost and token columns stay empty.

Verify

After /otel-setup finishes, start a new Claude Code session in the project, run any prompt, then visit:

Dashboard: http://localhost:3100 — Sessions, Traces, Insights tabs
Engine health: curl -s http://localhost:4319/api/health | jq .

If the dashboard shows your session with non-empty cost, you're done.

What survives a reboot

After a Mac reboot or Docker Desktop restart:

✅ Env vars in .claude/settings.local.json — file on disk, picked up on next session
✅ Compose file at ~/.arthai/docker-compose.yml — file on disk
✅ Trace data in the arthai_data Docker volume — preserved across container restarts
✅ Engine + DB + Watchtower containers — auto-restart because the compose template sets restart: unless-stopped on every service
⚠ Docker Desktop itself — depends on a per-user OS toggle (Settings → General → "Start Docker Desktop when you log in"). We can't set this for you.

Quick verify after a reboot:

docker ps --filter 'name=arthai'

Should show three running containers (arthai-intelligence, arthai-db, arthai-watchtower). If any are missing:

docker compose -f ~/.arthai/docker-compose.yml up -d

Migration for existing customers (set up before the reboot-durability fix landed) — your engine and DB may have RestartPolicy: no. One-line fix, no data loss:

docker update --restart unless-stopped arthai-db arthai-intelligence

Or re-run /otel-setup → Local. The skill detects the legacy compose, prints the migration command, and the new template overwrites ~/.arthai/docker-compose.yml with the right policy on every service.

Updating & disabling

The Watchtower sidecar pulls the latest arthai/intelligence:latest image once a day. Trace data lives in the arthai_data Docker volume and is preserved across updates. Force-update now: docker compose -f ~/.arthai/docker-compose.yml pull && docker compose -f ~/.arthai/docker-compose.yml up -d.

To disable telemetry: export OTEL_DISABLED=true or remove the env block from .claude/settings.local.json. To opt out of auto-restart: docker update --restart no arthai-db arthai-intelligence arthai-watchtower.

What Can I Do?

Pick an intent to see the recommended workflow.

Learn

Set up your project and learn the codebase

/calibrate → /onboard

calibrateonboardscanwiki-knowledge-basewelcome

Build

Plan features, implement with an AI team, ship PRs

/planning my-feature

planningimplementqaprprecheckreview-pr

Fix

Debug bugs, repair CI, triage incidents

/fix #42 or /incident "site down"

fixci-fixincidentsre

Test

4-layer QA, E2E generation, visual regression, 8 agents

/qa commit or /qa full

qaqa-e2e-genqa-visualqa-challengerqa-domain

Automate

Autonomous mode and event-driven remediation

/autopilot

autopilotdeploy-announceci-fix

Learn Your Project

New to a project? /calibrate deep-scans your codebase — architecture, conventions, stack, domain model. Then /onboard gives you a prioritized briefing: what’s broken, what’s waiting, what to work on.

/calibrate

→

/onboard

→

start working

/calibrate

Scans source code, detects platform (web, mobile, CLI), maps integrations, recommends MCP servers and agents. Writes .claude/project-profile.md. Run once per project.

/onboard

Gathers git state, GitHub PRs/issues, environment health. Presents 4-tier briefing: Fix first → Waiting on you → Continue → Available. Suggests what to work on.

Build a Feature

Full feature development with an adversarial planning team, parallel implementation agents, QA, and automated PR creation.

/planning my-feature

→

/implement my-feature

→

/qa commit

→

/pr

/planning

Spawns PM (Opus) + Architect (Opus) + Devil's Advocate (Sonnet). 3 rounds of structured debate. Produces a hardened plan. Modes: --lite, --fast, full.

/implement

Reads the plan, spawns backend + frontend + QA + red-team agents in parallel. Builds, tests, creates PR-ready diff.

/qa commit

Targeted QA on changed files — unit tests, regressions, acceptance criteria.

/pr

Stages, commits, pushes, creates GitHub PR with QA results.

Fix a Bug

6-step formal pipeline. Root cause analysis, scope lock, behavior contract, fix, verify, PR.

/fix "description"

→

RCA

→

scope lock

→

contract

→

implement

→

verify

→

/pr

Hotfix mode

Use --hotfix for production emergencies (skips non-essential steps).

Severity levels

Use --severity critical|high|medium to set priority.

Fix CI

Auto-reads CI failure, diagnoses root cause, patches, resubmits. 3 retry attempts. Discord alert if all fail.

/ci-fix

→

reads logs

→

diagnoses

→

patches

→

resubmits (up to 3x)

/ci-fix details

Reads CI logs from GitHub Actions, diagnoses the root cause, applies a targeted patch, and resubmits. Retries up to 3 times before sending a Discord alert. Supports ci, staging, and prod targets.

Quality Assurance

Four-layer test strategy with 8 specialized agents. Commit mode for fast checks, full mode for comprehensive validation, plus opt-in E2E generation and visual regression.

Four-Layer Test Strategy

1. Baseline Tests

Existing test suites — regression anchor, same every run

2. Generated Scenarios

Fresh every run — thinks like real users based on the diff

3. Property-Based Tests

Infer invariants from code changes, test with random/edge-case inputs

4. Coverage Audit

Reviews if existing tests still match the codebase

Modes

/qaCommit mode — targeted checks on changed files (~1-3 min)

/qa fullComprehensive — all checks across full codebase (~10-20 min)

/qa stagingHealth + smoke + E2E against deployed staging

/qa prodRead-only health + smoke against production

/qa e2e-genGenerate exploratory Playwright tests for changed components (opt-in)

/qa visualComputer-use visual regression at desktop + mobile viewports (opt-in)

QA Agents (8)

qa sonnet
Orchestrator — spawns sub-agents, collects results, produces report

qa-e2e sonnet
Playwright E2E tests for user workflows

qa-e2e-gen sonnet
Generates exploratory Playwright tests from diffs

qa-visual sonnet
Computer-use visual regression (desktop + mobile)

qa-challenger sonnet
Adversarial red-teaming of test plans

qa-domain sonnet
Domain logic validation (state machines, constraints)

qa-baseline-updater sonnet
Manages test snapshots and golden files

qa-test-promoter haiku
Promotes generated tests that caught bugs to baselines

Typical Flow

/qa commit

→

4-layer analysis

→

report

→

/qa e2e-gen (opt-in)

→

/qa visual (opt-in)

Related skills

/qa-incident — log a QA incident from a known issue
/qa-learn — review QA knowledge base stats, prune stale entries
/ci-fix — auto-remediate CI failures (3 retry attempts)

Ship Code

/precheck runs tests locally in ~30s. /qa validates changed files. /pr creates the PR.

/precheck

→

/qa commit

→

/pr

/precheck

Runs lint, type-check, and test suite locally. Catches CI failures in ~30 seconds instead of the 4-minute GitHub Actions round-trip.

/qa commit

Targeted QA on changed files only — fast validation before shipping.

/pr

Stages, commits with a conventional message, pushes, and opens a GitHub PR with QA summary in the description.

Autonomous Mode

Fully autonomous. Picks highest-priority unassigned issue, plans, implements, QAs, PRs. Stops for merge approval.

/autopilot

→

picks issue

→

plans

→

implements

→

waits for merge

→

repeats

/autopilot details

Starts a continuous loop: fetches the highest-priority unassigned GitHub issue, creates a plan, implements, runs QA, and opens a PR. Pauses for human merge approval before picking the next issue. Zero intervention required between cycles.

Research & Knowledge

Build curated topic wikis. Init scaffolds, ingest processes sources, query synthesizes answers, lint health-checks.

/wiki-knowledge-base init topic

→

ingest

→

query

→

lint

init

Scaffolds a new wiki directory structure for a topic with index, pages, and metadata files.

ingest

Processes source material (docs, URLs, text) and adds structured entries to the wiki.

query

Synthesizes answers from wiki content with source citations.

lint

Health-checks the wiki — finds stale entries, broken references, coverage gaps.

Operations

Health checks, log tailing, deploy watching, incident triage, server restarts.

/sre statusHealth check across all services and infrastructure

/sre logsTail and analyze logs from running services

/sre watchWatch an active deployment for issues

/incidentClassify severity, diagnose in parallel, route to the right fix skill

/restartDiscover, restart, and validate local dev servers with health checks

Event-Driven Monitors

Instead of polling ("check CI every 5 minutes"), monitors sleep until something happens and then wake the toolkit to respond automatically. Zero API calls while idle — you only pay when an event fires.

How it works

/calibrate detects your stack
GitHub Actions, Railway, Sentry...

→

Writes config to .claude/monitors/
you add webhook URL on platform

→

Platform event fires → skill auto-runs
/ci-fix, /sre, /qa, /fix

Each monitor is a JSON config in .claude/monitors/. Calibrate generates them from templates in monitors/ (repo root), adapted to your project's platform and branch.

Available monitor templates

github-ci.json

When CI fails on any non-default branch, /ci-fix automatically reads the failure log, diagnoses the issue, patches, and resubmits. Up to 3 retries with different strategies.

Event: workflow_run.conclusion == failure

deploy-health.json

When a deploy fails or a service crashes, /sre debug investigates logs, checks health endpoints, and attempts remediation before paging you.

Event: status == FAILED or CRASHED

staging-qa.json

When staging deploys successfully, /qa staging automatically runs the full QA suite against the live staging environment. No manual trigger needed.

Event: status == SUCCESS and environment == staging

runtime-errors.json

When runtime errors exceed a threshold (e.g., 5+ occurrences of a new error), /fix auto-runs with the error details. Includes 60-min cooldown and deduplication by fingerprint.

Event: occurrences >= 5 and status == unresolved

Setup

# Step 1: Run /calibrate — it auto-detects your stack and generates
# the right monitor configs in .claude/monitors/
/calibrate

# Step 2: Calibrate shows which monitors it generated:
#   ✓ .claude/monitors/github-ci.json (GitHub Actions detected)
#   ✓ .claude/monitors/deploy-health.json (Railway detected)
#   ✗ runtime-errors.json (no Sentry/Datadog detected)

# Step 3: Add webhook URLs on your platforms
#   GitHub:  Repo Settings → Webhooks → paste monitor endpoint URL
#   Railway: Project Settings → Webhooks → paste monitor endpoint URL
#   Sentry:  Settings → Integrations → Webhooks

# Step 4: Set webhook secrets in your environment
export GITHUB_WEBHOOK_SECRET="your-secret-here"
export DEPLOY_WEBHOOK_SECRET="your-secret-here"

# Done. Events fire → toolkit responds automatically.

Safety: Monitors include built-in loop guards — after 3 failed auto-fix attempts on the same branch, the monitor suspends itself and sends a Discord alert instead of retrying forever.

Re-running: If you add Sentry or change deploy platforms later, run /calibrate rescan to regenerate monitors for the new stack.

Consulting

Full consulting engagement pipeline. Discovery, assessment, opportunity mapping, solution design, deliverables.

/client-discovery

→

/consulting

→

/opportunity-map

→

/solution-architect

→

/deliverable-builder

/client-discovery

Runs an AI readiness assessment for a client — current capabilities, gaps, maturity level.

/consulting

Full engagement management with phase tracking: discovery, assess, propose, design, deliver, track.

/opportunity-map

Maps AI opportunities by ROI and implementation effort. Recommends target initiatives.

/solution-architect

Designs technical AI solutions with architecture diagrams, tech stack, timeline, and risk analysis.

/deliverable-builder

Generates client-ready deliverables: final reports, board decks, implementation guides, training plans.

Design

Add --design to planning for UX research and design critique before implementation.

/planning --design

→

design thinker + critic

→

/implement

Design workflow details

The --design flag adds UX researcher and design critic agents to the planning team. They conduct user journey mapping, heuristic evaluation, and accessibility review before any code is written.

All Skills (60)

This grid is for finding a skill by category or keyword. Already know which one you want? Skill cards with a “Open guide” link jump to that skill's full Skill Guide — usage, examples, and edge cases.

All Consulting Development Operations Quality Research Setup

All Agents (38)

Bundles (13)

Bundles are curated packages of agents, skills, and hooks that work together. Install one bundle to get a complete workflow. Bundles compose — install multiple without conflicts.

Which bundle should I pick?

Research + knowledge

atlas

Design-driven development

canvas

Product strategy

compass

AI consulting toolkit

counsel

Autonomous orchestration

cruise

Full development workflow

forge

Everything

prime

Deep QA

prism

Arth Router

router

Surgical bug fixing

scalpel

Operations and reliability

sentinel

Safety guardrails

shield

Project setup and onboarding

spark

All bundles

Hooks (29)

Lifecycle hooks fire automatically at key moments. No manual invocation needed.

What is arthai?

arthai is a development toolkit that turns Claude Code into a coordinated engineering team — specialized agents, slash-command workflows, and automatic guardrails, installed as Claude Code plugins and adapted to your codebase.

The problem it solves

Claude Code out of the box is one generalist assistant. It's powerful, but:

Everything runs at premium cost. A file lookup costs the same model tier as an architecture decision.
Workflows live in your head. Plan → implement → QA → PR is a sequence you re-type and re-forget every time, and steps get skipped.
No guardrails. Nothing stops a destructive command, an accidental edit to a migration file, or a push to main.
No visibility. When a session ends, you have no record of what the AI did, which agents ran, or what it cost.

What arthai adds

Layer	What it is	Example
Agents	Specialists with the right model tier for the job — QA, SRE, frontend, backend, planning, research	A Haiku agent handles file searches at 1/60th the cost of Opus
Skills	Slash-command workflows that encode multi-step engineering sequences	`/fix #42` runs root-cause analysis, scope lock, fix, regression tests
Hooks	Automatic behaviors at session lifecycle events	A guard blocks `rm -rf` and force-pushes before they execute
Calibration	`/calibrate` deep-learns your codebase and seeds a knowledge base; the knowledge graph builds from it automatically at session start and workflows query it	Fixes match your conventions; QA understands your domain rules
Observability	Arth Intelligence — a local dashboard of every session, tool call, agent spawn, and dollar	See exactly what the AI did and what it cost, on your machine

Who it's for

Engineers shipping with Claude Code daily who want repeatable workflows instead of improvised prompting.
Teams that care about cost — the toolkit routes routine work to cheaper models automatically.
Anyone who needs an audit trail of what AI did in their codebase.

It also includes role-based flows (/welcome, /wizard) for non-engineers on the team.

What it is NOT

Not a replacement for Claude Code — it's installed into Claude Code as plugins.
Not a cloud service — everything runs locally; observability data never leaves your machine.
Not all-or-nothing — install focused bundles (QA only, ops only, consulting only) or prime for everything.

How the pieces fit

flowchart LR You([You]) --> CC[Claude Code] CC --> P[arthai plugins] P --> TR["Triage router (hook) picks the cheapest capable route"] TR --> SK["Skills /planning → /implement → /qa → /pr"] SK --> AG["Agents tiered Haiku · Sonnet · Opus"] P -. OTEL hook .-> AI[("Arth Intelligence local Docker dashboard")] CC -. native OTEL: cost + tokens .-> AI

Next steps

Getting started — install in ~10 minutes
Plugin catalog — pick a bundle
Arth Intelligence — turn on the dashboard
FAQ

Getting Started

New here? This guide takes you from nothing to a working toolkit in ~10 minutes (Steps 1–6), then observability (Step 7, optional), calibration (Step 8), and a fresh session (Step 9).
No toolkit — just the arth CLI? Do Steps 1–2, then Installing the Arth CLI. It gives you telemetry, Explain, and the Cloud Orchestrator from a terminal without the Claude Code toolkit.
Already installed? Skip the walkthrough — update via the FAQ update flow, connect another repo to observability via the Arth Intelligence guide, or fully remove it via the Uninstall guide.

Prerequisites

Claude Code installed and working
Node.js 22+ — for npx arthai-activate
Git
macOS or Linux (Windows via WSL)
A valid Arth AI license key (ARTH-XXXX-XXXX-XXXX-XXXX)

Step 1: Get a license key and repo access

Email productive@getarth.ai with your GitHub username to request a license key.

You'll receive:

A license key (ARTH-XXXX-XXXX-XXXX-XXXX)
A collaborator invite to the arthai-marketplace private repo on GitHub

Accept the GitHub invite before continuing — Step 3 requires repo access.

Just want the no-toolkit CLI? If you're not using the Claude Code toolkit and only want the arth command (telemetry + Explain + Cloud Orchestrator from a terminal), do Steps 1–2 above, then jump straight to Installing the Arth CLI — you can skip the toolkit steps (3–7).

Step 2: Activate your key

Run this in your terminal (not inside Claude Code):

npx arthai-activate ARTH-XXXX-XXXX-XXXX-XXXX

This stores your key at ~/.arthai/license. You only need to do this once.

Step 3: Add the plugin marketplace

Inside Claude Code:

/plugin marketplace add ArthTech-AI/arthai-marketplace

Step 4: Choose and install a bundle

Start with prime — the everything bundle. Includes all agents, skills, and hooks:

/plugin install prime@arthai

If you want a smaller, focused bundle instead, see the plugin catalog for all available bundles.

Step 5: Enable auto-updates

Run /plugin, open the Marketplaces tab, select arthai, and choose Enable auto-update. This keeps your toolkit up to date automatically — new agents, skills, and bug fixes land without manual intervention.

Step 6: Reload and verify

/reload-plugins

Then fully quit and reopen Claude Code. A running session loads its command list at startup, so a newly added command (e.g. /cloud-setup after an update) won't appear from /reload-plugins alone — only a full restart (quit the app/CLI completely, not /clear) rebuilds the command set. This is the most common reason a freshly installed skill "isn't there."

Step 7: Enable observability (optional)

You can skip this step and jump to Step 8: Calibrate. The toolkit works without observability. Come back here whenever you want a dashboard view of what Claude Code is doing — sessions, tool calls, agent spawns, cost.

New to this? Start with the Arth Intelligence guide — what the dashboard shows, why you'd want it, and what data stays on your machine. This step is the detailed install walkthrough.

⚠️ Experimental — limited preview. Observability is in active development. Expect rough edges, breaking changes between releases, and gaps in coverage. Feedback welcome at productive@getarth.ai.

See what Claude Code did — every tool call, agent spawn, and workflow phase visualized in a dashboard.

Two telemetry streams, both required for full data:
Toolkit hook — adds session/prompt/tool/agent/skill spans (installed by sentinel below).
Claude Code native OTEL — adds cost USD, input/output/cache tokens, and model to those spans (gated by the env var CLAUDE_CODE_ENABLE_TELEMETRY=1).
/otel-setup turns both on for you. Without CLAUDE_CODE_ENABLE_TELEMETRY=1, the dashboard's cost and token columns stay empty — that's the most common "why is my dashboard half-broken?" question.

Prerequisites

Docker Desktop installed and running on your machine (download here)
Verify Docker is working: open your terminal and run docker info. If you see version info, you're good. If you see an error, open Docker Desktop first.
Ports 4319, 3100, 5432 available on localhost (engine, dashboard, postgres). If any of these are in use, stop the conflicting service before continuing.

7a: Install the sentinel bundle (skip if you installed prime)

Inside Claude Code:

/plugin install sentinel@arthai

This adds the OTEL tracing hook and the /otel-setup skill to your project.

If you already have sentinel installed, update it (or rely on auto-update from Step 5):

/plugin marketplace update arthai
/plugin uninstall sentinel@arthai
/plugin install sentinel@arthai
/reload-plugins

7b: Restart Claude Code

Close and reopen Claude Code (or start a new session). When the session starts, you'll see this message:

OTEL_SETUP_REQUIRED: Observability is installed but not configured.
Run /otel-setup now.

7c: Run `/otel-setup`

No toolkit? Use the Arth CLI instead. Everything in Step 7 is the toolkit path. If you're not running the Claude Code toolkit, install the Arth CLI (one line, below) and run arth otel-setup — it stands up the identical stack (db + dashboard on :3100 + engine on :4319) and writes the same global native-OTEL env, no toolkit required. See Installing the Arth CLI just below, then continue to Step 8 / arth cloud-setup. If you later install the toolkit, /otel-setup detects this stack and only adds toolkit attribution.

Installing the Arth CLI (no toolkit)

The Arth CLI ships through the private Arth distribution repo — install is gated by the same GitHub access you already have (the repo invite + your ARTH_GITHUB_TOKEN); no public package. Requires Node.js 22+.

Need a token? ARTH_GITHUB_TOKEN is a GitHub fine-grained PAT with Contents: Read on ArthTech-AI/arthai-marketplace — the same one the orchestrator uses. Create it the way described in the Cloud Orchestrator prerequisites (github.com → Settings → Developer settings → Fine-grained tokens → Resource owner ArthTech-AI, repo arthai-marketplace, Contents: Read-only). Already use the gh CLI and it can read that repo? The installer will use gh auth token automatically — no need to set the variable.

# 1) Make sure your GitHub token (read access to ArthTech-AI/arthai-marketplace) is set:
export ARTH_GITHUB_TOKEN=github_pat_...     # or have the `gh` CLI logged in

# 2) Install the Arth CLI (downloads the self-contained binary onto your PATH).
#    Private repo → use the GitHub contents API with the raw accept header:
curl -fsSL -H "Authorization: Bearer $ARTH_GITHUB_TOKEN" \
  -H "Accept: application/vnd.github.raw" \
  https://api.github.com/repos/ArthTech-AI/arthai-marketplace/contents/cli/install.sh | sh

# 3) Verify it's on your PATH:
arth --help

Without read access to the repo the download 404s — that's the access gate. The installer puts arth in ~/.local/bin — if arth --help says "command not found", add that to your PATH: export PATH="$HOME/.local/bin:$PATH" (the installer prints this hint too). It also warns (but doesn't block) if you haven't activated a license yet: you only need the Arth license later, for arth cloud-setup → the experimental Cloud Orchestrator. Then:

arth otel-setup     # telemetry stack — no license needed
arth cloud-setup    # Explain this session + the experimental Cloud Orchestrator

Updating the CLI later: re-run the same install one-liner — it overwrites arth with the latest published build and re-verifies the checksum.

Type:

/otel-setup

The skill asks how you want to set up. Pick "Local" (option 2).

The skill then does everything for you automatically:

Checks Docker is running on your machine
Creates a configuration file at ~/.arthai/docker-compose.yml
Downloads the Arth Intelligence Docker image (first time takes 1-2 minutes)
Starts the engine on :4319 (receives traces) + dashboard on :3100 (shows traces) + database (stores traces) + Watchtower (auto-updates the engine image daily)
Waits until everything is healthy
Writes environment variables — including CLAUDE_CODE_ENABLE_TELEMETRY=1 — to your global ~/.claude/settings.json by default (so every project on this machine emits telemetry; nothing is committed to your repos). If you pick the "this project only" scope instead, they go to that repo's .claude/settings.local.json.
Creates a marker file so it doesn't ask you again

It asks two quick questions along the way — env-var scope (global vs this-project-only) and whether to keep session auto-tagging on — then runs unattended until done.

7d: Restart Claude Code (mandatory)

Close and reopen Claude Code so it picks up the new env block from ~/.claude/settings.json (or the repo's .claude/settings.local.json if you chose project-only scope). Without this restart, the env vars aren't loaded and traces won't flow.

7e: Verify it's working

You've just restarted Claude Code. The dashboard exists but is empty — there's no data yet because you haven't done anything yet. Walk through these in order:

Open the dashboard in your browser. Go to http://localhost:3100. You should see the Arth Intelligence Hub — a project/session list with an Experiments page in the sidebar. The Hub will probably be empty at this point — that's expected, you haven't run anything yet.

If the page doesn't load at all, the Docker stack may not be up. Run:

   docker ps

You should see three containers — arthai-intelligence, arthai-db, arthai-watchtower. If any are missing:

   docker compose -f ~/.arthai/docker-compose.yml up -d

Generate some activity in Claude Code. Switch back to Claude Code and run any prompt — even something trivial: Or: The toolkit's OTEL hook emits trace spans for every prompt, tool call, agent spawn, and stop event. Native OTEL emits cost and token data alongside.
```
what's in package.json?
```
```
/onboard
```

Refresh the dashboard. Your session should appear in the Sessions list with a recent timestamp. If nothing appears after ~10 seconds, the engine may not be receiving traces. Check engine health: Expect a JSON response containing "ok":true. If it fails, check logs:
```
curl -s http://localhost:4319/api/health | jq .
```
```
docker logs arthai-intelligence | tail -50
```

Click into your session. You should see a waterfall of spans — your prompt, the tool calls Claude Code made, any agent spawns, etc. Each span shows duration and metadata.

Confirm cost columns are populated. Look at the cost / token columns in the trace.

If they show values (e.g. $0.0023, 1,847 tokens) → native OTEL is flowing. You're done.

If they show — or are empty → only the toolkit hook is on. Native OTEL needs CLAUDE_CODE_ENABLE_TELEMETRY=1. Verify: If missing, re-run /otel-setup, pick Local again, then restart Claude Code.

  grep CLAUDE_CODE_ENABLE_TELEMETRY ~/.claude/settings.json
  # or, if you chose project-only scope: grep CLAUDE_CODE_ENABLE_TELEMETRY .claude/settings.local.json
  # should print: "CLAUDE_CODE_ENABLE_TELEMETRY": "1"

If all 5 steps work — observability is working end-to-end. Future Claude Code sessions automatically send traces to http://localhost:3100. You don't need to do anything else.

7f: Capture a baseline — observer-only mode (optional)

Sometimes you want to know: what does Claude Code do on its own, vs. what changes when the toolkit is active? Maybe you're evaluating whether to keep the toolkit on for a particular workflow, or you want to debug a behavior and need to isolate "is this the toolkit or is this Claude itself?"

Observer-only mode is for that. It keeps the OTEL hook emitting telemetry (so you still see the run in the dashboard), but suppresses every toolkit-specific side effect:

export OTEL_OBSERVE_ONLY=true

Then launch Claude Code as you normally would. That session's spans land in the dashboard exactly like a regular run, but:

No skill.current.json written (the file that tracks active slash commands for span attribution)
No agent.<id>.json written (the file that brackets subagent spans)
No "run /otel-setup" nag if observability isn't configured yet
Every span carries an extra resource attribute: arth.observe_only=true

To clear it, unset OTEL_OBSERVE_ONLY (or just open a new shell). It only applies to sessions started while the env var is set.

Typical A/B workflow:

Run a normal Claude Code session — do something representative (e.g., ask /onboard to brief you, then ask a follow-up).
Open a new terminal, export OTEL_OBSERVE_ONLY=true, and launch Claude Code again. Run the same prompts.
In the dashboard, find the two sessions side-by-side. The toolkit-on run has skill attribution (skill.name = "onboard" on the tool spans); the observer run does not. Compare the trace shapes, durations, and span counts.

That diff is the toolkit's contribution to your workflow.

Precedence — which env wins:

Set	Behavior
Nothing	Default — toolkit on, telemetry on
`OTEL_OBSERVE_ONLY=true`	Telemetry on, toolkit side effects off (this section)
`OTEL_DISABLED=true`	Telemetry off, toolkit off (overrides observer mode)

Filtering observer runs in the dashboard UI ships in a future release. Until then, the arth.observe_only attribute is carried on every span of an observer run — the runs appear in the dashboard alongside normal ones.

7f.1: Compare baseline vs toolkit in the arth dashboard

Once you've captured a baseline run AND a regular toolkit run, the arth dashboard's /experiments page renders them side-by-side across cost, tokens, calls, cache hit rate, lines edited, and active time.

Auto-tagging (default ON for both modes):

Every session you run automatically gets an arth.experiment label so it shows up in /experiments dropdowns without you having to set anything before each launch. The label format makes baseline vs toolkit easy to scan:

Mode	Label format	Generated by
no-toolkit baseline (just `claude`, no plugin)	`auto-baseline-<git-branch>-<first-prompt-slug>-<unix>`	Engine's session-watcher (reads CC's session JSONL)
toolkit-on session (`prime@arthai` installed)	`auto-toolkit-<git-branch>-<unix>`	Toolkit's OTEL hook (`hooks/otel-telemetry.sh`)

Example after running the same task twice:

auto-baseline-main-debug-login-failure-1715890123
auto-toolkit-main-1715891456

Pick one as left, the other as right, click Compare.

Quick walkthrough — zero-config path:

Run baseline first (no env vars, no aliases). Just claude in your project.
Run toolkit second (still no env vars). Just claude again — the toolkit auto-installs the hook.
Open the dashboard at http://localhost:3100/experiments. Both runs are already in the dropdowns.
Pick left = baseline, right = toolkit, click Compare.

Custom labels (override auto-tag):

If you want a specific human-readable label instead of the auto one, set arth.experiment before launching — your value wins:

export OTEL_RESOURCE_ATTRIBUTES="$OTEL_RESOURCE_ATTRIBUTES,arth.experiment=prepme-credit-bug-baseline"
claude

Turn auto-tagging OFF:

Set ARTH_AUTO_EXPERIMENT_DISABLED=1 — same env var honored by both modes:

~/.arthai/docker-compose.yml environment: block → disables the engine watcher's auto-tag (no-toolkit / orphaned-toolkit sessions)
<project>/.claude/settings.local.json env block → disables the toolkit hook's auto-tag (toolkit sessions)

/otel-setup asks you about this at install time — you can flip it then or at any point later.

Mid-session annotations — type /marker "spike here" inside any session. An amber ◆ glyph appears on the dashboard's DAG timeline within ~5s. From the dashboard, you can also click Drop marker on any session detail page. Export filtered slices — /arth logs export --since 1h from inside Claude Code, OR use the dashboard sidebar's "Download diagnostic bundle". Filter by experiment, marker, session ID, and time range — all AND-compose.

The Arth Intelligence guide has the condensed version of this comparison flow.

7g: What survives a reboot

After you reboot your Mac (or restart Docker Desktop), here's what comes back automatically and what doesn't:

Layer	Survives reboot?	Why
Env vars in `.claude/settings.local.json`	✅ Yes	File on disk — Claude Code reads it on every session start
`~/.arthai/docker-compose.yml`	✅ Yes	File on disk
`arthai_data` volume (your traces, scores, patterns)	✅ Yes	Docker named volume — persistent across container restarts and reboots
Engine container (`arthai-intelligence`)	✅ Yes — for current compose templates	Compose template sets `restart: unless-stopped`. If `docker inspect` shows `RestartPolicy: no`, your install predates this — one-time migration below.
Postgres container (`arthai-db`)	✅ Yes — same caveat	Same — depends on compose template having `restart: unless-stopped`.
Watchtower auto-updater	✅ Yes	Already had `restart: unless-stopped` in older compose files.
Docker Desktop itself	⚠️ Depends on YOU	Docker Desktop has a per-user "Start Docker Desktop when you log in" toggle (Settings → General). If it's off, nothing comes back until you launch Docker Desktop manually. We can't set this for you — it's an OS-level user preference.

Quick verify after a reboot:

docker ps --filter 'name=arthai'
# Should show 3 running containers: arthai-intelligence, arthai-db, arthai-watchtower

If any are missing, start them:

docker compose -f ~/.arthai/docker-compose.yml up -d

Migration for older installs (containers showing RestartPolicy: no):

If docker inspect shows your containers have RestartPolicy: no, run this one-liner — no data loss, no re-setup:

docker update --restart unless-stopped arthai-db arthai-intelligence

Or re-run /otel-setup and pick Local — the new compose template will overwrite ~/.arthai/docker-compose.yml with the right policy.

Opting out of auto-restart:

If you'd rather start Arth Intelligence manually each session (e.g., to save resources when not coding):

docker update --restart no arthai-db arthai-intelligence arthai-watchtower

You'll need to docker compose -f ~/.arthai/docker-compose.yml up -d whenever you want the dashboard back.

7h: Updating Arth Intelligence

There are three things that can update independently. Each has its own update path. None of them touch your trace data — your sessions, scores, and patterns live in the arthai_data Docker volume and are preserved across all updates.

Layer	What updates it	How often
The container image (`arthai/intelligence`)	Watchtower sidecar — pulls + restarts the container	Daily, automatic
The skill on disk (`/otel-setup`)	Standard plugin update (auto-update, or reinstall the bundle)	When you update plugins
Your local compose file (`~/.arthai/docker-compose.yml`)	Re-running `/otel-setup` — overwrites with the latest template	Only when you run the skill again

The container image — two ways

Automatic (default). A watchtower sidecar shipped in the compose template checks once a day, pulls the latest arthai/intelligence image, and restarts only that container. You don't need to do anything. To verify it's running:

docker ps --filter name=arthai-watchtower

Manual — force an update right now. Run the hosted update script:

curl -fsSL https://arthtech-ai.github.io/__ARTH_MARKETPLACE__/scripts/update.sh | sh

Or paste the two commands directly if you'd rather not pipe to shell:

docker compose -f ~/.arthai/docker-compose.yml pull
docker compose -f ~/.arthai/docker-compose.yml up -d

Both do the same thing — pull the latest image and recreate the container against it. The named volume arthai_data is left untouched.

To opt out of auto-updates:

docker stop arthai-watchtower
docker rm arthai-watchtower

You'll then need to update manually using the script above whenever you want a new version.

The skill — when we ship a new `/otel-setup`

/plugin marketplace update arthai
/plugin uninstall sentinel@arthai
/plugin install sentinel@arthai
/reload-plugins

If that doesn't pick up the change, fall back to the marketplace remove + re-add flow shown in the FAQ.

The compose file — only matters if the template changes

If a sentinel release changes the compose template (e.g. adds a new service), you'll need to re-run /otel-setup to regenerate ~/.arthai/docker-compose.yml with the new content. Re-running the skill is safe — it overwrites the compose file but never touches the arthai_data volume.

The one thing that destroys data

docker compose -f ~/.arthai/docker-compose.yml down -v   # ← DON'T DO THIS

The -v (--volumes) flag drops the arthai_data volume and erases every session, score, and pattern. Plain docker compose down (without -v) stops the containers but keeps the data — that's safe and reversible. Only use down -v if you intentionally want to wipe everything and start fresh.

What you see in the dashboard

Session waterfall — colored swimlanes showing agents, tools, tasks, permissions
Agent cost tiers — which agents ran at Opus (60x), Sonnet (10x), Haiku (1x)
Tool call duration — how long each tool call took
Workflow phases — spec generation, scope debate, implementation, QA
Session summaries — duration, agent count, task count, failures

Troubleshooting

Problem	Fix
`/otel-setup` says "Docker is not running"	Open Docker Desktop, wait for it to start, then re-run `/otel-setup`
Dashboard at `localhost:3100` shows nothing	Traces stream live — run any prompt in Claude Code, then refresh the dashboard page.
Dashboard doesn't load at all	Check Docker is running: `docker ps` should show `arthai-intelligence` and `arthai-db` containers. If not, run: `docker compose -f ~/.arthai/docker-compose.yml up -d`
Traces stop appearing after a restart	Run `source ~/.zshrc` to reload environment variables, or check that Docker containers are still running: `docker ps`
Want to stop the dashboard	Run: `docker compose -f ~/.arthai/docker-compose.yml down`
Want to restart the dashboard	Run: `docker compose -f ~/.arthai/docker-compose.yml up -d`
Want to update to the latest version	See 7h: Updating Arth Intelligence — auto-updates daily, force now with `update.sh`
Want to remove everything	Run: `docker compose -f ~/.arthai/docker-compose.yml down -v` (this deletes all trace data)

7i: Enable the dashboard's AI features (optional)

/otel-setup (above) is telemetry-only. On top of it there are two separate optional features — don't conflate them:

Not sure which keys you need? See Configuration → Keys at a glance for a one-table summary. Short version: telemetry needs none, Explain on a local model (Ollama / LM Studio + qwen) needs none, a cloud Explain model needs that provider's key, and only the experimental orchestrator needs a Claude key + GitHub token + license.

A. "Explain this session" — a grounded AI summary. NOT experimental. You can turn this on without the experimental orchestrator. It's offered at the end of /otel-setup (or arth otel-setup), and also by /cloud-setup. Pick any provider — including free local models:

Provider	What you provide
Anthropic / OpenAI / Gemini / Bedrock	that provider's API key
Ollama (local, free)	host (default `http://host.docker.internal:11434`)
LM Studio (local, free)	host (default `http://host.docker.internal:1234`) — the loaded model (e.g. qwen) is auto-detected

New user: just answer "yes" when /otel-setup offers Explain and pick a provider. Existing user (set up before this): re-run /otel-setup (or /cloud-setup) and opt into Explain — your key is added to ~/.arthai/.env and survives image updates. Local-model step-by-step: Configuration → Local models.

B. Cloud Orchestrator — experimental, Claude-only. Point Arth at a Git repo and watch it calibrate or plan a feature live. Off until you opt in via /cloud-setup (toolkit) or arth cloud-setup (CLI):

/cloud-setup        # in Claude Code (toolkit)
arth cloud-setup    # from a terminal (Arth CLI — no toolkit needed)

The sandbox runs Claude only (for now), so you'll need an ANTHROPIC_API_KEY even if you set Explain to a non-Claude provider (both keys are kept). Plus your license (auto-read from ~/.arthai/license if you ran npx arthai-activate) and a GitHub fine-grained token with read access to ArthTech-AI/arthai-marketplace. See the Cloud Orchestrator guide for how to create the token + the full walkthrough.

Step 8: Calibrate your project

On first use in a project, run:

/calibrate

This scans your codebase and configures the toolkit to match your project's patterns, conventions, and tech stack. It also builds a knowledge graph — a ranked index of your project's conventions, domain rules, and patterns that workflows like /fix query automatically to get the most relevant context for each task. The graph auto-rebuilds whenever your knowledge base changes.

Want it cheaper? Calibrate's sub-agents are already Haiku/Sonnet — the only Opus cost is the Claude Code session driving it (your "model" setting, not the toolkit). Run /model sonnet before /calibrate for a no-Opus pass. Details: /calibrate → Cost & running it cheaply.

Step 9: Start a new Claude Code session

Restart your Claude Code session so the knowledge graph gets built and the OTEL env block from Step 7 is picked up. Then:

/onboard                    # prioritized briefing on what to work on
/planning my-feature        # start building with the toolkit

You're ready.

Recommended first session

After installing and calibrating:

/onboard                    # get a briefing on your project
/planning my-first-feature  # try the planning workflow
/implement my-first-feature # spawn the team that builds it
/qa                         # commit-mode QA on the diff
/pr                         # create the PR

If you're not building something new, two good explorations:

/tech-debt                  # survey, prioritize, and propose plans for tech debt
/perf <scope>               # cross-functional performance pass

Common skills — cheat sheet

The full list of every skill in your installed bundles is at skills-reference.md. The most common ones grouped by what you're doing:

You want to...	Use
Onboard / decide what to work on	`/onboard`, `/welcome`, `/wizard`
Plan and build a feature	`/planning` (includes design spec HTML by default), `/implement`, `/qa`, `/pr`
Fix a bug formally	`/fix <description\|#issue>`
Ship code	`/precheck`, `/qa`, `/revert-check`, `/pr` (or `/ship` for the one-shot)
Review a PR	`/review-pr <#N>`
Audit code health	`/tech-debt`, `/perf`
Generate or audit docs	`/docs <audit\|write\|check>`
Repair a broken pipeline	`/incident`, `/ci-fix`, `/sre`
Restart local servers	`/restart [service]`
Deploy	`/deploy <local\|staging\|...>`, `/deploy-ios`
Schedule recurring agents	`/schedule-routine`, `/autopilot`
Manage GitHub issues	`/issue <title>`, `/issue list`, `/issue close #N`
Share a plan or strategy	`/share <plan> --format md\|slack\|twitter`
Generate from templates	`/templates <type> <topic>`

Bundle-specific skills (consulting, design, etc.) live in their respective bundles — install the bundle to surface them. See the plugin catalog.

Using Cowork Dispatch (tweet → pipeline)

Partner installs only. This feature requires access to the claude-agents source repo, which standard plugin customers don't have. Skip this section unless Arth AI has given you source access.

Cowork is Anthropic's mobile companion app for Claude. The prime bundle includes a Cowork Dispatch skill: paste a tweet URL in Cowork and it automatically queues /monitor-tweet on your desktop Claude Code.

Additional requirement: the Cowork skill dispatches to ~/.claude-agents on your Mac — a source-repo clone. You need both:

# 1. Plugin install (above) — surfaces the skill in Cowork
# 2. Clone the toolkit to ~/.claude-agents — provides /monitor-tweet on desktop
git clone git@github.com:ArthTech-AI/claude-agents.git ~/.claude-agents
~/.claude-agents/install.sh --key ARTH-XXXX-XXXX-XXXX-XXXX ~/.claude-agents

Without the clone, the Cowork skill fires but the desktop pipeline has no /monitor-tweet to run.

Knowledge graph

When you run /calibrate, the toolkit builds a knowledge graph from your project's knowledge base (.claude/knowledge/). This is a ranked index of your conventions, domain rules, patterns, and vocabulary that agents query automatically.

How it works:

/calibrate scans your codebase and populates .claude/knowledge/shared/ with conventions, domain rules, patterns, and vocabulary
The knowledge graph indexes these files into a fast-lookup graph (.claude/knowledge/graph/)
When any skill runs (/fix, /planning, /implement, /qa, etc.), it queries the graph to pull in only the most relevant context — instead of loading every knowledge file in full
The graph auto-rebuilds whenever your knowledge base changes

What this means for you:

Agents make better decisions because they have ranked, relevant project context
Fixes match your coding conventions automatically
QA checks understand your domain rules
No action needed from you — it runs behind the scenes after /calibrate

Next steps

Tutorial: ship your first PR — the 10-minute golden path
Tutorial: calibrate a project · plan and implement a feature
What is arthai? — positioning, who it's for, how the pieces fit
Arth Intelligence — the observability dashboard explained
Configuration — customize the toolkit for your workflow
Plugin catalog — see all available bundles
FAQ — common questions and answers

Uninstall

Fully remove the arthai toolkit (and, optionally, Arth Intelligence) and get back to a clean Claude Code with no hook errors. Works for the plugin install path (@arthai). Steps run in Claude Code, then your terminal.

Step 1 — Remove the plugins (in Claude Code)

Run for each bundle you installed (prime, or forge/scalpel/sentinel/…):

/plugin uninstall prime@arthai        # repeat per bundle
/plugin marketplace remove arthai

This removes the skills, agents, and hooks that live inside the plugin directory.

Step 2 — Fully quit and reopen Claude Code

This is the step that prevents hook errors. It drops the plugin's hooks from the running session. Don't just /clear — quit the app/CLI completely, do Step 3 while it's closed, then reopen.

Step 3 — Remove the leftovers (in your terminal)

Plugin uninstall does not touch your project's CLAUDE.md, the toolkit's per-project state files, or global config. Run from your project root (macOS: use sed -i '' instead of sed -i):

# Global: plugin cache
rm -rf ~/.claude/plugins/cache/*arthai*

# Keep your Arth license (~/.arthai/license) so you don't need a new key to return.
# Remove only the observability config from ~/.arthai (license stays):
rm -f ~/.arthai/docker-compose.yml ~/.arthai/docker-compose.override.yml
rm -f ~/.arthai/.env ~/.arthai/otel-configured

# Per project: remove the managed blocks the toolkit injected
sed -i '/>>> claude-agents toolkit/,/<<< claude-agents toolkit/d' CLAUDE.md
sed -i '/>>> claude-agents managed/,/<<< claude-agents managed/d' .gitignore

# Per project: remove toolkit STATE files (config/routing/workflow caches) —
# NOT your generated content (preserved below)
rm -f .claude/.claude-agents.conf .claude/registry.json .claude/.triage-full-emitted \
      .claude/verification-pending .claude/.workflow-state.json .claude/.autopilot-state.json \
      .claude/.skill-usage.json .claude/.project-state-cache.json .claude/.toolkit-setup-done \
      .claude/.toolkit-last-seen-sha .claude/.cc-version .claude/.license-validated \
      .claude/.precheck-passed .claude/.escalation-state.json .claude/.fix-scope-lock.json \
      .claude/.context-block.cache .claude/.arth-otel.env
rm -rf .claude/.goals

If .claude/settings.json still references toolkit hooks afterward (only if the dev/symlink installer was ever used), delete the hooks entries whose command path contains claude-agents, arthai, or .claude/hooks/, then restart.

Optional — remove the license key too

The steps above keep your license at ~/.arthai/license so you can reinstall later without requesting a new key. Only remove it if you're done with arthai for good (you'd need a new key to come back):

rm -rf ~/.arthai      # ⚠ also deletes your license key

Step 4 — Remove Arth Intelligence (only if you enabled observability)

Stop the local Docker stack (your trace data is in the arthai_data volume — plain down keeps it; down -v erases it):

docker compose -f ~/.arthai/docker-compose.yml down      # stop containers, keep data
# docker compose -f ~/.arthai/docker-compose.yml down -v  # ⚠ also erases all trace data

Network arthai_default — Resource is still in use? Harmless. The arthai containers are removed; the network just isn't deleted because another of your containers is still attached to it. /otel-setup reuses the same network on the next start — nothing to fix. (Don't force-disconnect your own containers from it.)

Then turn off the toolkit's telemetry env. By default /otel-setup wrote the six OTEL_* / CLAUDE_CODE_ENABLE_TELEMETRY keys to your global ~/.claude/settings.json (or a repo's .claude/settings.local.json if you chose project scope). For a complete removal, delete just those six keys — the Arth Intelligence guide lists them and shows how to remove only those (your other settings stay). Prefer not to edit a settings file at all? export OTEL_DISABLED=true turns telemetry off without changing anything. Either way, also clear the cached endpoint file:

rm -f .claude/.arth-otel.env

Upgrading, not leaving? You don't need to remove the env block at all — keep it, reinstall, and run /otel-setup → Reconfigure → Local. It refreshes the six keys itself (merging without overwriting your other settings), pulls the latest image, and recreates the stack — a full OTEL install with nothing hand-edited. Your trace history is untouched either way (it lives in the arthai_data volume).

Step 4b — Remove Cloud Orchestrator leftovers

If you enabled the AI features — the Explain provider key/host (incl. a local OLLAMA_HOST / LMSTUDIO_HOST), set via /otel-setup or /cloud-setup, and any orchestrator secrets/flags — they live in ~/.arthai/.env + ~/.arthai/docker-compose.override.yml, both already removed by rm -rf ~/.arthai in Step 3. The Cloud Orchestrator also launches disposable sandbox containers and pulls a sandbox image — neither is managed by the compose file above, so down doesn't remove them. Clean up any leftovers:

# Remove any orphaned/stuck per-run sandbox containers
docker ps -aq --filter "name=arth-cloud-" | xargs -r docker rm -f

# Optional: drop the sandbox image too
docker rmi arthai/cloud-sandbox:latest 2>/dev/null || true

(Findings data lives in the arthai_data volume, so down -v above already erases it — no separate step needed.)

Step 5 — Reopen Claude Code and verify

/help        → toolkit slash commands (/pr, /qa, /onboard, /skills…) are gone
/plugin      → arthai is no longer listed

No hook errors on startup means it's clean — because you quit fully in Step 2, Claude Code rebuilds the hook set from scratch with no toolkit entries.

What's preserved on purpose

Uninstall never deletes content the toolkit generated for you — it's yours: .claude/knowledge/, .claude/plans/, .claude/specs/, .claude/qa-knowledge/, .claude/wikis/, .claude/project-profile.md. To remove that too:

rm -rf .claude/knowledge .claude/plans .claude/specs .claude/qa-knowledge \
       .claude/wikis .claude/project-profile.md

Reinstall

To come back later: Getting Started — /plugin marketplace add ArthTech-AI/arthai-marketplace then /plugin install prime@arthai.

For a clean upgrade (full reinstall): you don't have to edit any settings file. Uninstall the plugins (Steps 1–2), stop the stack with docker compose down (Step 4 — run it before rm -rf ~/.arthai, since deleting the folder first removes the compose file you need), reinstall, then run /otel-setup → Reconfigure → Local for a full, idempotent OTEL install. Everything except the per-project .claude/ cleanup is machine-wide — plugins, the license, the Docker stack, and the global OTEL env apply to every repo, so you reinstall once; only the .claude/ cleanup and /calibrate are per project.

Getting Started — install
Arth Intelligence — full observability teardown
FAQ — common install/uninstall questions

Arth Intelligence — Observability for Claude Code

⚠️ Experimental — limited preview. Observability is in active development. Expect rough edges, breaking changes between releases, and gaps in coverage. Feedback welcome at productive@getarth.ai.

Arth Intelligence is a local dashboard that shows you what Claude Code actually did — every session, prompt, tool call, agent spawn, skill invocation, and dollar spent. It runs entirely on your machine as a Docker container. Nothing leaves your laptop.

This page covers the default experience: Observability only. Out of the box the dashboard is a single observability surface (Hub / Sessions / DAG / Story / Experiments / Cost) — exactly what's described below. There is an optional, experimental Cloud Orchestrator add-on (point Arth at a repo to calibrate or plan it live, plus a visual Knowledge view) that is off by default and completely hidden unless you opt in. If you turn it on, the left nav regroups into three lenses and "Hub" is renamed "Observability" — see Cloud Orchestrator. Not interested? Do nothing; everything on this page is all you'll see.

Why you'd want it

See the work. A session is no longer a black box — click into a waterfall of spans showing every tool call and agent spawn with durations.
See the cost. Per-call cost in USD, input/output/cache tokens, and which model tier each agent ran at (Opus 60x / Sonnet 10x / Haiku 1x).
Prove the toolkit's value. Run the same task with and without the toolkit and compare cost, tokens, and outcomes side-by-side on the /experiments page.
Debug workflows. When a skill misbehaves, the trace shows exactly which phase, which agent, and which tool call went wrong.

How it works — two telemetry streams

Stream	Source	Carries
Trace spans	toolkit `otel-telemetry` hook	session, prompt, tool calls, agent spawns, skill invocations, stop events
Cost + token metrics	Claude Code native OTEL (gated by `CLAUDE_CODE_ENABLE_TELEMETRY=1`)	per-call cost USD, input/output/cache tokens, model

Both streams flow into the same local engine. If native OTEL is off, traces still flow but the dashboard's cost and token columns stay empty — that's the most common "why is my dashboard half-broken?" question.

flowchart LR subgraph Your machine CC[Claude Code session] HK[toolkit otel-telemetry hook] subgraph Docker: Arth Intelligence EN["Engine :4319 OTLP ingest"] DB[(Postgres arthai_data volume)] DA["Dashboard :3100 Hub · Session DAG · Experiments"] end end CC -- every prompt, tool call, agent spawn, skill, stop --> HK HK -- trace spans --> EN CC -- "native OTEL (env-gated) cost USD + tokens + model" --> EN EN --> DB --> DA

Setup

Prerequisites: Docker Desktop running; ports 4319 (engine), 3100 (dashboard) free; the sentinel plugin installed (or prime, which includes it). Optional: an LLM API key (Anthropic / OpenAI / Gemini / Bedrock) or a free local Ollama / LM Studio (LM Studio auto-detects the loaded model, e.g. qwen) — only if you want the AI "Explain this session" summaries (/otel-setup will offer to set this up; skip it and everything else works).

/plugin install sentinel@arthai    # skip if you installed prime
/otel-setup                                # pick "Local"

/otel-setup does everything: verifies Docker, writes ~/.arthai/docker-compose.yml, starts the stack (engine + dashboard + postgres + Watchtower auto-updater), and writes the env vars to your global ~/.claude/settings.json (so every project on this machine emits telemetry — nothing is committed to your repos):

CLAUDE_CODE_ENABLE_TELEMETRY=1
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4319
OTEL_EXPORTER_OTLP_PROTOCOL=http/json
OTEL_METRICS_EXPORTER=otlp
OTEL_LOGS_EXPORTER=otlp
OTEL_TRACES_EXPORTER=otlp

Then restart Claude Code — without the restart, the env vars aren't loaded and traces won't flow.

Verify it's working

Open http://localhost:3100. You should see the dashboard's Hub — your projects and sessions (empty until you run something). If it doesn't load: docker ps should show arthai-intelligence, arthai-db, and arthai-watchtower; if missing, docker compose -f ~/.arthai/docker-compose.yml up -d.
Generate activity. In Claude Code, run any prompt — even "what's in package.json?".
Refresh the dashboard. Your session appears in the Sessions list. If nothing shows after ~10s: curl -s http://localhost:4319/api/health | jq .
Click into the session. You see the span waterfall — prompt, tool calls, agent spawns, durations.
Check the cost columns. Values like $0.0023 mean native OTEL is flowing — you're done. Empty columns mean CLAUDE_CODE_ENABLE_TELEMETRY=1 isn't loaded: grep CLAUDE_CODE_ENABLE_TELEMETRY ~/.claude/settings.json, re-run /otel-setup if missing, restart Claude Code.

Connecting your other projects

By default /otel-setup writes the OTEL env block to your global ~/.claude/settings.json, which Claude Code reads for every folder on this machine — so all your projects (and any you create later) report to the same localhost:4319 engine and appear in the same dashboard automatically. There's nothing to repeat per repo.

Check it's set:

grep CLAUDE_CODE_ENABLE_TELEMETRY ~/.claude/settings.json

(If you deliberately chose the This project only scope during /otel-setup, the block lives in that repo's .claude/settings.local.json instead and only that repo is connected.)

Verify a specific repo end-to-end: start a Claude Code session in it, run any prompt, then look for a new session in the dashboard — every session carries the project name and path (and an auto-experiment label with the git branch), so each repo is identifiable in the Sessions list. No new session after ~10s means that repo's env block is missing or the session wasn't restarted after setup.

What the dashboard shows

Hub — every project and session in one tree: segments, span counts, duration, cost, and last activity, updating live while a session runs. (With the Cloud Orchestrator add-on enabled, this lens is renamed Observability and joined by Orchestrator and Knowledge lenses — otherwise it stays "Hub".)

Arth Intelligence Hub — projects, sessions, spans, duration, and cost per session

Session detail — click any session for four views (Story / DAG / Spans / Trace): the narrative of what happened, a DAG timeline of agents and tools, the raw span list, and the trace waterfall. Each session also shows:

Insights strip — automatic good / watch / concern findings (e.g. elevated error rates)
Cost breakdown — three axes: by model, by skill, and by owner (toolkit vs Claude Code vs 3rd-party plugins vs direct prompting), per model tier
Explain this session (optional, not experimental) — a grounded, plain-language AI summary of what happened. It calls an LLM directly (separate, API-billed — not your Claude Code subscription; ~$0.001–0.002/session on a small/flash model, or free with local Ollama / LM Studio) and is opt-in: enable it from /otel-setup (or arth otel-setup) — it's offered inline — or with /cloud-setup. Any provider works (Anthropic / OpenAI / Gemini / Bedrock / Ollama / LM Studio — the last auto-detects the loaded model, e.g. qwen). With no key configured, the dashboard shows a small "needs an LLM key" prompt where the summary would go (it doesn't silently disappear). To turn it on — including on an existing install that predates this feature — re-run /otel-setup (or /cloud-setup) and provide a key; it's saved to ~/.arthai/.env and survives image updates (pulls/Watchtower never remove it). See Configuration → Explain this session.
Workflow phases — spec generation, scope debate, implementation, QA checkpoints
Markers — drop /marker "spike here" mid-session; an amber ◆ appears on the DAG timeline

Session detail — span timeline with durations and cost

Experiments — pick any two experiment labels and compare them side-by-side on six metrics (cost, tokens, API calls, cache hit rate, lines edited, active time) plus workflow attribution. Two caveats: cost/token/cache metrics need native OTEL enabled (sessions without it show 0), and lines-edited only populates for toolkit-instrumented sessions.

Experiments — baseline vs toolkit side-by-side comparison

Project view — a Today summary with a velocity score (composite of outcome rate, toolkit coverage, and cost efficiency) and data-backed coaching insights (e.g. "start with /planning — your last 4 sessions averaged 3 pivots without a plan").

Diagnostic bundle — one click in the sidebar (or /arth logs export) zips engine logs and a redacted span slice for support.

Compare: baseline vs. toolkit

Want to know what the toolkit actually changes? Every session is auto-tagged for the /experiments page — no setup:

Run a session without the toolkit (claude in a project with no plugin) → tagged auto-baseline-<branch>-<prompt-slug>-<unix>.
Run a session with the toolkit → tagged auto-toolkit-<branch>-<unix>.
Open http://localhost:3100/experiments, pick one on each side, click Compare. Cost, tokens, calls, cache hit rate, lines edited, and active time fill in side-by-side.

For observer-only mode (OTEL_OBSERVE_ONLY=true — telemetry on, toolkit side effects off), custom experiment labels, and the full A/B walkthrough, see Getting Started Step 7.

Privacy — what leaves your machine

Nothing. The engine, dashboard, and database all run in local Docker containers. Trace data lives in the arthai_data Docker volume on your machine. The env vars live in a git-ignored file. The only outbound traffic is Watchtower pulling image updates from Docker Hub.

Operations

Task	How
Disable telemetry	`export OTEL_DISABLED=true` (or remove the env block from `~/.claude/settings.json` — or the repo's `.claude/settings.local.json` if you used project-only scope)
Stop the stack	`docker compose -f ~/.arthai/docker-compose.yml down` (data preserved)
Start it again	`docker compose -f ~/.arthai/docker-compose.yml up -d`
Update now	`docker compose -f ~/.arthai/docker-compose.yml pull && docker compose -f ~/.arthai/docker-compose.yml up -d`
Check your version	`docker compose -f ~/.arthai/docker-compose.yml images` shows the running `arthai/intelligence` image; compare its ID to the `:latest` digest on Docker Hub
Auto-updates	Watchtower checks daily and recreates only the engine container — trace data is preserved
After a reboot	Containers auto-restart (`restart: unless-stopped`); verify with `docker ps --filter 'name=arthai'`
Wipe everything	`docker compose -f ~/.arthai/docker-compose.yml down -v` — ⚠️ `-v` deletes all trace data permanently

Details on reboot durability and the three update layers: Getting Started 7g–7h.

Am I on the latest version?

Watchtower pulls the latest arthai/intelligence image daily, so you're normally current automatically. To check manually, compare the image your container is running against Docker Hub:

docker compose -f ~/.arthai/docker-compose.yml images arthai-intelligence

Compare the image ID against the :latest tag on Docker Hub. If they differ, run Update now above (or wait for Watchtower).

Uninstall Arth Intelligence

Removing observability doesn't touch the toolkit — skills and agents keep working without it.

# 1. Stop and remove the containers (keep the trace data)
docker compose -f ~/.arthai/docker-compose.yml down

#    …or wipe the trace data too (permanent):
docker compose -f ~/.arthai/docker-compose.yml down -v

# 2. Remove the config and the "already configured" marker
rm -f ~/.arthai/docker-compose.yml ~/.arthai/otel-configured

Remove the OTEL env block (the six OTEL_* / CLAUDE_CODE_ENABLE_TELEMETRY keys) from ~/.claude/settings.json — or from a project's .claude/settings.local.json if you used project-only scope there — and delete any cached endpoint file: rm -f .claude/.arth-otel.env (a stale copy can silently misroute telemetry if you ever reinstall). Quick alternative while deciding: export OTEL_DISABLED=true turns everything off without removing anything.

Restart Claude Code.

To uninstall the whole toolkit instead, see the README Uninstall section.

Troubleshooting

Problem	Fix
`/otel-setup` says "Docker is not running"	Open Docker Desktop, wait for it to start, re-run `/otel-setup`
Dashboard loads but shows nothing	Run a prompt in Claude Code first, then refresh; check `curl -s http://localhost:4319/api/health`
Dashboard doesn't load at all	`docker ps` → if containers missing: `docker compose -f ~/.arthai/docker-compose.yml up -d`
Cost / token columns empty	Native OTEL is off — verify `CLAUDE_CODE_ENABLE_TELEMETRY=1` is in `~/.claude/settings.json`, restart Claude Code
Sessions show as "Unattributed"	Toolkit hook isn't emitting — check the sentinel plugin is installed and `/otel-setup` completed
Traces stop after a reboot	`docker ps --filter 'name=arthai'` — restart the stack if containers are down

More Q&A: FAQ — Observability.

Getting Started — Step 7 — the full install walkthrough (7a–7h)
What is arthai? — how observability fits the toolkit
Configuration — OTEL env var reference

Configuration

Auto-configuration

The fastest way to configure is:

/calibrate

This scans your codebase and sets up agents, skills, and hooks to match your project's patterns. It's safe to run multiple times.

Manual configuration

Project-specific agents

Add custom agents as .md files in your project's .claude/agents/ directory. These run alongside the plugin agents.

Project-specific skills

Add custom skills as directories in .claude/skills/. Each skill needs a SKILL.md file with frontmatter.

Overriding plugin defaults

Project-local files take precedence over plugin-provided ones of the same name:

Create .claude/agents/<name>.md (or .claude/skills/<name>/SKILL.md) in your project
Give it the same name: as the plugin agent/skill you want to replace

Your override is a regular project file — future plugin updates never touch it.

Bundle selection guide

You need...	Install
Full dev workflow (planning, implementing, QA, PRs)	`forge`
Bug fixing only	`scalpel`
Project setup and onboarding	`spark`
SRE and incident response	`sentinel`
Deep QA testing	`prism`
Safety guardrails + compliance extensions	`shield`
Design workflows	`canvas`
Product management	`compass`
Consulting toolkit	`counsel`
Autonomous mode	`cruise` (requires forge + scalpel + sentinel)
Everything	`prime`

Multiple bundles

You can install multiple bundles. They compose without conflicts:

/plugin install forge@arthai
/plugin install sentinel@arthai
/plugin install prism@arthai

Observability (OTEL)

The sentinel bundle includes the otel-telemetry hook and /otel-setup skill for tracing every Claude Code session.

Setup

/otel-setup

The skill prompts for your preferred setup. Pick "Local" — it runs everything on your machine.

Prerequisite: Docker Desktop installed and running.

The skill pulls arthai/intelligence from Docker Hub, starts the engine + dashboard + Postgres, and configures everything automatically.

The skill writes these env vars to .claude/settings.local.json (project-local, git-ignored) by default, or to your shell profile if you pick the global scope:

Variable	Required?	Purpose
`CLAUDE_CODE_ENABLE_TELEMETRY=1`	Required for cost data	Enables Claude Code's native OTEL emitter — only path that produces cost USD, input/output/cache tokens, and model. Without this, dashboard cost columns stay empty.
`OTEL_EXPORTER_OTLP_ENDPOINT`	Required	Where spans and metrics are sent (e.g., `http://localhost:4319` for the local Docker collector).
`OTEL_EXPORTER_OTLP_PROTOCOL=http/json`	Required	Native OTEL defaults to gRPC/protobuf, which most simple HTTP collectors reject silently. http/json is the format Arth Intelligence and most plain HTTP collectors accept.
`OTEL_EXPORTER_OTLP_HEADERS`	Optional	Auth headers (e.g., `Authorization=Bearer <key>`) — needed for cloud-hosted collectors only.
`OTEL_METRICS_EXPORTER=otlp`	Required	Route native metrics (cost / tokens) to OTLP.
`OTEL_LOGS_EXPORTER=otlp`	Required	Route native logs to OTLP.
`OTEL_TRACES_EXPORTER=otlp`	Required	Route native traces to OTLP.

What gets traced

The hook captures 22 Claude Code event types:

Session lifecycle: start, end, stop failure
User prompts: with skill detection (freeform vs /skill-name)
Agent lifecycle: start/stop with cost tier (opus-60x, sonnet-10x, haiku-1x) and duration
Tool calls: pre/post with input summary, output summary, and wall-clock duration
Tasks: created/completed with subject
Errors: tool failures, permission denials with error messages
Context: compaction events, instructions loaded, cwd changes
Environment: worktree create/remove, notifications, teammate idle

Critical spans (session.end, agent.stop) are sent synchronously with a 1s timeout for reliable delivery. All other spans are fire-and-forget (<10ms overhead).

Auto-trigger

If the hook is installed but OTEL is not configured, it automatically prompts you to run /otel-setup on your next session start. This only fires once — after setup, a marker file (~/.arthai/otel-configured) prevents re-prompting.

Disabling

export OTEL_DISABLED=true

"Explain this session" — optional LLM key

The Intelligence dashboard's grounded "Explain this session" summary calls an LLM directly (NOT your Claude Code subscription — separate, API-billed). It's optional: with no key the dashboard shows a "needs an LLM key" prompt where the summary would go (it doesn't silently vanish).

Where you configure it (it's NOT experimental): you can enable Explain in either setup command — you do not need to turn on the experimental Cloud Orchestrator:

With the toolkit: /otel-setup offers it at the end, or run /cloud-setup (Step 1).
Without the toolkit (CLI): arth otel-setup offers it at the end, or run arth cloud-setup. (First install the Arth CLI — gated by your GitHub repo access, no public package.)

Either way your choice lands in ~/.arthai/.env and survives image updates (pulls and Watchtower never remove it). Use any one provider — a small / flash-tier model is recommended (it only summarizes facts):

Provider	Env var	Recommended (small) model
Anthropic	`ANTHROPIC_API_KEY`	`claude-haiku-4-5`
OpenAI	`OPENAI_API_KEY` (+ `OPENAI_BASE_URL` for gateways)	`gpt-4o-mini`
Gemini	`GEMINI_API_KEY`	`gemini-2.0-flash`
Ollama (local, free)	`OLLAMA_HOST`	`llama3.1` (8B)
LM Studio (local, free)	`LMSTUDIO_HOST`	auto-detected loaded model (e.g. qwen)
Bedrock (API key)	`AWS_BEARER_TOKEN_BEDROCK` (+ `AWS_REGION`)	`anthropic.claude-3-5-haiku-...`

Override the model with ARTH_EXPLAIN_MODEL and force a provider with ARTH_EXPLAIN_PROVIDER (accepts lmstudio/ollama). Cost: ~$0.001–0.002 per session on the cloud models (cached per session) → a few dollars/month even at heavy use, or $0 with local Ollama / LM Studio. Don't use a frontier model here — flash/mini tier is the right size.

Keys at a glance — which do you actually need?

Most of the dashboard needs no key. What you need depends on the two opt-in AI features — and a fully local, zero-key setup is a first-class path:

What you want	Key / credential needed	How to get it
Telemetry only (sessions, cost, DAG) — `/otel-setup`	None	—
Explain on a local model (Ollama / LM Studio + qwen)	None ($0, nothing leaves your machine)	Install Ollama or LM Studio, load a model, start its server — see below
Explain on a cloud model	that one provider's API key (Anthropic / OpenAI / Gemini / Bedrock)	from that provider
Cloud Orchestrator (experimental, Claude-only) — `/cloud-setup`	`ANTHROPIC_API_KEY` + a GitHub fine-grained token + your Arth license	Cloud Orchestrator → Prerequisites

The local zero-key path (Explain via Ollama / LM Studio, orchestrator off) is the cheapest and most private setup — no provider account, no API bill. The only access you need is the same the toolkit already required: your Arth license (getting started) and repo access to ArthTech-AI/arthai-marketplace. Set it up in /otel-setup (inline) or /cloud-setup, then decline the orchestrator when asked.

Local models step-by-step (Ollama or LM Studio — free)

Both run a local server that the dashboard reaches over the OpenAI-compatible API. The dashboard runs inside Docker, so you must point it at host.docker.internal, not localhost.

LM Studio + qwen (or any model):

In LM Studio, download/load the model you want (e.g. a Qwen3 model).
Open the Developer / Local Server tab → Start Server → enable "Serve on Local Network" (so the Docker container can reach it). Default port is 1234.
Set the env var (or just pick "LM Studio" in /otel-setup / /cloud-setup): The loaded model is auto-detected via /v1/models — you don't need to type its id. To pin a specific one: echo 'ARTH_EXPLAIN_MODEL=<id from /v1/models>' >> ~/.arthai/.env.
```
echo 'LMSTUDIO_HOST=http://host.docker.internal:1234' >> ~/.arthai/.env
```
Recreate the dashboard: (Add -f ~/.arthai/docker-compose.override.yml too if you've enabled the orchestrator.)
```
docker compose -f ~/.arthai/docker-compose.yml up -d
```

Ollama:

ollama pull llama3.1 (or your model).
echo 'OLLAMA_HOST=http://host.docker.internal:11434' >> ~/.arthai/.env
Recreate the dashboard (same command as above).

Linux note: if host.docker.internal doesn't resolve, the container needs --add-host=host.docker.internal:host-gateway (Docker Desktop on macOS/Windows adds it automatically), or use the host's LAN IP.

Note — the Cloud Orchestrator is Claude-only (for now). Explain can use any provider above, but the experimental orchestrator's sandbox runs Claude and needs its own ANTHROPIC_API_KEY. So if you set Explain to a non-Claude provider and turn on the orchestrator, you'll be asked for a Claude key as well — both are kept in ~/.arthai/.env.

Updating plugins

Enable auto-updates once: run /plugin, open the Marketplaces tab, select arthai, choose Enable auto-update. Manual update:

/plugin marketplace update arthai
/plugin uninstall <your-bundle>@arthai   # then reinstall, e.g. prime
/plugin install <your-bundle>@arthai
/reload-plugins

After an update that adds a brand-new command, fully quit and reopen Claude Code. /reload-plugins refreshes existing commands but a running session won't surface a newly added one (e.g. /cloud-setup) until a full restart rebuilds the command set. If an update still doesn't take, use the full reset flow in the FAQ.

Context budget

The toolkit injects guidance into your session — a routing table that picks the right agent/skill for each request, plus a one-line session-start status. Unlike toolkits that load every persona on every message, this one is deliberately lean: the full routing table is injected once per session, then each later turn gets only a compact reminder.

Measured on a reference session:

Injection	Tokens	Frequency
Session-start status	~30	once
Full routing table	~1,700	once (first message)
Compact reminder	~30	each turn after
Startup total	~1,750	once
20-turn session	~2,300	—

These numbers are not hand-typed — run scripts/token-budget.sh (or --json) in the source repo for the live figure, which the script re-measures by actually running the hooks. For comparison, framework context bloat is the most-cited complaint about competing toolkits; keeping steady-state cost near ~30 tokens/turn is a deliberate design choice.

Quality Assurance Guide

The arthai QA framework uses a four-layer test strategy with 9 specialized agents.

Four-Layer Test Strategy

Every QA run uses four complementary layers:

Baseline Tests — existing test suites (regression anchor, same every run)
Generated Scenarios — fresh every run, thinks like real users based on the diff
Property-Based Tests — infer invariants from code changes, test with random/edge-case inputs
Coverage Audit — reviews if existing tests still match the codebase

Modes

Mode	Command	What it does	Time
Commit	`/qa` or `/qa commit`	Targeted checks on changed files, 2-4 agents	~1-3 min
Full	`/qa full`	All checks across full codebase, all agents	~10-20 min
Staging	`/qa staging`	Health + smoke + E2E against deployed staging	a few min
Production	`/qa prod`	Read-only health + smoke (NO mutations)	a few min
E2E Gen	`/qa e2e-gen`	Generate Playwright tests for changed components (opt-in)	~3-8 min
Visual	`/qa visual`	Computer-use visual regression at desktop + mobile (opt-in)	~5-15 min

QA Agents

Agent	Model	Role
qa-baseline-updater	sonnet	QA baseline updater — manages test snapshots and golden files for API response v
qa-challenger	sonnet	Adversarial QA agent that red-teams test plans. Reads the diff, knowledge base,
qa-domain	sonnet	Domain logic quality evaluator. Validates business logic integrity — state machi
qa-e2e-gen	sonnet	Generates exploratory Playwright tests from git diffs. Maps changed components t
qa-e2e	sonnet	E2E test specialist — Playwright browser tests for user workflows, axe-core a11y
qa-ios	sonnet	iOS simulator-based visual QA via xcrun simctl + computer-use MCP. Boots iPhone
qa-test-promoter	haiku	Converts generated test scenarios that caught real bugs into permanent baseline
qa-visual	sonnet	Visual regression QA using computer-use MCP (screenshots) and claude-in-chrome (
qa	sonnet	QA orchestrator — testing across backend, frontend, and E2E layers

Typical Workflow

During development:

/qa                     # quick check on changed files
/qa e2e-gen             # generate E2E tests for UI changes (opt-in)
/qa visual              # visual regression check (opt-in, needs dev server)

Before shipping:

/qa full                # comprehensive check across all files
/precheck               # local CI in 30s
/pr                     # create PR with QA results

After deployment:

/qa staging             # validate staging deployment
/qa prod                # read-only production health check

Skill	Usage	Description
/ci-fix	`/ci-fix [ci	staging	prod] [branch]`	Auto-remediate CI, staging, and production failures. 3-attem
/fix	`/fix <description	#issue> <--severity critical	high	medium	low> <--hotfix	--lite	--lite-strict	--verified	--full	--swarm>`	Formal bug fix pipeline — root cause analysis, scope lock, b
/qa	`/qa (commit), /qa full (comprehensive), /qa staging (deployed), /qa prod (read-only), /qa e2e-gen (opt-in), /qa visual (opt-in). Flags: <--commit-strict> <--workflow	--classic> <--invoked-by VALUE>`	Run QA checks.
/qa-incident	`/qa-incident <description>`	Manually create a QA incident from a known issue.
/qa-learn	`/qa-learn [prune]`	Review QA knowledge base stats, prune stale entries, show le

QA Knowledge Base

The QA system learns from past runs:

qa-knowledge/bug-patterns.md — patterns that caught real bugs
qa-knowledge/coverage-gaps.md — persistent test gaps
qa-knowledge/flaky-tests.md — tests that pass/fail inconsistently
qa-knowledge/incidents/ — past incidents in affected files

Use /qa-learn to review stats and prune stale entries.

Installing QA

The prism bundle includes the full QA suite:

/plugin install prism@arthai

Or get everything with prime:

/plugin install prime@arthai

Auto-generated on 2026-07-05

Monitor-Powered Watchers

Get woken up on CI failures, deploy crashes, and runtime error spikes — without polling for them yourself. The agent then walks you through the fix; it does not apply fixes without your say-so.

How it works

Claude Code's Monitor tool registers background watchers that fire on external events (CI failure, deploy crash, error threshold breach). Zero token cost while idle. Wakes only when something happens, shows you what fired, and suggests the right next command (/ci-fix, /restart, /sre, etc.) — you decide whether to run it.

Setup

Step 1 — Update the toolkit

In Claude Code (skip if auto-update is enabled):

/plugin marketplace update arthai
/plugin uninstall <your-bundle>@arthai   # then reinstall, e.g. prime
/plugin install <your-bundle>@arthai
/reload-plugins

Then fully quit and reopen Claude Code. A running session loads its command list at startup, so a newly added command won't appear from /reload-plugins alone — only a full restart (quit the app/CLI completely, not /clear) rebuilds the command set.

Step 2 — Go to your project directory

cd /path/to/your/project

Step 3 — Start a watcher with `/monitor`

/monitor doesn't need any project detection step — just tell it what to watch:

/monitor logs <service>          # watch a service's log output for error patterns
/monitor pr <owner/repo>         # watch a GitHub repo for PR state changes
/monitor deploy <service>        # watch deployment health
/monitor ci                      # watch CI run results

Each subcommand writes a watcher script to .claude/monitors/ and registers it with Claude Code's Monitor tool. There is no separate config-generation step — /monitor <subcommand> does the setup directly, and confirms with a summary of what it's watching and how to stop it (/monitor stop [<watcher-id>]).

Step 4 — Done

From this point, the watcher runs in the background at zero token cost until it sees a match.

Event	What happens
A log line matches the error pattern	The agent wakes, shows you the error context, and suggests `/ci-fix` (test failure), `/restart` (crash), or `/sre` (infra error). It does not auto-fix — you confirm before anything runs.
A watched PR changes state	The agent wakes and reports the change.
A watched deploy's health check fails	The agent wakes and reports the failure.
A CI run finishes	The agent wakes and reports the result.

Monitor is intentionally hands-on-the-wheel: it replaces polling loops with an event-driven wake-up, but every remediation step still needs your confirmation.

Nothing breaks

/ci-fix, /sre, /fix, /qa, and /pr work exactly as before — a Monitor watcher just adds an event-driven wake-up on top. If you never start a watcher, behavior is unchanged.

Stopping a watcher

/monitor stop <watcher-id>   # stop one watcher
/monitor stop                # stop all active watchers

/monitor with no arguments shows usage and lists currently active watchers (read from .claude/monitors/.active).

Which bundle includes `/monitor`

Bundle	Includes `/monitor`
prime	Yes
forge	No
sentinel	No
scalpel	No

If your bundle doesn't include /monitor, switch to prime (Step 1 above) to get it.

Supported watch sources

Subcommand	Watches
`/monitor logs <service>`	Docker container logs or a log file, for a configurable error pattern (`--pattern <regex>`)
`/monitor pr <owner/repo>`	GitHub PR state changes (`--state <open\|closed\|merged>`), via the `gh` CLI
`/monitor deploy <service>`	Deployment health, via a health-check endpoint (`--url <health-endpoint>`)
`/monitor ci`	CI run results (`--branch <branch>`, `--repo <owner/repo>`), via the `gh` CLI

There's no webhook or platform-specific config to wire up — each subcommand writes a small bash watcher script and polls or tails the relevant source directly.

Workflow Comparison — Which Skill Should I Run?

The toolkit ships several orchestration workflows. They overlap in places and diverge in others. This guide answers two questions:

Given my situation, which one should I run?
What does each one actually do, and what does it produce?

Looking for the catalog of every skill? See skills-reference.

TL;DR Decision Tree

What do you have?
│
├── A clear goal but no clear plan? ─────────────────► /goal <objective>
│       e.g. "Cut homepage LCP under 2s"
│       e.g. "Make this endpoint return paginated results"
│
├── A backlog of issues to drain? ───────────────────► /autopilot
│       e.g. "Work through these 8 issues while I'm away"
│
├── A spec / well-defined feature to ship? ──────────► /planning → /implementation-plan → /implement
│       e.g. "Refactor the auth module per this PRD"
│       e.g. "Add user avatars (acceptance criteria below)"
│
├── A specific bug to fix (with reproduction)? ──────► /fix
│       e.g. "Fix #141 short session credit refund"
│
├── CI is broken? ───────────────────────────────────► /ci-fix
│       e.g. "build is red on main"
│
├── Need to deploy something? ──────────────────────► /deploy
│       e.g. "ship this to staging"
│
├── A tweet I want to evaluate as a feature? ────────► /monitor-tweet
│       e.g. "@anthropic just announced X — relevant?"
│
└── Already mid-session and lost track? ────────────► /continue
        re-reads state, picks up where you left off

The distinction that trips people up most: /goal vs /autopilot vs /implement.

You have...	Run
One objective, fuzzy path	`/goal`
Multiple issues, ranked queue	`/autopilot`
One feature, written-down plan	`/implement` (after `/planning` → `/implementation-plan`)
One bug, clear repro	`/fix`

The Workflows

`/goal` — Speed-First Single Objective

When to use it: You know where you want to land but not how to get there. Exploratory, single objective. Fast.

How it works:

/goal <objective>
   │
   ▼
SCOUT  ── reads .claude/knowledge/shared/* + project-profile.md + prior goals FIRST,
   │      then targeted codebase scan to fill gaps
   ▼
CLARIFY  ── 3-5 context-aware questions (skips what the KB already answered)
   │      ◄── user answers (or "go" for defaults)
   ▼
CONFIRM PLAN  ── present clarified plan, wait for y/n/d
   │      ◄── user says "y"
   ▼
LOOP: pick action ──► execute ──► VERIFY (mandatory: lint + types + tests)
   │                                       ──► capture evidence (with verified flag)
   │                                       ──► self-evaluate done_when
   │  (auto-continue between turns via Stop hook)              │
   ▼                                                            ▼
all subtasks done + all evidence verified + requirements satisfied
   │
   ▼
ready for /pr (HARD STOP) ──► after /pr: append to goals-history.md

Example:

/goal Cut homepage LCP below 2s on mobile

→ Scout reads app/page.tsx, identifies hero image and analytics blocker, proposes 4 subtasks each with done_when clauses, kicks off the first action. After 5–7 turns of scout → edit → test → measure, stops at PR creation.

State: .claude/.goals/current.json (one active goal at a time) Stops at: PR creation (you run /pr yourself) Auto-continues between: every turn, via goal-auto-continue Stop hook Lifecycle: /goal pause, /goal resume, /goal clear, /goal status Produces: branch, commits, evidence log, PR-ready summary

Best for: "Make X happen" objectives where the what is clear but the how is exploratory.

`/autopilot` — Rigor-First Backlog Loop

When to use it: You have a queue of issues and want them worked through with risk gating, evidence capture, and a PR per item.

How it works:

/autopilot
   │
   ▼
ASSESS ─► CLASSIFY ─► VERIFY ─► PLAN ─► IMPLEMENT ─► QA ─► SELF-REVIEW ─► PR
   │       (P0–P5)    (risk    (repro?)        ▲                            │
   │                   0–12)                   │                            │
   │                                           │                            ▼
   ▼                                           │                       AWAITING_MERGE
priority_queue                                 │                            │
                                               └─ auto-continue between ────┘
                                                  phases via Stop hook

Example:

/autopilot --urgent-first

→ Ranks open issues P0–P5, picks the most urgent, scores blast-radius/ reversibility/confidence/domain-sensitivity (0–12), refuses if score >= 11, escalates if >= 9, otherwise verifies repro, plans, implements with the right team (backend/frontend/QA agents as needed), runs tests, scope-guards, creates PR. Stops for merge approval. After merge, picks the next item.

State: .claude/.workflow-state.json Stops at: PR creation per item (mandatory human gate); blocks on risk escalation, scope drift, QA failures after 2 attempts, or per-item budget breach (>6 agents). Auto-continues between: assess → classify → verify → plan → implement → qa → self-review → pr (via autopilot-auto-continue Stop hook). Stays silent at human-gate phases (awaiting_merge, blocked, paused). Evidence: evidence[] array — git-diffs, test results, lint, types, PR link, review status. PR body is assembled directly from evidence (no recall). Completion criteria: completion_criteria[] — each criterion has a done_when clause the model self-evaluates against captured evidence. Produces: PR per item, evidence trail, session summary

Best for: Backlog burndown when you want a consistent process per item, and the items are well-shaped enough to not need bespoke planning.

`/planning` — PRD Generation (Phase 1 of 2)

When to use it: You have a feature to build and want a written PRD (user stories, journey, edge cases, success criteria) before any architecture decisions are made.

How it works: spawn a product-manager (Sonnet) to write the PRD, with a brief feasibility-only note from the architect (no API design, no task breakdown, no debate). Design Thinker feeds UX context into the PM by default (skip with --no-design); GTM Expert can contribute positioning (--gtm).

Produces: .claude/specs/<feature>.md — the PRD, plus a design spec HTML by default.

Best for: Getting the user stories and scope reviewed before spending a debate cycle on architecture. Follow with /implementation-plan once the PRD looks right.

`/implementation-plan` — Architecture & Design (Phase 2 of 2)

When to use it: You have a reviewed PRD (from /planning) and want the full architecture debate before code is touched.

How it works: reads .claude/specs/<feature>.md → spawns architect (Opus) + product-manager (Opus) + Devil's Advocate → debates tradeoffs → finalizes plan. --design adds Design Thinker + Design Critic; --gtm adds a GTM Expert. --fast/--lite/--lite-strict control debate depth (see Arguments & flags).

Produces: .claude/plans/<feature>.md with scope, milestones, files, risks.

Best for: Non-trivial features where the plan itself is the deliverable.

`/implement` — Spec-Driven Team Build

When to use it: You have a plan (from /planning → /implementation-plan, or hand-written at .claude/plans/<feature>.md) and want a team to build it.

How it works: read the plan → spawn parallel agent team (backend + frontend + QA + red team) → each agent owns its layer → QA validates → red team challenges → finalize → ready for /qa commit and /pr.

Produces: code, tests, ready for PR.

Best for: Multi-layer features once a plan exists. Heavier than /goal — appropriate when you want explicit per-agent ownership and a paper trail of debate.

`/fix` — Formal Bug-Fix Pipeline

When to use it: A specific bug with a reproduction or issue number.

How it works: RCA → scope lock → behavior contract → fix → differential test → regression proof.

Produces: minimal-scope fix, regression test, PR.

Best for: Targeted bugs where you want guard-rails preventing scope creep.

`/ci-fix` — CI Failure Remediation

When to use it: CI is red and you want it fixed without manual debugging.

How it works: fetch failing run → diagnose → fix → re-run (3 attempts max). Exhausted? → Discord alert + escalate.

Produces: fix commits, green CI (or escalation).

Best for: Build-on-fire situations.

`/deploy` — Deployment Pipeline

When to use it: Ready to ship to local, staging, or preview.

How it works: read /calibrate deployment knowledge → run platform- specific deploy → post-deploy health check → report.

Produces: deployed environment, health-check evidence.

Best for: Routine deploys. Refuses production — those go through your team's review process.

`/monitor-tweet` — Tweet-to-PR Pipeline (partner installs only — requires source-repo access)

When to use it: You saw a tweet about a feature/idea and want to evaluate whether the toolkit should adopt it.

How it works: TRIAGE (extract idea, research feasibility, audit toolkit + arth) → present findings + BOTH repo options → user approves target + direction → build → review → /pr (auto).

Produces: PR (toolkit or arth) implementing the tweet's idea.

Best for: Triaging external feature ideas without manually researching every one.

Quick Reference: Example Scenarios

Situation	Run
"Work through these 8 issues while I'm away"	`/autopilot`
"Make this API endpoint return paginated results"	`/goal`
"Refactor the auth module, here's the spec"	`/planning` → `/implementation-plan` → `/implement`
"CI is broken on main"	`/ci-fix`
"Deploy to staging"	`/deploy staging <service>`
"Write the PRD before building"	`/planning`
"Debate the architecture before building"	`/implementation-plan` (after `/planning`)
"Fix bug #141"	`/fix #141`
"Saw an interesting tweet"	`/monitor-tweet "<tweet text>"`
"Cut LCP under 2s"	`/goal Cut LCP under 2s on mobile`
"Get the build green"	`/ci-fix`
"Pick up where I left off"	`/continue`
"What should I work on?"	`/onboard`

Auto-Continue & State

Both /goal and /autopilot use Stop hooks that nudge the model to auto-continue between phases without the user having to type "continue":

goal-auto-continue (Stop) — fires when .claude/.goals/current.json shows an active goal with auto_continue: true and unfinished subtasks.
autopilot-auto-continue (Stop) — fires when .claude/.workflow-state.json shows mode: autopilot and a non-human-gate phase.

Both hooks stay silent at human-gate moments (PR review, blocked, paused, awaiting merge, done). Both have loop-guards: after a fixed number of consecutive Stop events with no progress (12 for /goal, 15 for /autopilot), the hook bails and waits for the user.

The triage-router (UserPromptSubmit) detects active state and routes follow-up messages back to the active workflow. If the user types an unrelated slash command, the active workflow auto-pauses and the new request runs normally.

State files:

Workflow	State file
`/goal`	`.claude/.goals/current.json` (+ `archive/`)
`/autopilot`	`.claude/.workflow-state.json`
`/planning`	`.claude/specs/<feature>.md`
`/implementation-plan`	`.claude/plans/<feature>.md`
`/implement`	`.claude/.implement-state.json`
`/fix`	`.claude/.fix-scope-lock.json`, `.fix-behavior-contract.md`

Choosing Between `/goal` and `/autopilot`

Both are autonomous loops. Both stop at PR creation. Both auto-continue. The difference is shape of the work:

Dimension	`/goal`	`/autopilot`
Number of items	1 freeform objective	N issues (P0–P5 ranked queue)
Pace	Action → action → action → PR	Item → PR → wait → next item
Risk gating	Implicit (escalate when needed)	Explicit CLASSIFY phase per item
Default model	Inline + Sonnet only when needed	Sonnet team per item
Best for	"Find the path to X"	"Drain my queue"
Spawn budget	6 agents/goal	6 agents/item
Stops at	PR for the goal (once)	PR per item (every time)

If your work is "I have a destination, figure out the path" — that's /goal.

If your work is "I have a stack of well-scoped tickets, work them" — that's /autopilot.

If your work is "I have a written spec, build it" — that's /implement.

Bypass Permissions

Permission prompts are Claude Code's safety net: before writing a file or running a shell command, Claude asks you to approve. That's useful in interactive sessions — but it stalls subagents, pipelines, and Cowork Dispatch tasks that have no human in the loop to click Allow.

Bypass permissions mode disables the prompts entirely for a session. Every file edit, bash command, and tool call proceeds without asking.

When you need it

Session type	Without bypass	With bypass
Interactive (you're watching)	Fine — prompts are quick to approve	Optional
Subagents spawned by `/implement`, `/qa`, etc.	Stalls — subagent can't approve its own calls	Runs unblocked
Cowork Dispatch `start_code_task`	Stalls — no UI to approve	Runs unblocked
CI or scheduled agents	Stalls — no human present	Runs unblocked

If your sessions hang waiting for approval, or your pipelines time out mid-task, bypass permissions is what you need.

How it's configured in this repo

This repo's .claude/settings.json includes:

{
  "permissions": {
    "defaultMode": "bypassPermissions"
  },
  "skipDangerousModePermissionPrompt": true
}

defaultMode: bypassPermissions — all Claude Code sessions that load this project's settings start in bypass mode automatically. This applies to the main session, all subagents, and any start_code_task session pointed at this directory.

skipDangerousModePermissionPrompt — suppresses the one-time "are you sure you want to enter dangerous mode?" dialog. Without this, the first session still stalls on that confirmation.

The permissions block is preserved by install.sh's hook-merge logic, so it survives future toolkit upgrades.

How to enable it in your project

Option 1 — Project-level (recommended for automation)

Add to your project's .claude/settings.json:

{
  "permissions": {
    "defaultMode": "bypassPermissions"
  },
  "skipDangerousModePermissionPrompt": true
}

Commit this file. Every session against that repo — yours, your teammates', CI agents — inherits it automatically.

If your .claude/ directory is gitignored (common when using the toolkit), force-add the file:

git add -f .claude/settings.json
git commit -m "chore: enable bypass permissions for automated sessions"

Option 2 — CLI flag (per-session)

For a single session without touching settings:

claude --dangerously-skip-permissions

Or for a non-interactive run:

claude -p "your prompt here" --dangerously-skip-permissions

Option 3 — Global user settings

Add to ~/.claude/settings.json to enable bypass for every project on your machine:

{
  "permissions": {
    "defaultMode": "bypassPermissions"
  },
  "skipDangerousModePermissionPrompt": true
}

Use this if you want bypass everywhere, not just in one repo.

Security considerations

Bypass permissions means Claude Code will run any bash command or write any file without asking. That's a real reduction in human oversight. Before enabling it:

Use it when:

You trust the agent and the repo it's operating on
You're running automated pipelines where no human can approve mid-session
The repo is under version control and you can review + revert changes afterward

Don't use it when:

Working in a repo that contains credentials, production configs, or sensitive data
You're trying a new agent or prompt you haven't tested
The session has access to external systems (email, databases, payment APIs) where an accidental action can't be undone

Mitigations if you use bypass broadly:

Keep .env and secrets files in .gitignore so an accidental write doesn't commit them
Use the pre-bash-guard hook (installed with the shield bundle, included in prime) — it blocks genuinely destructive commands like rm -rf / even in bypass mode. Check it's present if you rely on it as a bypass-mode safety net
Run in a git repo so every file change is recoverable via git restore

How subagents inherit bypass

When a parent Claude Code session spawns a subagent (via the Agent tool), the subagent runs within the same session context and inherits the project settings.json. You don't need to pass any special flags — if the project has defaultMode: bypassPermissions, every subagent in that session runs with bypass too.

For Cowork Dispatch start_code_task sessions, Claude Code starts a fresh session pointed at the project directory. It loads settings.json from that directory on startup, so the same inheritance applies.

FAQ

General

Q: Do I need a license key? Yes. Without a valid key, the plugin installs but skills and hooks are non-functional. Email productive@getarth.ai to get a key.

Q: How does license validation work? On every prompt you send in Claude Code, a hook validates your key. First time it calls the license server (~200ms). After that it uses a 24-hour cache (~5ms).

Q: Can I use the plugins offline? Yes, after the first validation. The 24-hour cache means you can work offline as long as you validated within the last day.

Q: What happens if my key is revoked? Skills stop working within 24 hours (when the cache expires). Contact productive@getarth.ai for a new key.

Installation

Q: Skills aren't showing after install Run /reload-plugins and restart Claude Code.

Q: "LICENSE REQUIRED" on every prompt Run npx arthai-activate ARTH-XXXX-XXXX-XXXX-XXXX in your terminal (not Claude Code).

Q: How do I update the plugins (skills, agents, hooks) to the latest version? Easiest: turn on auto-updates once — run /plugin, open the Marketplaces tab, select arthai, choose Enable auto-update. Manual update:

/plugin marketplace update arthai      # refresh the catalog
/plugin uninstall <your-bundle>@arthai # then reinstall, e.g. prime
/plugin install <your-bundle>@arthai
/reload-plugins

What changed: see the CHANGELOG.

If the update doesn't seem to take, fall back to the full reset — remove the marketplace, clear the cache, re-add, and reinstall:

/plugin marketplace remove arthai
rm -rf ~/.claude/plugins/cache/*arthai*
/plugin marketplace add ArthTech-AI/arthai-marketplace
/plugin install <your-bundle>@arthai
/reload-plugins

Q: How do I update Arth Intelligence (the local Docker container)? Two ways, both safe — your trace data is preserved either way.

Automatic. If you installed via /otel-setup, a watchtower sidecar checks once a day and pulls the latest arthai/intelligence image. You don't need to do anything.

Manual (force update now). Run:

curl -fsSL https://arthtech-ai.github.io/__ARTH_MARKETPLACE__/scripts/update.sh | sh

Or paste these two commands if you prefer not to pipe to shell:

docker compose -f ~/.arthai/docker-compose.yml pull
docker compose -f ~/.arthai/docker-compose.yml up -d

Never run docker compose down -v unless you intentionally want to wipe all trace data (e.g. the uninstall path) — the -v flag drops the data volume and erases every session, score, and pattern. Plain down (without -v) is safe; it just stops the containers.

See Step 7h: Updating Arth Intelligence for the full picture (skill updates, compose-file updates, opt-out of auto-updates).

Usage

Q: Which bundle should I start with? Start with prime — the everything bundle (it's what the Quick Start installs). If you want a smaller footprint, forge covers the core development workflow (planning, implementing, QA, PRs); add more bundles as needed.

Q: Can I install multiple bundles? Yes. Bundles compose without conflicts.

Q: What does /calibrate do? It scans your codebase and configures the toolkit to match your project's tech stack, patterns, and conventions. Safe to run multiple times.

Q: Can I override plugin behavior? Yes. Project-local files in .claude/agents/ and .claude/skills/ take precedence over plugin-provided ones of the same name. See Configuration for details.

Troubleshooting

Q: CLAUDE.md not created after install Restart Claude Code — the setup hook fires on session start, not on install.

Q: Old version seems stuck Clear the plugin cache: rm -rf ~/.claude/plugins/cache/*arthai* then reinstall.

Q: A skill is erroring out Check that your license is valid (cat ~/.arthai/license), then try /reload-plugins and restart Claude Code.

Observability

Full guide — what Arth Intelligence is, setup, verification, dashboard tour, privacy: arth-intelligence.md

Q: Why are the cost / token columns empty in the dashboard? Cost and token data only flow when Claude Code's native OTEL is enabled. The toolkit hook emits structural spans (sessions, prompts, tool calls, agent spawns) but does not emit cost metrics — that's the native emitter's job. Verify the env var is set:

grep CLAUDE_CODE_ENABLE_TELEMETRY .claude/settings.local.json

If missing, run /otel-setup and pick "Local" — it writes CLAUDE_CODE_ENABLE_TELEMETRY=1 along with the OTLP endpoint and protocol (http/json) into .claude/settings.local.json. Restart Claude Code, run any prompt, and the cost columns populate from that point forward.

If the var is set but cost is still empty, also check OTEL_EXPORTER_OTLP_PROTOCOL=http/json is present — without it, Claude Code defaults to gRPC/protobuf and the engine silently drops the metrics.

Q: My dashboard shows everything as "Unattributed" (no Toolkit / Claude / custom-agent split) Your OTEL emitter is pointing at the wrong engine or protocol, so the owner labels never reach the dashboard — and on older installs it failed silently. This is usually a stale .claude/.arth-otel.env (the file the emission hook reads) left over from an earlier setup, holding an old port or http/protobuf instead of http/json.

Fix — three steps, ~1 minute:

Update the toolkit to the latest plugin version.
Update Arth Intelligence — it auto-updates via watchtower, or run docker compose pull && docker compose up -d (the turn_cost_facts table is created automatically on startup).
Run /otel-setup once, then restart your Claude Code session. /otel-setup now detects and repairs a stale .arth-otel.env, reconciles it with settings.local.json, validates the endpoint is reachable on http/json, and runs a smoke test. You'll also get a one-line warning at session start if the endpoint is unreachable, so this can't silently bite you again.

Attribution is forward-only: only sessions run after you correct the config are attributed by owner. Sessions captured earlier stay "Unattributed" — that's by design, not a bug.

Q: Where does the dashboard live? After /otel-setup finishes, it's at http://localhost:3100. Engine health is at http://localhost:4319/api/health.

Q: What's the difference between the toolkit's OTEL hook and Claude Code's native OTEL? Both stream into the same local engine and complement each other:

Stream	Source	Carries
Trace spans	toolkit `otel-telemetry` hook	session, prompt, tool calls, agent spawns, skill invocations, stop events
Cost + token metrics	Claude Code native OTEL (env-gated)	per-call cost USD, input/output/cache tokens, model name

You want both on for a complete dashboard. /otel-setup enables both in one step.

Q: Can I disable observability? Yes. export OTEL_DISABLED=true makes the toolkit hook a no-op. Native OTEL is gated separately by CLAUDE_CODE_ENABLE_TELEMETRY — unset it (or set to 0) to stop native cost/token streams.

Q: Why is my dashboard empty after a reboot? Most likely your engine + DB containers aren't running. Check:

docker ps --filter 'name=arthai'

You should see three containers — arthai-intelligence, arthai-db, arthai-watchtower. If any are missing, your compose file probably has an old restart policy (only watchtower had restart: unless-stopped before the reboot-durability fix). Two ways to fix:

# Fastest — just update the running containers in place
docker update --restart unless-stopped arthai-db arthai-intelligence

# Or re-run /otel-setup → Local — overwrites ~/.arthai/docker-compose.yml with the
# new template that sets restart: unless-stopped on every service

Either way, no data loss — the arthai_data Docker volume is preserved across all container changes.

If Docker Desktop itself didn't start, that's a per-user OS toggle: open Docker Desktop → Settings → General → "Start Docker Desktop when you log in." We can't set this for you.

Q: What survives a reboot vs what doesn't? See getting-started.md → Step 7g: What survives a reboot for the full table. Short version: env vars + compose file + your trace data all persist; what was running depends on whether your containers have restart: unless-stopped (current /otel-setup templates set this; verify with docker inspect --format '{{.HostConfig.RestartPolicy.Name}}' arthai-intelligence).

/autopilot

Autonomous work loop — picks up issues, implements, QAs, PRs. Stops for merge approval.

Synopsis

/autopilot [issue_number] [--dry-run] [--urgent-first|--least-first]

When to use it

You have a backlog of GitHub issues and want them worked methodically: assess → classify risk → verify → plan → implement → QA → self-review → PR
You want a junior-engineer workflow with guardrails — risk scoring before any code, escalation when unsure, and a mandatory human gate at every PR
Not for: one open-ended objective where the path is unknown — that's /goal, the speed-first single-objective loop; /autopilot is rigor-first and queue-driven, with full P0–P5 ranking and per-item risk classification
Not for: a single bug you already understand — /fix is faster

Quickstart

/autopilot --urgent-first

What you'll see: the open issues and PRs ranked P0–P5, then the loop starts on the most urgent item. It self-paces through implementation and QA, opens a PR, and stops — it never merges. After you merge, it picks up the next item automatically.

Examples

/autopilot                  # rank the backlog, ask which direction to work
/autopilot 141              # work just issue #141
/autopilot --urgent-first   # P0 → P5: critical items first (recommended default)
/autopilot --least-first    # P5 → P0: warm up on small items
/autopilot --dry-run        # show the ranked queue and plan without doing anything

Arguments & flags

Flag	Values	Default	What it does
`issue_number`	—	none	Skip ranking, work this issue only
`--dry-run`	—	off	Assess and classify only; report what it would do — good for trust-building
`--urgent-first`	—	—	Work the queue P0 → P5
`--least-first`	—	—	Work the queue P5 → P0

What it does

Assess — scans open issues, your PRs, and review comments; ranks everything P0 (critical) to P5 (nice-to-have) using labels and impact rules. If no direction flag was given, it asks you — user-confirmation checkpoint. Vague issues without acceptance criteria are flagged for your triage, never auto-picked.
Classify risk — scores blast radius, reversibility, confidence, and domain sensitivity (0–12). Low risk proceeds autonomously; 6–8 proceeds with notice; 9–10 stops and asks — user-confirmation checkpoint; 11–12 is refused. Migrations and auth/payment code always escalate.
Verify — reproduces the bug before touching code; stale already-fixed issues get closed instead of "fixed" again.
Plan — defines expected files/layers and objective completion criteria (each with a verifiable done_when clause).
Implement — simple fixes inline; multi-file work via backend/frontend agents plus a QA agent. Every commit and test run is captured in an evidence log.
QA — runs tests, lint, type checks (2 fix attempts max, then it escalates).
Self-review — a separate gate after QA passes: checks scope growth (>1.5x planned stops the loop), hardcoded values, error handling, security, test coverage, and unintended changes; re-assesses risk now that code exists. Unrelated bugs found along the way are filed as issues, not fixed inline.
PR — mandatory stop — opens a PR whose body is assembled from the evidence log, then stops and waits for your merge approval — user-confirmation checkpoint. It never merges. While waiting, it pre-assesses the next item.
Post-merge — checks CI on main (auto-reverts and reopens the issue if its merge broke CI), records learnings, then loops to the next item. Once this completes, a completion-verifier agent checks the item's evidence against its completion criteria (see below).

Agents spawned

Agent	Model tier	Role
backend / frontend	sonnet	Implementation, per the plan's layers (simple fixes run inline with no spawn)
qa	sonnet	Reviews and validates alongside implementation agents

architect (opus) is not spawned directly by autopilot — for risk-6+ features, Phase 4 (PLAN) delegates to the /planning skill, which spawns architect itself.

Post-merge verification: after a work item's full lifecycle completes, autopilot spawns completion-verifier (sonnet) to check the item's evidence against its completion criteria. It reports PASS, GAPS FOUND, or INCONCLUSIVE — gaps don't block, you decide whether to rerun.

Output & artifacts

One PR per work item, with risk assessment, scope check, evidence, and QA results in the body
.claude/.workflow-state.json — durable loop state: queue, current item, evidence, session summary (resume anytime with /autopilot)
Issues filed for problems discovered along the way; learnings appended to .claude/knowledge/skills/autopilot.md

Troubleshooting

Problem	Fix
`BLOCKED: Can't resolve test failure after 2 attempts`	The loop escalates instead of thrashing — investigate and give guidance
`SCOPE: Started at 3 files, now at 6`	Scope guard fired — approve the growth or trim the change
`COST: This item has used 6 agent spawns`	Per-item budget hit — continue, skip, or split the issue into smaller ones
Issue skipped as unrankable	It has no acceptance criteria — add them to the issue and re-run
Loop stopped and you want it back	`/autopilot` resumes from `.claude/.workflow-state.json`; saying "stop" or any unrelated request pauses it

/goal — single-objective, speed-first counterpart; both stop at PR creation
/fix — the formal bug pipeline autopilot's rigor is modeled on
/pr — manual PR workflow for work you drive yourself
/tech-debt — autopilot auto-runs an audit every 3 completed items

/brainstorm

Grounded thinking partner — turn a vague thought, question, or "help me understand this" into clarity, anchored in this repo's calibrated knowledge base (populated by running /calibrate first — the skill still works on uncalibrated projects, but answers lean more generic).

Synopsis

/brainstorm [thought | question]

Alternate invocations: /help me think, /wondering, /what if, /should we even, /not sure — all trigger the same skill.

When to use it

You don't know what you want yet — a half-formed idea, a "should we even do this?", a "what if we…"
You want a grounded explanation of how something in this repo works, not generic advice
You're weighing a direction and want your hidden assumptions surfaced before you commit
Not for: committing to build a specific feature — that's /planning; fixing a known bug — that's /fix; knowing you're building X and wanting the design nailed down before coding — that's superpowers:brainstorming (it ends by writing a design doc). /brainstorm is for when you don't know what you want yet — it's upstream of intent; it routes you to those other skills once your thinking sharpens.

Quickstart

/brainstorm I keep thinking we should refactor the payments module

What you'll see: one clarifying question, a reflection grounded in your project's calibrated knowledge (cited from project-profile.md, conventions.md, domain.md, etc.), an explicit "in this repo X / generically Y" contrast, one or two hidden assumptions called out, and a closing choice — dig deeper, converge to a summary, or hand off to the right skill. It never writes code or files on its own.

For example, on a direct question, one line of the reply looks like:

In this repo: we have inbox-queue (Redis, at-least-once) and system-bus (Kafka, exactly-once).
Generically: the tradeoff is operational simplicity vs delivery guarantees and replay-ability.

When your intent sharpens, /brainstorm offers (never auto-invokes) the matching downstream skill:

Your signal	Suggested handoff
"I want to build / spec a feature"	`/planning <name>`
"The spec exists, ready to build"	`/implement <name>`
"Just want to drive one concrete objective to done, fast"	`/goal <objective>`
"Something's broken / a bug"	`/fix <description>` (or `/incident` if severity unclear)
"I need to find code / where is X"	`/explore <query>`
"Broad orientation to this repo"	`/onboard` or `/scan`
"Competitors / market signal"	`/market-research`
"Product / feature framing"	`/opportunity-map`
"Tech debt suspicion"	`/tech-debt <scope>`
"Performance concern"	`/perf <scope>`
"Should we even do this?"	Stay in `/brainstorm` — that's this skill's job

Examples

/brainstorm what's the difference between our two queue systems?   # grounded explanation, repo-vs-generic
/brainstorm what if we let users tag projects with custom labels?  # half-formed idea → probes intent, may steer to /planning
/brainstorm should we even build a mobile app?                     # existential — stays in /brainstorm, that's the job
/brainstorm                                                        # no args → prompts "What's on your mind?"

What it does

Loads the knowledge base first (a hard contract) — reads .claude/project-profile.md, .claude/knowledge/shared/*.md, .claude/knowledge/external/sources.md, and CLAUDE.md before answering, so every response is grounded in (or explicitly contrasted against) project reality. The skill works on any project — it just gives grounded, repo-specific answers only if your project has been calibrated via /calibrate first; otherwise it says so up front and leans generic. A size guard lazy-loads deep content for very large KBs.
Parses your input — categorizes it (question, vague thought, half-formed idea, comparison, existential, bug symptom) to shape the turn.
Answers in a fixed turn template — one clarifying question (max), grounded reflection with citations, explicit repo-vs-generic labeling, 1–2 surfaced assumptions, and a closing choice.
Watches the handoff radar — when your intent crystallizes, it offers the right downstream skill (/planning, /implement, /fix, /explore, /tech-debt, /perf, …) but never auto-invokes it — a user-confirmation checkpoint.
Converges on request — produces a compact, paste-ready "Where we landed" summary, then offers to write a durable insight back to the knowledge base (conventions / domain / patterns / vocabulary). Never auto-writes; skipped entirely for pure lookups.

If the project hasn't been calibrated (no project-profile.md), it says so up front and notes its answers will lean generic — run /calibrate first for repo-grounded responses.

Output & artifacts

No files written by default — /brainstorm is a thinking mode, single-agent, no fan-out.
On convergence: a paste-ready summary block in the conversation.
Optional, only if you accept the offer: a few-line dated entry appended to .claude/knowledge/shared/<area>.md.
On an accepted handoff: control passes to the chosen skill, which owns its own outputs.

Cost

Negligible (~2 units) — a single agent, no fan-out, one set of parallel file reads on entry (bounded by a size guard so large knowledge bases lazy-load instead of front-loading), then turn-based responses. An accepted handoff to another skill bears that skill's own cost, and an accepted knowledge-base write-back at convergence is a single small append. /brainstorm is deliberately cheap so it can be the friction-free front door.

Troubleshooting

Problem	Fix
Answers feel generic, not repo-specific	The project isn't calibrated — run /calibrate so the knowledge base exists.
`/brainstorm: thought or question required as positional arg in non-interactive mode. Exiting.`	You ran it with no argument under `CI=true`/`CLAUDE_AUTOPILOT=1`. Pass a thought: `/brainstorm <your thought>`.
It keeps asking questions instead of acting	By design — it's a thinking partner, not an executor. Accept a handoff (e.g. `/fix`, `/planning`) when you're ready to act.
Confused with `superpowers:brainstorming`	Use `/brainstorm` when you're not sure what you want; use `superpowers:brainstorming` when you're set on building X and want a design doc first.

/onboard — when you want a prioritized briefing on what to work on, not open-ended thinking
/planning — the usual handoff once a /brainstorm idea crystallizes into a feature
/calibrate — populates the knowledge base that /brainstorm grounds its answers in
Workflow comparison — where /brainstorm sits among the toolkit's entry points

/calibrate

Deep-learn a project and configure the full toolkit — scans code patterns, recommends MCP servers/agents/skills/workflows, installs everything. The single entry point for project adaptation.

Synopsis

/calibrate [full|rescan|recommend|status|best-practices]

When to use it

First time the toolkit is installed on a project — run plain /calibrate
After the project evolves significantly (new framework, new service, >20% more files, or >2 weeks since last calibration) — /calibrate rescan. Or just run /calibrate with no arguments — it auto-detects and rescans if already calibrated.
To see suggestions without installing anything — /calibrate recommend
Not for: a quick CLAUDE.md refresh — that's /scan; or updating the toolkit itself — see the FAQ update flow

Prerequisite: CLAUDE.md must exist and be complete (no  placeholders). If it's missing or incomplete, /calibrate automatically runs /scan first.

Quickstart

/calibrate

What you'll see: a mode line explaining which mode was picked and why, then the calibration phases run, ending in a recommendations report (MCP servers, toolkit categories, workflows, custom agents/skills) and an install prompt. On completion it writes .claude/project-profile.md and advises starting a fresh session.

Examples

/calibrate                                  # full calibration first time; auto-rescan if already calibrated
/calibrate full                             # force a full calibration even if already calibrated
/calibrate rescan                           # diff against the existing profile, show what changed
/calibrate rescan --skip-debt               # rescan without the conditional tech-debt audit
/calibrate recommend                        # read-only — show recommendations, install nothing
/calibrate status                           # current calibration state, no sub-agents
/calibrate best-practices --source local    # extract best practices from your codebase
/calibrate best-practices --source industry # generate industry best practices for your stack

What it does

Phase 0 — mode detection (inline): checks for .claude/project-profile.md. No profile → full calibration; profile present → auto-rescan. Shows a mode line (with a cost ladder of the alternatives) so you can switch. Also verifies CLAUDE.md has the toolkit managed block and runs /scan first if CLAUDE.md is missing or still has  placeholders.
Phase 1 — scan: a scanner agent deep-reads the project — platform, architecture patterns, coding conventions, testing patterns, domain model, integrations, environments.
Phase 2 — evaluate: an evaluator agent scores every toolkit agent, skill, and hook against the scan and identifies gaps and recommended workflows.
Phase 3 — profile: a profiler agent writes .claude/project-profile.md (diff mode on rescan — only changes, manual sections preserved).
Phase 4 — recommend (user-confirmation checkpoint): presents the report, then asks via AskUserQuestion: Install all / Pick items (a multi-select follow-up) / Skip (profile only). This prompt always fires — installs require your approval.
Phase 5 — install (parallel): installer applies the approved items; a knowledge agent seeds .claude/knowledge/; a best-practices agent writes .claude/knowledge/shared/best-practices.md.
Phase 5b — build knowledge graph (inline): after the three Phase 5 agents return, the orchestrator runs kg-ingest.sh --rebuild to index the knowledge base into a queryable graph. This is what lets downstream skills (/planning, /implement, /fix, /qa, /pr, /perf, /ci-fix, and others) pull ranked, relevant context instead of reading the whole knowledge base every time — it's why calibration is foundational to the rest of the toolkit. Non-blocking: if the knowledge-graph skill isn't installed, calibration still succeeds and those skills fall back to full KB reads.
Phase 6 — verify (inline): confirms MCP servers in settings, categories installed, profile and knowledge base written, then prints the completion report.
Phase 7 — completion verification: an independent verifier agent reports PASS / GAPS FOUND / INCONCLUSIVE — gaps don't block; you decide whether to rerun.

Rescan additionally integrates older calibrate-created agents/skills into workflows (asks yes/no/pick — a user-confirmation checkpoint) and may run a conditional tech-debt audit based on drift signals (--force-debt / --skip-debt to override). If a phase fails, dependent phases are skipped and the profile is marked partial — calibration still completes with what it has.

Agents spawned

Agent	Model tier	Role
calibrate-scanner	haiku	Deep project scan
calibrate-evaluator	haiku	Scores toolkit fit, finds gaps
calibrate-profiler	haiku	Writes `.claude/project-profile.md`
calibrate-installer	sonnet	Installs approved items
calibrate-knowledge	haiku	Seeds the knowledge base
calibrate-best-practices	sonnet	Writes best-practices doc
completion-verifier	sonnet	Independent completion check

(Tiers are the defaults — your model-policy.yml governs the actual choice.)

Cost & running it cheaply

The sub-agents above are Haiku/Sonnet only — none run at Opus. Their tiers are controlled by model-policy.yml, not your session model — switching /model has no effect on sub-agent spawns.

The only place Opus can enter is the Claude Code session that drives the skill — the orchestrator's own inline work (Phase 0 mode detection, Phase 4 recommendations, Phase 6/7 verification). That session runs at your Claude Code model — set in ~/.claude/settings.json (the "model" key) or switched live with /model. It is not controlled by the toolkit, so if your session is on Opus, the orchestrator's inline steps run on Opus even though every spawned agent stays Haiku/Sonnet.

To minimize Opus use: check model-policy.yml first — if it already specifies Sonnet for all agents (the default), sub-agents already run Sonnet regardless of your session model, and there's nothing more to gain there. If your session itself is on Opus and you want to cut cost on the orchestrator's inline work too, switch before running:

/model sonnet     # orchestrator's inline work → Sonnet; sub-agents unaffected either way
/calibrate
/model opus       # switch back afterwards if you like

Model tier isn't the dominant cost, though — the scan volume is (the Haiku scanner reading many files). The single biggest saving is simply not running calibrate on an Opus session. For the wider cost picture see Configuration → Context budget, and watch a run live — cost by model/skill/owner — in Arth Intelligence.

Output & artifacts

.claude/project-profile.md — architecture, conventions, domain model
.claude/knowledge/ — seeded knowledge base (incl. shared/best-practices.md)
.claude/settings.json — recommended MCP servers added (never removed)
.claude/agents/, .claude/skills/ — custom agents/skills created as regular files
A final report plus a context tip: calibration is the heaviest context operation — start a fresh session after it finishes. The phases fill your session with intermediates (scanner output, KB reads, evaluator matrices) that have no further runtime value once the files are written — everything of value is already saved in .claude/project-profile.md and .claude/knowledge/, so clearing context loses nothing

Troubleshooting

Problem	Fix
`/calibrate recommend` errors about a missing profile	Recommend mode requires an existing `project-profile.md` — run `/calibrate` first
New MCP servers don't appear	Restart Claude Code after calibration adds MCP servers
`/calibrate best-practices` errors about `--source`	The flag is required — pass `--source local` or `--source industry`
Report says `calibration_status: partial`	A phase failed and dependents were skipped — re-run `/calibrate full`

/scan — lighter CLAUDE.md population; calibrate runs it when needed
/onboard — session briefing; hints at calibrate when no profile exists
/setup — bootstraps a project from scratch; calibrate is the natural follow-up

/ci-fix

Auto-remediate CI, staging, and production failures. 3-attempt retry with investigation. Discord alert on exhaustion.

Synopsis

/ci-fix [ci|staging|prod] [branch]

When to use it

CI went red on your branch — lint, type, test, build, migration, or dependency failures
A staging deploy failed and you want it diagnosed and patched
A production deploy failed — prod mode investigates read-only first and never touches the prod database
Not for: a logic bug in your code with green CI — that's /fix; or pre-push validation — that's /precheck

Quickstart

/ci-fix

What you'll see: the failed run's logs pulled and classified, a narrowly-scoped fix verified locally, then a commit + push and a wait for the new CI result. If the first attempt doesn't go green, it retries with a different strategy — up to 3 attempts.

Examples

/ci-fix                        # fix CI failures on the current branch
/ci-fix ci feature/my-branch   # fix CI on a specific branch
/ci-fix staging                # diagnose + fix a staging deploy failure
/ci-fix prod                   # production deploy failure — read-only investigation first

Arguments & flags

Argument	Values	Default	What it does
mode	`ci` \| `staging` \| `prod`	`ci`	Which failure surface to remediate
branch	any branch name	current branch	Which branch's CI runs to inspect

What it does

Reads the full CI/CD landscape — CLAUDE.md test commands and the project profile, then checks the knowledge base and knowledge graph using keywords from the error logs (if kg-query.sh is available, it runs first and surfaces past CI failure patterns, known flaky tests, and team conventions before investigating from scratch). It then detects the actual CI platform via config files (.github/workflows/, .gitlab-ci.yml, Jenkinsfile, .circleci/config.yml, and others) and uses platform-specific commands to pull logs, check status, and retry — for non-GitHub CI, review your CLAUDE.md test/CI commands, since most guidance below assumes GitHub Actions conventions. It also checks for environment mismatches between CI and local: runtime versions (python/node/go), running services (postgres, redis), env vars, and secrets — if tests pass locally but fail in CI, this is the first place to look.
Checks known flaky patterns — before investigating, it queries past CI incidents and the bug-pattern database in the knowledge base. If a match is found, it reports the known issue instead of redoing the investigation; if it's a new pattern, it gets added to the knowledge base for future /ci-fix runs.
Attempt loop (max 3) — each attempt: pull the failed logs, classify the failure (lint / types / tests / build / migration / dependency), apply a fix scoped to only the failing files, verify locally before pushing, then commit, push, and wait for the new CI result. Each retry uses a different strategy — attempt 2 reads more context, attempt 3 does a deep investigation against the last green commit.
Mode-specific guardrails — staging mode reads deploy logs and health endpoints, patching code or env vars; prod mode is read-only first, never modifies the production database, and considers a git revert on attempt 3.
On exhaustion — after 3 failed attempts it posts a Discord alert to #deployments (when discord-ops is configured), writes a QA incident file, and hands back to you for manual review.
Completion verification — a completion-verifier agent runs a final cross-check, displaying PASS (fully fixed), GAPS FOUND (fix incomplete), or INCONCLUSIVE (unclear if fixed). GAPS FOUND does not block anything — you decide whether to rerun /ci-fix.

On the claude-agents toolkit repo specifically, CI failures often come from repo-specific conventions rather than real bugs: bracket characters in a SKILL.md description:/arguments: field (fix: use <text> not [text]), missing required frontmatter fields, a new portable.manifest entry not yet mapped to a category in install.sh:get_category_items(), or a stale agent test fixture (fix: cp agents/<name>.md tests/fixtures/claude-setups/poweruser/.claude/agents/). /ci-fix auto-fixes all of these when it recognizes them.

Hard rules: never repo-wide lint auto-fix, never # noqa/# type: ignore suppressions, never deleted or skipped tests, never direct deploy commands — fixes always go through git, never modify dependency files unless the failure is actually a dependency issue, and never run railway up or an equivalent platform deploy command — always push to git and let the platform auto-deploy.

Agents spawned

Agent	Model tier	Role
troubleshooter	sonnet	Only on stuck escalation — most attempts run inline
completion-verifier	sonnet	Final completeness cross-check

Output & artifacts

Fix commits pushed to the branch (fix: <category> — <what was fixed>)
.claude/monitors/.ci-fix-state.json — per-branch attempt counter that persists across sessions (Monitor-triggered runs). Monitor-triggered invocations check this file first: if attempts >= 3 for the branch, it skips remediation and sends a Discord alert instead of retrying; on a green CI result the branch's entry is removed (reset). This prevents runaway agent spawns on a persistently failing branch.
Knowledge-base entry in .claude/knowledge/skills/ci-fix.md after every fix; on exhaustion, an incident file in .claude/qa-knowledge/incidents/
Discord alert in #deployments when all 3 attempts fail (if configured)
CI improvement recommendations (caching, parallelism, timeouts, flaky-test retries) after the fix

Troubleshooting

Problem	Fix
`Auto-fix exhausted after 3 attempts`	Manual review required — the QA incident file contains the diagnosis trail; start a fresh session to investigate
Monitor-triggered run exits immediately with an alert	The cross-session guard found `attempts >= 3` for this branch in `.ci-fix-state.json` — fix manually, then a green run resets the counter
Fix requires architectural changes	/ci-fix reports and stops by design — it never attempts large refactors
Same failure keeps recurring across branches	It may be a flaky test — check the report's known-flaky match and the CI recommendations for retry configuration
Attempt 2 looks like it's about to fail the same way as attempt 1	Rewind (Esc Esc) to before attempt 1 and re-prompt with a different strategy — don't let a failed attempt's context carry into the next try
Following work after a multi-attempt fix behaves oddly or seems confused	After a fix that took 2-3 attempts (or hit exhaustion), run `/clear` before starting the next task — accumulated failed-attempt logs and dead-end diagnoses actively mislead subsequent reasoning

/precheck — catch these failures locally before CI ever sees them
/fix — when the CI failure traces to a real code bug worth the formal pipeline
/incident — triage entry point that routes to /ci-fix automatically
/sre — deploy health checks and operations beyond CI
Monitors — event-driven auto-remediation: wake /ci-fix on CI failure with zero idle cost

/client-discovery

Client discovery and AI readiness assessment.

Synopsis

/client-discovery <client-name> [-- brief]

When to use it

First substantive step with a new client — gathers business context, tech landscape, and pain points
Producing an AI maturity score and readiness level to anchor the rest of the engagement
Not for: deep competitive or industry research — that's /market-research; or running the whole engagement lifecycle — that's /consulting

Quickstart

/client-discovery acme-corp

What you'll see: a structured intake interview (13 questions in three groups), followed by a current-state analysis, AI maturity scorecard, competitive research, and a set of discovery files written to the client's directory.

Prerequisites: WebSearch access must be enabled in your Claude Code settings — step 4 (competitive landscape research) runs ~8 WebSearch queries and will fail if WebSearch is not permitted. Run the command from the project directory where consulting-toolkit/ exists (typically the claude-agents root).

Examples

/client-discovery acme-corp                                   # full interactive intake
/client-discovery acme-corp -- 50-person fintech, Postgres + AWS, wants churn prediction
                                                              # brief pre-answers questions; only gaps are asked

What it does

Intake interview — user-confirmation checkpoint: asks 13 questions via AskUserQuestion in three groups — Business Context (5), Current Technology (4), Pain Points & Goals (4). If you passed a -- brief, questions it already answers are skipped and the extracted answers confirmed with you. Declined questions are recorded as "Not disclosed".
Current state analysis — business model summary, a Mermaid tech-stack diagram, a Data Readiness Score (1-5 across Volume, Quality, Accessibility, Governance, Integration), and a team capability matrix.
AI maturity assessment — scores Strategy, Data, Technology, People, and Process 1-5 against a rubric; composite score (5-25) maps to a readiness level (Not Ready, Early Stage, Developing, Advancing, AI-Native Ready), presented as an ASCII scorecard.
Competitive landscape research — runs ~8 WebSearch API queries (industry AI trends, competitor initiatives, case studies with ROI) and compiles a competitive intelligence brief (max 500 words). Requires WebSearch to be enabled — see Prerequisites above.
Writes the discovery files — profile plus four discovery documents (see below). If the client directory already exists, it warns you and asks whether to overwrite or create a versioned copy.

The whole process takes roughly 15-20 minutes with a cooperative client.

Output & artifacts

All files land in consulting-toolkit/clients/<client-name>/:

profile.md — company overview, business model summary, maturity score and readiness level
discovery/intake.md — all 13 questions with verbatim answers
discovery/current-state.md — business narrative, tech-stack diagram, data readiness scores, capability matrix
discovery/maturity-assessment.md — dimension scores with justification, composite score, scorecard, recommended focus areas
discovery/stakeholder-map.md — decision makers, influencers, champions, end users, blockers (templated with placeholders if not provided during intake)

Troubleshooting

Problem	Fix
Client can't answer a question	Say so — it records "Not disclosed" and moves on; scores are justified from what was provided
Client directory already exists	The skill asks before touching it — choose overwrite or a versioned copy
Stakeholder map is mostly placeholders	Expected when stakeholder info wasn't shared at intake — fill it in during follow-up conversations
Downstream skills say discovery data is missing	Confirm the discovery files above were written for the exact client name you're passing them
Step 4 fails with a permission error	Ensure WebSearch is enabled/permitted in your Claude Code settings

/consulting — the engagement orchestrator; runs this as the discovery phase
/opportunity-map — the next step: turns discovery data into scored initiatives
/market-research — deeper, focused research beyond the discovery-phase scan

/cloud-setup

Configure the Arth dashboard's UI/cloud features — the "Explain this session" LLM key and the experimental Cloud Orchestrator (calibrate/plan a repo). Runs on top of a telemetry-only /otel-setup install.

Synopsis

/cloud-setup

When to use it

You ran /otel-setup (telemetry-only) and now want the dashboard's "Explain this session" AI summary
You want to turn on the experimental Cloud Orchestrator (point Arth at a Git repo to calibrate or plan a feature live)
You're adding an LLM key, GitHub token, or license to an existing Arth Intelligence install
Not for: the OTEL telemetry stack itself (cost/token/session data) — that's /otel-setup, which stays telemetry-only

Quickstart

/cloud-setup

What you'll see: a precheck that your local Arth Intelligence stack is healthy, then two optional questions — enable Explain (pick a provider/key) and enable the Cloud Orchestrator (collect the required secrets, write the override, pull the sandbox). You can enable just one, both, or skip both entirely. Telemetry is never touched.

CLI parity

The same interactive flow ships in the Arth CLI for terminals without the toolkit:

arth cloud-setup

Install the Arth CLI first — it ships through the private Arth repo (gated by your GitHub access, no public package).

Examples

/cloud-setup        # the only form — the skill takes no arguments

Typical runs:

Just want Explain → opt into Explain, pick a provider/key, decline the orchestrator → dashboard summaries turn on
Want the full cloud experience → opt into Explain and the orchestrator → it writes the override, pulls the sandbox, and Runs / ☁ Calibrate a repo appear at http://localhost:3100
No toolkit installed → run arth cloud-setup in a terminal for the identical flow

What it does

Precheck — Arth Intelligence must be running. Confirms Docker is up, ~/.arthai/docker-compose.yml exists (from /otel-setup → Local), and the engine is healthy on :4319. If not, it stops with the exact fix (run /otel-setup first, or start the stack).
Asks: enable "Explain this session"? (FIRST, and NOT experimental). A user-confirmation checkpoint. Optional and separate, API-billed (not your Claude Code subscription) — ~$0.001–0.002 per session on a small/flash model, or free with local Ollama / LM Studio. Pick a provider in plain language (Anthropic / OpenAI / Gemini / Ollama / LM Studio / Bedrock); it confirms your choice (e.g. "Anthropic, claude-haiku-4-5, ~$0.001/session") and estimated per-session cost, then saves the key/host to ~/.arthai/.env and recreates the dashboard to pick it up. LM Studio auto-detects the loaded model (e.g. qwen) — no model id needed. Skip it and the dashboard shows a small "needs an LLM key" prompt where the summary would go. (/otel-setup also offers to turn on Explain inline at the end of its own flow, right after it verifies telemetry — so you don't have to run this skill just for Explain.)
Asks: enable the Cloud Orchestrator? (experimental, off by default, Claude-only). A user-confirmation checkpoint. If you accept, it collects the required secrets and writes an additive ~/.arthai/docker-compose.override.yml (orchestrate flags + sandbox image + Docker-socket access — the base compose is untouched and reversible), pulls arthai/cloud-sandbox:latest, and recreates the dashboard with cloud enabled. The sandbox runs Claude only (for now), so it needs an ANTHROPIC_API_KEY — if your Explain choice in step 2 was Claude it's reused; if it was a non-Claude provider (or you skipped Explain) you're asked for a Claude key now (both keys are kept). It also needs ARTH_GITHUB_TOKEN + ARTH_LICENSE_KEY. Decline (default) and nothing changes.
Summarizes what's now on (Explain / Orchestrator) and confirms telemetry is unaffected.

Output & artifacts

~/.arthai/.env — the LLM key (Explain) and any orchestrator secrets, expanded by docker-compose. Survives image updates (pulls/Watchtower never remove it).
~/.arthai/docker-compose.override.yml — orchestrator flags + sandbox image + socket mount (only if you enabled the orchestrator). Delete it to turn the orchestrator off.
A recreated arthai-intelligence container picking up the new config. Dashboard at http://localhost:3100.

Troubleshooting

Problem	Fix
`Arth Intelligence isn't installed yet`	Run `/otel-setup` (pick Local) first, then re-run `/cloud-setup`
`engine isn't healthy on :4319`	Start the stack: `docker compose -f ~/.arthai/docker-compose.yml up -d`, then re-run
"Explain this session" still shows the key prompt	Re-run `/cloud-setup`, opt into Explain, and provide a key (or set up local Ollama for free)
No Runs / Calibrate a repo in the dashboard	The orchestrator flags aren't set — re-run `/cloud-setup` and accept the Cloud Orchestrator step
Sandbox fails to start on a run	Confirm the Docker network: `docker network ls	grep arthai `should show` arthai_default`; adjust` ARTH_CLOUD_DOCKER_ARGS` in the override if it differs

To disable the orchestrator: delete ~/.arthai/docker-compose.override.yml and run docker compose -f ~/.arthai/docker-compose.yml up -d.

/otel-setup — the telemetry stack this builds on
Cloud Orchestrator guide — full walkthrough of calibrate/plan a repo
Arth Intelligence guide — what the dashboard shows
Configuration → Explain this session

/consulting

Run a full AI consulting engagement.

Synopsis

/consulting [client-name] [--phase discovery|assess|propose|design|deliver|track] [-- brief]

Prerequisites

Each phase delegates to a specialist skill — the phase will fail if its skill isn't installed:

Phase	Delegated skill
`discovery`	/client-discovery
`assess`	/opportunity-map
`propose`	/pitch-generator, /roi-calculator
`design`	/solution-architect
`deliver`	/deliverable-builder
`track`	/engagement-tracker

Run /skills to confirm all six are installed before starting a full engagement — a missing sub-skill surfaces as a phase failure, not a clear "skill not found" message.

When to use it

Starting a new client engagement — it creates the client workspace and walks the full lifecycle
Resuming an engagement mid-stream — it reads the saved state and picks up at the current phase
Getting a status report or a dashboard of all active engagements
Not for: running a single activity in isolation — call the phase skill directly, e.g. /client-discovery or /roi-calculator

Quickstart

/consulting acme-corp -- AI readiness assessment for a mid-market logistics firm

What you'll see: a new client workspace at consulting-toolkit/clients/acme-corp/ with engagement.json and a profile.md stub, then the discovery phase kicks off via /client-discovery.

Examples

/consulting acme-corp                          # resumes if engagement.json exists, else initializes at discovery
/consulting acme-corp --phase propose          # jump straight to the proposal phase
/consulting acme-corp -- brief context here    # new engagement with a free-text brief
/consulting                                    # no client name — dashboard of all active engagements

Note: /consulting <client-name> with no --phase flag doesn't always "resume" — it resumes only if consulting-toolkit/clients/<client-name>/engagement.json already exists. On a brand-new client name it initializes a fresh engagement at the discovery phase instead.

Arguments & flags

Flag	Values	Default	What it does
`[client-name]`	—	(omit for dashboard)	Client identifier used for directory naming and file lookups
`--phase`	`discovery` `assess` `propose` `design` `deliver` `track`	resume from saved state	Jump to a specific engagement phase
`-- <brief>`	free text	none	Engagement context or specific request

What it does

Loads or initializes engagement state — reads consulting-toolkit/clients/<client-name>/engagement.json if it exists and resumes from the current phase; otherwise creates the client directory, engagement.json (phase statuses, decisions, risks, value tracking), and a profile.md stub.
Routes to the active phase — each phase delegates to a specialist skill:

discovery → /client-discovery — interviews, tech assessment, stakeholder mapping
assess → /opportunity-map — maturity scoring, priority matrix
propose → /pitch-generator + /roi-calculator — proposal and ROI models
design → /solution-architect — architecture, build-vs-buy, sprint plan
deliver → /deliverable-builder — final deliverables package, handoff docs
track → /engagement-tracker — status reports, value tracking

Phase transition protocol — when a phase completes, it verifies completion criteria (deliverables produced, decisions logged, risks captured), updates engagement.json, and generates a phase transition summary. User-confirmation checkpoint: it always confirms with you before advancing to the next phase — it never auto-advances.
Status report on demand — at any point you can ask for an engagement status report (phase progress table, value summary, open risks, recent decisions, next actions) generated from engagement.json.
Multi-engagement dashboard — invoked with no client name, it lists all active engagements with phase, status, and projected value.

After each phase it prints a context-hygiene tip: start a fresh session per phase (/clear, then /consulting <client> --phase <next>) — engagement state lives in engagement.json, so nothing is lost.

Cost and time: a full 6-phase engagement runs ~200-300 units of model spend and spawns 7+ sub-skill invocations sequentially (one or more per phase), each of which may spawn its own agents. Expect this to be a multi-session effort — budget a session per phase rather than expecting one pass to finish everything.

Output & artifacts

consulting-toolkit/clients/<client-name>/engagement.json — engagement state: phases, deliverables, decisions, risks, value captured (written by /consulting itself)
consulting-toolkit/clients/<client-name>/profile.md — client profile (filled in across phases, written by /consulting itself)
Per-phase deliverables written by the delegated skills (discovery notes, opportunity matrix, ROI models, proposals, architecture docs, tracking dashboards) under clients/<client-name>/ — note this is a different root than the consulting-toolkit/clients/ path above, since each delegated skill (/client-discovery, /opportunity-map, /pitch-generator, /solution-architect, /deliverable-builder, /engagement-tracker) manages its own output location. Always pass the exact same <client-name> to every phase skill, or later phases won't find earlier phases' files.
Phase transition summaries and status reports in the conversation

Troubleshooting

Problem	Fix
Engagement resumes at the wrong phase	The saved `current_phase` in `engagement.json` drives routing — pass `--phase <name>` to override
Phase skill reports missing discovery/assessment data	Phases build on each other — run the earlier phase first (e.g. `/consulting <client> --phase discovery`)
Analysis feels muddied after several phases in one session	Start a fresh session per phase: `/clear`, then `/consulting <client> --phase <next>` — state reloads from `engagement.json`
Two clients' context bleeding together	Always `/clear` between clients — client context must never cross engagements

/client-discovery — discovery phase: intake interview and AI readiness assessment
/market-research — industry, competitor, and trend research
/opportunity-map — assess phase: score and prioritize AI initiatives
/roi-calculator — propose phase: financial models with sensitivity analysis
/pitch-generator — propose phase: proposals, exec summaries, deck outlines
/solution-architect — design phase: architecture and implementation plans
/deliverable-builder — deliver phase: client-ready deliverables
/engagement-tracker — track phase: progress, milestones, RAG status
/magazine-generator — polished HTML magazine for product storytelling
/templates — generic structured deliverables outside an engagement

/deliverable-builder

Build client-ready consulting deliverables.

Synopsis

/deliverable-builder <client-name> [--type final-report|status-update|board-deck|implementation-guide|training-plan|change-management|data-strategy]

When to use it

Assembling polished documents from engagement data — final reports, board decks, technical guides
Producing periodic status updates or plans (training, change management, data strategy) during delivery
Not for: pre-sale documents like proposals and pitch decks — that's /pitch-generator; or live progress dashboards — that's /engagement-tracker

Prerequisites

A client profile at clients/<client-name>/profile.json
Engagement data under clients/<client-name>/ (discovery, assessment, architecture, tracking subdirectories) — typically produced by running /client-discovery, /opportunity-map, /solution-architect, and /engagement-tracker for the client first
Without this data, generated deliverables will have thin or empty sections

Quickstart

/deliverable-builder acme-corp --type final-report

What you'll see: a 12-section engagement report (executive summary through appendices) assembled from the client's discovery, assessment, architecture, and tracking data, written to clients/acme-corp/deliverables/final-report.md with word count and section list reported.

Examples

/deliverable-builder acme-corp                              # no type — prompts you to choose one
/deliverable-builder acme-corp --type final-report          # 15-30 page comprehensive engagement report
/deliverable-builder acme-corp --type board-deck            # 10-slide executive/board presentation outline
/deliverable-builder acme-corp --type status-update         # 2-5 page periodic progress report
/deliverable-builder acme-corp --type implementation-guide  # 10-20 page technical how-to for IT teams

Arguments & flags

Flag	Values	Default	What it does
`--type`	`final-report` `status-update` `board-deck` `implementation-guide` `training-plan` `change-management` `data-strategy`	prompts you	Which deliverable to build

What it does

Loads engagement context — reads all available data under clients/<client-name>/ (profile, discovery, assessment, architecture, tracking, prior deliverables) and extracts company name, engagement dates, primary contact, and initiative names.
Type selection — user-confirmation checkpoint: if --type was not given, it asks you to choose from the 7 deliverable types.
Generates the deliverable from the standard template for that type, pulling content from prior skill outputs:

final-report — 12 sections for the executive sponsor: summary, methodology, current state, market analysis, recommendations, architecture, roadmap, ROI, risks, next steps, appendices
status-update — RAG status by workstream, accomplishments, risks, decisions needed, next-period plan, key metrics
board-deck — 10-slide outline ending in a specific ask and budget approval request
implementation-guide — prerequisites checklists, step-by-step setup, configuration reference, testing, runbook, rollback procedures
training-plan — skills assessment, three-track curriculum (all staff / power users / technical team), timeline, resources, success metrics
change-management — stakeholder analysis, phased communication plan, resistance management, adoption metrics, support structure
data-strategy — data landscape and maturity, governance framework, quality standards, migration plan, AI-specific data requirements

Quality gates — structure (version header, date, executive summary, TOC), content (citations, valid Mermaid, specific action items, realistic numbers), formatting (no leftover placeholders), and completeness checks before finalizing.
Versioned output — if the file already exists, the version number is incremented (1.0 → 1.1) and a change log entry added rather than silently overwriting.

Output & artifacts

clients/<client-name>/deliverables/<type>.md — the generated deliverable
A confirmation in the conversation: file path, word count, sections generated, and any quality-gate items needing manual review

Troubleshooting

Problem	Fix
Deliverable has thin or empty sections	The source data is missing — run the upstream skills (discovery, opportunity-map, solution-architect, engagement-tracker) for that client first
Want to regenerate without losing the previous version	Just re-run — the skill bumps the version and adds a change-log entry instead of overwriting
Quality gate flags items for manual review	Address the listed items (usually unsourced numbers or unfilled brackets) before sending to the client

/consulting — the engagement orchestrator; runs this as the deliver phase
/pitch-generator — pre-sale counterpart: proposals and pitch documents
/engagement-tracker — source of the tracking data behind status updates
/solution-architect — source of the architecture content in reports and guides
/templates — generic structured documents outside a client engagement

/deploy-ios

Build and deploy iOS apps to Simulator, TestFlight, or App Store. Auto-detects Xcode project config, resolves signing, and validates builds.

Synopsis

/deploy-ios [simulator|testflight|appstore] [--device name] [--scheme scheme]

--clean is also accepted to force a clean build.

When to use it

Running your iOS/iPadOS app in the Simulator to check a change
Archiving and uploading a build to TestFlight for beta testers, or to App Store Connect for review
Not for: Expo/EAS-managed internal-testing uploads driven by your calibrated deploy targets — that's /deploy testflight-internal; non-iOS projects (it stops if no Xcode project is found)

Prerequisites

TestFlight/App Store uploads need an App Store Connect API key. Set three environment variables before running the skill: APP_STORE_CONNECT_API_KEY_ID, APP_STORE_CONNECT_API_ISSUER_ID, and APP_STORE_CONNECT_API_KEY_PATH (path to your .p8 key file). Generate the key from your Apple Developer account under App Store Connect → Users and Access → Integrations. Without these, the skill falls back to producing a local IPA for you to upload manually.
Manual code signing (instead of the default Xcode-managed automatic signing) needs PROVISIONING_PROFILE_SPECIFIER and CODE_SIGN_IDENTITY set as environment variables before running the skill, e.g. export PROVISIONING_PROFILE_SPECIFIER=<profile-name> and export CODE_SIGN_IDENTITY=<identity-name>.

Quickstart

/deploy-ios

What you'll see: the detected workspace/project and scheme, a simulator build (the default target), then the app installed and launched in the iOS Simulator with a summary of device, OS version, and bundle ID.

Examples

/deploy-ios                                    # build + launch in the default simulator
/deploy-ios simulator --device "iPhone 16 Pro" # pick a specific simulator device
/deploy-ios testflight --scheme MyApp          # archive + upload to TestFlight
/deploy-ios appstore                           # archive + upload for App Store review
/deploy-ios simulator --clean                  # clean build when caches are suspect

Arguments & flags

Flag	Values	Default	What it does
target	`simulator`, `testflight`, `appstore`	`simulator`	Where the build goes
`--device`	simulator device name	most recent iPhone simulator available; falls back to any available iOS simulator if no iPhone found	Which simulator to boot and install on
`--scheme`	Xcode scheme name	asks if multiple	Which scheme to build
`--clean`	—	off	Cleans before building

What it does

Detects the project — finds the .xcworkspace (or .xcodeproj), lists schemes, detects the package manager (SPM/CocoaPods/Carthage), warns if Pods are stale, and reads build settings for bundle ID and deployment target. If multiple schemes exist and you didn't pass --scheme, it asks which to use — a user-confirmation checkpoint. If there's no Xcode project, it stops.
Resolves the target — simulator (default), TestFlight, or App Store.
Builds — simulator builds use a Debug configuration against your chosen device; TestFlight/App Store builds clean-archive in Release with automatic code signing (manual signing supported if you provide the profile and identity — see Prerequisites above).
Deploys — simulator: installs and launches the app, opening Simulator.app if needed. TestFlight/App Store: exports the archive and uploads directly to App Store Connect using your App Store Connect API key (see Prerequisites); if no key is configured, it produces a local IPA and tells you to upload via Transporter or the App Store Connect web UI.
Validates — simulator: confirms the app installed and launched without crashing, reports device/OS/bundle/build time. Uploads: confirms success and reminds you TestFlight processing takes 15–30 minutes.

After two failed build attempts it stops, shows the full error log, and asks for guidance.

Fastlane integration

If your project has a fastlane/ directory, the skill automatically uses your existing Fastlane lanes instead of raw xcodebuild — for simulator builds, fastlane run build_app; for TestFlight, fastlane pilot upload; for App Store, fastlane release if that lane exists. Make sure your lanes support the target you're deploying to (simulator, testflight, appstore); the skill checks fastlane lanes first.

Output & artifacts

Simulator: the app running in the Simulator, plus a build summary in the conversation
TestFlight/App Store: an archive under build/<scheme>.xcarchive, an exported IPA, and an upload to App Store Connect (with bundle version and build number reported)
No files committed and no remote infrastructure touched

Troubleshooting

Problem	Fix
`No signing certificate`	Sign in to your Apple Developer account: Xcode → Settings → Accounts
`No profiles found`	Enable Automatic Signing in Xcode, or create the provisioning profile in the Apple Developer portal
`Simulator not found`	Run `xcrun simctl list devices` and pass a valid name via `--device`
`xcodebuild: error: ... not found` (scheme)	Run `xcodebuild -list` and pass the correct `--scheme`
Stale/missing Pods	Run `pod install` in the project directory, then retry
`SPM resolution failed`	Run `xcodebuild -resolvePackageDependencies`, then retry
Upload times out	Retry the upload and check your network connection
Build mysteriously failing	Retry with `--clean` — and after 2 failures the skill stops with the full log rather than guessing

/deploy — environment-level deploys, including EAS/fastlane internal-testing tracks from your calibrated config
/fix — for code-level build failures the toolchain can't resolve
/sre — health and operations for your backend services

/deploy

Deploy to local, staging, preview, and mobile testing environments. Reads deployment knowledge from /calibrate. Refuses production — those go through your team's review process.

Synopsis

/deploy [<local|staging|preview|testflight-internal|google-play-internal>] [<service>] [--dry-run]

Production is refused — always. /deploy production (or any environment marked production) stops hard, explains why, and offers the safe path instead: PR → CI → review → your pipeline deploys on merge. There is no override.

When to use it

Getting your changes onto staging or a preview URL so the team can test them
Starting/restarting your full local stack (/deploy local)
Pushing an iOS or Android build to internal testing (TestFlight / Play internal track)
Not for: production releases — create a PR with /pr and let your review process ship it; pure local server restarts — /restart restarts already-running services, while /deploy local starts the full stack and runs migrations

Choosing an environment: use local for your own development, staging to share work with your team, preview for a temporary URL tied to a branch, and testflight-internal/google-play-internal to push a build to mobile internal testers.

Quickstart

/deploy staging

What you'll see: pre-deploy checks (clean tree, right branch, CI green, platform CLI authenticated), then the platform-appropriate deploy, a health check, and a plain-English summary with the staging URL.

Examples

/deploy staging                    # deploy current branch to staging
/deploy local                      # start/restart the local stack, run migrations, verify health
/deploy preview                    # temporary preview URL (Vercel/Netlify/Cloudflare style)
/deploy testflight-internal        # iOS build to TestFlight internal testers only
/deploy staging backend --dry-run  # show what would happen for one service, deploy nothing

Service names are project-specific (e.g. backend, frontend, api) — use the names from your CLAUDE.md ## Local Dev Services table.

Arguments & flags

Flag	Values	Default	What it does
environment	`local`, `staging`, `preview`, `testflight-internal`, `google-play-internal`	menu shown	Target environment; omit it to pick from a menu of your calibrated environments
service	a service name	all/default	Narrows the deploy for multi-service projects
`--dry-run`	—	off	Shows what would happen without deploying

What it does

Reads the deployment knowledge base that /calibrate populated (environments, platforms, CI pipeline). If the project isn't calibrated, it stops and tells you to run /calibrate first.
Resolves the target — if you didn't name an environment, it shows a menu of your environments — a user-confirmation checkpoint.
Production safety gate — refuses production/prod/live unconditionally and offers staging, /pr, or /sre ci instead.
CI pipeline check — if your CI already deploys this environment, it offers options (trigger the pipeline, watch it, bypass CI for staging with confirmation, or explain the pipeline) — a user-confirmation checkpoint rather than deploying around your pipeline.
Pre-deploy checks — clean working tree, correct branch, latest CI run passing, platform CLI installed/authenticated, project linked. Failures stop the deploy with a concrete fix.
Deploys — spawns an SRE agent with the platform-specific sequence (Railway, Vercel, Fly.io, Netlify, Cloudflare Pages, GitHub Pages, Render, or EAS/fastlane for mobile internal tracks). local starts services, runs migrations, and verifies health.
Verifies and reports — health checks with retries (or build number + processing ETA for mobile), logs the deploy to the knowledge base, and presents next steps in plain English.
Offers auto-deploy — after a successful staging deploy, asks if you want a cloud Routine that deploys staging automatically on PR merge — a user-confirmation checkpoint (declining changes nothing).

Agents spawned

Agent	Model tier	Role
sre	sonnet	Runs the platform-specific deploy sequence and health verification (one per deploy)

Output & artifacts

The deployed environment's URL with health status (or TestFlight/Play build number and processing ETA)
A deploy log entry appended to .claude/knowledge/shared/deployment.md after every deploy
Optionally: a Monitor watcher for CI-triggered deploys, and an auto-deploy Routine if you opt in

Troubleshooting

Problem	Fix
`Run /calibrate first`	`/deploy` won't guess your infrastructure — run /calibrate once (2–3 min), then retry
`I won't deploy to production`	Working as designed — use `/deploy staging` to test, then /pr for the production path
`You have uncommitted changes`	Commit or stash, then retry — deploys only run from a clean tree
`CI is failing on this commit`	Run /ci-fix, then retry the deploy
Platform CLI not authenticated	Follow the printed auth one-liner (e.g. `railway login`), then retry
Deploy succeeded but health check failed	`/deploy` escalates to `/sre debug` automatically; three consecutive failures route to /incident

/calibrate — required first run; builds the deployment knowledge /deploy reads
/pr — the path to production: PR, review, merge, pipeline deploys
/sre — deploy-check verifies deployments; ci watches the pipeline
/deploy-ios — full Xcode-level control for iOS builds (simulator/TestFlight/App Store)

/docs

Audit and generate documentation for skills, agents, and hooks — including full per-skill customer doc pages.

Synopsis

/docs <audit|write|check> [skill-name]

When to use it

You've added a project-specific skill, agent, or hook and want its documentation written to the same standard as the toolkit's own pages
You want a coverage report: which skills have full doc pages, which have quality drift (stale one-liners, broken links, missing sections)
In CI or a pre-PR hook, to gate new skills on meeting the documentation bar
Not for: writing your application's docs (READMEs, API docs for your product code) — /docs documents the toolkit surface (skills/agents/hooks), not your app

Quickstart

/docs audit

What you'll see: a report listing which skills meet the full documentation bar, which are missing pages, and which have quality issues (e.g., a synopsis that drifted from the skill's actual usage string), with a next action per item.

Examples

/docs audit              # coverage + quality report, no writes
/docs write my-skill     # generate the full doc page for one skill
/docs write              # sweep mode — fix README/GETTING-STARTED mention gaps and list skills needing pages (no page generation)
/docs check              # silent pass/fail gate — exit 1 if a newly added skill lacks docs

Arguments & flags

Argument	Values	Default	What it does
mode	`audit`, `write`, `check`	`audit`	Report gaps / generate docs / CI gate
`[skill-name]`	any skill (optional)	—	With `write`: generate that skill's full page. Without a skill name, `write` runs sweep mode instead — it fixes README/GETTING-STARTED mention gaps and lists which skills still need pages; it does not generate any page.

What it does

Detects scope — in check mode, only skills/agents/hooks added in the current branch (requires the project to be git-initialized with a main branch to diff against); in audit/write mode, every user-invocable skill.
Checks each item against the documentation bar — a doc page exists with all required sections: Synopsis, When to use it, Quickstart, Examples, What it does, Output, Troubleshooting, and Related (defined in customer-docs/skills-page-required-sections.txt); the page's one-liner matches the skill's frontmatter description (the description text before its "Usage:" clause — anything after "Usage:" is stripped before comparing); the synopsis matches its usage string; the skill is reachable from at least one task surface (routing table, workflow guide); relative links resolve; the name is mentioned in the README and getting-started guide.
Audit mode — prints the report: items at full bar, items missing pages (the backfill queue), and quality issues per item. No writes.
Write mode with a skill name — reads the skill's SKILL.md in full (requires customer-docs/docs/skills/_TEMPLATE.md to exist) and generates the page from the template: phases become the "What it does" list, confirmation gates are marked explicitly, artifacts and troubleshooting come from the skill body. Inferred prose is marked  for human review. It also proposes the task-surface placement — the exact routing-table row or workflow entry where the new skill should appear. You then run ./generate-docs.sh && ./generate-site.sh yourself to regenerate the reference tables and render the page on the site (verify zero WARN lines) — the skill does not run the generators for you.
Write mode without a name — fixes README/getting-started mention gaps and lists which skills still need pages; it deliberately does not mass-generate pages, since each page needs review.
Check mode — exits 0 silently when complete, exits 1 with the gap list when a new skill is below the bar (used by the pre-PR docs hook and CI).

Output & artifacts

Audit/check report in the conversation (check mode also sets the exit code for CI)
Write mode: a new per-skill doc page with all required sections, plus a proposed task-surface placement. Run ./generate-docs.sh && ./generate-site.sh afterward to regenerate the reference tables/site so the page is linked — the skill itself does not run these generators.
README and getting-started mention entries where missing

Troubleshooting

Problem	Fix
A reference-table row is wrong	Fix the source (the skill's frontmatter description), not the generated file — then re-run the generators
Generated page sections feel generic	That's what the `<!-- draft: review -->` markers are for — review and edit; a page that parrots its one-liner into every section fails audit
`check` fails your PR for a new skill	Run `/docs write <name>`, review the page, and add the skill to a task surface
Page exists but audit still flags it	Likely drift: the one-liner or synopsis no longer matches the skill's frontmatter — update whichever is stale
`write <name>` fails to generate a page	Confirm `customer-docs/docs/skills/_TEMPLATE.md` exists — write mode requires it
New page isn't showing up on the site/reference tables	Write mode doesn't run the generators for you — run `./generate-docs.sh && ./generate-site.sh` and check for zero WARN lines

/calibrate — creates the project-specific skills and agents you'll want documented
/scan — populates CLAUDE.md project context; /docs covers the toolkit surface
/pr — the pre-PR docs check runs as part of shipping a new skill

/engagement-tracker

Track consulting engagement progress and milestones.

Synopsis

/engagement-tracker <client-name> <--action status|update|milestone>

When to use it

Checking where an engagement stands — progress bars, RAG status, blockers, next actions
Logging a dated progress update after a working session or client call
Marking a milestone complete and rolling phase progress forward
Not for: producing the client-facing status report document — that's /deliverable-builder with --type status-update

What is RAG? RAG (Red/Amber/Green status indicator) tracks engagement health across five dimensions: schedule, scope, resources, quality, and stakeholder. Each dimension rates as Red (blocked/at risk), Amber (issues in progress), or Green (on track). The overall engagement RAG is the worst status across all dimensions.

Quickstart

/engagement-tracker acme-corp

What you'll see: an ASCII dashboard — per-phase progress bars and milestone checklists, overall progress, key metrics (days elapsed, on-track yes/no), a RAG summary across schedule/scope/resources/quality/stakeholder, blockers, and next actions — also written to tracking/status.md.

Examples

/engagement-tracker acme-corp                       # default action: status dashboard
/engagement-tracker acme-corp --action status       # same, explicit
/engagement-tracker acme-corp --action update       # add a dated progress log entry
/engagement-tracker acme-corp --action milestone    # mark a milestone complete (e.g. "D3")

Arguments & flags

Flag	Values	Default	What it does
`--action`	`status` `update` `milestone`	`status`	Dashboard, dated log entry, or milestone completion
`--date`	`YYYY-MM-DD`	today	Backdate milestone completion to a past date (milestone action only)

What it does

Loads engagement state — reads clients/<client-name>/engagement.json. On first run it creates the file with four tracked phases (Discovery & Assessment, Analysis & Strategy, Solution Design, Deliverable Creation), each with predefined milestones (locked phase structure; you cannot create custom phases or milestones, but you can edit milestone names in engagement.json directly if needed). User-confirmation checkpoint: when creating fresh, it prompts you for the start date and target end date.
status — renders the full ASCII dashboard: progress bars per phase, milestone checklists with completion dates, overall progress, key metrics computed from the dates (time used %, on-track), the five-dimension RAG summary (overall RAG = worst dimension), blockers and risks, next actions, and recent updates. Written to tracking/status.md and shown in the terminal. Important: RAG ratings default to GREEN — you must update them with justifications during status updates to avoid all-green dashboards.
update — user-confirmation checkpoint: prompts for the update details (or accepts them in the invocation), then appends a dated entry — activities completed, decisions, risks/issues, next steps — to tracking/updates.md, updates engagement.json (next actions, blockers, RAG ratings with justifications, phase progress), and regenerates the dashboard.
milestone — displays the current milestone list with IDs, prompts you for the milestone ID to mark complete (or accepts it as a parameter), updates the milestone and recalculated phase progress in engagement.json, appends a completion entry to tracking/milestones.md, shows a confirmation box, and regenerates the dashboard. When all milestones in a phase complete, the phase automatically closes and advances to the next phase — you will see a notification confirming which phase is now active.

Updates and milestone entries are always appended — previous entries are never overwritten.

Output & artifacts

All under clients/<client-name>/:

engagement.json — engagement state: phases, milestones, RAG, blockers, next actions, update log
tracking/status.md — the ASCII dashboard (regenerated after every action)
tracking/updates.md — append-only dated progress log
tracking/milestones.md — append-only milestone completion records

Troubleshooting

Problem	Fix
Dashboard looks stale	Re-run `--action status` — the tracker also regenerates automatically when `tracking/status.md` is older than the latest update
Marked the wrong milestone	Milestone status lives in `engagement.json` — ask to set that milestone back to pending and the progress recalculates
Completed a milestone last week, not today	Use `--date YYYY-MM-DD` to backdate the completion
Everything shows GREEN by default	RAG ratings must be justified — challenge unexplained greens; the quality checklist requires justification, not defaults

/consulting — the engagement orchestrator; runs this as the track phase
/deliverable-builder — turns this tracking data into a client-facing status update
/client-discovery — kicks off the first tracked phase of the engagement

/explore

Fast codebase exploration using Haiku. Use for finding files, searching code, understanding structure.

Synopsis

/explore <query>

When to use it

"Where is X?" questions — finding files, functions, classes, or patterns
Understanding directory structure or how a subsystem is organized, cheaply
Scouting before expensive work — many toolkit skills run the same explore-light agent internally for exactly this reason
Not for: deep architectural reasoning or debugging novel problems — exploration findings feed those tasks, they don't replace them; and not for making changes — /explore only reads

Quickstart

/explore how does authentication work

What you'll see: the explore-light agent (running on Haiku, ~60x cheaper than Opus) scans the codebase and returns a concise summary with the relevant file paths — without burning your main session's context on the search itself.

Examples

/explore find all API route handlers          # pattern search across the codebase
/explore how does authentication work         # understand a subsystem
/explore where is the database schema defined # locate a specific definition

What it does

Parses your query from the skill arguments.
Spawns the explore-light agent (Haiku) with your query as its prompt — the agent does the file finding, grepping, and structure summarizing in its own context.
Returns the findings — file paths and a brief explanation — back into your session.

Agents spawned

Agent	Model tier	Role
explore-light	haiku	Performs the exploration and returns findings

Output & artifacts

Findings in the conversation: file paths, brief descriptions, and structural summaries
No files written, no code modified — read-only by design

Troubleshooting

Problem	Fix
Findings are too shallow for the question	/explore is optimized for speed and cost — for deep analysis, use the findings as input to a follow-up task in your main session
Query returned the wrong area of the codebase	Be more specific: name the feature, directory, or symbol ("auth middleware in the API layer" beats "auth")
You needed an edit, not a search	/explore never modifies code — ask for the change directly after locating the file

/onboard — session briefing that includes project orientation
/scan — persists project structure into CLAUDE.md, rather than answering one-off queries
/calibrate — deep project learning; /explore is the lightweight everyday counterpart

/extensions disable

Disable a previously enabled compliance extension pack.

Synopsis

/extensions disable <pack_id> [reason]

When to use it

A pack no longer applies to this project (out of scope, requirements changed)
Temporarily pausing a pack you plan to re-enable under a different scope
Not for: skipping a single rule while keeping the rest of the pack active — use /extensions waive instead. Use disable when the whole pack no longer applies; use waive when only one rule in it doesn't.

pack_id must be the full domain/name form shown by /extensions list (e.g. compliance/hipaa) — a bare name like hipaa won't match.

Quickstart

/extensions disable compliance/hipaa "Out of scope for this project"

What you'll see:

Archived evidence to .claude/extensions/compliance-hipaa/archive/2026-07-05T00-00-00Z/   (only if a report existed)
Disabled compliance/hipaa (reason: Out of scope for this project).
To re-enable: /extensions enable compliance/hipaa

The pack's rules stop contributing to skill prompts, its current evidence report is archived (if one existed), and the audit trail is preserved.

Examples

/extensions disable compliance/hipaa "Out of scope for this project"   # disable with inline reason
/extensions disable security/baseline                                  # no reason given — you'll be asked for one

What it does

Collects a reason — if you didn't supply one, prompts you with a dropdown menu of common reasons (e.g. "Out of scope for this project"), plus a free-text and a cancel option. This is a user-confirmation checkpoint: a reason is required for the audit trail, and you can cancel out of it.
Validates the pack is currently enabled — disabling a pack that isn't enabled (or is already disabled) exits cleanly (idempotent).
Marks the entry disabled in .claude/extensions/.enabled.json with timestamp and reason — the original entry is updated, never deleted, so a later re-enable keeps the full history.
Archives the evidence report (if one exists) to .claude/extensions/<pack-kebab>/archive/<timestamp>/.
Confirms, printing Disabled {pack_id} (reason: {reason}). followed by To re-enable: /extensions enable {pack_id} — re-enabling regenerates evidence from scratch.

Output & artifacts

Updated .claude/extensions/.enabled.json — entry marked with disabled_at and disabled_reason
.claude/extensions/<pack-kebab>/archive/<timestamp>/evidence-report.md — archived evidence (when one existed)

Troubleshooting

Problem	Fix
`Pack {pack_id} is not currently enabled`	Nothing to disable — check /extensions status for what's active
`Reason required`	Re-invoke with a quoted reason: `/extensions disable <pack> "why"`
`jq required for /extensions disable`	Install `jq` — this command manipulates JSON state
`Could not acquire lock`	Another `/extensions` command is running; if not, remove `.claude/extensions/.lock`

/extensions enable — re-enable later; the prior disable is preserved in the audit trail
/extensions list — all packs on disk, enabled or not
/extensions status — operational detail for enabled packs
/extensions waive — narrower alternative: waive one rule, keep the pack

/extensions enable

Enable a compliance extension pack for this project.

Synopsis

/extensions enable <domain>/<name>

The pack_id argument must be in <domain>/<name> format (e.g., compliance/hipaa, security/baseline).

When to use it

Opting a project into a compliance or quality pack (e.g. compliance/hipaa, security/baseline) so its rules apply to toolkit workflows
Re-enabling a pack you previously disabled — both entries (the original disable and the new re-enable) are kept in the audit trail with timestamps, so you can see when it was disabled and when it was re-enabled
Not for: seeing what's available — that's /extensions list; or skipping one rule in an enabled pack — that's /extensions waive

Quickstart

/extensions enable compliance/hipaa

What you'll see: Enabled compliance/hipaa v1.0.0 (restrictiveness 5, source original). Evidence will be generated on next /pr run. — plus a conflict count if the pack's rules overlap with already-enabled packs.

Examples

/extensions enable compliance/hipaa      # opt in to a pack (format: <domain>/<name>)
/extensions enable security/baseline     # enable a second pack — conflicts auto-resolve

What it does

Validates the pack exists on disk (both its opt-in and rules files) and has well-formed frontmatter.
Checks current state — already-active packs exit cleanly (idempotent); a previously disabled pack is detected as a re-enable and the prior audit trail is preserved.
Scans for rule conflicts with already-enabled packs. Overlapping rules auto-resolve in favor of the higher restrictiveness (alphabetical pack ID on a tie) — each resolution is logged, with no interactive prompt.
Records the enablement in .claude/extensions/.enabled.json with the pack ID, version, timestamp, your git email, and a source field (always "manual" for this command). Re-enables also add a re_enabled_at timestamp.
Confirms — evidence for the pack is generated on your next /pr run.

Output & artifacts

.claude/extensions/.enabled.json — the project's enabled-pack and waiver registry. Example entry: A re-enable adds "re_enabled_at": "<iso8601>" to the new entry while leaving the prior disabled entry untouched.
```
{
 "pack_id": "compliance/hipaa",
 "version": "1.0.0",
 "enabled_at": "2026-07-05T12:00:00Z",
 "enabled_by_email": "user@example.com",
 "source": "manual"
}
```
.claude/extensions/<pack-kebab>/conflict-log.md — one line per auto-resolved rule conflict
Evidence report generated on the next /pr run

Troubleshooting

Problem	Fix
`Pack {pack_id} not found`	Check the ID format is `<domain>/<name>` and run /extensions list to see available packs
`Pack {pack_id} has malformed frontmatter`	The pack's opt-in file is missing required keys (version/restrictiveness) — fix or re-install the pack
`jq required for /extensions enable`	Install `jq` — this command manipulates JSON state
`Could not acquire lock`	Another `/extensions` command is running; if not, remove `.claude/extensions/.lock`
`Pack ... is already enabled`	Nothing to do — the command is idempotent

/extensions list — see all packs and their status
/extensions status — enabled packs with evidence freshness
/extensions disable — opt back out
/extensions waive — waive a single rule instead of the whole pack
/pr — generates the evidence report for enabled packs

/extensions list

List compliance extension packs and their status.

Synopsis

/extensions list

When to use it

Discovering which compliance/quality packs are installed and which are enabled in this project
Looking up a pack_id before running /extensions enable
Not for: evidence freshness and waiver detail on enabled packs — that's /extensions status

Quickstart

/extensions list

What you'll see: a table of every pack on disk with its status, version, restrictiveness, source, and enable date — followed by a totals footer like 4 packs available · 2 enabled · 0 waivers active.

Examples

/extensions list    # the one and only form — no arguments

What it does

Resolves the extension root — project extensions/, the plugin install (if present), or the compiled bundle install at .claude/extensions-installed/, in that order.
Enumerates packs on disk (organized as <domain>/<name>) and reads each pack's version, source, and restrictiveness.
Cross-references the project's .claude/extensions/.enabled.json to mark each pack enabled or disabled. If the file doesn't exist, all packs show as disabled — that just means nothing is opted in yet.
Prints the table and a one-line footer with totals (packs available, enabled, active waivers).

Output & artifacts

A status table in the conversation — nothing is written to disk
Example row: compliance/hipaa disabled 1.0.0 5 original —

Troubleshooting

Problem	Fix
`No extension packs installed`	Add packs under `extensions/<domain>/<name>/`, or install a bundle that ships packs
All packs show `disabled` even though some should be enabled	Install `jq` — without it, the tool cannot read `.enabled.json` and assumes no packs are enabled
A pack shows `?` for version/restrictiveness	That pack's opt-in frontmatter is malformed or missing required fields (`version`, `source`, `restrictiveness`) — the listing continues, but fix the pack before enabling it

/extensions enable — opt in to a pack from this list
/extensions status — enabled packs only, with evidence freshness and waivers
/extensions disable — opt back out
/extensions waive — waive a single rule with a rationale

/extensions status

Print enabled extension packs with evidence freshness and waiver counts.

Synopsis

/extensions status

When to use it

Checking compliance health before a release: is every enabled pack's evidence current?
Auditing how many waivers are active and which packs had rule conflicts
Not for: browsing all packs on disk including disabled ones — that's /extensions list

Quickstart

/extensions status

What you'll see: a health line (✓ All evidence current. or a ⚠ warning), then a per-pack table with version, enable date, evidence report age, waiver count, and conflict-log entries. Conflicts are rule clashes between two enabled packs that were auto-resolved by /extensions enable (the higher-restrictiveness rule wins) — the count here just reflects how many such resolutions are on record for that pack; see its conflict-log.md if you want the detail.

Examples

/extensions status    # the one and only form — no arguments

What it does

Reads the project's .claude/extensions/.enabled.json; if nothing is enabled, points you to /extensions list and /extensions enable.
For each active pack, gathers operational detail: when it was enabled, when its evidence report was last regenerated (or (never)), how many waivers target it, and whether a conflict log exists.
Prints a health line first — flags packs whose evidence is missing or older than the latest commit on the current branch (⚠ N packs have stale or missing evidence — run /pr to regenerate.), otherwise confirms all evidence is current. Evidence reports are the compliance-rule-check output for a pack, generated when you run /pr; if a report is missing or older than your latest commit, that pack's compliance status may not reflect your current code — that's what "stale" means here.
Prints the status table with one row per enabled pack.

Output & artifacts

A health line and status table in the conversation — nothing is written to disk
Stale evidence is regenerated by your next /pr run

Troubleshooting

Problem	Fix
`No extensions enabled in this project`	Run /extensions list to see available packs and /extensions enable to opt in
Evidence shows `(never)` or ⚠ stale	Run /pr — evidence reports are generated as part of the PR workflow
`jq required for /extensions status`	Install `jq` — this command reads JSON state
Freshness check skipped	You're not in a git repo (no commit to compare against) — rows still print without the stale comparison
CONFLICTS column shows entries	Informational only — no action required. It means two enabled packs had an overlapping rule ID and /extensions enable auto-resolved it by restrictiveness. Check the pack's `conflict-log.md` if you want to see which rule won and why

/extensions list — all packs on disk, including disabled ones
/extensions enable / /extensions disable — change what's active
/extensions waive — the waivers counted in this view
/pr — regenerates the evidence reports this command checks

/extensions waive

Waive a specific extension rule with a written rationale.

Synopsis

/extensions waive <domain>/<name>/<RULE-ID> "reason"

When to use it

An enabled pack's rule genuinely doesn't apply to this project, and you want the code reviewer to stop flagging it as blocking
Overriding the most-restrictive auto-resolution when two packs conflicted and the winning rule is wrong for your context
Not for: turning off an entire pack — that's /extensions disable

Prerequisite: the pack must already be enabled — run /extensions enable <pack_id> first if it isn't.

Quickstart

/extensions waive compliance/hipaa/HIPAA-312b "No PHI is stored in this service"

What you'll see:

Waived compliance/hipaa/HIPAA-312b — "No PHI is stored in this service."
This rule will appear as 'waived' in the next evidence report.
To revoke: edit .claude/extensions/.enabled.json and remove the entry from .waivers[].

The rule shows as waived (with your rationale) in the next evidence report instead of being enforced as blocking.

Examples

/extensions waive compliance/hipaa/HIPAA-312b "No PHI is stored in this service"   # waive one rule with a rationale
/extensions waive security/baseline/SEC-04 "Covered by org-level WAF policy"       # format: <domain>/<name>/<RULE-ID>

What it does

Parses the target — the first two path segments are the pack ID, the remainder is the rule ID. A quoted reason is required.
Validates that the pack is currently enabled and that the rule ID actually exists in the pack's rule file.
Checks for duplicates — waiving an already-waived rule exits cleanly, showing the existing reason (idempotent).
Appends the waiver to .claude/extensions/.enabled.json with the rule, reason, timestamp, and your git email.
Confirms — the rule renders as waived with your rationale in the next evidence report, and the code reviewer no longer treats it as blocking for this project.

To revoke a waiver, edit .claude/extensions/.enabled.json and remove the entry from .waivers[].

Output & artifacts

A waiver entry in .claude/extensions/.enabled.json (.waivers[])
The rule appears as waived with your reason in the pack's evidence report on the next /pr run

Troubleshooting

Problem	Fix
`Cannot waive a rule in a pack that isn't enabled`	Run /extensions enable for the pack first
`Rule {rule_id} not found in {pack_id}`	Check the rule ID against the pack's rules file — the error message shows its path
`Bad format`	Use `<domain>/<name>/<RULE-ID>` plus a quoted reason as the second argument
`Waiver ... already exists`	Idempotent — the existing waiver and its reason are shown; revoke it by editing `.enabled.json` if you want to change it
`jq required`	Install `jq` — this command manipulates JSON state

/extensions enable — the pack must be enabled before its rules can be waived
/extensions disable — the whole-pack alternative
/extensions status — shows active waiver counts per pack
/extensions list — pack inventory and totals
/pr — generates the evidence report where waivers appear

/fix

Formal bug fix pipeline — root cause analysis, scope lock, behavior contract, differential testing, regression proof.

Synopsis

/fix <description|#issue> [--severity critical|high|medium|low] [--hotfix|--lite|--lite-strict|--verified|--full|--swarm]

When to use it

A real bug: wrong output, exception, regression, "this used to work" — anything where the cause needs proving before the fix
Production fires — --hotfix compresses the pipeline without skipping correctness checks
Not for: building features — that's /planning then /implement; or a red CI pipeline — that's /ci-fix

Note: /fix can also be auto-triggered by a Monitor webhook when production error events cross a threshold — see Monitor Integration in the skill for the .claude/monitors/ config that wires an error tracker (Sentry, Datadog, Rollbar) to the pipeline.

Quickstart

/fix POST /auth/refresh returns 500

What you'll see: an investigation that converges on a root cause (with file:line evidence), a scope-locked fix with a regression test proven to catch the bug, a differential test showing nothing else changed (any unrelated test regression blocks the PR), and a PR with the full evidence trail.

Examples

/fix #123                                      # load the bug from a GitHub issue
/fix #123 --severity critical                  # explicit severity (otherwise auto-assessed)
/fix #123 --hotfix                             # production fire — expedited single-trace path
/fix --severity high token refresh fails silently

Arguments & flags

Flag	Values	Default	What it does
`--severity`	`critical` \| `high` \| `medium` \| `low`	auto-assessed	Drives QA depth and merge urgency
`--hotfix`	—	off	Expedited path for production fires: single backward trace, compressed checks, deferred deep investigation post-merge
`--lite` / `--verified` / `--full` (alias `--swarm`) / `--lite-strict`	—	auto-detected	Advanced: force the investigation depth (lite → verified → full ladder) instead of letting the pipeline pick it from the bug's signals; `--verified` (analyst + challenger) is what auto-detection picks when no signal fires

All six investigation-depth flags (--hotfix, --lite, --lite-strict, --verified, --full/--swarm) are mutually exclusive — pass at most one. Leave them all off to let auto-detection pick a depth from the bug's signals.

What it does

User-confirmation checkpoints (marked below) are interactive prompts where /fix pauses and asks you to decide something — which hypothesis to pursue, whether to accept a symptom-side fix, what QA level to run. In CI, autopilot, or Monitor-triggered runs, each checkpoint uses a safe default instead of blocking.

Loads bug context — from the issue or description, syncs with the default branch (refuses to run with uncommitted changes), runs an advisory revert-check so a buffered revert can't masquerade as the bug, and checks the knowledge base for similar past incidents. If invoked with no description, it asks for one first (user-confirmation checkpoint).
Root cause investigation — competing investigator agents trace the bug independently, then cross-challenge each other until they converge on a causation chain (symptom → root) with cited evidence. Depth is auto-selected: lite for smoking-gun tracebacks, full swarm for auth/payments/migrations/security, verified (analyst + challenger) as the default when neither signal fires. If the swarm can't converge, you choose which hypothesis to pursue (user-confirmation checkpoint). If a lite investigation finishes with low confidence, disagreeing agents, or both agents stopping at the symptom with no root-cause walk-back, it auto-escalates to a full investigation — pass --lite-strict to disable this and accept the lite verdict as-is.
Scope lock — builds the dependency graph from the root cause and writes a Fix Zone (may modify), Watch Zone (caution), and Frozen Zone (blocked). The fix cannot touch files outside the zone.
Behavior contract — tables what MUST change and what MUST NOT change, with a baseline test-suite snapshot.
Implements the fix — a scoped agent fixes at the root cause (not the symptom) and writes a regression test, which is mutation-checked: it must FAIL with the fix reverted and PASS with it applied.
Review swarm — a fix-reviewer (7-point correctness checklist plus a root-cause pre-gate) and a qa-attacker (3-5 attack scenarios) review in parallel and cross-examine. The pre-gate verifies the diff actually lands at or upstream of the root cause, not just at the symptom, and returns PASS / FAIL / SYMPTOM-ONLY-JUSTIFIED. A symptom-only fix (FAIL) is BLOCKED and sent back to implementation; a justified symptom-side fix (SYMPTOM-ONLY-JUSTIFIED) is surfaced to you for confirmation (user-confirmation checkpoint) in interactive mode, or logged as a warning-only in CI/autopilot.
Differential testing — re-runs the full suite and diffs against the baseline. Any unrelated PASS→FAIL blocks the PR.
Post-fix handoff — asks what QA level to run before PR: commit / full / skip, with full recommended for critical/high severity (user-confirmation checkpoint), runs /qa, restarts local servers, and asks you to test the fix manually before proceeding (user-confirmation checkpoint). Then creates the PR via /pr with the bug-fix evidence template, updates the knowledge base, runs a final completion verification, and prints a post-fix checklist (incident record, bug-pattern, knowledge-graph write, verifier) before announcing the PR URL. In unattended runs (CI, autopilot, Monitor-triggered) it never prompts: the QA level is derived from severity and manual testing is deferred with a logged notice.

Agents spawned

Agent	Model tier	Role
backward-tracer / forward-tracer	sonnet	Independent root-cause investigation strategies
pattern-matcher	haiku	Searches the knowledge base for matching past incidents
implementation agent (backend / frontend)	sonnet	Applies the fix inside the Fix Zone
fix-reviewer	sonnet	Correctness checklist + root-cause pre-gate
qa-attacker	sonnet	Adversarial attack scenarios against the fix
completion-verifier	sonnet	Final completeness cross-check

Note: in calibrated projects, these agent assignments and model tiers can be overridden — check .claude/registry.json for active agent substitutions and model-policy.yml (or .claude/model-policy.yml) for model-tier overrides.

Output & artifacts

.claude/.fix-scope-lock.json — the enforced Fix/Watch/Frozen zones. Written in Step 2 and read by the implementation agent, which is blocked from editing anything outside the Fix Zone. Don't delete or hand-edit it mid-fix — if the scope is too narrow, re-run /fix with a clearer description instead.
.claude/.fix-behavior-contract.md — what changes vs. what's preserved
A regression test in your test suite, proven to catch this specific bug
A PR with root cause, scope, behavior contract, and test evidence; incident record in .claude/qa-knowledge/incidents/ and a bug-patterns.md entry

Troubleshooting

Problem	Fix
`Uncommitted changes present — commit, stash, or discard before /fix`	The pipeline refuses to start on a dirty tree — clean it up first
`Rebase conflicts against origin/main`	The base shifted under you — run `git rebase --abort` and sync manually before re-running
Investigation swarm could not converge	You'll be shown all competing hypotheses with evidence — pick one or ask it to investigate all
Review verdict: BLOCK ("fix patches symptom only")	The diff doesn't address the root cause — the pipeline returns to implementation rather than shipping a band-aid
Regression test passes both with and without the fix	The test is rejected as useless and rewritten — this is by design

/qa — the QA pass /fix runs before creating the PR
/ci-fix — for CI/deploy failures rather than code bugs
/incident — triage first when you don't yet know if it's a bug, infra, or CI problem
/review-pr — bug-fix-aware PR review after the fix lands

/goal

Speed-first goal-oriented loop — set an objective, agent picks next concrete action, captures evidence, stops at PR. Different from /autopilot (rigor-first multi-issue queue).

Synopsis

/goal <objective> | pause | resume | clear | status

When to use it

The destination is clear but the path is not — "cut the homepage LCP under 2s", "get CI under 5 minutes"
One freeform objective you want driven autonomously with evidence at every step
Not for: working through a backlog of issues — that's /autopilot, the rigor-first multi-issue queue with P0–P5 ranking and risk classification; /goal is single-objective and speed-first
Not for: a well-defined feature (/planning → /implement) or a specific bug (/fix)

Quickstart

/goal Cut homepage LCP below 2s on mobile

What you'll see: a knowledge-base read and quick scout, a short block of clarifying questions, then a plan (requirements + subtasks) for you to confirm. After that the loop self-paces — one concrete action per turn, each verified and logged as evidence — and stops when the goal is ready for PR.

Examples

/goal Cut homepage LCP below 2s on mobile   # set an objective and start the loop
/goal status                                # current goal, subtasks done, evidence, next action
/goal pause                                 # pause — state preserved
/goal resume                                # pick up where it left off
/goal clear                                 # discard the active goal (archived, not deleted)

Arguments & flags

Argument	What it does
`<objective>`	Sets a freeform goal and starts the loop (one active goal at a time)
`[--prototype]`	Speed mode: happy-path build only, targeted tests for the change, discovered edge cases deferred to issues, agent work trusted with a spot-check
`[--full]`	Production mode (auto-detected from words like "production", "customer", "end to end"): nothing deferred without explicit sign-off, unit + integration + E2E tests required and green before every merge, skipped/weakened tests banned, every delegated deliverable independently re-verified rather than trusted
`--merge-on-green`	Pre-authorizes the per-PR gate: after each PR the loop watches CI and merges on green instead of stopping
`pause` / `resume`	Pause or resume the active goal; state survives sessions
`clear`	Archive the active goal
`status`	Compact status block (also the default when run with no args and a goal is active)

What it does

Scout — first checks for an already-active goal — user-confirmation checkpoint: if one exists (status clarifying/proposed/active/paused/awaiting_pr), it asks "replace / keep / cancel" and waits for you. Otherwise it reads the project knowledge base first (profile, conventions, domain rules, past goals, known footguns) — mandatory, and calibrates the whole goal — then a cheap explore-light scan fills only the gaps.
Clarify — user-confirmation checkpoint: presents what it already knows, 3–5 context-aware questions with concrete options and defaults, and its assumptions. Answer them, or say "go" to accept all defaults.
Confirm plan — user-confirmation checkpoint: shows the approach, requirements (each with an evidence type), and subtasks (each with a done_when clause). Nothing executes until you approve; a dry-run option executes without writes.
Loop — each turn picks ONE concrete action (read, edit, test run, or a Sonnet agent spawn for non-trivial code), executes it, and appends evidence. A Stop hook auto-continues the loop so you don't have to type "continue". For large goals the loop becomes an orchestrator: it decomposes the plan into file-disjoint lanes, runs each in its own git worktree via parallel Sonnet subagents (dynamic workflows when available), routes subtasks to installed toolkit skills first, monitors agents with a takeover clock (stalled or over-time lanes get one stand-down order, then the orchestrator finishes them itself), and independently re-runs each agent's claimed verification before accepting it.
Verify — non-negotiable — verification is mandatory and blocking: every code change must be verified (lint, types, tests) in the same turn before the subtask can be marked done or the loop advances. If the changed code has no tests, they're written first, then run. Every evidence entry carries a verified boolean, and goal completion (the Phase 6 PR) is blocked if any evidence entry is still verified: false.
PR — hard stop — when all subtasks are done and requirements satisfied, it presents the summary with a suggested PR title and branch and stops for you. It never creates or merges the PR on its own unless you passed --merge-on-green; run /pr or tell it to.
Escalation — the loop comes back to you instead of spinning, on any of: the same failure surviving 2 distinct fix theories, the same subtask stuck 4+ turns without a verified deliverable, CI red on main after a merge (fix-forward once, then escalate on the second failure), anything destructive/irreversible beyond the repo, a data-loss discovery, an architecture fork that changes the goal's shape, or the spawn/cost budget being exceeded. Each arrives as "what broke / why / what was tried / 2–3 options with a recommendation". You also get scoreboard updates on every merge and an ETA on request or whenever the critical path changes.
Learnings — on completion, archives the goal state and appends lessons to the project knowledge base for future runs, plus a final report: requirement → evidence table, everything shipped, and explicitly-listed loose ends.

Agents spawned

Agent	Model tier	Role
explore-light	haiku	Scout — targeted codebase scan after the KB read
backend / frontend	sonnet	Multi-file code work in parallel worktree lanes (budget: max 6 spawns per goal without checking in — lifted when you explicitly authorize broad parallelism)
main loop (planner/verifier)	strongest available model tier	Vision → architecture → task plan, arbitration, root-cause on stuck lanes, final goal-completeness review

QA passes aren't a spawned agent — they route to the installed /qa skill like other toolkit skills the plan needs.

Output & artifacts

.claude/.goals/current.json — full goal state: clarifications, subtasks, evidence log, turns, connections
.claude/.goals/archive/ — completed/cleared goals
Code changes on a feature branch (auto-named goal/<slug> if needed) — never pushed to main
Lessons appended to .claude/knowledge/shared/goals-history.md and related knowledge files

Troubleshooting

Problem	Fix
`Existing goal active — replace?`	One goal at a time: replace (archives the old one), keep it, or cancel
Loop paused with `blocked_reason: "stuck"`	The same subtask ran 4+ turns without progress — give guidance or `/goal clear`
Loop paused after 6 agent spawns	Spawn budget hit — approve continuing, or split the objective; it's probably too broad for /goal
It refuses to skip verification	By design — the loop won't ship unverified code. `/goal clear` if you truly want to abandon it

/autopilot — the "drain the queue" tool; /goal is the "find the path" tool
/planning + /implement — for well-defined features with known scope
/fix — formal pipeline for a specific bug
/pr — create the PR when the goal reaches its hard stop

/implement

Spin up an implementation team from a plan.

Synopsis

/implement <feature-name> [--frontend-only] [--backend-only] [--no-redteam] [--redteam-once] [--redteam-once-strict] [--redteam-strict] [--tasks] [--resume-from] [--phase] [--phases] [--all-phases] [--workflow|--classic]

When to use it

A plan exists at .claude/plans/<feature-name>.md (from /implementation-plan) and you're ready to build
You want backend, frontend, and QA agents working the plan in parallel, with an optional red team challenging the code
Not for: writing the plan — that's /implementation-plan; or one-off bug fixes — that's /fix

Running in CI/autopilot: when CLAUDE_AUTOPILOT=1 or CI=true, the mode prompt is skipped and you must pass the feature name plus an explicit red-team flag (--no-redteam, --redteam-once, or --redteam-strict) on the command line — otherwise the run exits with a missing-args error.

Quickstart

/implement dark-mode

What you'll see: a mode prompt (Auto / Guarded / Fast / Strict), then a team builds the plan's tasks, red team findings get challenged and fixed, QA runs at the level you pick, and the flow ends with an open PR.

Examples

/implement dark-mode                       # full team per the plan's layers, mode prompt
/implement fix-component --frontend-only   # frontend + qa only, no backend agent
/implement add-api --backend-only          # backend + qa only
/implement dark-mode --no-redteam          # Fast mode — build only, skip red team
/implement dark-mode --redteam-once        # Guarded — one red-team pass at the end
/implement dark-mode --redteam-strict      # block on any unresolved HIGH finding
/implement big-feature --phase 2           # multi-phase plan: run only phase 2
/implement big-feature --phases 2-4        # run a contiguous phase range
/implement big-feature --all-phases        # run all remaining phases sequentially

Arguments & flags

Flag	Values	Default	What it does
`--frontend-only`	—	off	Force frontend-only, even if the plan says both layers. Mutually exclusive with `--backend-only` — passing both errors out.
`--backend-only`	—	off	Force backend-only, even if the plan says both layers. Mutually exclusive with `--frontend-only` — passing both errors out.
`--no-redteam`	—	off	Fast mode — skip the red team entirely
`--redteam-once`	—	off	Guarded mode — one red-team pass after the build. Auto-escalates to block-on-HIGH when ≥2 HIGH findings or any HIGH touches auth/payments/migrations/crypto/pii.
`--redteam-once-strict`	—	off	Same as `--redteam-once` but disables the auto-escalation — accepts the once-pass verdict as-is even on HIGHs
`--redteam-strict`	—	off	Strict mode + block on ANY unresolved HIGH finding (1+). Default Strict only blocks when 3+ HIGHs are unresolved. Both modes always block on unresolved CRITICALs.
`--phase N`	number	—	Multi-phase plans: execute only phase N
`--phases N-M`	range	—	Multi-phase plans: execute a contiguous range
`--all-phases`	—	off	Multi-phase plans: execute all remaining pending/failed phases
`--tasks "T3,T4"`	task IDs	—	Build only the listed task IDs from the plan (unknown IDs warn; zero matches exits)
`--resume-from N`	step number	—	Resume a previous run at step N with safe defaults; an explicit red-team flag overrides resume's forced once mode
`--workflow`	—	off	Experimental. Run red-team finding generation as a 2-task dynamic Workflow with structured findings (needs Claude Code ≥ 2.1.154); any Workflow failure falls back to classic automatically. Not yet recommended for routine use.
`--classic`	—	on (default)	Today's SendMessage-based red-team behavior. Use this by default — it's the default until the `--workflow` path is validated.

What it does

Plan resolution — if no feature name was given, a picker lists available plans (user-confirmation checkpoint). Loads the plan and the matching PRD spec (user stories + edge cases) for traceability.
Phase detection — multi-phase plans get per-phase state tracking in .claude/plans/<name>.state.json. If the plan changed since the last run, a phase was interrupted, or all phases are complete, the skill asks how to proceed (user-confirmation checkpoint).
Mode selection — unless a red-team flag was passed, asks Auto / Guarded / Fast / Strict (user-confirmation checkpoint). Auto picks from plan size and risk keywords (auth, payments, data migration, security, PII → Strict): fewer than 3 tasks and fewer than 5 files → Fast; fewer than 3 tasks with 5+ files → Guarded; 3-9 tasks with no risk keywords → Guarded; 10+ tasks or any risk keyword present → Strict.
Context gathering — explore-light scans existing code patterns; topic wikis and the knowledge base are consulted so agents match the project instead of inventing patterns.
Spawns the team in parallel — backend and/or frontend per the plan's layers, plus QA. Backend shares the API contract with frontend before either implements; QA traces every user story and edge case to code.
Red team — red-challenger generates attack scenarios against the diff; red-reviewer checks plan compliance (no scope creep, no gaps). Developers must answer each finding FIXED / DEFENDED / ACKNOWLEDGED. Unresolved CRITICALs block — user-confirmation checkpoint: fix / override / abort. Max 2 cycles.
Phase loop (multi-phase only) — in one-phase-at-a-time mode, pauses between phases and asks whether to proceed (user-confirmation checkpoint); failed phases prompt retry / skip / stop.
Post-implementation workflow — asks what QA level to run (commit / full / staging / skip — user-confirmation checkpoint), runs /qa, restarts local servers and waits for your manual test sign-off (user-confirmation checkpoint: ready / issues), then creates the PR via /pr --skip-qa and asks what's next.

With --workflow, the red-team finding generation in step 6 runs as a 2-task dynamic Workflow that returns structured findings (severity, file, line) instead of free-text reports — the challenge-defend cycle, verdict gate, escalation rules, and cycle cap are identical either way, and any Workflow failure falls back to classic automatically. Classic is the default until the workflow path is validated.

Agents spawned

Agent	Model tier	Role
explore-light	haiku	Pattern scan before coding
backend (python-backend, adaptive to any stack)	sonnet	Backend tasks
frontend	sonnet	Frontend tasks
qa	sonnet	Story/edge-case coverage review + validation
red-challenger (qa-challenger)	sonnet	Attack scenarios against the diff
red-reviewer (code-reviewer)	sonnet	Plan-compliance review
completion-verifier	sonnet	Final plan-vs-diff check

Output & artifacts

Code changes in your working tree, scoped to the plan's backend/frontend directories
.claude/plans/<feature-name>.state.json — phase status (multi-phase plans only)
An implementation report (files changed, red team summary), QA results, and a GitHub PR opened via /pr --skip-qa

Troubleshooting

Problem	Fix
`no plans found in .claude/plans/`	Run `/planning <name>` then `/implementation-plan <name>` first
`missing required args in non-interactive mode`	CI/autopilot runs need the feature name plus a red-team flag (and a phase flag for multi-phase plans)
`multi-phase plan detected but no phase arg`	Add `--phase N`, `--phases N-M`, or `--all-phases`
Red team keeps blocking on a finding you accept	Reply path is offered at the block prompt — choose override to mark it ACKNOWLEDGED, or use `--redteam-once` for advisory-only HIGHs
`State file ... is corrupted`	Accept the reset-to-pending prompt; completed work in git is untouched

/implementation-plan — produces the plan this skill consumes
/qa — the QA levels offered after the build
/pr — the PR step this skill chains into
Workflow guide

/implementation-plan

Spin up an adversarial planning team for a feature whose PRD already exists. Reads .claude/specs/<feature>.md, runs the PM + Architect (+ Devil's Advocate) debate, and writes a locked-scope plan to .claude/plans/<feature>.md.

Synopsis

/implementation-plan <feature-name> [--design] [--gtm] [--fast] [--lite] [--lite-strict] [--sync]

When to use it

After you've reviewed (and edited, if needed) the PRD that /planning produced
You want scope locked and a task breakdown debated before any code is written
Not for: drafting the PRD itself — that's /planning; or building — that's /implement

Non-interactive mode (CI/Autopilot)

If the environment has CLAUDE_AUTOPILOT=1 or CI=true set, the skill runs non-interactively and will not prompt you. In that mode:

feature-name is required
One mode flag is required: --fast, --lite, or omit it for Full mode

If a required argument is missing, the skill prints an error and exits non-zero instead of asking. If running this from a script or CI pipeline, pass the mode flag explicitly rather than relying on the interactive debate-depth prompt.

Quickstart

/implementation-plan dark-mode

What you'll see: a debate-depth prompt (Auto / Fast / Lite / Full), then the PM + Architect (+ Devil's Advocate) debate runs, and a locked-scope plan lands at .claude/plans/dark-mode.md with must-haves, exclusions, cost estimates, and a task breakdown.

Examples

/implementation-plan dark-mode              # full mode by default — 2 debate rounds
/implementation-plan dark-mode --fast       # PM + Architect only, 1 round — quick iterations
/implementation-plan dark-mode --lite       # adds Devil's Advocate, 1 combined round
/implementation-plan checkout --design      # adds Design Thinker + Design Critic to the debate
/implementation-plan checkout --gtm         # adds GTM Expert for launch positioning
/implementation-plan dark-mode --sync       # append new PRD items to an existing plan

The task breakdown in the plan is automatically phased (organized into ### Phase N sections) when tasks have natural sequential dependencies — Lite and Full mode only. Fast mode always produces a single-phase, flat breakdown for speed.

Arguments & flags

Flag	Values	Default	What it does
`--design`	—	off	Include Design Thinker + Design Critic teammates in the debate
`--gtm`	—	off	Include a GTM Expert teammate
`--fast`	—	off	PM + Architect only, single round, no Devil's Advocate. Single-phase plan. Fastest.
`--lite`	—	off	PM + Architect + Devil's Advocate in one combined round. Auto-escalates to Full if the DA flags HIGH-risk items (disable with `--lite-strict`).
`--lite-strict`	—	off	Same as `--lite` but disables auto-escalation. Does NOT auto-upgrade to Full mode even if DA flags HIGH-risk items.
`--sync`	—	off	Reconcile the existing plan against spec items added by `/planning --sync` — appends tasks, no debate re-run

No mode flag = Full mode (two debate rounds: scope, then feasibility).

What it does

Verifies the PRD exists at .claude/specs/<feature-name>.md — errors out otherwise (it never auto-runs /planning). If the PRD's feasibility is RED, it pauses — user-confirmation checkpoint: continue anyway, cancel, or open the PRD.
Interactive resolution — asks for debate depth (Auto recommended — picks Fast/Lite/Full from PRD signals) and optional experts via AskUserQuestion (user-confirmation checkpoint; every question includes Cancel).
Context gathering — explore-light codebase scan plus topic wikis and the project knowledge graph/base.
Spawns the debate team in parallel — PM, Architect, Devil's Advocate (unless --fast), plus design/GTM experts if flagged.
Structured debate — Round 1 (scope): PM claims must-haves traced to user stories, Architect counters on feasibility, DA attacks for scope creep and hidden assumptions, then a verdict. Round 2 (feasibility, Full mode): Architect leads with API contract, DB changes, and task estimates; PM and DA challenge. The PRD is authoritative — stories are not relitigated.
Escalation protocol — user-confirmation checkpoint: before any PRD-traced item is deferred or rejected, you're shown the team's reasoning and asked to keep it, accept the recommendation, or reduce scope. Your overrides are recorded in the plan and the DA may not re-challenge them.
Scope lock + plan write — locked must-haves are hashed (scope_hash) and the full plan (scope lock, technical approach, cost estimates, debate record, task breakdown, acceptance criteria) is written to .claude/plans/<feature-name>.md.
Completion verification + handoff — a completion-verifier agent checks the plan against the PRD, then in guided mode the skill asks before proceeding to /implement (user-confirmation checkpoint).

Agents spawned

Agent	Model tier	Role
explore-light	haiku	Codebase scan
product-manager	opus	Scope claim, owns the "what" and "why"
architect	opus	Technical approach, owns the "how"
devils-advocate (qa-challenger)	sonnet	Risk attack — skipped in `--fast`
design-thinker + design-critic	sonnet	UX brief and critique (`--design` only)
gtm-expert	sonnet	Launch positioning (`--gtm` only)
completion-verifier	sonnet	Plan-vs-PRD traceability check

Output & artifacts

.claude/plans/<feature-name>.md — the locked-scope plan, with frontmatter (debate_mode, scope_hash, spec_hash, da_confidence, layers)
Escalation log and debate record inside the plan (what was deferred, rejected, or user-overridden)
Architecture decisions written back to .claude/knowledge/shared/ and the knowledge graph

Design spec handling: If the PRD references a design spec (a companion HTML file from /planning) and it has gone stale — meaning the PRD was edited after the design spec was generated — the plan displays a warning: ⚠ Design spec is stale (prd_hash mismatch). Run /planning <feature> --design-spec-only to regenerate. This is a normal built-in check, not an error — the plan itself is still valid, but the design reference may be out of sync with the current PRD.

Cost comparison by mode:

Mode	Approx. cost	Speed	Best for
Fast	~421x	Fastest	Quick iterations, small features, prototyping
Lite	~431x	Balanced	Medium features with some risk concerns
Full	~541x	Most thorough	Complex, high-stakes, or multi-system features

Troubleshooting

Problem	Fix
`no PRD found at .claude/specs/<name>.md`	Run `/planning <feature-name>` first, or `ls .claude/specs/` to check the name
Plan flagged `REQUIRES_HUMAN_DECISION`	Non-interactive runs can't escalate to you — open the plan and resolve the UNRESOLVED items listed with team recommendations
Lite mode unexpectedly ran a second round	Auto-escalation fired (HIGH-risk DA findings) — pass `--lite-strict` to disable it
Plan warns the design spec is stale	The PRD changed after the design spec was generated — run `/planning <feature> --design-spec-only` to regenerate

/planning — phase 1: produces the PRD this skill consumes
/implement — phase 3: builds from the plan this skill writes
Workflow guide — the full planning → implement → PR flow

Session management after planning

By the time /implementation-plan finishes, the context window is heavy with debate output, the full PRD, and codebase-scan results. Before running /implement:

Best practice: start a fresh session — /clear, then open with a one-line brief: "Implement <feature-name> — plan at .claude/plans/<feature-name>.md". This gives /implement a clean window and avoids context rot during the build.
Staying in this session? Run /compact focus on <feature-name> plan — planning done, ready to implement first, so the summary preserves what the build needs.
If a debate round produced a bad result, rewind (Esc Esc) to before that round and re-prompt with a correction rather than stacking fixes on a bad output.

/incident

Incident triage orchestrator — classifies severity, diagnoses in parallel, routes to /sre, /ci-fix, or /fix based on evidence.

Synopsis

/incident [description]

When to use it

Anything is broken and you don't know which skill to use — "the site is down", "CI is failing", "users are getting timeouts"
A production error, failed deploy, slow database, or alert firing
Not for: a bug where you've already run the 4 diagnostic checks (health, deploys, CI, logs) and ruled out alternatives — go straight to /fix; a known CI failure — /ci-fix; routine health checks — /sre

Prerequisites: a git repository (for git log), the GitHub CLI gh (for CI/deploy status), and health endpoints defined in CLAUDE.md's Environments table. Docker is optional but used for container health checks if running. Richer diagnostics — DB queries, error rates, APM traces — require MCP servers (Sentry, Datadog, Postgres, Redis, etc.), which /calibrate can install for you.

Quickstart

/incident "500 errors on the checkout page"

What you'll see: an instant severity/type classification, four parallel diagnostic checks completing in under a minute, a correlated diagnosis with confidence score, then automatic routing to the right resolution skill — ending with a verified fix and an incident report.

Examples

/incident "the site is down"      # free-text description — full triage
/incident #234                    # load the incident from a GitHub issue
/incident                         # no args — auto-detect: checks health, CI, recent deploys

What it does

Classify (instant) — pattern-matches the description into severity (CRITICAL/HIGH/MEDIUM/LOW) and type (infra, CI, code bug, performance, local ops, data, auth). CRITICAL proceeds with no questions; HIGH/MEDIUM briefly confirm the description; LOW asks whether you want full triage or a quick check — a user-confirmation checkpoint.
Parallel diagnosis (< 60s) — spawns 4 cheap agents at once: health endpoints, recent deploys + CI, error signals/logs, and a knowledge-base lookup for similar past incidents. The lookup checks the knowledge graph first for ranked results (ranked by usefulness), then falls back to (or supplements with) manual search over .claude/qa-knowledge/ and .claude/knowledge/ files.
Correlate — combines the four signals through a decision matrix into a root-cause hypothesis with a preliminary confidence level.
Challenge — unless the incident is CRITICAL with high confidence, one or two Haiku devil's-advocate agents adversarially test the diagnosis. 30–120 seconds of verification here prevents 30+ minutes of chasing the wrong root cause: CRITICAL (medium/low confidence) gets a 30s fast check, HIGH/MEDIUM get a 90s full challenge, LOW gets 120s. Saying "just fix it" at any point skips the challenge.
Verdict gate — computes a confidence score. High score proceeds; mid-range merges the challenger's insights; low score presents the competing hypotheses and asks you to pick a path — a user-confirmation checkpoint.
Route to resolution — invokes the right skill automatically with all gathered evidence: /sre debug (infra/perf/data), /ci-fix (CI), /fix (code bugs), /restart (local ops). Escalations between skills are automatic.
Verify, report, learn — re-runs health checks, writes an incident report, updates the knowledge base, notifies your team channel (Discord/Slack if configured), and offers a "What's next?" menu.

Agents spawned

Agent	Model tier	Role
explore-light	haiku	Health endpoint checks
ops (×3)	haiku	Recent deploys + CI, error signals/logs, knowledge-base lookup
qa-challenger	haiku	Devil's advocate — challenges the primary diagnosis
sre or ops	haiku	Cross-domain verification in full-challenge mode
sre / fix / ci-fix pipeline	sonnet	The routed resolution skill's agent
completion-verifier	sonnet	Post-resolution completion check — reports `PASS`, `GAPS FOUND`, or `INCONCLUSIVE` verbatim and does not block; you decide whether to rerun

Output & artifacts

Incident report at {project-root}/.claude/qa-knowledge/incidents/<date>-<slug>.md (created automatically after resolution) — timeline, root cause, prevention, challenge results
Knowledge base updates: .claude/knowledge/agents/sre.md, .claude/qa-knowledge/bug-patterns.md, .claude/knowledge/skills/incident.md (classification accuracy, recurring patterns), and matching entries written to the knowledge graph via kg-write.sh for incident summaries, bug patterns, and misclassifications (if detected)
Team notifications at each phase if Discord/Slack is configured
Whatever the routed skill produces — e.g. a fix PR from /fix, or a verified infra fix from /sre debug

Troubleshooting

Problem	Fix
`All systems healthy ... Nothing to triage` on no-args run	Auto-detect found no problems — describe the symptom explicitly: `/incident "<what you saw>"`
Diagnosis is inconclusive	`/incident` runs `/sre status` and presents findings with a suggested route — confirm or redirect it
Triage reports missing data sources (no monitoring, no DB access)	Run /calibrate to install the recommended MCP servers — future triage gets richer evidence
Resolution stalls (skill doesn't resolve in ~15 min)	The orchestrator alerts and presents options; for CRITICAL incidents unresolved in 30 min it suggests a revert

/sre — the infrastructure route; also useful standalone for health checks
/ci-fix — the CI-failure route
/fix — the code-bug route, with full verification pipeline
/restart — the local-ops route
/qa-incident — manually record a known issue without running triage

/issue

Create, close, list, and remind on GitHub issues.

Synopsis

/issue <title>, /issue close #N, /issue list, /issue status, /issue remind, /issue update #N

When to use it

Filing a quick, well-formed GitHub issue without leaving the session — it generates the task line and acceptance-criteria checkboxes for you
Closing out finished issues, checking milestone progress, or posting a status reminder
Capturing bugs discovered mid-task (other skills like /autopilot file issues through the same flow)
Not for: actually working the issues — that's /autopilot for a queue or /fix for one bug

Quickstart

/issue Fix login redirect loop on Safari

What you'll see: an issue created with a ## Task one-liner and 2–5 acceptance-criteria checkboxes (a short checklist of concrete conditions that define when the issue is "done"), attached to the most recent open milestone if one exists, and the issue URL reported back. If no assignee is given or inferable, it asks who to assign.

Examples

/issue Fix login bug --assign sam            # create + assign
/issue Set up CI --milestone "Launch Prep"   # create + attach to a milestone
/issue fix auth bug --assign sam, add dark mode --assign ana   # two issues at once
/issue close #270                            # close (also: /issue done #270)
/issue close #270 fixed by PR #456           # close with a reason — added to the close comment
/issue list                                  # open issues in the current milestone
/issue list sam                              # open issues assigned to sam only
/issue status                                # milestone progress: X/Y done, days remaining
/issue remind                                # post milestone status + open issues
/issue remind ana                            # reminder for one person's issues only
/issue update #270 waiting on API key        # add a comment to an issue
/issue update #270 --title "Urgent: login bug"  # retitle an issue

Blank /issue (no title) prompts you interactively for what the issue should be about.

Arguments & flags

Command	What it does
`/issue <title>`	Create a new issue (body auto-generated with acceptance criteria)
`--assign <user>`	Assign on creation; if ambiguous and omitted, it asks
`--milestone "Name"`	Attach to a milestone; defaults to the most recent open one
`/issue close #N [reason]` / `done #N`	Close one or more issues; any reason you add is included in the close comment
`/issue list [username]`	Open issues, optionally filtered by person
`/issue status`	Milestone progress summary
`/issue remind [username]`	Post a reminder of open issues
`/issue update #N <comment>`	Comment on an issue; `--title "..."` updates the title

What it does

Parses your input — title, assignee, milestone; multiple comma- or line-separated issues are created in parallel.
Generates the body — a ## Task one-liner plus a ## Acceptance Criteria checklist (always included).
Creates via gh — gh issue create, then attaches the milestone (specified, or the most recent open one). If the assignee is ambiguous, it asks you first — user-confirmation checkpoint.
Reports the URL back so you can open it immediately.
Close / list / status / remind / update subcommands map directly onto gh issue close|list|comment|edit and the milestone API, formatted as clean tables.

Output & artifacts

GitHub issues created, closed, commented on, or retitled in the current repository
Issue URLs and milestone-progress tables in the conversation
No local files written

Troubleshooting

Problem	Fix
`gh: command not found` or auth errors	Install the GitHub CLI and run `gh auth login`
Issue created without a milestone	No open milestone exists — create one in GitHub or pass `--milestone "Name"`
`/issue close all done` asks instead of closing	By design — it lists the milestone's open issues and asks which to close
Wrong repository targeted	`/issue` operates on the current directory's repo — `cd` to the right project first

/autopilot — works the issue queue this skill creates
/fix — formal pipeline for a single bug issue
/tech-debt — its findings can be filed as issues via /issue

/kb-diagram

Render the project knowledge base as an architecture diagram (Mermaid or DOT/SVG) from graph.jsonl, with optional push to connected diagramming MCPs.

Synopsis

/kb-diagram [--format=mermaid|dot] [--output=PATH] [--no-ingest] [--push <adapter>|auto] [--help]

Also triggers on the phrases "architecture diagram", "kb architecture", "knowledge diagram", or "diagram knowledge" if you'd rather describe the task than type the command.

When to use it

You want a visual map of what's in .claude/knowledge/shared/ — how conventions, domain rules, and patterns connect
After a /calibrate run, to review the knowledge base at a glance
Pushing the KB diagram into a connected diagramming tool (e.g. Pencil) via MCP
Not for: diagramming topic wikis — those live under .claude/wikis/ (wiki-knowledge-base); or diagramming your application code

Quickstart

/kb-diagram

What you'll see: the knowledge graph is (re)ingested from your KB files, then a Mermaid diagram is written to .claude/knowledge/diagrams/kb-architecture.mmd, clustered by node type with a legend.

Examples

/kb-diagram                      # Mermaid output (default) → .mmd
/kb-diagram --format=dot         # DOT source → .dot; SVG too if `dot` is installed
/kb-diagram --output=my-kb       # override the output file basename
/kb-diagram --no-ingest          # skip re-indexing, use existing graph.jsonl
/kb-diagram --push auto          # push to the first connected diagramming MCP

Arguments & flags

Flag	Values	Default	What it does
`--format`	`mermaid`, `dot`	`mermaid`	Output format; `dot` also renders SVG when Graphviz is on PATH
`--output`	path	`kb-architecture`	Override the output file basename
`--no-ingest`	—	off	Skip the knowledge-graph re-index, render from the existing `graph.jsonl`
`--push`	`<adapter>`, `auto`	off	Push the diagram to a connected diagramming MCP; `auto` detects the first one
`--help`	—	—	Show usage

What it does

Ingest — indexes your .claude/knowledge/shared/*.md files into the knowledge graph (graph.jsonl), unless --no-ingest is set.
Render — generates a Mermaid (or DOT) diagram of the graph, clustered by node type to stay readable for typical KBs (< 25 nodes). Missing KB files are marked ✗ in the legend rather than failing the run.
Push (optional) — with --push, sends the diagram to a connected diagramming MCP server; --push auto reports what's detected.

A PostToolUse hook also re-renders the diagram automatically in the background whenever you save a .claude/knowledge/shared/*.md file — no watch flag needed.

Output & artifacts

Path	When
`.claude/knowledge/diagrams/kb-architecture.mmd`	`--format=mermaid` (default)
`.claude/knowledge/diagrams/kb-architecture.dot`	`--format=dot`
`.claude/knowledge/diagrams/kb-architecture.svg`	`--format=dot` and Graphviz `dot` on PATH

Troubleshooting

Problem	Fix
`KB directory .claude/knowledge/shared/ not found` (exit 2)	Run `/calibrate` first to create the project knowledge base
`Knowledge graph has 0 nodes` (exit 3)	Add `##` / `###` headings to your KB files — headings become graph nodes
Required dependency missing (exit 4)	Install `jq` and/or `python3`
`--push` finds no adapter	The target MCP server must be connected in `~/.claude.json` or project `.mcp.json`; run `--push auto` to see what's detected
No SVG produced with `--format=dot`	Install Graphviz so `dot` is on PATH; the `.dot` source is still written

Note: --filter and --depth flags for very large KBs (> 40 nodes) are not yet implemented.

/calibrate — builds the knowledge base this skill diagrams
/wiki-knowledge-base — topic wikis (research KBs) rather than the project KB

/magazine-generator

Generate a single-file HTML horizontal-swipe magazine from product metadata and brand guidelines.

Synopsis

/magazine-generator [product-name]

When to use it

Creating a visually striking product story to share with a client, prospect, or board — no designer needed
Packaging an engagement outcome or a client's new product into a swipeable, self-contained showcase
Not for: structured written deliverables like reports and decks — that's /deliverable-builder; or formal proposals — that's /pitch-generator

Quickstart

/magazine-generator acme-pro

What you'll see: a short brand questionnaire (tagline, colors, fonts, product copy), then a self-contained output/magazine-acme-pro.html — a 10-page horizontal-swipe magazine you can open in any browser, email, or host as a static asset (Google Fonts requires internet access to load; offline viewing falls back to system fonts).

Examples

/magazine-generator                 # collect all inputs interactively
/magazine-generator acme-pro        # use "acme-pro" as the product name, collect the rest
/magazine-generator acme-pro --fast # skip optional inputs, use sensible defaults

Arguments & flags

Flag	Values	Default	What it does
`[product-name]`	—	asked interactively	Product name; also drives the output filename slug
`--fast`	—	off	Skips optional inputs and uses sensible defaults

What it does

Reads project context (prep work, before you're asked anything) — checks .claude/knowledge/shared/domain.md and CLAUDE.md if they exist, for product description, audience, differentiators, and any documented color/font preferences. Anything found here pre-fills the questions in the next step — you won't be asked for it again.
Collects brand inputs — user-confirmation checkpoint: asks for tagline, primary/accent/dark-panel colors, heading and body fonts (Google Fonts), and product copy via a single AskUserQuestion prompt. Anything already known from step 1 or the argument is pre-filled. For colors/fonts, type "defaults" for a clean dark-teal + Inter combination, or paste/enter your own. Copy can be pasted, given as a file path, or synthesized with "use project context".
Generates the magazine — a single self-contained HTML file with 10 fixed pages, each covering one story beat: Cover, Problem, Solution, Features, Audience, Competition, Origin Story, Launch/Traction, Tech, Back Cover. Scroll-snap horizontal swipe, dot navigation, keyboard arrows, and touch gestures are built in. All CSS and JS are inline; the only external dependency is Google Fonts.
Quality gate — 12 checks before writing: exactly 10 pages, scroll-snap present, at least 3 dark panels, an oversized stat element, the Google Fonts link, navigation dots, keyboard handler, touch handler, no external JS libraries, all 10 page IDs present, no leaked [PLACEHOLDER] text, and no secrets in the output.
Writes the file and offers next steps — adjust colors, rewrite a page, change fonts, or /share the file.

Output & artifacts

output/magazine-<product-name-slug>.html — the complete magazine (slug = product name lowercased, spaces → hyphens); the output/ directory is created if needed
Safe to commit to a repository as a static asset; works opened directly from the filesystem (no server)

Troubleshooting

Problem	Fix
Fonts don't load when viewing	Google Fonts needs internet access — for offline viewing, expect fallback fonts
Swipe feels off in Firefox	Known Firefox quirk with smooth scroll + snap — the built-in JS navigation (arrows/dots) handles it correctly; Chrome and Safari are the primary targets
Mobile users don't find the navigation	Horizontal swipe is non-standard on mobile — ask to add visible left/right arrow buttons for small viewports
No traction numbers to show	Leave them out — the launch page falls back to "Private Beta" or "Coming [Quarter Year]" copy instead of invented stats
Stakeholder raises accessibility concerns	The fixed-page horizontal-scroll format deliberately trades off WCAG 2.1 vertical-scroll compliance for the magazine aesthetic — flag this before publishing

/consulting — the client-engagement suite this complements for product storytelling
/pitch-generator — formal proposals and pitch documents instead of a visual magazine
/deliverable-builder — structured written deliverables from engagement data
/share — format the generated magazine for sharing

/marker

Drop a timeline marker into the current session for later A/B comparison.

Synopsis

/marker "<label>"

When to use it

Annotating meaningful moments during an A/B experiment — "starting refactor", "bug repro confirmed", "baseline recorded" — so the Arth dashboard can display them in the compare view
Bookmarking a point in a long session you'll want to find later on the timeline
Not for: configuring telemetry itself — that's /otel-setup

Quickstart

/marker "starting refactor"

What you'll see: marker recorded: starting refactor — the marker is attached to the current session's timeline. If the Arth engine isn't reachable, you'll see marker queued (engine unreachable): starting refactor instead and the marker is saved locally.

Examples

/marker "starting refactor"      # annotate the start of an experiment arm
/marker "baseline recorded"      # mark the comparison point
/marker "bug repro confirmed"    # pin the moment a hypothesis was verified

What it does

Reads the current Claude Code session ID to identify which session the marker belongs to.
Captures a nanosecond-precision timestamp.
Sends {label, timestamp_ns, session_id} to the Arth engine's markers API.
If the engine is unreachable, appends the marker to a local fallback file (/tmp/claude-agents-otel/<session>/markers.jsonl) instead — markers are never lost, and the skill never blocks your session.

Each invocation appends a new marker — running the same label twice records two distinct markers, so you can mark repeated checkpoints (e.g. each retry of an experiment) without overwriting earlier ones. The whole operation is inline shell with no agent spawns, so it adds effectively zero cost or latency to your session.

Output & artifacts

A timeline marker on your session in the Arth dashboard — see Arth Intelligence for how markers appear in the session compare view
Fallback file /tmp/claude-agents-otel/<session>/markers.jsonl when the engine is unreachable (append-only)

Troubleshooting

Problem	Fix
`no active session — run inside Claude Code`	The skill needs a live session ID; invoke `/marker` from within a Claude Code session
`usage: /marker "<label>"`	A label is required — quote it if it contains spaces
`marker queued (engine unreachable): <label>`	The Arth engine isn't responding; the marker was saved locally. Check the engine is running (default `http://localhost:3333`, override with `ARTH_BASE_URL`)

Arth Intelligence — the dashboard where markers show up in A/B compare views
/otel-setup — configure the observability pipeline that feeds the dashboard

/market-research

Research industry AI adoption, competitors, and trends.

Synopsis

/market-research <client-name> [--focus industry|competitors|trends]

Prerequisites

A client profile at clients/<client-name>/profile.json (run /client-discovery first if it doesn't exist yet)
WebSearch tool access — this skill runs live research queries and has no offline mode

When to use it

Building the evidence base for a proposal — adoption rates, ROI benchmarks, named case studies
Profiling what a client's competitors are doing with AI and where the gaps are
Scanning emerging AI technology and regulatory shifts relevant to a client's vertical
Not for: the initial client intake and maturity scoring — that's /client-discovery

Quickstart

/market-research acme-corp --focus industry

What you'll see: a series of WebSearch queries, then a sourced research report — adoption landscape, case studies, ROI benchmarks, and a Porter's Five Forces AI analysis — written to the client's discovery directory.

Examples

/market-research acme-corp                     # no focus — runs all three modes sequentially (slower, ~15-25 units)
/market-research acme-corp --focus industry    # adoption rates, use cases, ROI benchmarks (fastest, single mode)
/market-research acme-corp --focus competitors # competitor AI capabilities and gaps
/market-research acme-corp --focus trends      # emerging tech, regulation, market shifts

Arguments & flags

Flag	Values	Default	What it does
`--focus`	`industry` `competitors` `trends`	all three	Limits the research to one mode

What it does

Loads the client profile — reads clients/<client-name>/profile.json for industry, company name, known competitors, geography, and AI maturity. If the profile is missing, it asks you to run /client-discovery <client-name> first.
Executes research queries — 8 WebSearch queries per focus mode, tailored to the client's industry, competitors, and geography. Collects adoption percentages, named case studies with quantified outcomes, ROI benchmarks, and implementation barriers — all with source attribution.
Porter's Five Forces AI analysis — builds an AI-specific competitive forces canvas (new entrants, vendor power, customer power, substitutes, rivalry), each force rated HIGH/MEDIUM/LOW with justification and a strategic implication for the client.
Synthesizes findings — each insight gets supporting evidence with citation, relevance to the client, a recommended action, and a confidence rating (HIGH/MEDIUM/LOW). Cross-references identify convergent signals, contradictions, white spaces, and urgent threats.
Quality checklist — before output: every claim cited, ROI numbers sourced and dated, competitor data within 12 months, no speculative claims presented as facts.

Output & artifacts

Written to clients/<client-name>/discovery/ (relative to your Arth client workspace directory), one file per focus mode:

discovery/market-research.md — industry focus: adoption landscape, case studies, ROI benchmarks, Five Forces canvas, strategic implications
discovery/competitive-landscape.md — competitors focus: capability matrix, per-competitor profiles, gap analysis, opportunities and threats
discovery/industry-trends.md — trends focus: near/mid/long-term trend timeline, regulatory landscape, technology watch list

All reports end with a numbered source list.

Troubleshooting

Problem	Fix
`Run /client-discovery <client-name> first`	The client profile is missing or lacks `industry`/`company_name` — complete discovery before researching
Competitor mode produces thin profiles	The profile has no known competitors listed — edit `clients/<client-name>/profile.json` to add a `competitors` array, then run the skill again
Findings feel generic	Check the confidence ratings — LOW-confidence single-source insights need validation before they go in a proposal

/consulting — the engagement orchestrator; research feeds the assess and propose phases
/client-discovery — produces the client profile this skill reads
/opportunity-map — uses research findings when scoring initiatives
/pitch-generator — cites this research in proposals

/memory-setup

Configure arth-memory — the org decision/memory layer for AI agents. Choose Cloud (talk directly to your org's deployed server — zero local infra, best for trying it / team testing), Local (Docker store on this machine), or Global (offline replica of a central AWS/GCP server), then register the MCP server. Opt-in.

Test the org memory in ~1 minute (Cloud)

The fastest way to evaluate arth-memory — no Docker, no Ollama, nothing to run locally:

/memory-setup        # choose "Cloud" → paste your org's server URL (e.g. the ALB) → done

Then open a new session and ask an agent to do a task in a repo your org has seeded — it pulls the org's decisions/constraints via compile_context automatically. Or browse the dashboard at <server-url>/dashboard. Behind the scenes this registers the arth-memory-mcp-remote bridge, which calls the deployed REST API over HTTP.

To share with teammates: send them your toolkit branch + the server URL. They run /memory-setup → Cloud → paste the URL. That's it.

Synopsis

/memory-setup

When to use it

First-time setup: you want your coding agents to recall your team's Decisions, Constraints, and Facts (the "why") and have rule violations flagged at coding time
Switching a machine from a standalone Local store to a shared Global (central) one, or vice versa
Re-pointing at a different central server, or verifying an existing config still works
Not for: running /calibrate itself — /memory-setup just configures where knowledge is stored and synced; /calibrate fills it

Quickstart

/memory-setup

Pick Cloud to test your org's deployed server instantly (no local infra), Local to run an offline Docker store, or Global to replicate the central server offline. Then run /calibrate.

Examples

/memory-setup        # the only form — the skill is interactive

Typical runs:

First time, solo/offline → choose Local → a Docker FalkorDB store starts, the MCP server is registered, the store is created at ~/.arthai/memory
Joining an org with a central server → choose Global → provide the server URL + your reader token → an offline replica syncs down (gated at 1 GB)
Already configured → re-run → it offers to reconfigure, or just re-verifies your existing setup

What it does

/memory-setup is fully opt-in and additive — if you never run it, the toolkit behaves exactly as today (no memory MCP server, no container).

Installs the arth-memory CLI if it isn't already on your PATH.
Asks how to connect — a user-confirmation checkpoint:

Cloud — talk directly to your org's deployed server over HTTP (no Docker/Ollama/store). You provide its URL (+ optional reader token). Fastest to try / test.
Local — a Docker FalkorDB store on this machine + native Ollama. Fully offline, ~$0.
Global — an offline replica synced from your org's central AWS/GCP server (you provide its URL + a reader token).

Asks where the local knowledge store lives (default ~/.arthai/memory). Knowledge is more than one repo — KB/KG for every repo you work on lives in this one store, and each repo points at it via gitignored symlinks (.claude/knowledge → store). Knowledge is never committed per-repo.
Registers the arth-memory MCP server in .claude/settings.local.json (so agents get compile_context, constraint_check, write).
Runs an initial 1 GB-gated sync: under 1 GB → downloads everything into the store; larger orgs defer to selective sync (a later milestone).

Output & artifacts

~/.arthai/memory-configured — marker (cloud | local | remote)
~/.arthai/config.json — { mode, memory_dir? , api_url? } (read by the toolkit lib + context hook)
A running store: none for Cloud (direct API); arth-memory-falkordb Docker container (Local) or arth-memory-replica (Global)
<repo>/.claude/settings.local.json — the arth-memory MCP server block (git-ignored; reader token only)
<repo>/.claude/{knowledge,project-profile.md,okf} — gitignored symlinks into the store

Troubleshooting

arth-memory CLI not found after install — open a new shell so PATH updates, then re-run.
Local: Docker not running — start Docker Desktop and re-run; the store container needs it.
Local: recall is keyword-only — native Ollama isn't reachable; install it and pull nomic-embed-text, then re-sync.
Global: 401 — your reader token is wrong or expired; get a fresh one from your admin.
Global: total_live: 0 — the central store itself is empty; an admin needs to seed it (org build / first calibrate).
Nothing changed in a repo — /memory-setup configures the machine; run /calibrate in the repo to populate/hydrate.

/calibrate — establishes the store, emits an OKF bundle, and (when memory is configured) pushes generated knowledge up / hydrates it down
/implement, /planning, /qa — automatically read the memory store as context (injected by the toolkit's context hook)
/otel-setup — the sibling opt-in configuration skill (observability)

/onboard

Session onboarding — project briefing, work prioritization, and guided setup.

Synopsis

/onboard

When to use it

At the start of a session, to see where things stand and what to work on
After time away — "where did I leave off?" — to recover branch, plan, and PR context
On a brand-new (empty) project, to get pointed at the right first steps
Not for: deep project learning and toolkit configuration — that's /calibrate; or populating CLAUDE.md — that's /scan

Quickstart

/onboard

What you'll see: any toolkit updates since your last session, then a tiered briefing — 🔴 Fix first, 🟡 Waiting on you, 🔵 Continue, ⚪ Available — with PR/issue numbers, uncommitted files, plan progress (e.g., "Step 3/5" or "Phase 2"), and a menu of available workflows. It presents and waits; it never auto-starts work.

Examples

/onboard            # the only form — behavior adapts to your project's state

What it does

Detects project state — greenfield (no code, no CLAUDE.md, fresh git history) vs brownfield (existing code).
Greenfield flow — asks one question via AskUserQuestion ("What are you building?") — a user-confirmation checkpoint — then suggests a tech stack and asks you to confirm or adjust it (user-confirmation checkpoint). It then runs /scan to scaffold CLAUDE.md, and creates a project plan via /planning if the feature is non-trivial — or scaffolds directly if it's simple enough.
Brownfield: toolkit update check (always first) — compares the toolkit's current version against what this project last saw and shows new commits, new skills, and updated agents. If the project has never been calibrated, it shows a /calibrate hint.
Deep context gathering (no questions asked) — in parallel: git state (branch, status, recent commits, stashes), GitHub state via gh (your open PRs, review requests, assigned issues, recent merges, unassigned issues), active plan files, and environment health from the CLAUDE.md ## Environments table (Docker containers, health endpoints — production is checked read-only). Also checks whether the tech-debt baseline (written by /tech-debt) is stale: if it's more than 14 days old and there have been 5+ commits since, a hint appears in the ⚪ Available tier suggesting you run /tech-debt to refresh it; if no baseline exists yet, it suggests creating one.
Prioritizes — weights findings by the nature of your project (production app vs early-stage vs library) and by labels (bug/critical/security → fix first; CHANGES_REQUESTED or stale PRs → waiting on you).
Presents the briefing — only the tiers that have items, with actionable detail (who's waiting, how long, which files, which plan step) plus the workflow menu (/planning, /fix, /qa, /pr, …).
Routes on your choice — picks the right skill or fix command for whatever you select. If you type something specific at any point, onboarding gets out of the way.

Output & artifacts

The briefing itself, in the conversation — no code or config changes
.claude/.toolkit-last-seen-sha — updated so toolkit changes are only announced once
Greenfield path may chain into /scan (writes CLAUDE.md) or /planning

Troubleshooting

Problem	Fix
No PR/issue section in the briefing	`gh` CLI isn't installed or authenticated — onboard degrades gracefully and shows git-only state
"Everything looks clean" and nothing else	That's the real state: main branch, no uncommitted work, no open PRs/issues — just tell it what to build
Briefing shows only 3 items per tier	With 10+ findings it truncates to the top 3 per tier — ask to see the full list
Environment checks missing	CLAUDE.md has no `## Environments` table yet — onboard still checks Docker containers and `.env.local` as a fallback, but run /calibrate or /scan `environments` to document your full environment config for a richer check next time

/calibrate — first-time deep project learning (onboard hints at it when missing)
/scan — populates CLAUDE.md; onboard runs it for greenfield projects
/welcome — the non-engineer equivalent (intent menu instead of a briefing)
/planning — where feature work usually goes next

/opportunity-map

Map AI opportunities by ROI and effort, recommend target orgs.

Synopsis

/opportunity-map <client-name>

When to use it

After discovery, to turn pain points and readiness data into a prioritized list of AI initiatives
Building the priority matrix and phased roadmap that anchors the proposal
Not for: dollar-level financial modeling of an initiative — that's /roi-calculator; or gathering the underlying client data — that's /client-discovery

Quickstart

/opportunity-map acme-corp

What you'll see: 8-15 candidate AI initiatives generated from the client's pain points, each scored and ranked, a 2x2 impact/effort matrix, a three-wave roadmap, and department-level recommendations written to the client's assessment directory.

Examples

/opportunity-map acme-corp     # the single invocation form — reads discovery, writes assessment

What it does

Loads discovery data — reads clients/<client-name>/discovery/ and profile.md; extracts pain points, goals, tech stack, readiness scores, maturity level, budget, and timeline. If discovery/intake.md, current-state.md, or maturity-assessment.md are missing, it stops and tells you to run /client-discovery first.
Generates AI initiatives — 8-15 candidates using Jobs-to-Be-Done framing: at least 2 per pain point, a mix of Automation / Insight / Generation / Prediction types, at least 2 quick wins and 1 moonshot, calibrated to the client's maturity level.
Scores every initiative — weighted formula across 6 factors: Revenue Impact (25%), Cost Savings (20%), Feasibility (20%), Time to Value (15%), Strategic Alignment (10%), Risk inverse (10%). Scores classify as Must-Do / Should-Do / Could-Do / Defer, each with a written justification.
Priority matrix — an ASCII 2x2 impact/effort matrix (Quick Wins, Strategic Bets, Fill-Ins, Money Pits) plus a ranked priority table.
Three-wave roadmap — Wave 1 Quick Wins (Month 1-2), Wave 2 Foundation Building (Month 3-4), Wave 3 Strategic Bets (Month 5-6), with deliverables, dependencies, resources, and success metrics per wave.
Capability mapping — maps the toolkit's actual agent ecosystem to each client department: which agents and skills deploy where, a concrete in-practice scenario, and a week-one quick win per department, plus cross-department synergies.

Output & artifacts

Written to clients/<client-name>/:

assessment/opportunity-matrix.md — initiative catalog, factor-by-factor scoring, ASCII priority matrix, ranked table, justifications
assessment/roadmap.md — three-wave roadmap, timeline visualization, dependencies, budget summary, KPIs, risk register per wave
assessment/department-recommendations.md — per-department agent/skill briefs, synergies, change-management and governance recommendations

Troubleshooting

Problem	Fix
`Discovery data not found for <client-name>`	Run `/client-discovery <client-name>` first — this skill requires intake, current-state, and maturity files
Initiatives feel too ambitious for the client	Expected guardrail: maturity below 10 biases toward simpler initiatives; budget under $50K restricts to quick wins on existing tools
An initiative needs data the client doesn't have	The skill flags these — they need a data acquisition plan before they're roadmap-ready
Scores look arbitrary	Every score must trace to discovery evidence — if one doesn't, challenge it and regenerate

/consulting — the engagement orchestrator; runs this as the assess phase
/client-discovery — produces the discovery data this skill requires
/roi-calculator — next step: financial models for the top-scored initiatives
/pitch-generator — folds the matrix and roadmap into the proposal

/otel-setup

Configure OTEL observability — starts the local Arth Intelligence Docker stack (or points at a remote endpoint) and writes the env vars.

Synopsis

/otel-setup

When to use it

First-time setup: you want your Claude Code sessions to appear on the Arth Intelligence dashboard with cost and token data
Repairing a broken or partial OTEL config (the classic symptom: sessions show up but cost columns are empty)
Re-pointing telemetry at a different endpoint, or verifying an existing config still works
Not for: understanding what the dashboard shows or how the data flows — that's the Arth Intelligence guide

Quickstart

/otel-setup

What you'll see: a check of your current config, then two questions — where to send traces (Cloud / Local / Custom), and how much telemetry to collect (Native-only vs Full, see below) — after which the skill writes the env vars (globally by default, so every project on the machine emits telemetry), starts the local Docker stack if you chose Local, and verifies end-to-end by sending a test span. Restart your Claude Code session and your activity appears on the dashboard.

No toolkit? The same setup ships in the Arth CLI as arth otel-setup. Install the Arth CLI first — it's gated by your GitHub repo access (no public package).

Examples

/otel-setup        # the only form — the skill takes no arguments

Typical runs:

First time → choose Local → Docker stack starts, env vars written, test span verified
Cost columns empty on the dashboard → re-run → it detects the partial config and offers Repair in place
Already configured → re-run → it offers to reconfigure, or just re-verifies your existing setup

What it does

Detects current state — reads .claude/settings.local.json and classifies it: not configured (full setup), fully configured (offers to reconfigure or just re-verify), or partially configured — the silent-failure case, where it shows exactly which keys are missing and offers repair-in-place, full reconfigure, or cancel — a user-confirmation checkpoint.
Asks question 1 — where should traces go? A user-confirmation checkpoint with three options:

Cloud — Arth Intelligence (traces go to ingest.getarth.ai; the dashboard itself lives at app.getarth.ai, where you also get your Arth project key)
Local — a local Docker stack: it checks Docker is running, writes ~/.arthai/docker-compose.yml (Postgres + the Arth Intelligence engine + a watchtower auto-updater, all with restart: unless-stopped), starts it, and waits for the engine health check
Custom — your own OTLP endpoint (Honeycomb, Datadog, Jaeger, …); it asks for the URL and any auth headers

Asks question 2 — how much telemetry? A user-confirmation checkpoint between two depths:

Native-only (default) — just Claude Code's own OTEL: cost by model, cost by owner, tokens. Written to the global ~/.claude/settings.json so every project on the machine emits it, including non-toolkit repos and sessions launched outside a shell.
Full — everything in Native-only, plus the toolkit's otel-telemetry hook: skill/agent attribution (skill names, lines edited), experiment auto-tags, a richer agent DAG. Choose this only if you're running the toolkit in this repo and want that extra detail.

Decides scope — the native block is written to the global ~/.claude/settings.json by default (every project on the machine), so this normally isn't asked. It only prompts for an override if you want this one repo pointed somewhere different, via <repo>/.claude/settings.local.json.
Offers "Explain this session" inline (optional, non-experimental). After the stack is healthy, /otel-setup (and arth otel-setup) ask whether to enable the dashboard's "Explain this session" AI summary — pick any provider (Anthropic, OpenAI, Gemini, Bedrock) including free local Ollama / LM Studio (LM Studio auto-detects the loaded model, e.g. qwen). It's optional; telemetry works without it. The experimental Cloud Orchestrator (calibrate/plan a repo) is configured separately — run /cloud-setup (or arth cloud-setup); it's Claude-only and needs a GitHub token + license.
Writes the env vars — merges the six required OTEL keys into the chosen target without overwriting your other settings, then asks one yes/no follow-up about auto-tagging sessions for arth's /experiments page (default: on).
Verifies end-to-end — re-reads the config and asserts all six keys landed, checks for a stale .claude/.arth-otel.env that could silently divert spans (offering one-click reconciliation), sends a test span to the endpoint the hook will actually use, and checks your existing containers will auto-restart after a reboot (offering a one-line migration if not).
Finishes — writes the ~/.arthai/otel-configured marker so you aren't re-prompted, and tells you to restart your Claude Code session so traces start flowing.

Output & artifacts

OTEL env vars in ~/.claude/settings.json (global, recommended and the default) or <repo>/.claude/settings.local.json (project-only override)
Local option: ~/.arthai/docker-compose.yml plus running containers — engine on port 4319, dashboard at http://localhost:3100, data persisted in the arthai_data Docker volume
Reconciled .claude/.arth-otel.env if a divergent one existed
Marker file ~/.arthai/otel-configured
For what the dashboard then shows you — sessions, cost, experiments — see the Arth Intelligence guide
After setup, confirm you're on the latest image — compare docker compose -f ~/.arthai/docker-compose.yml images against Docker Hub's :latest. See Arth Intelligence → Am I on the latest version?

Troubleshooting

Problem	Fix
`/otel-setup requires jq`	`brew install jq` (macOS) or `apt-get install jq` (Linux), then re-run
`Docker is not running`	Start Docker Desktop and re-run `/otel-setup`
`Arth Intelligence failed to start`	`docker compose -f ~/.arthai/docker-compose.yml logs` to see why
Sessions appear but cost columns are empty	The partial-config silent failure — re-run `/otel-setup` and choose Repair in place; verification confirms all six keys
`415 Unsupported Media Type` during verify	Protocol mismatch — the repair path rewrites the protocol key; also check `.claude/.arth-otel.env` for a stale value
`DIVERGENCE DETECTED` between config sources	Accept the offered reconciliation — it rewrites `.arth-otel.env` to match `settings.local.json`
Endpoint unreachable in the smoke test	Local: start the engine (`docker compose -f ~/.arthai/docker-compose.yml up -d`); remote: check connectivity and the endpoint URL
Dashboard dark after a Mac reboot	Legacy compose file without restart policies — run the printed `docker update --restart unless-stopped ...` one-liner
"Explain this session" shows "needs an LLM key — run `/cloud-setup`"	No LLM key configured. Run `/cloud-setup` (or `arth cloud-setup`), opt into Explain, and provide a key — it's saved to `~/.arthai/.env` and survives image updates. Free option: local Ollama.

Never run docker compose -f ~/.arthai/docker-compose.yml down -v — the -v erases all session data. Plain down is safe; updates are automatic via watchtower.

Arth Intelligence guide — the conceptual story: what flows to the dashboard and how to read it
Monitors — event-driven watchers that build on the same observability stack
/calibrate — full toolkit configuration for a project

/perf

Run a performance optimization pass with a cross-functional team.

Synopsis

/perf [scope] [--backend-only] [--frontend-only] [--audit-only] [--deep]

When to use it

An endpoint, page, or feature is slow and you want a structured audit-then-fix pass
Before a launch — full audit plus load-test validation with --deep
A cheap read-only check on a module before reviewing a PR (--audit-only)
Not for: frontend-only quick audits of a single route — /lighthouse is lighter; or general code cruft — that's /tech-debt

Quickstart

/perf

What you'll see: a quick codebase scan, a mode prompt (audit / targeted / deep), then a prioritized performance audit — findings classified CRITICAL/HIGH/MEDIUM/LOW with file:line citations — followed by fixes and a validation report if you picked a fixing mode.

Examples

/perf                              # full codebase audit, mode chosen interactively
/perf GET /api/products --backend-only   # one endpoint: queries, caching, serialization
/perf --frontend-only              # bundle size, code splitting, images, Web Vitals
/perf src/api/routes/ --audit-only # read-only report, no code changes — fast and cheap
/perf --deep                       # full team + load tests/benchmarks, most thorough

Arguments & flags

Flag	Values	Default	What it does
`scope`	file, directory, feature, endpoint, `all`	`all`	What to audit/optimize
`--backend-only`	—	off	Limit to backend optimizations
`--frontend-only`	—	off	Limit to frontend optimizations
`--audit-only`	—	off	Report findings only — read-only, no code changes
`--deep`	—	off	Deep profiling: benchmarks and load tests if configured

What it does

Gathers context — reads CLAUDE.md, the project profile, and past performance findings from the knowledge base so the audit is calibrated to your stack and stage.
Quick scan — a cheap explore-light pass flags N+1 queries, unbounded queries, missing indexes, heavy bundles, missing caching/compression, and similar hotspots.
Mode selection — user-confirmation checkpoint: asks you to pick Audit (~15x cost), Targeted (~30x), Deep (~50x), or auto. Skipped if you passed --audit-only or --deep.
Audit phase — performance lead profiles the critical path and produces a severity-tiered, file-cited report; the architect assesses scalability bottlenecks with stage-appropriate recommendations (no distributed-systems advice for early-stage projects).
Optimize phase (Targeted/Deep only) — backend and frontend agents implement the CRITICAL/HIGH fixes from the audit, running tests, linters, and type checks after each change.
Validate phase — QA runs the full test suite (plus load tests and bundle-size comparison when applicable) and reports any regressions.
Report + continuation — aggregates everything into a final report, writes findings to the knowledge base, then presents a menu: [1] fix remaining items, [2] run /qa commit, [3] create PR (/pr), [4] deeper analysis (/perf {scope} --deep), [5] done for now. The pass isn't "finished" until you pick one — the report alone doesn't close it out. (Skipped if autopilot is active.)
Completion verification — after you respond to that menu, a completion-verifier agent runs a separate spec/plan-compliance check (only meaningful if a plan/spec file exists for the scope) and prints PASS, GAPS FOUND, or INCONCLUSIVE. This does not re-measure performance — that validation already happened in the QA phase above.

Agents spawned

Agent	Model tier	Role
explore-light	haiku	Cheap hotspot scan before the team spins up
performance	sonnet	Leads the audit, prioritizes findings
architect	sonnet	Scalability assessment, stage-appropriate recommendations
backend	sonnet	Implements backend fixes (Targeted/Deep)
frontend	sonnet	Implements frontend fixes (Targeted/Deep)
qa	sonnet	Regression validation after optimization

Models shown are defaults, loaded from model-policy.yml. A project with a custom model-policy override will use different models for some or all of these roles.

After the continuation menu, a separate completion-verifier agent (sonnet) runs a spec/plan-compliance check — it's not part of the main team and doesn't validate performance itself.

Output & artifacts

Final performance report in the conversation: audit, scalability assessment, optimizations applied, validation results, remaining recommendations
Code changes in your working tree (Targeted/Deep modes) — nothing in --audit-only
Findings appended to .claude/knowledge/agents/performance.md; decisions and patterns to .claude/knowledge/shared/decisions.md and patterns.md

Troubleshooting

Problem	Fix
Run is expensive/slow for a quick question	Use `--audit-only` or narrow the scope to a file or endpoint
Optimizations broke a test	The QA validation phase reports it — fix before shipping, or revert the specific change
No load-test comparison in the report	Load tests only run in Deep mode, and only if a load-test tool (k6, locust, etc.) is already set up in the project — `/perf` does not install or configure one for you
Findings ignore project-specific known-slow patterns	Run /calibrate first so the team inherits project knowledge

Context management

/perf can accumulate a lot of context — profiling output, benchmark data, and optimization proposals from multiple agents — especially after a --deep or full-codebase run. After the report, the skill prints a tip block to help you decide what to do next:

Changes implemented, heading to PR? Run /compact with a summary of what was optimized so the what-changed context survives for the PR body.
Audit only (--audit-only), no changes made? Context pressure is low — either continue straight to planning fixes, or /clear and start a new session with a short prompt describing the audit's key findings.
Planning multiple optimization passes? Use a new session per pass. Carrying over context from pass 1 can bias judgment in pass 2 — you may see old bottlenecks as already fixed when they aren't.

/tech-debt — structural cruft audit; /perf is speed, /tech-debt is hygiene
/qa — validate all changes before shipping
/pr — ship the optimizations
Workflow guide

/pitch-generator

Generate consulting proposals and pitch documents.

Synopsis

/pitch-generator <client-name> <--format proposal|exec-summary|deck-outline>

When to use it

Converting discovery and opportunity data into a budget-ready proposal
Producing a one-page executive brief for a C-suite decision
Outlining a live pitch deck slide by slide, with speaker notes
Typically invoked as part of /consulting workflow (propose phase), but can be run standalone once discovery is complete
Not for: generating the underlying ROI numbers — that's /roi-calculator; or post-sale deliverables like final reports and board decks — that's /deliverable-builder

Quickstart

/pitch-generator acme-corp

What you'll see: a full 5-8 page proposal — cover page, executive summary, current state, opportunities, phased approach, architecture and roadmap diagrams, investment and ROI, team model, next steps — saved to the client's proposals directory.

Examples

/pitch-generator acme-corp                            # default: full proposal (5-8 pages)
/pitch-generator acme-corp --format exec-summary      # one-page brief for C-suite/board
/pitch-generator acme-corp --format deck-outline      # 9-slide pitch outline with speaker notes

Arguments & flags

Flag	Values	Default	What it does
`--format`	`proposal` `exec-summary` `deck-outline`	`proposal`	Output format: full proposal for budget approval, 1-page executive summary, or 9-slide deck outline

What it does

Loads client data — reads from clients/<client-name>/profile.md, discovery/current-state.md, discovery/maturity-assessment.md, assessment/opportunity-matrix.md, and assessment/roadmap.md, plus ROI data if /roi-calculator has been run. Missing any of these files triggers a redirect to run the prerequisite skill first.
Generates the chosen format:

proposal — 10 sections from cover page to next steps, including the maturity scorecard, the priority matrix, Mermaid architecture and Gantt roadmap diagrams, an investment table with "cost of doing nothing", and the engagement team model.
exec-summary — problem, opportunity, approach, expected ROI, investment, timeline, and next step on a single printed page (~500 words max).
deck-outline — exactly 9 slides with speaker notes, mapping specific Arth AI agents (architect, code-reviewer, frontend, python-backend, etc.) and their capabilities to the client's departments and a three-phase engagement (Deploy & Discover, Build & Innovate, GTM & Scale).

Quality gate — 10 checks before output: every claim traces to client materials or a cited benchmark (unsourced numbers are removed or marked [TO BE VALIDATED]), numbers consistent across sections, every discovery pain point addressed, no generic placeholder language, length limits enforced, clear next steps with specific actions included, all ROI figures reasonable and defensible, measured consultant tone, client-provided information attributed, and unknowns framed as collaborative discovery questions.
Saves and reports — writes the file and shows the commands to generate the other formats.

Output & artifacts

Written to clients/<client-name>/proposals/ (created if missing):

Format	File
`proposal`	`proposals/full-proposal.md`
`exec-summary`	`proposals/executive-summary.md`
`deck-outline`	`proposals/deck-outline.md`

Sections with insufficient data carry [DATA NEEDED: ...] markers, and the firm name appears as a [Your Firm Name] placeholder for you to fill in.

Troubleshooting

Problem	Fix
`Discovery data not found for <client-name>`	Run `/client-discovery <client-name>` first
`Opportunity data not found for <client-name>`	Run `/opportunity-map <client-name>` first
ROI figures are ranges rather than firm numbers	Run /roi-calculator first — without it the proposal uses conservative industry-benchmark ranges
`[DATA NEEDED]` or `[TO BE VALIDATED]` markers in the output	Deliberate honesty markers — gather that data from the client before sending, never fill them with guesses

/consulting — the engagement orchestrator; runs this in the propose phase
/roi-calculator — produces the financial numbers the proposal cites
/opportunity-map — produces the priority matrix and roadmap the proposal presents
/deliverable-builder — post-win deliverables: final reports, board decks, guides
/share — clean up and format a document for sharing outside the engagement

/planning

Generate a PRD (product requirements document) for a feature — user stories, journey, edge cases, success criteria — plus an async tech feasibility note from the architect. Includes a design spec HTML by default (use --no-design to skip).

Synopsis

/planning <feature-name> [--no-design] [--gtm] [--sync] [--design-spec-only] [--check] [-- brief]

When to use it

Starting a new feature — before any architecture or task breakdown happens
You want user stories, edge cases, and a feasibility read reviewed before committing to a build approach
Not for: the implementation plan itself — that's phase 2, /implementation-plan; or small bug fixes — use /fix

Quickstart

/planning dark-mode -- Users want a dark theme that follows system preference and persists per account

What you'll see: a PRD written to .claude/specs/dark-mode.md, a design spec at .claude/specs/dark-mode-design.html, and the architect's feasibility verdict (GREEN / YELLOW / RED) — then a prompt to review the PRD before running /implementation-plan.

Examples

/planning oauth-login -- Add Google OAuth sign-in        # most common — name + inline brief
/planning oauth-login                                    # no brief — skill asks for one interactively
/planning admin-report --no-design -- CSV export...      # PRD only, skip UX brief + design spec HTML
/planning checkout --gtm -- One-click checkout...        # add GTM positioning input to the PRD
/planning oauth-login --design-spec-only                 # backfill a design spec for a PRD made with --no-design
/planning oauth-login --check                             # read-only: does the design spec still match the PRD?

Arguments & flags

Flag	Values	Default	What it does
`--no-design`	—	off	Skip the UX brief and design spec HTML — PRD only. Cannot be combined with `--sync`, `--design-spec-only`, or `--check`.
`--gtm`	—	off	Adds a GTM Expert positioning note that shapes the PRD's target user and success metrics
`--sync`	—	off	Currently stubbed: prints a not-yet-implemented notice (design-artifact sync is planned for v1.1) and exits without changes. Use `--design-spec-only` or `--check` below instead.
`--design-spec-only`	—	off	Skips PRD regeneration and operates on an existing PRD: spawns the design-spec-writer agent, writes the design spec sibling file, and patches the PRD frontmatter. Use this to backfill a design spec for a PRD that was generated with `--no-design`. Idempotent — safe to re-run.
`--check`	—	off	Read-only drift check: compares the design spec's recorded PRD hash against the current PRD body and exits 0 (match) or 1 (mismatch, with guidance). Makes no file changes. Useful for CI / pre-commit / drift-hook automation.
`-- <brief>`	free text	—	Inline feature brief (2–3 sentences). Everything after `--` is the brief.

--sync, --design-spec-only, and --check are pairwise exclusive — pass at most one, and all three require an existing PRD (pass feature-name as the first positional argument).

Mode flags like --fast/--lite/--full are not used in PRD generation — the skill prints a hint and continues with the normal PRD flow, ignoring the flag. Those flags belong to /implementation-plan, where they control the architecture debate.

What it does

Interactive resolution — if the feature name or brief wasn't passed inline, asks for them via AskUserQuestion (user-confirmation checkpoint; every question includes Cancel). Names are silently normalized to kebab-case. Running in the Arth cloud sandbox: you'll also get one extra round of scope-clarifying questions (MVP vs. full, target user/surface, etc.) before the PRD is written — this only happens in cloud runs, never in a local terminal or CI.
Context gathering — an explore-light agent scans the codebase for related routes, components, models, and tests; topic wikis and the project knowledge graph/base are consulted.
Input briefs (parallel) — a Design Thinker writes a short UX brief (skipped with --no-design) and, with --gtm, a GTM Expert writes a positioning note. These feed into the PM — they are not separate PRD sections.
PRD writing (parallel) — the Product Manager writes the PRD (user stories US-N with priorities and acceptance, user journey, edge cases EC-N, success criteria); the Architect writes a tech feasibility note (GREEN / YELLOW / RED) flagging hard constraints and high-risk areas. No debate happens here.
Design spec generation — a design-spec-writer agent produces the design spec HTML (user journeys, key screens, interaction principles, accessibility, design system), hash-linked to the PRD for drift detection. Skipped with --no-design.
Present and hand off — prints the PRD path and feasibility verdict, then stops. User-confirmation checkpoint: /implementation-plan is never auto-invoked — even in autopilot — because PRD review is the whole point of the two-phase split.

Agents spawned

Agent	Model tier	Role
explore-light	haiku	Codebase scan
design-thinker	sonnet	UX brief (default; skipped with `--no-design`)
gtm-expert	sonnet	Positioning note (`--gtm` only)
product-manager	sonnet	Writes the PRD
architect	sonnet	Tech feasibility note
design-spec-writer	sonnet	Design spec HTML (default; skipped with `--no-design`)

Output & artifacts

.claude/specs/<feature-name>.md — the PRD, with frontmatter (phase: prd, story/edge-case counts, feasibility verdict, spec_hash for drift detection)
.claude/specs/<feature-name>-design.html — design spec sibling (unless --no-design)
A decision entry in the project knowledge graph
Next step printed: review the PRD, then run /implementation-plan <feature-name>

Troubleshooting

Problem	Fix
`missing required args in non-interactive mode`	In CI/autopilot the skill won't prompt — pass the feature name and `-- <brief>` inline
`--no-design cannot be used with --{flag}`	Pick one — `--sync`, `--design-spec-only`, and `--check` all require a PRD that already has design artifacts (i.e. was not generated with `--no-design`)
`--sync is not yet implemented (planned for v1.1)`	Expected — `--sync` is a stub today. Use `--design-spec-only` to regenerate the design spec, or `--check` to read-only-detect drift
Feasibility comes back RED or YELLOW	Read the printed hard constraints — revising the PRD now is far cheaper than after the architecture debate
Design spec contains TODO markers	The writer's output was incomplete on both attempts — re-run with `/planning <feature> --design-spec-only` as the warning suggests

/implementation-plan — phase 2: turns the reviewed PRD into a locked-scope plan
/implement — phase 3: builds from the plan
Workflow guide — where planning fits in the full feature workflow

/pr

Run QA, create a GitHub PR, and manage post-merge workflow. Never push directly to main.

Synopsis

/pr [--skip-qa]

When to use it

Your change is done and you want the safe path to a PR: QA gate, commit, issue link, push, PR
After a merge, to run the cleanup workflow (close issue, delete branch, back to main)
Not for: a quick commit+push+PR with no QA — that's /ship; fixing a CI run that already failed — that's /ci-fix

Prerequisites: an authenticated platform CLI — GitHub CLI (gh, run gh auth login if needed), GitLab CLI (glab), Azure CLI (az), or Bitbucket API credentials — matching your git remote origin. /pr auto-detects the platform from .git/config.

Quickstart

/pr

What you'll see: a mode line, a revert-safety check, /qa in commit mode, a commit, a tracking issue found or created, your branch rebased on the default branch and pushed, and a PR URL with Summary / QA Results / Test plan sections. For large PRs (>500 changed lines or >15 files), a one-line tech-debt suggestion is appended. If compliance extensions are enabled, a compliance matrix is added too.

Supports GitHub, GitLab, Bitbucket, and Azure DevOps — platform is auto-detected from .git/config. Examples below show GitHub (gh) commands; GitLab uses glab mr create, Bitbucket uses its REST API, Azure DevOps uses az repos pr create.

Examples

/pr             # full workflow — QA + commit + push + PR
/pr --skip-qa   # QA already ran upstream (/fix, /implement) — quick lint + type sanity check only

Arguments & flags

Flag	Values	Default	What it does
`--skip-qa`	—	off	Skips the full QA gate (used when `/fix` or `/implement` already ran QA); runs only a quick lint + type-check sanity pass

What it does

Convention discovery — reads CLAUDE.md, the project profile, and git history to match your team's commit style, PR template, and branch protection rules; also detects the source-control platform (GitHub/GitLab/Bitbucket/Azure DevOps) from .git/config.
Tech-debt nudge — for large diffs (>500 changed lines or >15 files), flags that a /tech-debt pass is worth running after merge. Skipped silently otherwise, or if disabled in .claude/tech-debt-config.json.
Revert check — runs /revert-check in advisory mode. If it suspects your working tree accidentally undoes recently-merged PRs, it surfaces the warning and asks whether to proceed (user-confirmation checkpoint).
Precheck (toolkit projects only) — if this is the claude-agents toolkit itself (detected via tests/run.sh), runs /precheck first and stops on failure. Skipped for non-toolkit projects.
QA gate — runs /qa in commit mode (or just the quick sanity check with --skip-qa). Failures stop the workflow — no PR is created on a failing QA run.
Stage and commit — stages relevant files (never .env, credentials, or large binaries) and commits in the project's detected style. Verifies the branch is complete (type check / lint) before pushing.
Issue linkage — finds an existing tracking issue for the branch or creates one, so the PR auto-closes it on merge via Closes #N.
Compliance evidence (if extensions enabled) — when .claude/extensions/.enabled.json lists enabled packs, generates a per-pack evidence report and adds a compliance matrix to the PR body. Skipped entirely (zero overhead) if no extensions are enabled.
Push and create the PR — rebases on the latest default branch (fails loudly on conflicts, never auto-resolves), pushes, and opens the PR with Summary, QA Results, and Test plan sections. Refuses to run on main — branch first.
Completion verification — a completion-verifier agent confirms the PR matches the plan/spec (skipped with --skip-qa, since that flag signals QA already ran upstream via /fix or /implement — if you pass it standalone with no prior QA, the verifier is still skipped).
Post-merge workflow — when the merge is detected (a Monitor watcher fires automatically if configured, otherwise tell it "merged"): verifies the issue closed, deletes remote and local branches, checks out main, pulls, and runs /onboard for what's next.

Agents spawned

Agent	Model tier	Role
completion-verifier	sonnet	PR-vs-plan check (skipped with `--skip-qa`)

(The QA gate itself runs via the /qa skill, which spawns its own checkers.)

Output & artifacts

A commit on your feature branch and a pushed branch on origin
A GitHub issue (found or created) with a Branch: <name> marker, auto-closed on merge
A GitHub PR (URL returned) with Summary / QA Results / Test plan and Closes #N
After merge: branches cleaned up, main checked out and pulled
PR pattern notes appended to .claude/knowledge/skills/pr.md

Troubleshooting

Problem	Fix
Refuses to run on `main`	Create a feature branch first — `/pr` never pushes to main
`✗ Rebase conflicts against origin/main`	`/pr` stops rather than auto-resolving. Open the conflicting files, fix the markers, `git add` them, then `git rebase --continue` and re-run `/pr` — or bail out entirely with `git rebase --abort`
QA gate fails and PR is not created	Fix the reported failures (route to `/fix` if needed) and re-run — `/pr` won't ship failing code
Revert check flags files you changed on purpose	Confirm at the prompt that the changes are intentional and the workflow continues
Post-merge cleanup didn't run	No merge webhook configured — just say "merged" and the post-merge steps run manually
Unexpected `## Compliance` section in the PR body	Your project has compliance extensions enabled (`.claude/extensions/.enabled.json`) — see /extensions status for what's tracked and why
`/precheck` ran when I didn't ask for it	Expected on the claude-agents toolkit repo itself — `/pr` runs it automatically before QA to catch failures locally in ~30s

/precheck — the fast local test gate /pr runs on toolkit projects
/ship — the no-QA shortcut when you just want commit + push + PR
/qa — the QA framework behind the gate
/implement — chains into /pr --skip-qa automatically

/precheck

Run local tests before pushing. Catches CI failures locally in ~30 seconds instead of a 4-minute CI round-trip.

Synopsis

/precheck [--skip-revert-check]

When to use it

Every time before pushing a branch or creating a PR — /pr expects a passing precheck
After finishing a change, to know in seconds whether CI would fail
Not for: full-codebase validation — that's /qa full; or fixing a CI run that already failed — that's /ci-fix

Quickstart

/precheck

What you'll see: a revert-safety check, then the test suites relevant to your changed files, ending with ✓ Precheck passed (N tests, Xs) — Ready to push. On success it also records a pass marker, then automatically commits, pushes, opens a PR, squash-merges it, and deletes the branch — no confirmation prompt. If you want to review the PR on GitHub before it merges, stop the skill after the pass marker is written, or use /pr for a create-only flow.

Examples

/precheck                       # standard run — pick suites from the diff, test, then ship
/precheck --skip-revert-check   # you intentionally deleted recently-merged code (refactor)

Arguments & flags

Flag	Values	Default	What it does
`--skip-revert-check`	—	off	Skips the accidental-revert detection when a deletion is intentional

What it does

Revert check — runs /revert-check in strict mode to catch stale-buffer or bad-stash-pop changes that would silently undo recently-merged PRs. A failure here stops everything and shows the suspected files.
Detects what changed — diffs your branch against main to find the touched files.
Selects the relevant test suites — only the tests that cover what you changed; falls back to the full mechanical run when changes span multiple areas. Refuses to run on main (asks you to branch first).
Runs the tests and reports pass/fail per suite.
On pass: writes the .claude/.precheck-passed marker (the toolkit's routing checks this before allowing /pr), then proceeds to the full ship sequence — commit, push, PR, squash-merge, and branch cleanup (git checkout main && git pull && git branch -d {branch}) — without re-asking.
On fail: removes any stale pass marker and lists the failing tests. It will not push with failures.

Output & artifacts

.claude/.precheck-passed — timestamped pass marker consumed by /pr
Test run output in the conversation; on the ship path, a merged PR with the branch deleted and main checked out locally

Troubleshooting

Problem	Fix
`Switch to a feature branch first`	Precheck refuses to run on `main` — `git checkout -b my-change` and re-run
Revert check flags files you deleted on purpose	Re-run with `/precheck --skip-revert-check`
Tests pass locally but `/pr` still complains	The pass marker is stale (new commits since the run) — re-run `/precheck`

/qa — deeper, multi-agent quality checks before shipping
/pr — use this instead of /precheck when you want to create (or update) a PR without also triggering the auto-merge; /precheck still expects /pr-created PRs to have a passing precheck marker
/revert-check — the safety check precheck runs first (standalone usage)
Workflow guide: Ship code

/qa-incident

Manually create a QA incident from a known issue.

Synopsis

/qa-incident [description]

When to use it

A bug reached production (or a user) without being caught — log it so future QA runs target that area
You know about an issue you aren't fixing right now, but want the QA system to remember it
Not for: actually fixing the bug — that's /fix, which writes its own incident record automatically; or reviewing the knowledge base — that's /qa-learn

Quickstart

/qa-incident admin page crashes when user has no sessions

What you'll see: a few clarifying questions (severity, affected files, root cause if known), then a confirmation: "Incident logged. Next /qa run will generate a regression test targeting this issue."

Examples

/qa-incident admin page crashes when user has no sessions
/qa-incident credit balance goes negative under concurrent checkout
/qa-incident                # no description — you'll be prompted for one

What it does

Parses the description from your argument; if you provided none, it asks you to describe the issue first.
Asks clarifying questions — severity (high / medium / low), affected files if known, and root cause if known (user-confirmation checkpoint).
Creates the incident file at .claude/qa-knowledge/incidents/{date}-{slug}.md with status: uncovered, the root cause (or "To be investigated"), how QA missed it, and a regression-test recommendation. The knowledge-base directory is created automatically if it doesn't exist yet.
Confirms — the next /qa run reads uncovered incidents and generates a regression test targeting the issue.

Auto-logging from escalation (not manual)

There is also a separate, fully automatic path — you never invoke this yourself. When another workflow resolves an error after the escalation guard tripped (3+ consecutive failures in a row), the resolution is auto-logged as a distinct incident at .claude/qa-knowledge/incidents/{date}-escalation-{slug}.md, with status: covered and type: escalation-resolution (vs. status: uncovered for manually-created incidents above). It records what was tried and failed, the actual root cause, the fix applied, and search keywords — so future sessions find the fix instead of repeating the same debugging journey. This is why you may see both uncovered and covered incidents in the directory: uncovered means "log it, /qa will write a test"; covered means "already resolved, kept as a searchable record."

Output & artifacts

.claude/qa-knowledge/incidents/{date}-{slug}.md — the incident record (status: uncovered until a regression test covers it)
.claude/qa-knowledge/incidents/{date}-escalation-{slug}.md — auto-logged escalation-resolution records (status: covered), created only by the auto-logging path above, not by manual /qa-incident runs
.claude/qa-knowledge/bug-patterns.md — updated only by the auto-logging path, when an escalation resolution represents a new recurring pattern
Picked up by /qa on later runs — it reads uncovered incidents and generates a regression test targeting each one

Troubleshooting

Problem	Fix
Invoked with no description	You'll be prompted: describe the issue in a sentence (e.g. "admin page crashes when user has no sessions")
`.claude/qa-knowledge/` doesn't exist	Created automatically — no setup needed
Incident logged but `/qa` didn't test it	Make sure the next QA run touches the affected area — incidents inform scenario generation for relevant changes

/qa — consumes incidents to generate targeted regression scenarios
/qa-learn — review incident counts, coverage status, and prune stale entries
/fix — the formal fix pipeline, which logs its own incident records on completion
QA Guide — how the QA knowledge base feeds the self-improving loop

/qa-learn

Review QA knowledge base stats, prune stale entries, show learning metrics.

Synopsis

/qa-learn [prune]

When to use it

Periodically, to see what the self-improving QA system has accumulated — incidents, promoted tests, bug patterns, coverage gaps
When the knowledge base feels noisy or stale — prune clears out entries that no longer earn their keep
Not for: adding a new incident — that's /qa-incident; or running tests — that's /qa

Quickstart

/qa-learn

What you'll see: a structured table of knowledge-base metrics — incident counts (covered vs. uncovered), promoted test files, bug patterns, coverage gaps, and the adversarial challenger's hit rate (attacks that found real issues / total attacks).

Examples

/qa-learn          # show stats — read-only
/qa-learn prune    # remove stale entries and report what was pruned

What it does

Default (stats):

Counts incidents in .claude/qa-knowledge/incidents/, split into covered (regression test exists) vs. uncovered.
Lists promoted tests from .claude/qa-knowledge/promoted-tests/ — generated scenarios that earned a permanent place in the suite.
Counts bug patterns (bug-patterns.md) and coverage gaps (coverage-gaps.md).
Computes the challenger hit rate from challenger-log.md — how often adversarial QA attacks actually found issues.
Presents everything as one table so you can see whether the QA system is learning.

prune mode (modifies the knowledge base — it reports exactly what was removed):

Archives or deletes covered incidents older than 30 days.
Removes provenance for promoted tests whose source test file no longer exists.
Removes coverage gaps that now have test coverage.
Truncates challenger log entries older than 90 days.

Output & artifacts

Stats mode: a metrics table in the conversation — nothing written
Prune mode: stale entries removed from .claude/qa-knowledge/, with a report of what was pruned

Troubleshooting

Problem	Fix
`QA knowledge base not initialized`	Run `/qa commit` or `/qa full` first — the knowledge base accumulates from QA runs
All counts are zero	The KB exists but nothing has been learned yet — normal for a fresh project; counts grow as QA runs catch bugs and gaps
Pruned something you wanted to keep	Prune only touches covered/stale entries by rule — re-log anything still relevant with /qa-incident

/qa — the runs that populate this knowledge base
/qa-incident — manually add an incident to the knowledge base
QA Guide — how the self-improving QA loop works end to end

/qa

Run QA checks.

Synopsis

/qa [commit|full|staging|prod|e2e-gen|visual|ios] [--commit-strict] [--workflow|--classic] [--invoked-by <caller>]

Flags:

--commit-strict — commit mode only: trust your judgement and skip the automatic qa-domain escalation when the completion-verifier flags a domain-logic gap
--workflow / --classic — full mode only: force how the surface fan-out is spawned (dynamic Workflow vs the classic parallel agents). Classic is the default; the workflow path falls back to classic on any error
--invoked-by <implement|fix|pr> — set automatically when another skill runs /qa for you; suppresses /qa's own "Run /review-pr now?" prompt since the calling skill asks its own next-step question. Standalone runs are unchanged

When to use it

After finishing a change, before /pr — the default commit mode checks exactly what you changed
Before merging a large branch — full mode runs every surface (web, iOS, domain, adversarial) in parallel
After a deploy — staging / prod modes hit the deployed environment's health and smoke checks
Not for: a fast pre-push gate — that's /precheck; or fixing a CI run that already failed — that's /ci-fix

For mode-selection strategy and the four-layer test philosophy, see the QA Guide. This page covers invocation and what each run does.

Quickstart

/qa

What you'll see: a mode line confirming commit mode, then 2-4 QA agents running targeted checks on your changed files (5-8 generated scenarios, ~1-3 min), ending with a structured QA report and a suggested next step (/review-pr then /pr).

Examples

/qa              # commit mode — targeted checks on the last commit's diff (~1-3 min)
/qa full         # comprehensive multi-surface QA, all agents in parallel (~10-20 min)
/qa staging      # health + smoke + E2E against deployed staging
/qa prod         # READ-ONLY health + smoke against production — no mutations
/qa e2e-gen      # generate + run exploratory Playwright tests for changed components (~3-8 min)
/qa visual       # screenshot routes at desktop/tablet/mobile and evaluate (~5-15 min)
/qa ios          # iOS Simulator visual QA — macOS with Xcode + built .app only (~5-10 min)
/qa --commit-strict   # commit mode without the automatic qa-domain escalation pass
/qa full --workflow   # run the full-mode surface fan-out as a dynamic Workflow (classic is the default)

If you run /qa with no argument and nothing has changed since the last commit, it shows a one-line picker instead of running — user-confirmation checkpoint (Enter for a full audit, ? for all modes, c to cancel).

What it does

Commit and full modes follow the same skeleton:

Reads project config — CLAUDE.md test commands, QA knowledge base (past bug patterns, incidents), and .claude/qa-config.json if present.
Analyzes the diff — HEAD~1 in commit mode, main...HEAD in full mode. Commit mode also runs an advisory /revert-check and annotates the report if anything looks like an accidental revert.
Maps changes to existing tests — finds which changed files have tests, which are stale, and which have none (flagged as coverage gaps).
Spawns QA agents in parallel — selected by where your changes live (see table below). Full mode runs the same core lint/test/schema/architectural-review agents (backend-qa, frontend-qa, contract-qa, code-review, pbt-qa) on the full-codebase diff, plus a dedicated surface roster: visual, E2E, iOS (when detected), domain, and adversarial-challenger agents.
Generates fresh test scenarios — thinks like a real user about what your change could break, plus property-based tests for pure functions (100 iterations in commit mode, 1000 in full).
Audits coverage — flags new code without tests, stale tests, dead tests.
Produces the QA report — pass/fail per check, failures, warnings, coverage, and next steps. Full mode writes the report to a file and exits non-zero on failure.
Verifies completeness — a completion-verifier agent cross-checks the run (skipped for staging/prod and tiny diffs). If it flags a domain-logic gap in commit mode, a qa-domain agent is auto-spawned for one extra validation pass (suppress with --commit-strict).
Offers test promotion — when a generated scenario caught a real bug, it asks whether to promote the scenario into a permanent regression test via qa-test-promoter (user-confirmation checkpoint; capped at 3 per run, never auto-promoted in CI/autopilot).
Prints a post-run checklist, then suggests the next step — a one-line checklist (verifier verdict + knowledge-base writes + promotions) appears before the verdict so nothing was silently skipped; then on pass in guided mode it asks whether to run /review-pr now or skip to /pr (user-confirmation checkpoint — suppressed when /pr, /implement, or /fix invoked QA with --invoked-by, since the caller asks its own next-step question); on fail it asks whether to fix and re-run.

Staging/prod modes skip the diff analysis and instead resolve deployed URLs from CLAUDE.md, hit health endpoints, and run smoke (and E2E for staging) checks. Prod is strictly read-only.

Agents spawned

Selection depends on mode and which files changed. Commit-mode/core agents run in both commit and full mode; the surface agents are full-mode only (except when invoked directly via /qa visual, /qa e2e-gen, or /qa ios).

Commit-mode / core agents (run in commit mode, and again in full mode over the full-codebase diff):

Agent	Model tier	Role
backend-qa / frontend-qa	haiku	Lint, type check, tests, build on changed layers
contract-qa	haiku	Schema diff / migration validation
domain-qa, code-review, pbt-qa	sonnet	Business-logic checks, architectural review, property-based tests
qa-test-promoter	haiku	Converts an approved bug-catching scenario into a permanent regression test (only when you say yes)
completion-verifier	sonnet	Final completeness cross-check

Full-mode surface agents (the F3 roster — additive on top of the core agents above):

Agent	Model tier	Role
qa-visual	sonnet	Visual regression across desktop/tablet/mobile viewports
qa-e2e	sonnet	Exploratory end-to-end flow testing (Playwright)
qa-ios	sonnet	iOS Simulator visual QA (macOS + Xcode + built `.app` only)
qa-domain	sonnet	Domain/business-logic validation across the full surface
qa-challenger	sonnet	Adversarial attacks against the changes

Note: e2e-gen (/qa e2e-gen) is a separate opt-in mode that generates new exploratory Playwright tests — it is not part of full mode. qa-e2e is the full-mode surface agent that runs E2E tests. The similar names refer to different things.

Model tiers shown above are defaults from model-policy.yml and may differ if your project customizes model assignments via a model-policy.override.yml.

In full mode the surface agents (qa-visual, qa-e2e, qa-ios, qa-domain, qa-challenger) can spawn either classically or as a dynamic Workflow (--workflow); results, verdict math, and the report are identical in both paths.

Output & artifacts

Structured QA report in the conversation (all modes)
Full mode: qa-report-{timestamp}.md in the project root, plus per-agent JSON results in qa-results/
Knowledge-base updates in .claude/qa-knowledge/ (bug-patterns.md, coverage-gaps.md, flaky-tests.md) when a run discovers something new
Generated scenarios are one-off explorations — they are not persisted as test files, unless a scenario catches a real bug and you approve promotion: then qa-test-promoter writes a permanent regression test plus a provenance file in .claude/qa-knowledge/promoted-tests/

Troubleshooting

Problem	Fix
`diff is empty` picker appears	Nothing changed since the last commit — pick full audit, another mode, or cancel
`/qa: empty diff — specify mode` (CI/autopilot)	Non-interactive runs need an explicit mode argument
`Dev server not running` (visual / e2e-gen)	Start the dev server with the command from `CLAUDE.md`, then re-run
`Playwright not configured` (e2e-gen)	The project has no `playwright.config.*` — e2e-gen can't generate tests without it
`computer-use MCP not connected` (visual)	Connect the computer-use MCP and retry; full mode marks the surface SKIPPED instead
iOS surface reports SKIPPED	iOS QA requires macOS with Xcode and a built `.app` bundle

QA Guide — mode strategy, four-layer test philosophy, QA agent details
/precheck — the fast local gate before pushing
/pr — runs /qa commit mode as part of PR creation
/fix — when QA finds a real bug worth a formal pipeline
/qa-incident / /qa-learn — feed and review the QA knowledge base

/restart

Discover, restart, and validate local dev servers. Auto-detects Docker vs native, checks health, catches crash loops.

Synopsis

/restart [service] [--preflight]

Both arguments are optional — with none, it restarts all discovered services.

When to use it

Your local dev servers are wedged — port in use, server not starting, stale processes
After pulling changes that touch services, to bring everything back up in the right order
Validating your local environment is ready without touching anything (--preflight)
Not for: deployed environments (staging/production) — that's /sre; deploying changes — that's /deploy

Quickstart

/restart

What you'll see: the services discovered from CLAUDE.md (or auto-discovered from your repo), a pre-flight validation pass, restarts in dependency order, then a final status table where every service reads STABLE after an 8-second crash-loop check.

Examples

/restart                 # discover → validate → restart all services
/restart backend         # restart only the backend (plus any down dependencies)
/restart --preflight     # validate only — Docker, deps, env files, ports; restart nothing
/restart backend --preflight  # validate only the backend (and its dependencies) without restarting anything

Arguments & flags

Flag	Values	Default	What it does
`[service]`	a service name	all services	Restarts only the named service (and its dependencies if they're down)
`--preflight`	—	off	Discovery and validation only — doesn't stop or start anything

What it does

Discovers services — reads the Local Dev Services table in CLAUDE.md; if missing, scans the repo (docker-compose, package.json, Makefile, pyproject.toml, .env, monorepo configs) to find each service's type (Docker vs native), port, start command, health check, and dependencies. When discovery is ambiguous, it presents what it found and asks for confirmation — a user-confirmation checkpoint. Confirmed discoveries are written back to CLAUDE.md so future runs skip this step.
Pre-flight validation — checks:

The Docker daemon is running
Dependencies are installed (node_modules / venv)
.env files exist
Ports are free

If a port is held by an unexpected process, it warns you before killing it. With --preflight, it stops here and reports.

Restarts in dependency order — stops services in reverse order, verifies ports are freed, then starts infrastructure → backends → frontends, waiting for each layer's health check before starting the next.
Validates health twice — an initial per-service health check, then a re-check 8 seconds later to catch crash loops. If a service crashes, it reads the logs, reports the actual error with a suggested fix, and does not retry automatically.

Output & artifacts

A final status table in the conversation: each service with type, port, and STABLE / CRASHED / SKIPPED
A machine-readable trailer line per service — RESTART-RESULT: service=<name> status=<STABLE|CRASHED|SKIPPED> port=<port> — so downstream skills like /incident can parse the outcome
On failure: the last error from the service's logs plus a concrete fix suggestion
An updated Local Dev Services table in CLAUDE.md after a confirmed auto-discovery

Troubleshooting

Problem	Fix
`Docker: NOT RUNNING`	Start Docker Desktop and re-run — `/restart` won't proceed without the daemon
`node_modules: MISSING` / `venv: MISSING`	Run `npm install` (or set up the venv) in the service directory, then retry
A service shows `CRASHED` after restart	Read the error in the report — it's pulled from the service's logs (e.g. a missing module means installing deps); fix and re-run
`WARNING: port <port> still occupied`	Another process is holding the port — the report identifies it; stop it or confirm the kill
A service shows `SKIPPED`	One of its dependencies failed — fix the dependency first; dependents are never started on a broken base. Example: if `postgres` crashes and `backend` depends on it, `backend` shows `SKIPPED` — fix and re-run `postgres` first, then `/restart backend`

/sre — health and operations for deployed environments; /sre rebuild for stale-cache monorepo rebuilds
/deploy — /deploy local starts the local stack as part of a deploy flow
/incident — routes local-ops problems here automatically

/revert-check

Detect accidental reverts of recently-merged PRs in the working tree before commit. Catches stale-buffer / bad-stash-pop scenarios that would silently undo landed bug fixes.

Synopsis

/revert-check [--strict|--advisory]

When to use it

Before committing after a git stash pop, a branch switch, or an editor session that may have restored stale buffers
As a standalone audit any time your working tree feels suspicious ("am I undoing something?")
Not for: running tests — that's /precheck, which already invokes this check as its first step

Quickstart

/revert-check

What you'll see: either ✓ No suspected reverts. Working tree changes look like genuine new work. or a table of flagged files showing the net line delta and the recently-merged PRs that touched each file, with verification commands.

Examples

/revert-check              # advisory (default) — warn but never block
/revert-check --advisory   # same as default
/revert-check --strict     # exit non-zero on any suspected revert — for pre-commit hooks / CI

Arguments & flags

Flag	Values	Default	What it does
`--advisory`	—	on	Warn only; always exits 0
`--strict`	—	off	Exit 1 when any suspected revert is found — use as a blocking gate

What it does

Identifies changed files — staged + unstaged tracked files, filtering out lockfiles and generated artifacts. Nothing changed → exits immediately.
Detects the comparison baseline — the remote default branch (origin/main or equivalent), fetched fresh.
Runs the per-file heuristic — a file is a suspected revert when all three signals fire: it was touched by a commit on the default branch in the last 30 days (recency), your working tree net-deletes lines against that baseline (direction), and the deletion is non-trivial (magnitude). Specifically, magnitude is detected as >20 net deleted lines AND deletions outnumber additions by 2:1 or more.
Reports — flagged files are shown with their net delta and the recent PR numbers that touched them, plus the exact git diff / gh pr view / git restore commands to verify or discard. The human decides — the skill never modifies anything.
Exits — 0 in advisory mode (even with warnings), 1 in strict mode if anything was flagged.

The check is entirely read-only: no files written, no watchers registered, no refs touched.

Output & artifacts

Console report only — a pass line or the flagged-files table with verification steps
No files are written or modified; exit code carries the verdict for hooks/CI in --strict mode

Troubleshooting

Problem	Fix
A genuine refactor or test consolidation is flagged	Expected false positive — review the diff, then proceed in advisory mode and document the intentional deletion in your commit message (e.g., "Refactor: removing dead code #123")
`skipped: no remote baseline`	No remote or default branch detectable (offline, no origin) — the check skips gracefully with exit 0
A subtle one-line revert wasn't caught	Below the heuristic's threshold by design — `/qa`'s test execution is the second line of defense
Lockfile or generated artifact gets flagged	These should be filtered automatically — if one slips through, it's safe to ignore and proceed (advisory mode never blocks you)

/precheck — runs this check in strict mode as Step 0 before local tests
/pr — runs it in advisory mode before QA
/qa — commit mode runs it in advisory mode and annotates the QA report
/ship — runs this check in advisory mode as Step 0 before pre-flight (the no-QA path's only revert defense)

/review-pr

Differentiated PR review — bug fix vs feature. Scope compliance, regression tests, behavior contract, knowledge base lookup. Uses the toolkit's code-reviewer agent + project-specific checks.

Synopsis

/review-pr [#N|branch] [--type bug|feature] [--workflow|--classic]

When to use it

Before merging any PR — bug fixes and features get different, type-appropriate checklists
Reviewing a /fix PR — it verifies the scope lock, behavior contract, and regression test that /fix produced
Reviewing a feature PR against its plan — catches scope creep and over-engineering
Not for: creating the PR itself — that's /pr; or running the test suite — that's /qa

Quickstart

/review-pr

What you'll see: the current branch's PR is auto-detected, classified as bug fix or feature, then reviewed. The result is a report with a per-check summary table (✓/⚠/✗) and a verdict — APPROVE, REQUEST CHANGES, or COMMENT — which you decide whether to post to GitHub.

Concepts

Scope zones (set by /fix and enforced here): every bug-fix PR has a Fix Zone (files the fix is allowed to touch), a Watch Zone (files that may be affected as a side effect — touching them is allowed but must be justified in the PR), and a Frozen Zone (files that must NOT change — migrations, lockfiles, unrelated modules). /review-pr checks the diff against these zones and blocks on any Frozen Zone violation.

Examples

/review-pr                    # review the current branch's PR, auto-detect type
/review-pr #88                # review a specific PR by number
/review-pr feature/auth       # review the PR for a specific branch
/review-pr #88 --type bug     # force the bug-fix checklist (scope lock, regression test, RCA)
/review-pr #88 --type feature # force the feature checklist (plan completeness, over-engineering)
/review-pr #88 --workflow     # run the review fan-out as parallel isolated agents (needs Claude Code >= 2.1.154 and workflows enabled)
/review-pr --classic          # force the sequential path (the default; always available)
/review-pr feature/auth --type feature --workflow

Arguments & flags

Flag	Values	Default	What it does
`[#N\|branch]`	—	current branch's PR	Which PR to review
`[--type]`	`bug`, `feature`	auto-detect	Forces the review mode; otherwise inferred from labels, title prefix, and `/fix` artifacts
`[--workflow]` / `[--classic]`	—	classic	Runs the review fan-out as parallel isolated agents (workflow) or the sequential path (classic, default)

--workflow prerequisites: requires Claude Code >= 2.1.154 and workflows enabled in settings. If the version is too old, or workflows are disabled and you're in a non-interactive run, review silently falls back to classic with a one-line notice — you won't get an error, just the sequential path instead. In an interactive session with workflows disabled, you're asked whether to enable them.

What it does

Loads PR context — fetches the PR details, diff, and commits, then auto-detects the type (labels like bug/feat, title prefixes like fix:/feat:, or local /fix artifacts). If nothing matches, it asks you (bug vs feature, with each checklist previewed); only non-interactive runs default to feature, with a printed notice.
Gathers prior knowledge — queries the project knowledge graph (if installed) for conventions, patterns, and past decisions touching the changed files, then runs a cheap explore-light scan to map callers, dependencies, and test coverage (blast radius).
Standard review (all PRs) — spawns the code-reviewer agent with the diff, project conventions, and prior knowledge: code quality, security, test coverage, architecture, and consistency with past decisions.
Type-specific review —

Bug fix: scope compliance against the /fix scope lock (Frozen Zone violations block), root-cause documentation in the PR body, regression test presence, a mutation check (test must fail without the fix and pass with it — run in an isolated worktree using your project's test command, never your working tree; skipped with a reason if that isn't possible), similar-past-bugs lookup, breaking-change and rollback assessment.
Feature: plan completeness vs .claude/plans/<feature>.md (skipped with a note if no plan is referenced in the PR's commits), over-engineering check, test coverage for new code, domain logic validation.

Compiles the report — per-check summary table plus a verdict. It never auto-approves — findings are always presented for you to decide.
Writes back, then posts — findings go to the project knowledge base first (systemic patterns flagged — e.g., a file with repeated bug-fix PRs). Then the user-confirmation checkpoint: Approve / Request changes / Comment / Don't post — only with your approval does anything reach GitHub via gh pr review. On your own PR, Approve and Request-changes are removed (GitHub forbids self-approval) and the verdict is delivered locally. When you're reviewing the currently checked-out branch, it can also post line-anchored inline comments via the built-in /code-review --comment.
Completion verification — a verifier agent double-checks the review covered what it claimed; gaps are reported, not blocking.

Agents spawned

Agent	Model tier	Role
explore-light	haiku	Maps callers, dependencies, and tests for the changed files
code-reviewer	inherit (caller's tier)	Standard review — quality, security, coverage, architecture
qa-domain	sonnet	Feature PRs — validates business-rule consistency
completion-verifier	sonnet	Verifies the review itself was complete

Output & artifacts

The review report in the conversation (summary table + verdict)
A GitHub PR review (approve / request-changes / comment) — only after you confirm
Review history appended to .claude/knowledge/skills/review-pr.md; new conventions or recurring bug patterns written to the knowledge graph

Troubleshooting

Problem	Fix
Wrong type detected (feature reviewed as bug or vice versa)	Re-run with `--type bug` or `--type feature` to force the mode
`✗ CRITICAL: No regression test found` on a bug fix	Bug fixes require a regression test — add one; the mutation check also requires it to fail without the fix
`FROZEN ZONE VIOLATION` on a bug-fix PR	The PR touches files in the Frozen Zone (see Concepts above) — revert those changes. Files in the Watch Zone are allowed but need a justification note in the PR body
Approve option missing when posting	It's your own PR — GitHub rejects self-approval (HTTP 422); the verdict is shown locally and you can post a comment instead
`⚠ Mutation check skipped`	The mutation check needs an isolated copy of your repo (a git worktree, or the `EnterWorktree` tool if available) plus a known test command — it never touches your actual working tree. It's skipped if either is missing: add the test command to CLAUDE.md's Test Commands table, or run the test yourself on the PR branch to confirm it fails without the fix and passes with it
No PR found for the current branch	Create it first with /pr, or pass the PR number explicitly

Cost

Roughly 18-25 Arth units per invocation: one Haiku spawn (explore-light) plus 1-2 Sonnet spawns (code review + project-specific checks). Cost runs higher when the knowledge graph is queried or the PR is large. Note: code-reviewer runs at the caller's model tier, so under an Opus-tier caller the review portion is Opus-priced instead of Sonnet-priced.

/fix — produces the scope lock and behavior contract this review verifies
/pr — creates the PR; run /review-pr after
/qa — test-suite validation, complementary to code review
Workflow guide

/roi-calculator

Calculate ROI for AI initiatives with sensitivity analysis.

Synopsis

/roi-calculator <client-name> <--initiative name>

When to use it

Putting defensible numbers behind a proposal — costs, benefits, payback, NPV, IRR
Stress-testing an initiative's economics before a client commits budget
Not for: identifying or prioritizing which initiatives to model — that's /opportunity-map; or writing the proposal itself — that's /pitch-generator

Quickstart

/roi-calculator acme-corp

What you'll see: a financial-inputs questionnaire, then full cost and benefit models, a 12-month cash flow projection, payback/NPV/IRR metrics, and a 3x3 sensitivity matrix — saved to the client's assessment directory with a summary of the key results.

Examples

/roi-calculator acme-corp                                    # model the top 3 initiatives by weighted score
/roi-calculator acme-corp --initiative "Intelligent Document Processing"
                                                             # model a single named initiative

Arguments & flags

Flag	Values	Default	What it does
`--initiative`	initiative name	top 3 by weighted score	Focus the model on one initiative from the opportunity matrix

What it does

Gathers financial inputs — user-confirmation checkpoint: asks 8 questions via AskUserQuestion (current process costs, FTEs and fully-loaded cost, revenue, margins, error rates, discount rate, infrastructure constraints). Unanswered questions fall back to stated defaults — e.g. $85,000/year FTE cost, 10% discount rate.
Builds the cost model — implementation (one-time), infrastructure (monthly), training (one-time + ongoing), and maintenance (annual), each as low-high ranges, rolled into a 3-year Total Cost of Ownership.
Builds the benefit model — time savings (with conservative and optimistic scenarios), error reduction, revenue impact valued at operating margin, and risk reduction.
12-month cash flow projection — month-by-month costs vs benefits with a stated ramp (0% in months 1-2 rising to 100% by month 10), showing net and cumulative cash flow.
Key financial metrics — payback period, 3-year NPV at the chosen discount rate, estimated IRR (interpolated, since no iterative solver runs), and 3-year ROI, in a summary table.
Sensitivity analysis — a 3x3 matrix of cost scenarios (best/expected/worst) against benefit scenarios (optimistic/base/pessimistic), plus a breakeven analysis and a top-5 assumptions-and-risks table.

All projections are labeled as estimates, use conservative defaults, and show every calculation transparently. If IRR is negative or payback exceeds 24 months in the expected case, the initiative is flagged as high-risk.

Output & artifacts

Written to consulting-toolkit/clients/<client-name>/:

assessment/roi-model.md — inputs, cost model, benefit model, 12-month cash flow, payback/NPV/IRR/ROI with all formulas shown
assessment/sensitivity-analysis.md — scenario definitions, 3x3 matrix, breakeven analysis, assumptions table, and conservative/balanced/aggressive recommendations
A key-results summary in the conversation, with a pointer to /pitch-generator <client-name> --format proposal to fold the numbers into a proposal

Troubleshooting

Problem	Fix
`Required data not found for <client-name>`	Run `/client-discovery` and `/opportunity-map` first — the model needs the profile, intake, opportunity matrix, and roadmap
Client can't provide a financial input	Skip it — the skill substitutes documented industry defaults and clearly marks the assumption
Initiative flagged as high-risk	Expected when IRR is negative or payback exceeds 24 months — revisit scope or approach before proposing
Numbers look too precise	Key metrics should be presented as ranges, not points — ask for the range presentation if a point estimate slipped through

/consulting — the engagement orchestrator; runs this in the propose phase
/opportunity-map — produces the scored initiatives this skill models
/pitch-generator — uses these numbers in the proposal's Investment & ROI section
/solution-architect — complements this with a technical cost model

/router-setup

One-command Arth Router install — discovers your Claude plan, generates a real pool config, starts the service, and wires the triage hook.

Synopsis

/router-setup

When to use it

First-time setup: you want substantive prompts routed by the Arth Router (model/harness/platform decisions that are plan/credit-aware and budget-governed)
Repairing a partial install (service down, env var missing, no pool config)
Not for: day-to-day operation — that's /router

Quickstart

/router-setup

What you'll see: a state check (service running? config present? env wired?), plan discovery from your local Claude account (Max 20x / Max 5x / Pro → a capacity estimate for your 5-hour window), pool config generation, service start (Docker or a local venv — you confirm first), env wiring into ~/.claude/settings.json (you confirm first), and an end-to-end proof that a test prompt gets an ARTH ROUTER DECISION block.

Examples

/router-setup      # the only form — the skill takes no arguments

Typical runs:

First time → plan discovered → pools generated → service started → hook wired → proof shown
Service already running → reports status and skips to whatever is missing
No repo access → tells you to request access to ArthTech-AI/arth-router and stops

What it does

Detects current state — health-checks the service, checks ~/.arthai/arth-router/policy.yaml and the ARTH_ROUTER_URL env var; runs only the missing steps.
Installs the service — Docker if the daemon is running, otherwise a local venv install from the arth-router repo.
Generates YOUR pool config — reads your plan tier from ~/.claude.json and writes a subscription pool with a per-tier capacity estimate (flagged as an estimate — calibrate later with /router calibrate).
Starts the service and (optionally) the outcomes bridge — a user-confirmation checkpoint.
Wires the env — merges ARTH_ROUTER_URL / ARTH_ORG into ~/.claude/settings.json (read-merge-write) — a user-confirmation checkpoint. New sessions pick it up automatically.
Proves it end-to-end — runs the triage hook against a test prompt and shows the injected decision block.

Requirements

Python 3.11+ (or Docker), curl, and access to ArthTech-AI/arth-router
Works with the router bundle's triage-router hook (ships the fast path)

Output

Step-by-step progress ending in a proof block: the triage hook run against a test prompt showing the injected ARTH ROUTER DECISION (model/harness/platform, task type, pace, decision_id). Also prints your discovered plan tier and the capacity estimate written to the pool config.

Troubleshooting

Symptom	Fix
No repo access	Request access to `ArthTech-AI/arth-router`, re-run.
Port 8600 in use	Another router instance is running — `/router status` first; kill it or change `ARTH_ROUTER_PORT`.
Decision block missing in sessions	Env vars load at session START — open a new session. Verify with the proof step.
Plugin update removed the fast path	The `router` bundle ships the hook; re-install/update the bundle (or re-run this skill's proof step to confirm).
Plan tier shows "unknown"	Capacity left unset — the router treats unknown capacity as on-pace and never guesses; set `capacity_usd` manually and calibrate.

/router — day-to-day operation (summary, pools, calibrate, outcomes)
/otel-setup — telemetry for the Arth Intelligence dashboard (pairs well: outcomes + scores)

/router

Operate the Arth Router — the live routing decision service: status, usage summary, pool calibration, override reporting.

Synopsis

/router <status|summary|pools|calibrate|outcome DECISION_ID>

When to use it

Checking whether the router is up and what it's routing (status, pools)
The weekly validation check-in: override rate and cost-estimate bias (summary)
Calibrating your subscription window capacity after hitting a rate limit (calibrate)
Reporting the actual cost/tokens of a routed task, or recording that you overrode a decision (outcome)
Not for: installing the router — that's /router-setup

Quickstart

/router summary

What you'll see: decisions/denials/outcomes counts, total spend, each capacity pool's state (headroom / on_pace / tight / exhausted) with its rationale, and the two kill-test numbers called out — override rate (> 0.5 means the routing rules are wrong) and estimate bias (> 10% means the cost model needs fixing).

Examples

/router status                    # health: version, catalog sizes, pools, auth/audit
/router summary                   # the weekly validation digest
/router pools                     # capacity pools with burn state + rationale
/router calibrate                 # guided window-capacity calibration
/router outcome rd_ab12cd34ef     # report actuals for a routed decision

What it does

Reads ARTH_ROUTER_URL from the environment (set by /router-setup).
Calls the corresponding router API (/health, /usage/summary, /policy/pools, /outcome) and presents the JSON as a readable digest.
For calibrate, walks the capacity-estimate correction: rate-limited while the pool showed well under 100% → lower capacity_usd; never limited at 100% → raise it.
If the router is unreachable, says so — the triage hook is falling back to the static routing table, which is the designed behavior, not an outage.

Requirements

A running Arth Router (/router-setup) and ARTH_ROUTER_URL in the environment.

Output

A readable digest of the router's JSON. For summary: decision/denial/outcome counts, total spend, per-pool state with rationale, and the kill-test verdict lines (override rate vs 0.5, estimate bias vs the 10% SLO). For status: a one-line health summary. For outcome: confirmation with any metered_equivalent_burn_usd the router applied.

Troubleshooting

Symptom	Fix
"router unreachable"	Service down — the triage hook is falling back to static routing by design. Restart via `/router-setup` or the command in `~/.arthai/arth-router/router.log`'s directory.
`ARTH_ROUTER_URL` unset	Run `/router-setup` (writes it to `~/.claude/settings.json`); env loads in NEW sessions.
Everything routes to cheap models	Pool tight/exhausted or budget ≥ limit — check `/router pools`; that's the policy working. Raise limits in `~/.arthai/arth-router/policy.yaml` if wrong.
401/403 from the router	Auth is enabled on a shared instance — set `ARTH_ROUTER_TOKEN` or use an API key.

/router-setup — install and wire the router
/routing — the static triage rubric (the fallback path)
Arth Router repo: ArthTech-AI/arth-router (usage guide, scalability, enforcement docs)

/routing

Re-show the triage routing table and SPEED scoring rubric on demand — decide toolkit vs built-in for the task at hand.

Synopsis

/routing <task description>

When to use it

You want the full routing rubric back mid-session — the triage-router hook shows it once, then only compact reminders
You're unsure whether a task warrants a heavyweight workflow or a quick built-in edit, and want it scored
You're learning the toolkit and want to understand why a request routes where it does
Not for: listing what's installed — that's /skills, which /routing calls to enumerate live targets

Quickstart

/routing

What you'll see: the SPEED scoring rubric (the five axes used to decide toolkit-vs-built-in) followed by the live catalog of installed routing targets.

Examples

/routing                       # show the SPEED rubric + installed routing targets
/routing fix the auth bug      # score this task and recommend a route
/routing rename a variable     # low SPEED score (0-1) → recommends built-in Edit, not a workflow

Arguments & flags

Argument	Values	Default	What it does
task description	free text	none	When present, the task is scored 0–5 on SPEED and a concrete route is recommended

What it does

Prints the SPEED scoring rubric — Scope, Project-context, Expertise, Effort, Dollars (1 point each).
Runs /skills to enumerate the routing targets actually installed, so it never recommends a command you don't have.
If you passed a task description, scores it, shows the breakdown, and ends with a one-line recommendation: → route: <skill/agent/tool> (SPEED=<n>).

SPEED scoring rubric

Score the task on five axes, 1 point each:

Axis	Question	+1 when
S — Scope	How many files/steps?	3+ files or steps
P — Project	Needs CLAUDE.md / project context?	Yes
E — Expertise	Domain-specific (QA/SRE/design/architecture)?	Yes
E — Effort	Multi-turn workflow (PR/deploy/planning)?	Yes
D — Dollars	Would a cheaper model (Haiku/Sonnet) suffice?	Yes

Decision rules

0–1 → Claude Code built-in tools (Read / Grep / Glob / Edit / Bash).
2+ → a toolkit skill (multi-step workflow) or agent (domain expertise + project context).
3+ and multi-layer → /planning → /implement (team orchestration).
3+ and risky/sensitive → architect (Opus) brief → executor agent (Sonnet).

Output & artifacts

The rubric, catalog, and (if a task was given) a routing recommendation, printed into the conversation. Nothing is written to disk; no agents are spawned.

Troubleshooting

Problem	Fix
Routing targets look wrong/missing	The catalog comes from `/skills`; if a command is absent, install the bundle that provides it
Recommendation feels too heavy for a tiny change	See Decision rules — SPEED 0–1 routes to built-in tools; re-score, since a 3+ score means the task genuinely spans files/context/expertise
Table doesn't appear automatically anymore	That's expected — the hook injects it once per session to save tokens; `/routing` is the manual re-show

/skills — the installed catalog this rubric routes to
/onboard — project briefing and prioritized next steps
Workflow comparison — deeper decision tree for the autonomous/team workflows

/scan

Scan codebase and auto-populate CLAUDE.md with project context (services, ports, test commands, infrastructure).

Synopsis

/scan [section]

When to use it

After installing the toolkit, to fill CLAUDE.md's  placeholders with real project data
After codebase changes (new service, new test framework, new deploy target), to refresh a section
Not for: deep pattern learning, MCP/agent recommendations, or knowledge-base seeding — that's /calibrate. Rule of thumb: /scan fills in factual tables (ports, commands, configs) by reading files directly; /calibrate goes further, using an agent to learn coding conventions and recommend tools. New to a project? Run /onboard first — it calls /scan for you automatically.

Quickstart

/scan

What you'll see: a full scan of all five sections — Local Dev Services, Test Commands, Infrastructure, Domain, and Environments — written into CLAUDE.md as structured tables. Sections with real content (no  placeholders) are left untouched; sections that still contain any  markers get replaced with scanned data.

Examples

/scan                # full scan — all 5 sections
/scan services       # only Local Dev Services (ports, start commands)
/scan tests          # only Test Commands (test/lint/type-check per directory)
/scan infra          # only Infrastructure (deploy platforms, health endpoints)
/scan domain         # only Domain (core entities, business rules)
/scan environments   # only Environments (local/staging/production table)

What it does

Detects local dev services — reads package.json scripts, Python framework files, docker-compose.yml, and Procfile to build the services table (service, port, directory, start command).
Detects test commands — finds pytest/ruff/mypy configs and package.json test/lint/type-check scripts, and writes the Test Commands table.
Detects infrastructure — looks for platform configs: railway.json or railway.toml, vercel.json or .vercel/project.json, fly.toml, .aws directory or samconfig.toml or serverless.yml, Dockerfile or k8s/, and GitHub Actions workflows — plus health endpoints.
Detects the domain — scans model files (SQLAlchemy, Django, Prisma, TypeORM) and enum/status definitions to summarize core entities and business rules.
Detects environments — derives environments from .env.* files, CI deploy targets, and platform configs; always includes a local development row. Each row gets name, type, URL, health endpoint, deploy trigger, and branch (with  where undiscoverable).
Updates CLAUDE.md safely — replaces only sections that are missing or still contain any  placeholders (even partially-filled ones). Sections with real, user-written content and no remaining  markers are never overwritten. If no CLAUDE.md exists, one is created from the toolkit template first.

There are no confirmation prompts — the scan is deterministic file detection with no agent spawns, and it only fills empty/placeholder sections.

Output & artifacts

CLAUDE.md — created or updated in the project root with the scanned section tables
No other files touched; results are immediately readable by every agent and skill

Troubleshooting

Problem / Note	Fix
A section didn't update	It already has real content with no remaining `<!-- TODO -->` markers — scan never overwrites fully customized sections; update it by hand or delete the section and re-run
Only see the local row in Environments?	This is expected when no platform configs were detected — check for `railway.json`/`railway.toml`, `vercel.json`/`.vercel/project.json`, `fly.toml`, `.aws`/`samconfig.toml`/`serverless.yml`, or run /calibrate for full discovery via platform MCP queries
Wrong port or start command detected	Edit the table directly in CLAUDE.md — your value then sticks (scan won't touch it again, as long as no `<!-- TODO -->` markers remain in that row)
`<!-- TODO -->` markers remain after scanning	Those fields couldn't be detected from files (e.g. staging URLs) — fill them in manually

/calibrate — the deep version: patterns, recommendations, knowledge base (runs /scan first when CLAUDE.md is empty)
/onboard — runs /scan for greenfield projects
/setup — project bootstrap; its claude-agents module uses /scan to populate CLAUDE.md

/schedule-routine

Walk through setting up a Claude Code cloud-hosted scheduled routine — cron, GitHub webhook, or API trigger.

Synopsis

/schedule-routine [--list] [--delete <name>]

When to use it

Automating recurring work on Anthropic-hosted infrastructure — daily PR digests, weekly issue triage — with no server, no polling, no local process
Reacting to GitHub events (PR opened/merged/closed, issue opened, release published) with a Claude Code routine
Exposing a routine as an HTTP endpoint you can trigger from any system
Not for: local event-driven watchers in your own session — see Monitors

Quickstart

/schedule-routine

What you'll see: a 6-step guided setup — name, repository, prompt, trigger, connectors, validate & deploy. Progress is saved to .claude/.routine-draft.json, so you can stop and resume across sessions.

Examples

/schedule-routine                       # start the guided 6-step setup
/schedule-routine --list                # list existing routines (not scoped to the current repo)
/schedule-routine --delete pr-digest    # delete a named routine

Arguments & flags

Flag	Values	Default	What it does
`--list`	—	off	List existing routines (name, trigger, last run, status)
`--delete`	`<name>`	—	Delete a named routine

What it does

If a draft from a previous session exists, you're first asked whether to resume it or start fresh (user-confirmation checkpoint). Then the guided setup runs:

Name — asks for a short kebab-case name (user-confirmation checkpoint); validates the format.
Repository — detects your repos via the gh CLI and asks you to pick one (user-confirmation checkpoint); falls back to manual org/repo entry.
Prompt — shows the rules for a good routine prompt (self-contained, clear success criteria, explicit "nothing to do" case), then asks you to write it (user-confirmation checkpoint). Vague or context-dependent prompts ("as discussed", too short) are flagged for rewrite.
Trigger — asks you to choose cron schedule, GitHub event, or API call (user-confirmation checkpoint), then collects the schedule/event details. Plain-English schedules are converted to cron and confirmed back to you.
Connectors (optional) — asks which external services the routine needs: GitHub, Linear, Slack, Google Drive (user-confirmation checkpoint), with authorization notes for each.
Validate & deploy — shows the full configuration summary and a validation checklist. If warnings exist, you choose between fixing the prompt or deploying anyway (user-confirmation checkpoint). Requires the Claude Code CLI to be installed — if it's missing, the skill shows install instructions and stops (your draft is preserved). If the CLI is installed but its routines create syntax differs from what the skill expects, it falls back to showing manual web-UI deployment steps instead.

After deployment you get a post-deployment checklist: test the routine, verify connector authorizations, confirm it appears in --list, and set up a /monitor ci watcher if you want to be notified of routine failures.

Output & artifacts

A deployed cloud-hosted routine running on Anthropic infrastructure
.claude/.routine-draft.json — in-progress setup state (deleted after successful deployment)
For API-triggered routines: an HTTP endpoint and key you can invoke from any system

Troubleshooting

Problem	Fix
`Claude Code CLI not found`	Install the Claude Code CLI and authenticate, then re-run — your draft is saved, nothing is lost
500 error on `--list` / `--delete`	Known API bug (claude-code issue #43440) — routines still execute correctly; only management operations are affected
`gh` CLI not authenticated in Step 2	Type the repo manually as `org/repo`
Prompt flagged as vague or context-dependent	Routines run with no conversation memory — rewrite the prompt to be fully self-contained
Invalid cron expression	The skill shows the format and common examples, then asks again

Monitors — local event-driven watchers; pair /monitor ci with a routine to catch failures
/onboard — session briefing that surfaces what's worth automating

/setup

Bootstrap any project from empty repo to deployed app. Modular — run full or target a module.

Synopsis

/setup [module]

When to use it

Starting a new project and want the full path: repo → deps → Docker → database → credentials → CI → deploy
A specific piece is broken or missing — run just that module (/setup credentials, /setup infra, …)
Checking how far along a project's setup is — /setup status
Not for: teaching the toolkit about an existing codebase — that's /calibrate; or restarting dev servers — that's /restart

Quickstart

/setup

What you'll see: the project's stack is detected (from CLAUDE.md and package/config files), then modules run in order — skipping anything already complete — with a status check after each. Every module is idempotent: re-run safely to fix or verify. "Already complete" is decided per module by a concrete check — e.g. .git/ existing (repo), a running Docker container (infra), a populated migrations table (database), gh CLI + auth present (secrets sync) — not a guess; see /setup status for the same checks surfaced as a table.

Examples

/setup                  # full interactive setup — detect what's needed, skip what's done
/setup status           # readiness table across all 16 modules
/setup prerequisites    # system requirements + account creation
/setup repo             # git init + .gitignore + project scaffolding
/setup deps             # install dependencies (venv, npm, cargo, ...)
/setup infra            # start Docker services (DB, cache, queue)
/setup database         # run migrations + seed data
/setup credentials      # API keys + env files, with per-key validation
/setup profiles         # bot/service identity + social presence
/setup verify-local     # start the app + smoke tests
/setup claude-agents    # install the toolkit + populate CLAUDE.md
/setup arth             # register the project with the Arth dashboard
/setup ci               # generate a GitHub Actions workflow for the stack
/setup discord          # Discord server + bot + webhooks + ChatOps
/setup secrets-sync     # push secrets to GitHub + cloud provider
/setup deploy           # cloud deployment (Railway, Vercel, AWS, ...)
/setup verify-prod      # production smoke tests
/setup monitoring       # Sentry / error tracking setup

What it does

Discovers project context — reads CLAUDE.md and scans package.json, requirements.txt/pyproject.toml, go.mod, Cargo.toml, docker-compose.yml, env files, migration dirs, and deploy configs to adapt every module to your stack.
Prerequisites — checks required software (Python/Node/Go/Rust/Docker/git/gh/CLI tools) and detects required accounts from .env.example (Anthropic, OpenAI, Stripe, Stytch, …) with a status table.
Repo, deps, infra, database — git init + .gitignore + scaffolding per stack; language-specific installs (incl. ML model pre-downloads); Docker services with per-service health checks; migration tool detection (Alembic/Prisma/Django/Knex/Drizzle) + seed data.
Credentials (user-confirmation checkpoint) — creates the env file from the template, auto-generates secret keys, then guides you through each missing credential with get/cost/validate steps. You paste values yourself — it never stores or echoes credentials.
Local verify, toolkit, CI — starts the app, hits health endpoints, runs tests; installs the claude-agents toolkit and populates CLAUDE.md via /scan; generates a stack-matched GitHub Actions workflow with fake test env vars.
Discord, secrets sync, deploy — optional Discord server/bot/webhooks for CI + ChatOps; pushes real secrets to GitHub and the cloud provider — always confirms with you before pushing to remote platforms (a user-confirmation checkpoint); deploys to the detected platform (Railway/Vercel/Fly/AWS/GCP) with post-deploy migrations and webhook registration.
Prod verify + monitoring — production health checks, smoke tests, log review; optional Sentry setup.
Completion verification — spawns an independent verifier that reports PASS / GAPS FOUND / INCONCLUSIVE. Gaps don't block; you decide whether to rerun.

Agents spawned

Agent	Model tier	Role
completion-verifier	sonnet	Independent check that setup actually completed

Output & artifacts

Project files: .gitignore, scaffolding, env file, docker-compose services running, migrations applied, .github/workflows/ci.yml, deploy configuration
CLAUDE.md populated and the claude-agents toolkit installed (module 8)
Secrets in GitHub Secrets + cloud provider env vars (after your confirmation)
/setup status — a 16-row readiness table (✅/❌/⚠️ per module)
A context tip on completion: everything is persisted to files — start a fresh session afterwards (and run /calibrate there)

Troubleshooting

Problem	Fix
`ValidationError` on app startup	Missing env vars — run `/setup credentials`
`ConnectionRefusedError` on :5432 / :6379	Postgres/Redis not running — `docker compose up -d`
`alembic upgrade` fails: tables already exist	`alembic stamp head`, then re-run the migration
`npm ERR! ERESOLVE`	Dependency conflict — `npm install --legacy-peer-deps`
CI fails on service health	Container not ready — add health-check options to the workflow
`install.sh` permission denied	`chmod +x ~/.claude-agents/install.sh`

/calibrate — deep-learn the project you just bootstrapped (run it next, in a fresh session)
/scan — the CLAUDE.md population step used by module 8
/deploy — day-to-day deploys after first-time setup
/restart — restart and validate local dev servers

Format plans and strategies for sharing. Adds attribution, cleans sensitive info, outputs shareable format.

/share [plan-name] [--format md|slack|twitter|html] [--no-attribution]

All arguments are optional — plan-name defaults to the most recently generated artifact this session, and --format defaults to md.

You generated a plan, PRD, or strategy with the toolkit and want to paste it into Slack, a doc, an email, or a tweet thread
You need a sanitized version of a deliverable — file paths, secrets, and internal URLs stripped — before it leaves your terminal
Not for: generating the deliverable itself — that's /templates or /planning

/share user-notifications

What you'll see: the plan from .claude/plans/user-notifications.md, cleaned of internal metadata and sensitive details, printed as a polished markdown document with a one-line attribution footer — ready to copy.

/share user-notifications                   # most recent form — markdown output
/share gtm-strategy --format slack          # Slack-compatible formatting, concise bullets
/share oauth-support --format twitter       # tweet-thread blocks (≤280 chars each)
/share user-notifications --format html     # styled HTML page written to .claude/shared/
/share user-notifications --no-attribution  # skip the attribution footer
/share                                      # no name — shares the session's latest artifact, or lists plans to pick from

Flag	Values	Default	What it does
`<plan-name>`	name in `.claude/plans/`	latest session artifact	Which plan to share; if nothing is found, you pick from a list
`--format`	`md`, `slack`, `twitter`, `html`	`md`	Output format for the target platform
`--no-attribution`	—	off	Omits the attribution footer entirely

Finds the artifact — the named plan in .claude/plans/, otherwise the most recently generated artifact this session; if neither exists, it lists available plans and asks you to pick (user-confirmation checkpoint).
Cleans it — strips YAML frontmatter and toolkit internals, converts absolute file paths to relative ones, removes API keys/tokens/secrets and internal URLs (localhost, staging), and tidies headings and code blocks.
Formats for the target — full markdown (default), Slack-style markdown with concise bullets, a tweet-thread summary with a hook, or a clean responsive HTML page.
Outputs — md/slack/twitter print to the terminal for you to copy; html is written to .claude/shared/<plan-name>.html. The skill never posts anywhere itself.
Suggests next steps — offering the other formats.

A subtle one-line attribution footer is appended unless you pass --no-attribution. The wording adapts to the content: plans with multiple agents credit the "PM + Architect team", GTM strategies credit the "GTM Expert", and design briefs credit "Design Studio".

Formatted text in the terminal (md, slack, twitter) — copy and paste it where you need it
.claude/shared/<plan-name>.html for --format html — open in a browser or share the file

Problem	Fix
No plan found with the given name	Check `.claude/plans/` — the skill lists what's available so you can pick
Output looks truncated in Slack	Slack truncates long messages; the slack format already shortens sections — share the md or html version for full detail
Twitter blocks exceed the character limit	Each block targets ≤280 chars; trim manually if your handle/links push it over
Sensitive value still visible	The scrubber targets common patterns; review before posting and report anything it missed

/templates — generate the deliverables this skill formats
/planning — produces the plans in .claude/plans/
/implementation-plan — locked-scope plans, also shareable

/ship

Use when the user wants to commit, push, and create a PR in one command. Shortcut for the full ship workflow — stages, commits, pushes, and opens a GitHub PR.

Synopsis

/ship [message]

When to use it

You've reviewed your changes and want the fastest path from working tree to open PR
Small, low-risk changes where a full QA gate is overkill
Not for: changes that should pass QA first — that's /pr; or running tests before pushing — that's /precheck
/ship only runs when you invoke it by name — saying "ship it" in prose deliberately routes to /pr, the QA'd path

Quickstart

/ship

What you'll see: a quick revert-safety check, changed files staged, the commit message shown before the commit runs, the branch rebased on latest and pushed, and a final report with the commit subject, files staged, branch, and PR URL — plus a reminder that no local tests ran, so watch CI (/monitor ci or gh pr checks --watch).

Examples

/ship                                  # auto-generate the commit message from the diff
/ship fix flaky retry in webhook poller   # use your text as the commit message

Arguments & flags

Flag	Values	Default	What it does
`<message>`	free text	auto-generated	Used as the commit message; without it, one is written from the diff in your repo's commit style

What it does

Revert check — runs /revert-check in advisory mode first. Silent when clean; if it suspects your tree is undoing recently-merged work, it shows the files and asks before proceeding.
Pre-flight — checks git status and the diff. Refuses to run on main — create a feature branch first. On toolkit projects (those with tests/run.sh, like this repo), also requires a fresh .precheck-passed marker from /precheck — missing or stale, and /ship stops with PRECHECK REQUIRED. If the tree is clean, skips straight to push + PR.
Stage and commit — stages relevant files (never .env, .env.local, credentials, coverage.xml, or large binaries) and commits with your message or an auto-generated one matched to recent commit style. The generated message is shown before the commit runs, and every commit is appended with Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>.
Rebase and push — fetches and rebases on the latest default branch (stopping loudly on conflicts — never auto-resolving), then pushes to origin, setting the upstream if needed.
Create the PR — if a PR already exists for the branch, shows its URL; otherwise opens one with a concise title and a Summary + Test plan body.
Final report — commit subject, files staged, branch, and the PR URL (re-fetched as a self-check), plus the CI advisory: no local tests ran, so CI is your only gate.

Output & artifacts

A commit on your current feature branch, pushed to origin
A GitHub PR (URL returned) — or the existing PR's URL if one is already open for the branch

What it deliberately does not do: run QA/lint/tests (use /pr for that), merge the PR, or deploy.

Troubleshooting

Problem	Fix
Refuses to run on `main`	`git checkout -b my-change` and re-run — `/ship` never pushes to main
`PRECHECK REQUIRED` error	Only on toolkit-shaped projects (repos with `tests/run.sh`, like claude-agents itself). Run /precheck first — it writes a marker `/ship` checks for that must be fresh (same commit, less than 30 minutes old)
Rebase conflicts before push	`/ship` stops and shows the instructions — resolve, then `git rebase --continue` and re-run (or `git rebase --abort`)
Revert check flags files you deleted on purpose	Confirm the changes are intentional when asked, and `/ship` continues
Pre-commit hook fails	The skill fixes the issue, re-stages, and creates a new commit (it never amends)
`gh` not found or not authenticated	Install the GitHub CLI and run `gh auth login`

/pr — the full ship workflow with a QA gate — prefer it for non-trivial changes
/precheck — run local tests first if you want confidence before shipping
Workflow guide

/skills

Browse the installed skill catalog — filter by category or keyword, see what each slash command does.

Synopsis

/skills [category|keyword]

When to use it

You forgot which slash command does the thing you want and don't want to wait for the once-per-session routing table to scroll back
You just installed a bundle and want to see what /-commands it added
You want to scope a category — "what's in Quality?" — before picking a workflow
Not for: deciding which command fits a specific task — that's /routing, which scores the task and recommends a route

Quickstart

/skills

What you'll see: every user-invocable skill installed in this project, grouped by category, each with its one-line description — generated live from the manifest, so it always matches what's actually installed.

Examples

/skills              # full catalog, grouped by category
/skills quality      # skills matching "quality" in name, description, or category (currently: the Quality category)
/skills deploy       # keyword match across name + description (deploy, deploy-ios…)
/skills consulting   # everything in the consulting bundle

Example output for /skills quality:

## Skills catalog (2 skills, filter: 'quality')

### Quality
- `/qa` — Run QA checks.
- `/qa-learn` — Review QA knowledge base stats, prune stale entries.

Arguments

Argument	Values	Default	What it does
filter	category name or keyword	none (show all)	Narrows the catalog: matches if the filter string appears anywhere in a skill's name, description, or category name — a single combined search, not separate strategies per field. Results are still grouped by each skill's real install category, so a match on a different field can pull in a skill from a category you didn't expect.

What it does

Reads portable.manifest (the source of truth) and each skill's frontmatter, so the catalog can never drift from what is installed.
Skips skills marked user-invocable: false (internal helpers other skills use).
Maps each skill to its install category so the grouping matches how it was installed.
Applies the optional filter (category or keyword) and prints the grouped catalog. No agents are spawned — it's a local read, effectively zero cost.

Output & artifacts

A grouped, formatted catalog printed into the conversation. Nothing is written to disk.
If the filter matches nothing, the list of available categories is shown instead.

Troubleshooting

Problem	Fix
`skills catalog: manifest not found`	The toolkit isn't installed where expected; run `~/.claude-agents/install.sh --status` to confirm the install path
A skill you expect is missing	It may be `user-invocable: false` (internal) or not installed in this project's bundle — check with `/skills` unfiltered, or install the bundle that contains it
Empty result for a real category	Category names are lowercase (e.g. `quality`, `operations`) and come from `install.sh`'s install categories/bundles, not from the skill's own frontmatter — try `/skills` unfiltered to see valid category names, or use a keyword instead

/routing — score a specific task and get a recommended route
/onboard — full project briefing with prioritized next steps
Skills reference — the complete generated index

/solution-architect

Design AI solutions for consulting engagements.

Synopsis

/solution-architect <client-name> [--initiative name]

When to use it

Turning approved initiatives into concrete technical designs the client's team can build
Producing technology selections, build-vs-buy calls, architecture diagrams, and an implementation plan
Not for: deciding which initiatives are worth building — that's /opportunity-map; or the business-side ROI case — that's /roi-calculator

Quickstart

/solution-architect acme-corp

What you'll see: solution designs for the client's top 3 initiatives by score — technology selection matrices, build-vs-buy analyses, Mermaid architecture diagrams, a Gantt-charted implementation plan, and a 3-year cost model — written to the client's architecture directory.

Examples

/solution-architect acme-corp                                  # architect the top 3 initiatives by score
/solution-architect acme-corp --initiative "Support Chatbot"   # architect one named initiative

Arguments & flags

Flag	Values	Default	What it does
`--initiative`	initiative name	top 3 by score	Filter the design work to a single initiative

What it does

Loads context — synthesizes prior outputs from the client's discovery/ and assessment/ directories: discovery/market-research.md, discovery/competitive-landscape.md, discovery/industry-trends.md, assessment/current-state.md, assessment/ai-readiness.md, assessment/stakeholder-map.md, and assessment/opportunity-matrix.md (the ranking source for "top 3 by score"); extracts objectives, data requirements, integration points, constraints, and budget per selected initiative.
Technology selection matrix — evaluates options per component (LLM provider, vector database, orchestration, data pipeline, monitoring, frontend) on capability, cost, hosting, data privacy level, integration effort, and lock-in risk. Data privacy levels: 1 = public data only (cloud OK), 2 = internal data (SOC 2 required), 3 = PII/sensitive data (encryption at rest/transit required), 4 = regulated data — HIPAA/GDPR/CCPA (on-prem or private cloud required). Level 4 data rules out cloud-only and SaaS options outright, so it's worth confirming a client's privacy level before expecting cloud recommendations.
Build vs buy analysis — for each major component: build custom vs SaaS vs open source, compared on cost, time to deploy, customizability, maintenance burden, data control, vendor risk, and skill required, with a recommendation tied to the client's constraints.
Architecture design — Mermaid system architecture and data pipeline diagrams (chosen from five reference templates: RAG pipeline, ML prediction service, document processing, chatbot/agent, analytics dashboard), an integration points table, and a security and privacy design covering auth, data protection, compliance, and AI-specific risks like prompt injection.
Implementation plan — a Mermaid Gantt build plan, resource requirements table, a risk register with mitigations and owners, and a testing strategy including AI-specific tests (golden-dataset benchmarks, hallucination detection, prompt robustness).
Cost model — 3-year TCO by year, break-even analysis, and a cost scaling projection from pilot to full scale.

A quality checklist runs before output: diagrams render, selections justified, timeline achievable, risk mitigations actionable, security matches the client's compliance needs.

Output & artifacts

Written to clients/<client-name>/architecture/:

architecture/solution-design.md — system architecture, integration points, security design, Mermaid diagrams
architecture/technology-selection.md — selection matrix, build-vs-buy, vendor comparisons
architecture/implementation-plan.md — Gantt chart, resource plan, risk register, testing strategy
architecture/cost-model.md — TCO, break-even, scaling projections

Troubleshooting

Problem	Fix
Designs reference data the prerequisites should provide	This skill reads discovery and assessment outputs. Run /client-discovery first (writes `discovery/`), then /opportunity-map (writes `assessment/`, including `opportunity-matrix.md`) — both must complete before `/solution-architect` has what it needs
`--initiative` doesn't match anything	Use the exact initiative name from `assessment/opportunity-matrix.md`
A Mermaid diagram doesn't render	Re-ask for the diagram — valid, renderable Mermaid is part of the skill's quality checklist
Recommendation conflicts with the client's compliance needs	Check the data privacy level assigned (1-4) — regulated data (level 4) forces on-prem or private cloud options

/consulting — the engagement orchestrator; runs this as the design phase
/opportunity-map — selects and scores the initiatives this skill designs
/roi-calculator — the business-case counterpart to this technical cost model
/deliverable-builder — packages these designs into client-ready documents

/sre

Run SRE operations.

Synopsis

/sre [command]

Commands: status, health, logs, deploy-check, ci, sleep, wake, debug, rebuild, watch. With no command, defaults to status. health accepts --monitor-mode (report only on failure — used by scheduled checks registered via watch).

When to use it

Checking whether your services and environments are up — before or after a deploy
Investigating a production or staging issue with a systematic, evidence-first approach
Day-to-day infra chores: viewing logs, sleeping/waking staging, clean monorepo rebuilds
Not for: fixing code bugs — that's /fix (and /sre debug will route you there automatically); CI auto-remediation — that's /ci-fix; "something is broken and I don't know what" — start with /incident

Quickstart

/sre

What you'll see: a structured pass/fail table covering every environment from your CLAUDE.md — deployment platform status, health endpoint results, and the latest CI runs.

Examples

/sre                          # full system status — all services, all environments
/sre health                   # quick check — just hit the health endpoints
/sre logs backend             # recent deploy logs for a service, errors summarized
/sre debug "500s on /api"     # systematic diagnosis of a production issue
/sre sleep                    # scale staging down to save costs
/sre rebuild                  # nuclear option for stale build caches

Subcommands

Command	What it does (one line)
`status`	Full system status — checks platform services, hits all health endpoints, and reviews recent CI runs for every environment
`health`	Quick health check — hits the health endpoint of every environment listed in CLAUDE.md and reports pass/fail
`logs [service]`	Fetches recent deploy logs for a service, filters for errors, and summarizes issues found
`deploy-check`	End-to-end deployment verification — recent CI runs, latest deploys, health endpoints, and migration state
`ci`	Checks CI/CD pipeline status, pulls failure logs for any failed runs, and suggests fixes
`sleep`	Sleeps the staging environment to save costs and verifies services scaled to zero
`wake`	Wakes the staging environment and waits for health checks to pass
`debug [description]`	Systematic production debugging — confirm, scope, diagnose, find root cause, then apply infra fixes directly or escalate code bugs to `/fix`
`rebuild`	Clean monorepo rebuild — stops services, nukes build caches, rebuilds packages in dependency order, restarts, and verifies
`watch [service]`	Registers a background health watcher via the Monitor tool — wakes the agent only on a status change; delegates to `/monitor deploy` when that skill is installed (falls back to a session-scoped scheduled check where Monitor is unavailable)

What it does

Discovers project context — reads CLAUDE.md, the project profile, and the knowledge base; detects your deploy platform, databases, monitoring, and CI from project files if not already calibrated, and recommends anything that's missing.
Runs the command — most commands spawn an SRE agent with platform-appropriate tools (Railway MCP, Vercel CLI, gh, etc.); rebuild and watch run inline without an agent. Discovery is scoped per command — local-only commands read just the CLAUDE.md tables.
Classifies fixes during debug — infra-only fixes (restart, config, scale) are applied directly and health-verified; if config changed, it asks you to smoke-test — a user-confirmation checkpoint. Code bugs are never patched from /sre — they're escalated to /fix with all gathered evidence.
Writes back what it learned — incidents, platform patterns, and infra decisions go to the project knowledge base so future runs diagnose faster.
After an infra fix, presents a "What's next?" menu (monitor, investigate further, project status, done) — a user-confirmation checkpoint.

Agents spawned

Agent	Model tier	Role
sre	sonnet (from `model-policy.yml`)	Executes the command against your platform — status checks, log analysis, debugging, sleep/wake (one agent per invocation; `rebuild` and `watch` spawn none — they run inline)

Output & artifacts

Structured status/health reports in the conversation (pass/fail per environment)
Incident reports at .claude/qa-knowledge/incidents/<date>-<slug>.md after debug finds a root cause
Knowledge base appends: .claude/knowledge/agents/sre.md (platform patterns, resolutions) and .claude/knowledge/shared/decisions.md (infra decisions)
For code bugs: a documented handoff into /fix with root cause, affected files, and evidence

Troubleshooting

Problem	Fix
Environments are skipped during health checks	Their URL in the CLAUDE.md `## Environments` table is still a TODO placeholder — fill it in or run `/scan`
Platform not detected / wrong tools used	Run /calibrate — `/sre` reads its project profile instead of re-discovering
`/sre debug` found the bug but didn't fix the code	By design — code fixes go through `/fix`, which has the verification pipeline. Accept the escalation it offers
Stale builds or phantom errors persist after restart	Use `/sre rebuild` — it clears turbo/next/tsc caches and rebuilds from scratch

/incident — the triage orchestrator; routes to /sre debug when the evidence points at infrastructure
/restart — local dev servers only; /sre covers deployed environments
/deploy — deploys to local/staging; /sre deploy-check verifies the result
/ci-fix — auto-remediation when /sre ci finds failures

/tech-debt

Holistic tech-debt audit — DRY violations, dead code, design smells, coupling, deprecated patterns. File-cited findings with severity tiers.

Synopsis

/tech-debt [scope] [--audit-only] [--workflow|--classic] [--invoked-by VALUE]

When to use it

Periodically on AI-heavy codebases — incremental generation accumulates duplication, dead code, and pattern drift fast
Before a refactor sprint, to get a file-cited, severity-tiered inventory of what's actually rotten
To track debt over time — each run writes a baseline and reports the delta (new vs resolved) against the previous one
Not for: performance issues — that's /perf; security — that's the code-reviewer in /review-pr; test coverage — that's /qa

Quickstart

/tech-debt

What you'll see: a quick structural scan, then three parallel auditors covering DRY/dead code, smells/coupling, and deprecated patterns — each auditor works its own slice of the codebase independently ("isolated"), so they don't wait on each other or duplicate context, which is what makes the workflow path faster on large repos. The result is a report with Critical / Important / Suggestion findings (each cited as file:line), a per-dimension summary table, and a baseline saved for future delta comparison.

Examples

/tech-debt                       # full codebase audit
/tech-debt backend/services/     # audit one module
/tech-debt auth                  # audit a feature area
/tech-debt --audit-only          # report only, no follow-up prompts (good for automation)
/tech-debt --classic             # force the legacy parallel-agent path instead of the workflow path

Arguments & flags

Flag	Values	Default	What it does
`scope`	file, directory, feature, `all`	`all`	What to audit
`--audit-only`	—	off	Skip interactive prompts; just present the report
`--workflow` / `--classic`	—	auto by repo size	Force the execution path; workflow fans out isolated auditors in parallel (~3-4x faster on large repos). Auto rule: <200 files → classic, ≥500 files → workflow, 200-500 files → workflow if scope is `all`, else classic. Workflow path requires Claude Code ≥ 2.1.154; falls back to classic with a prompt to run `claude update` if the version is too old
`--invoked-by`	`direct`, `calibrate_rescan`, `autopilot`, `pr`, `implement`, `onboard`	`direct`	Caller identity for telemetry — set automatically by cross-skill hooks

What it does

Loads context — CLAUDE.md, project profile, your coding conventions and patterns (to detect violations against them), the prior baseline, and .claude/tech-debt-config.json (severity caps, ignore paths) if present.
Quick scan — explore-light maps hotspots: duplicate blocks, oversized files, TODO/FIXME density, coupling hubs, dead files, and untested source files.
Picks the execution path — workflow (parallel isolated auditors, chosen automatically for large repos) or classic; may prompt once to enable workflows if your Claude Code settings have them disabled — user-confirmation checkpoint in that case. The workflow path requires Claude Code ≥ 2.1.154 — if your version is older, you'll see a one-time prompt to run claude update, and the audit falls back to the classic path automatically (no data is lost, just less parallelism).
Deep audit — three auditors run in parallel: DRY violations + dead code; code smells + design-pattern decay (deviations from your codebase's own patterns, not generic preferences); deprecated patterns (flagged only when the same repo already does it the modern way). No code is modified.
Dedupes and prioritizes — drops findings already tracked in QA incidents, applies severity caps, and computes the delta against the prior baseline ([NEW] / [RESOLVED] tags).
Writes the baseline — human-readable and machine-readable baseline files for trend tracking; other skills (/pr, /implement, /review-pr) reference it automatically.
Presents the report and offers next steps (plan the fixes, file issues, or stop). The prompt is skipped with --audit-only or when invoked by another skill.

Configuration

Optional file: .claude/tech-debt-config.json. If it doesn't exist, defaults apply (severity caps: critical 25 / important 50 / suggestion unlimited; no ignored paths).

{
  "severity_caps": {
    "critical": 25,
    "important": 50,
    "suggestion": null
  },
  "triggers": {
    "pr": "enabled",
    "implement": "enabled",
    "onboard": "enabled",
    "autopilot": "enabled",
    "calibrate_rescan": "enabled"
  },
  "ignore_paths": [],
  "ignore_patterns": []
}

Field	What it does
`severity_caps`	Max findings shown per tier before the report truncates and notes how many were omitted. `null`/absent = unlimited. Defaults: critical 25, important 50, suggestion unlimited
`triggers`	Per-caller mute switches — set a caller (e.g. `"pr": "disabled"`) to skip auto-invocation from that cross-skill hook. All default to `enabled` if the key is absent
`ignore_paths`	Paths excluded from the audit entirely
`ignore_patterns`	Specific code patterns to treat as known false positives, so they stop reappearing in every run

Agents spawned

Agent	Model tier	Role
explore-light	haiku	Structural hotspot scan + deprecated-pattern audit
code-reviewer ×2	sonnet	DRY/dead-code auditor and smells/coupling auditor
completion-verifier	sonnet	Verifies the audit completed what it claimed

Output & artifacts

The audit report in the conversation: Critical / Important / Suggestions, per-dimension summary table, delta vs baseline
.claude/knowledge/shared/tech-debt-baseline.md — human-readable baseline
.claude/knowledge/shared/tech-debt-baseline.json — machine-readable (for tooling and telemetry)
Run history appended to .claude/knowledge/agents/tech-debt.md
No code changes, ever — this skill is report-only by design

Troubleshooting

Problem	Fix
Too many findings to act on	Set `severity_caps` in `.claude/tech-debt-config.json` to cap critical/important counts (defaults: critical 25, important 50, suggestions unlimited if no config file exists)
Known false positives keep appearing	Add them to `ignore_paths` / `ignore_patterns` in the config file
`Dynamic workflows need Claude Code ≥ 2.1.154`	Run `claude update`, or pass `--classic` — the audit runs either way
Findings duplicate things you already track	Findings matching existing QA incidents are tagged `[ALREADY TRACKED]` and excluded from counts automatically

/perf — performance-focused sibling; /tech-debt is structural hygiene
/planning — turn critical findings into a fix plan
/issue — file findings as GitHub issues
/review-pr — surfaces baseline findings during PR review

/templates

Generate structured deliverables from templates.

Synopsis

/templates [type] [topic]

When to use it

You need a professional non-engineering deliverable fast: a product brief, launch plan, status report, design critique, meeting briefing, blog post, or user persona
You're a PM, marketer, or stakeholder who wants outcome-focused documents grounded in real project data
Not for: full feature PRDs with engineering feasibility — that's /planning; or formatting an existing document for distribution — that's /share
Prerequisite: status-report reads your git history, PRs, and issues — a sparse repo with little activity will produce a thin report; the other templates just need a topic

Quickstart

/templates product-brief notifications

What you'll see: a specialist is brought in and a plain-language product brief (problem, target user, desired outcome, success metrics, scope, open questions) is generated for "notifications", under 500 words, followed by next-step options.

Examples

/templates                                  # show the selection menu
/templates product-brief notifications     # product brief for a feature
/templates launch-plan dark-mode            # go-to-market launch plan
/templates status-report                    # stakeholder status report from git/PR/issue data — no topic argument needed
/templates design-critique login-page       # design evaluation against proven principles
/templates meeting-briefing sprint-planning # pre-meeting briefing with project context
/templates blog-post launch-announcement    # blog post grounded in real product capabilities
/templates user-persona pm-users            # behavior-based personas; pass a file path, pasted notes, or a segment name as the topic

Arguments & flags

Argument	Values	Default	What it does
`[type]`	`product-brief`, `launch-plan`, `status-report`, `design-critique`, `meeting-briefing`, `blog-post`, `user-persona`	none; shows a selection menu if omitted	Which deliverable to generate
`[topic]`	free text	none; asked if needed	What the deliverable is about; `status-report` needs none

What it does

Selects the template — if no type is given, presents a numbered menu of the seven templates (user-confirmation checkpoint). If the type needs a topic and none was provided, it asks for one (user-confirmation checkpoint); with a topic inline it goes straight to generation.
Brings in the right specialist — each template routes to a dedicated agent with a structured prompt that reads your project context (CLAUDE.md, git history, PRs, issues as relevant). Output is plain language, jargon-free, and length-bounded per template.
Offers next steps — after every output: "share this" (formats it via /share), "revise [section]", or "try another template".

Output & artifacts

The finished deliverable in the conversation, ready to revise or hand to /share for distribution
No files are written by default

Troubleshooting

Problem	Fix
Menu appears when you expected generation	Pass the type (and topic) inline: `/templates product-brief notifications`
Status report looks thin	It reads git history, PRs, and issues — sparse repos produce sparse reports; add a topic to focus it
Design critique asks what to critique	Provide a URL, screenshot path, or description of the design
User persona output feels generic	Point it at real research data (interview notes, support tickets, survey responses) rather than just a segment name
A specialist is unavailable	The skill explains what happened in plain English and suggests an alternative

/share — format any template output for Slack, Twitter, HTML, or markdown
/planning — full PRD with engineering feasibility, for features headed to implementation
/welcome and /wizard — non-engineer entry points that route into these templates

/welcome

Role-aware welcome for non-engineers. Presents intent-based menu instead of blank terminal.

Synopsis

/welcome

When to use it

You're a PM, founder, designer, marketer, or ops person opening Claude Code and want clear options instead of a blank prompt
You're not sure which workflow fits what you want to do — pick an outcome and the system routes it
Not for: engineers — the engineer briefing is /onboard (welcome falls through to it automatically); or guided multi-step workflows — that's /wizard

Quickstart

/welcome

What you'll see: a short menu of 4 outcome-oriented options tailored to your role (e.g. "Plan a feature", "Check project status", "Something else"). Pick one and the right workflow runs — no jargon, no agent or model names, no slash-command syntax required.

Examples

/welcome      # the only form — the menu adapts to your stored or detected role

You may also see this menu appear automatically — without typing /welcome — on your first message if it's a vague greeting like "hey" or "what should I do?" and your role resolves to non-engineer, or right after an engineer runs project setup and stores your role. That's the same skill routing you in; it's not an error.

What it does

Detects your role — reads USER_ROLE from .claude/.claude-agents.conf; if absent, infers it from which specialist agents are installed in the project (e.g. a GTM agent without QA agents suggests PM/founder/marketer; design-studio agents without backend agents suggest designer; every category installed suggests engineer). If it still can't tell, it asks via AskUserQuestion ("What's your role on this team?" — Product Manager / Founder / Designer / Marketer / Operations / Engineer) — a user-confirmation checkpoint. The answer is stored so future sessions skip the question.
Presents a role-based menu (user-confirmation checkpoint) — an AskUserQuestion with 4 options per role:

PM: Plan a feature · Check project status · Create or review an issue · Something else
Founder: Plan a feature or strategy · Check project status · Review positioning or GTM · Something else
Designer: Design thinking session · Get a design critique · Plan a UI feature · Something else
Marketer: Create a launch strategy · Analyze positioning · Plan content or campaign · Something else
Ops: Check project status · Triage open issues · Review recent changes · Something else
Engineer / unknown: falls through to /onboard

Routes your choice — planning options ask you to type a feature name and a short brief in plain English, then invoke /planning with a design spec included by default (founders get --gtm; add --no-design if you want to skip the design spec); status options invoke /onboard; issue options list GitHub issues; positioning/launch/content and design options bring in the matching specialist with project context. "Something else" lets you describe what you need in plain English. Max one question before action — if you type something specific at any point, the menu gets out of the way.
Offers follow-ups — after the routed action completes: "share this", "revise [section]", or "do something else" to return to the menu.

Output & artifacts

USER_ROLE saved in .claude/.claude-agents.conf (so the role question is asked once)
Whatever the routed workflow produces (e.g. a plan from /planning, a status briefing from /onboard, a GTM or design document in the conversation)

Troubleshooting

Problem	Fix
It went straight to an engineer briefing	Your role resolved to Engineer (or couldn't be determined as a non-engineer role) — welcome hands engineers to `/onboard` by design
It guessed my role wrong	Edit `USER_ROLE` in `.claude/.claude-agents.conf` — use one of `pm`, `founder`, `designer`, `marketer`, `ops`, or `engineer` (lowercase, e.g. `USER_ROLE="pm"`) — and run `/welcome` again
None of the menu options fit	Pick "Something else" and describe what you need — the menu is a suggestion, not a constraint

/wizard — guided multi-step workflows for the same roles (welcome is a single-choice menu; wizard walks a full process)
/onboard — the engineer-facing session briefing welcome defers to
/planning — where "Plan a feature" choices land

/wiki-knowledge-base

Build and maintain LLM-powered topic wikis.

Synopsis

/wiki-knowledge-base <init|fetch|ingest|query|lint|status> <topic> [-- <text>]

When to use it

Researching a technology or vendor landscape before adopting it
Competitive analysis or domain deep-dives where knowledge accumulates over weeks
Building durable domain context that /planning, /fix, /implement, and /review-pr automatically pick up
Not for: project code knowledge — that lives in the /calibrate knowledge base (calibrate)

Quickstart

/wiki-knowledge-base init stripe-payments

What you'll see: the skill asks you to describe the wiki's focus in 1-2 sentences, then scaffolds .claude/wikis/stripe-payments/ with raw/ (for your source documents), raw/assets/ (for images/PDFs), wiki/ (index, log, pages), and a schema.md. Add source documents to raw/ (or use fetch), then run ingest.

Examples

/wiki-knowledge-base init video-research                                    # scaffold a new wiki
/wiki-knowledge-base fetch video-research -- https://docs.livekit.io/agents # fetch a specific page via Chrome into raw/
/wiki-knowledge-base fetch video-research -- https://docs.livekit.io       # fetch a docs root: grabs the 3-5 key sub-pages
/wiki-knowledge-base ingest video-research                                  # process new raw sources into wiki pages
/wiki-knowledge-base query video-research -- compare vendor latency         # search wiki, synthesize an answer
/wiki-knowledge-base lint video-research                                    # health-check the wiki
/wiki-knowledge-base status                                                 # list all wikis with stats

Arguments & flags

Command	What it does
`init <topic>`	Scaffold a new wiki (asks for the wiki's focus)
`fetch <topic> -- <url> [url2] ...`	Fetch web pages via the Chrome browser extension into `raw/` — bypasses bot protection that blocks plain fetching. Point it at a docs root URL (rather than a specific page) and it navigates the nav/sidebar to fetch the 3-5 most important sub-pages instead of a single page
`ingest <topic>`	Process unprocessed `raw/` sources into interlinked wiki pages
`query <topic> -- <question>`	Search the wiki and synthesize a cited answer
`lint <topic>`	Find dead links, orphan pages, contradictions, stale sources
`status [topic]`	Dashboard of all wikis, or detail for one

What it does

Each wiki has three layers under .claude/wikis/<topic>/: immutable raw/ sources, the LLM-maintained wiki/ (index, log, pages), and a schema.md of conventions.

init — scaffolds the directory structure, then asks you to describe the wiki's focus (user-confirmation checkpoint) and generates the schema, index, and log.
fetch — opens each URL in a Chrome tab, extracts the page text, and saves it as a markdown source in raw/. If a URL is a docs root rather than a specific page, it identifies and fetches the 3-5 most important sub-pages instead of crawling the whole site. Falls back to direct fetching if Chrome tools aren't connected.
ingest — finds raw sources not yet in the log, reads each one, and presents 3-5 key takeaways for you to confirm or correct (user-confirmation checkpoint) before writing summary, entity, and concept pages, updating cross-references, the index, and the log.
query — reads the index, greps for relevant pages, reads only the matches, and synthesizes a cited answer. If the answer is novel synthesis, it offers to file it back as a new wiki page (user-confirmation checkpoint).
lint — checks for orphan pages, dead links, contradictions, index drift, stale sources, and missing cross-references; reports issues by severity with suggested next steps.
status — shows page/source counts, recent activity, and a page breakdown per wiki.

Agents spawned

Agent	Model tier	Role
explore-light	haiku	`lint` scan for large wikis (> 20 pages) — keeps lint cost flat

All other operations run inline in your session.

Output & artifacts

.claude/wikis/<topic>/raw/ — your immutable source documents (the LLM never modifies these)
.claude/wikis/<topic>/wiki/index.md — content catalog; wiki/log.md — append-only operation record; wiki/pages/ — entity, concept, summary, comparison, and synthesis pages
.claude/wikis/<topic>/schema.md — conventions, co-evolved with you
Other skills discover wikis automatically: /planning, /fix, /implement, and /review-pr grep the index and read only matching pages

Troubleshooting

Problem	Fix
`No new sources found` on ingest	Add documents to `.claude/wikis/<topic>/raw/` first (markdown, text, PDF)
`fetch` can't reach a page (Chrome MCP not connected)	It falls back to direct fetching; if that 403s, save the page with Obsidian Web Clipper and drop the file into `raw/`
Lint reports dead links or index drift	Re-run `ingest` or fix the flagged pages — HIGH-severity issues list the suggested fix
Wiki getting slow to search (100+ pages)	The skill's index-first protocol keeps reads selective; consider an external local-search tool for very large wikis

/kb-architecture-diagram — diagram the project knowledge base (a different KB than topic wikis)
/planning, /fix, /implement — consume wiki context automatically
/calibrate — the project knowledge base; wikis research, the project KB applies

/wizard

Guided step-by-step workflows for non-engineers. Role-based flows that walk you through common tasks.

Synopsis

/wizard [role] [step]

When to use it

You're a PM, founder, designer, or marketer and want to be walked through a full process (idea → plan → launch) step by step, with progress saved between sessions
You want to resume a multi-step process you started earlier — /wizard resume
Not for: a one-off choice menu — that's /welcome; or a direct feature plan when you already know what you want — that's /planning

Quickstart

/wizard

What you'll see: your role is detected (or asked once), then the matching 5-step workflow overview — for a PM, "Plan and Ship a Feature": describe your idea → understand your users → write the plan → create a launch plan → track progress. You pick a step; each one explains itself, asks before running, and saves its output.

Examples

/wizard               # detect role or ask, then show available workflows
/wizard pm            # show the PM workflow steps
/wizard pm 1          # jump straight to PM step 1 (Describe Your Idea)
/wizard pm next       # continue to the next incomplete step
/wizard founder       # Founder workflow: Build and Ship
/wizard designer      # Designer workflow: Design, Validate, Ship
/wizard marketer      # Marketer workflow: Research, Position, Launch
/wizard status        # progress table across workflows
/wizard resume        # resume where you left off
/wizard reset         # clear saved progress and start fresh

What it does

Detects your role — reads USER_ROLE from .claude/.claude-agents.conf (a plain-text config file in your project's .claude folder — your Claude Code session can create or edit it for you); if unset, asks via AskUserQuestion (Product Manager / Founder / Designer / Marketer) — a user-confirmation checkpoint. The answer is not automatically saved back to the config file, so you'll be asked again next session unless you (or Claude) add USER_ROLE to it manually.
Presents the role's 5-step workflow and lets you start anywhere (out-of-order is allowed — it warns but doesn't block):

PM — Plan and Ship a Feature: feature brief → user personas + assumptions to validate → implementation plan (via /planning) → launch plan → progress report from real git/PR/issue data
Founder — Build and Ship: vision → competitive landscape (including how you're positioned against similar products, using frameworks like Strategy Canvas / 7 Powers) → build plan (via /planning) → build (via /implement) → launch plan + status
Designer — Design, Validate, Ship: design brief → expert critique → improved iteration → implementation plan → final review against the original intent
Marketer — Research, Position, Launch: landscape analysis → positioning statement (how you describe the product vs. alternatives) → content drafts (blog, social, announcement, grounded in the real product) → launch plan → results + recommendations

Runs each step with the same protocol — loads previous steps' outputs as context, explains what the step does in plain language, then asks "Ready to start? Or would you like to skip this step?" — a user-confirmation checkpoint — before bringing in the right specialist. Everything is in everyday language: no agent names, model names, or command syntax.
Saves state after every step to .claude/.wizard-state.json (role, workflow, current step, per-step summaries and file references), then offers: "next", "revise" (redo this step with changes, overwriting its saved output — no history of prior attempts is kept), "share this", "save and stop", or "skip to step N".
Resumes across sessions — /wizard resume shows completed steps with summaries and continues from where you stopped; /wizard status shows the progress table; /wizard reset deletes the saved state.

Output & artifacts

.claude/.wizard-state.json — saved progress (survives session restarts)
Step deliverables saved as files where applicable (e.g. .claude/plans/<topic>-brief.md; planning/implementation steps produce their own plan files) — summaries are kept in state with file references
Briefs, personas, positioning, content drafts, launch plans, and status reports presented in the conversation

Troubleshooting

Problem	Fix
"No workflow in progress" on `/wizard resume`	There's no saved state yet — run `/wizard` to start a workflow
It asked for my role again	Nothing writes your answer back to config — set `USER_ROLE` in `.claude/.claude-agents.conf` (a config file in your project's `.claude` folder; your Claude Code session can edit it for you) if you don't want to be asked each time
Want to redo a step	Run that step again (`/wizard pm 2`) or say "revise" at the checkpoint — this redoes the step and overwrites its output in state; no earlier version is kept
Progress feels stale or wrong	`/wizard reset` clears `.claude/.wizard-state.json` and starts fresh
A step failed mid-run	State was saved before the step ran — it explains what happened in plain English and offers to retry

/welcome — single-choice role menu (wizard is the full guided process)
/planning — the planning engine behind the "write the plan" steps
/implement — the build engine behind the founder workflow's Step 4
/onboard — engineer-facing session briefing

← Back to Tutorials

Tutorial: Calibrate a Project

In this tutorial you'll run /calibrate on a project for the first time and verify everything it created. Calibration is how the toolkit stops being generic: it deep-learns your codebase, seeds a knowledge base, and configures agents, skills, and MCP servers to match how your project actually works.

Time: ~15 minutes (most of it is the calibration running) You need: the toolkit installed in a real project (Getting Started).

Step 1: Install the toolkit (skip if done)

If you haven't installed yet, follow Getting Started Steps 1–6: activate your license key, add the arthai-marketplace marketplace, install a bundle (start with prime), and /reload-plugins.

Step 2: Run calibration

In Claude Code, inside your project:

/calibrate

First you'll see a mode line explaining which mode was picked and why — with no existing profile, that's a full calibration. Then the phases run:

Scan — a scanner agent deep-reads the project: platform, architecture patterns, coding conventions, testing patterns, domain model, integrations, environments.
Evaluate — an evaluator scores every toolkit agent, skill, and hook against the scan and identifies gaps.
Profile — a profiler writes .claude/project-profile.md.
Recommend — the report is presented and you're asked: Install all / Pick items / Skip.

If your CLAUDE.md is missing or still has  placeholders, calibrate runs /scan first to populate it.

Step 3: Approve the recommendations

The Phase 4 prompt is a real checkpoint — nothing is installed without your approval. The report covers recommended MCP servers, toolkit categories, workflows, and custom agents/skills tailored to your stack. For a first run, Install all is the simplest choice; Pick items opens a multi-select if you want to be choosy; Skip writes the profile only.

After you approve, the install phase runs in parallel: the installer applies your approved items, a knowledge agent seeds .claude/knowledge/, and a best-practices agent writes .claude/knowledge/shared/best-practices.md. A final verification phase confirms everything landed and prints a completion report. An independent verifier then reports PASS / GAPS FOUND / INCONCLUSIVE — gaps don't block; you decide whether to rerun.

Step 4: What got created

Artifact	What it is
`.claude/project-profile.md`	Architecture, conventions, domain model — the file agents read at runtime
`.claude/knowledge/`	The seeded knowledge base, including `shared/best-practices.md`
`.claude/settings.json`	Recommended MCP servers added (never removed)
`.claude/agents/`, `.claude/skills/`	Any custom agents/skills created, as regular files you own
`.claude/monitors/`	Pre-adapted Monitor configs for your CI/deploy platform, with activation steps printed (Monitors)

Step 5: Start a fresh session

Calibration is the heaviest context operation in the toolkit — everything of value is in the files, so start a fresh Claude Code session when it finishes. At the start of that next session, the knowledge graph is built automatically from your knowledge base: a ranked index of your conventions, domain rules, patterns, and vocabulary that workflows like /fix, /planning, /implement, and /qa query to pull in only the most relevant context. The graph auto-rebuilds whenever your knowledge base changes — no action needed from you.

Step 6: Verify it worked

In your terminal:

ls .claude/project-profile.md
ls .claude/knowledge/shared/

You should see the profile and the seeded knowledge files. Then, in a Claude Code session:

/calibrate status

This prints the current calibration state without spawning any sub-agents. If the report ever says calibration_status: partial, a phase failed and its dependents were skipped — re-run /calibrate full.

Finally, run /onboard — with a profile in place you get a project briefing grounded in what calibration learned.

What you learned

/calibrate is the single entry point for project adaptation: scan → evaluate → profile → recommend → install → verify.
Installs always go through your approval — the Phase 4 prompt is a contract, not decoration.
The outputs live in .claude/ as regular files you own: profile, knowledge base, MCP config, custom agents/skills.
The knowledge graph builds automatically at your next session start and feeds every major workflow. (More depth: The knowledge system.)
Re-run /calibrate after the project evolves significantly — it auto-detects rescan mode and shows only what changed.

Next tutorial: Plan and implement a feature — the full PRD → plan → build → PR workflow.

← Back to Tutorials

Tutorial: Plan and Implement a Feature

In this tutorial you'll build a feature the toolkit way: /planning writes the PRD, you review it, /implementation-plan debates and locks the scope, and /implement spawns the team that builds it — running /qa and creating the PR via /pr at the end. You'll also see every user-confirmation checkpoint along the way, so none of them surprises you.

Time: ~45–90 minutes depending on feature size You need: the toolkit installed and ideally calibrated (previous tutorial), and a feature idea you can describe in 2–3 sentences.

Step 1: Write the PRD

/planning dark-mode -- Users want a dark theme that follows system preference and persists per account

Everything after -- is the inline brief. (Leave it off and the skill asks for one interactively — every question includes Cancel.)

What happens: an explore-light agent scans your codebase for related routes, components, models, and tests; a Design Thinker writes a UX brief; the Product Manager writes the PRD (user stories with priorities and acceptance criteria, user journey, edge cases, success criteria); and the Architect writes a tech feasibility note in parallel.

You'll end with:

.claude/specs/dark-mode.md — the PRD
.claude/specs/dark-mode-design.html — the design spec (skip with --no-design)
A feasibility verdict: GREEN / YELLOW / RED

Then it stops. This is deliberate — /implementation-plan is never auto-invoked, because PRD review is the whole point of the two-phase split.

Step 2: Review the PRD and design spec

Open .claude/specs/dark-mode.md and read it like you'd review a teammate's PRD: are the user stories right? Are the edge cases real? Edit the file directly if anything's off. Open the -design.html sibling in a browser for the user journeys and key screens.

If feasibility came back YELLOW or RED, read the printed hard constraints now — revising the PRD at this stage is far cheaper than after the architecture debate.

Step 3: Lock the scope

/implementation-plan dark-mode

Checkpoints you'll hit:

Debate depth prompt — Auto / Fast / Lite / Full. Auto is recommended; it picks from PRD signals. (If the PRD's feasibility was RED, you're first asked: continue anyway, cancel, or open the PRD.)
Escalation protocol — before any PRD-traced item is deferred or rejected, you see the team's reasoning and choose: keep it, accept the recommendation, or reduce scope. Your overrides are recorded in the plan.
Handoff — after the plan is verified against the PRD, the skill asks before proceeding to /implement.

Between those, the PM and Architect (plus a Devil's Advocate, unless --fast) run a structured debate — Round 1 on scope, Round 2 on feasibility in Full mode — and the result is a locked-scope plan at .claude/plans/dark-mode.md with must-haves, exclusions, cost estimates, a debate record, and a task breakdown.

Step 4: Build it

/implement dark-mode

Checkpoints you'll hit:

Mode prompt — Auto / Guarded / Fast / Strict (how aggressively the red team challenges the build). Auto picks from plan size and risk keywords — auth, payments, and migrations escalate to Strict.
Red-team blocks — unresolved CRITICAL findings stop the flow and ask: fix / override / abort.
Phase prompts — multi-phase plans pause between phases and ask whether to proceed.
QA level — after the build: commit / full / staging / skip.
Manual test sign-off — local servers are restarted and the skill waits for your "ready" before shipping.

The team works in parallel: backend and/or frontend agents per the plan's layers (backend shares the API contract with frontend before either implements), a QA agent traces every user story and edge case to code, and the red team attacks the diff and checks plan compliance.

Step 5: QA and PR

These run inside /implement's post-implementation workflow, so you don't invoke them separately: your chosen /qa level runs, and after your manual sign-off the PR is created via /pr --skip-qa (QA already ran, so /pr does only a quick lint + type sanity check). You get an implementation report, QA results, and a GitHub PR URL — then the skill asks what's next.

What you learned

The feature workflow is three phases with review gates between them: PRD → locked plan → build, each producing a file you own (.claude/specs/, .claude/plans/, then code + PR).
/planning stops after the PRD on purpose — your review is the input to the next phase.
/implementation-plan won't silently drop scope: every deferral of a PRD-traced item goes through you.
/implement chains QA and the PR for you, with checkpoints at mode selection, red-team blocks, QA level, and manual sign-off.
For work that doesn't fit this shape — a single fuzzy objective, or a ticket backlog — see Run autonomous work and the workflow comparison.

Next: you've finished the tutorials — the how-to guides cover incidents, CI recovery, autonomous work, and more.

← Back to Tutorials

Tutorial: Ship Your First PR

In this tutorial you'll take a small code change from your working tree to a merged-ready GitHub PR using three skills: /precheck → /qa → /pr. By the end you'll understand the toolkit's standard ship path and what each gate protects you from.

Time: ~10 minutes You need: the toolkit installed (Getting Started), a project with a test suite, and a small change you're ready to ship.

Step 1: Get on a feature branch

Both /precheck and /pr refuse to run on main — the toolkit never pushes to your default branch. If you're on main, branch first:

git checkout -b my-first-change

Make your small change now if you haven't already — a one-line fix or a comment tweak is fine for a first run.

Step 2: Run the fast local gate

/precheck

This catches CI failures locally in ~30 seconds instead of a 4-minute CI round-trip. You'll see:

A revert-safety check — verifies your working tree doesn't accidentally undo a recently-merged PR (a stale editor buffer or bad stash-pop can do this silently).
The test suites relevant to your changed files — it diffs against main and picks only the suites that cover what you touched.

On success the run ends with:

✓ Precheck passed (N tests, Xs) — Ready to push.

It also writes a pass marker at .claude/.precheck-passed — /pr checks this before allowing a PR.

Note: on a passing run, /precheck doesn't stop at a pass marker — it carries straight through the full ship sequence on its own: commit → push → PR → merge → branch cleanup, with no /qa or /pr review in between. If you want to follow the guarded /qa → /pr path in this tutorial (recommended for anything beyond a trivial change), interrupt it before it merges, or run /qa and /pr yourself instead of /precheck. If you let it run to completion, your change is already merged to main — skip ahead to What you learned.

If precheck fails, it removes any stale pass marker and lists the failing tests — it will not push with failures. Fix them and re-run.

Step 3: Run QA on your change

/qa

With no argument this runs commit mode — targeted checks on exactly what you changed (~1–3 minutes). You'll see a mode line confirming commit mode, then 2–4 QA agents running targeted checks on your changed files, generating 5–8 fresh test scenarios that think like a real user about what your change could break.

The run ends with a structured QA report: pass/fail per check, any failures or warnings, coverage gaps (new code without tests), and a suggested next step. On pass, it asks whether to run /review-pr now or skip to /pr — that's a deliberate confirmation checkpoint, not a glitch.

If you run /qa and nothing has changed since the last commit, it shows a one-line picker instead of running — commit your change first, or pick a different mode.

Step 4: Create the PR

/pr

This is the full safe path to a PR. You'll see, in order:

A revert check (advisory — it asks before proceeding if anything looks suspicious)
/qa in commit mode as a hard gate — no PR is created on a failing QA run
A commit in your project's detected style (it reads CLAUDE.md and git history to match your team's conventions; it never stages .env, credentials, or large binaries)
A tracking issue found or created, so the PR auto-closes it on merge via Closes #N
Your branch rebased on the latest default branch and pushed (it fails loudly on conflicts — never auto-resolves)
A PR URL with Summary, QA Results, and Test plan sections

Step 5: After the merge

When the PR is merged, tell the session "merged" (or let a configured Monitor watcher detect it). /pr then verifies the issue closed, deletes the remote and local branches, checks out main, pulls, and runs /onboard so you know what's next.

What you learned

The toolkit's ship path is /precheck → /qa → /pr — fast local gate, targeted quality checks, then a guarded PR.
/precheck writes a pass marker that /pr consumes; a failing precheck blocks the push.
/qa commit mode checks only your diff and generates fresh scenarios beyond your existing tests.
/pr never pushes to main, never ships failing QA, and handles issue linkage and post-merge cleanup for you.
For a one-shot commit + push + PR with no QA gate, there's /ship — but the guarded path is the default for a reason.

Next tutorial: Calibrate a project — teach the toolkit your codebase so every workflow gets smarter.

Audit Code Health

Two skills audit different dimensions of the same question — is this codebase aging well? /tech-debt finds structural rot; /perf finds slowness. Run them together for a full health picture, then turn the findings into a plan.

/tech-debt — structural audit

/tech-debt                          # full codebase audit
/tech-debt backend/services/        # scope to a module
/tech-debt --audit-only             # report only, change nothing

It scans for DRY violations, dead code, design smells, coupling, and deprecated patterns, and produces a file-cited report — every finding names a file and line, so nothing is hand-wavy. A stable baseline is written to .claude/knowledge/shared/tech-debt-baseline.md, which later runs diff against.

/perf — performance audit

/perf search                        # optimize search-related code paths
/perf GET /api/products             # a specific endpoint
/perf all --audit-only              # full audit, read-only
/perf backend/services/ --deep      # deep profiling: benchmarks + load tests (slower)

It spins up a cross-functional team — performance engineer, architect, backend, frontend, QA — to audit, optimize, and validate. Scope can be a file, directory, feature, or endpoint; --backend-only / --frontend-only narrow the layers. Start with --audit-only if you only want the findings.

Reading the severity tiers

Tech-debt findings come back in three tiers:

Tier	Meaning	How to treat it
CRITICAL	Actively harmful — fix-worthy on its own	These are your plan candidates
IMPORTANT	Real debt, not urgent	Batch into cleanup work
SUGGESTION	Nice-to-have polish	Take opportunistically

Counts are capped per tier (by default 25 critical / 50 important / suggestions unlimited), so the report stays prioritized instead of becoming a 400-item dump — if you hit the critical cap, that itself is the headline finding.

From audit to plan

An audit-only report changes nothing; the value is in what you schedule. The path into the toolkit's planning machinery:

Pick one cluster of related findings — e.g. the CRITICAL coupling issues in one module. Don't plan "fix all debt".

Write the brief from the findings with /planning, citing the file:line evidence:

/planning service-decoupling -- Break the circular dependencies the tech-debt audit found in backend/services/ (see tech-debt-baseline.md)

Lock scope with /implementation-plan — the Devil's Advocate is particularly useful on refactors, where scope creep is the default failure mode.
Re-run the audit after the work merges. The baseline diff shows what actually got paid down.

/tech-debt also runs conditionally during /calibrate rescans when drift signals warrant it (--force-debt / --skip-debt to override).

/tech-debt and /perf — full references
/planning → /implementation-plan — findings to locked plan
/qa — full mode is the validation pass after a big cleanup
Plan and implement a feature — the build workflow the plan feeds

Automate CI Recovery

Red CI doesn't need you to debug it by hand. /ci-fix remediates failures with a bounded retry loop, and a Monitor watcher can wake it the moment CI fails — so the pipeline often repairs itself before you've noticed it broke.

The three modes

/ci-fix                        # CI failures on the current branch
/ci-fix ci feature/my-branch   # CI on a specific branch
/ci-fix staging                # staging deploy failure — reads deploy logs + health endpoints
/ci-fix prod                   # production deploy failure — read-only investigation first

Staging mode patches code or env vars based on deploy logs and health endpoints. Prod mode is the most conservative: read-only first, never touches the production database, and considers a git revert on the final attempt.

The 3-attempt loop

Each attempt pulls the failed logs, classifies the failure (lint / types / tests / build / migration / dependency), applies a fix scoped to only the failing files, verifies locally before pushing, then commits, pushes, and waits for the new CI result. Each retry uses a different strategy:

Attempt 1 — direct fix from the failure classification
Attempt 2 — reads more context around the failure
Attempt 3 — deep investigation against the last green commit

Known flaky tests are checked first — a match is reported instead of debugged. Hard rules throughout: never repo-wide lint auto-fix, never # noqa / # type: ignore suppressions, never deleted or skipped tests, never direct deploy commands — fixes always go through git.

On exhaustion (3 failed attempts): a Discord alert posts to #deployments (when discord-ops is configured), a QA incident file is written with the diagnosis trail, and it hands back to you. The attempt counter persists per-branch in .claude/monitors/.ci-fix-state.json, so a Monitor-triggered run won't loop forever on the same failure — a green run resets it.

Event-driven watching with Monitor

Polling CI wastes tokens. Monitors register background watchers that fire on external events — zero token cost while idle:

Event	What happens
CI fails on any branch	`/ci-fix` wakes, diagnoses, patches, resubmits. Discord alert after 3 failed attempts.
Deploy fails or service crashes	`/sre debug` wakes, applies an infra fix or escalates to `/ci-fix`.
Staging deploy succeeds	`/qa staging` runs automatically.

Setup is part of /calibrate: it detects your CI system and deploy platform, generates pre-adapted configs in .claude/monitors/, and prints the exact webhook URL and platform steps. Add the webhook once on your platform and it never needs to change. All of this is additive — if you never configure a monitor, /ci-fix behaves exactly as before, just manually invoked.

Prevention beats recovery

The cheapest CI fix is the one that never reaches CI: /precheck runs the relevant suites locally in ~30 seconds before you push. And after every fix, /ci-fix leaves CI improvement recommendations (caching, parallelism, timeouts, flaky-test retries) — recurring failures across branches usually mean a flaky test, not a code problem.

/ci-fix — full reference, troubleshooting table
Monitors — setup, supported platforms, which bundles include which watchers
/precheck — catch failures locally before CI sees them
/incident — when you're not sure it's CI at all
/sre — /sre ci for pipeline status without auto-remediation

Fix a Production Incident

When something is broken and you don't yet know what, start with /incident. It triages, diagnoses in parallel, and routes to the right resolution skill with all the evidence attached. When you already know what kind of problem you have, go to the resolution skill directly and skip the triage.

Start with triage

/incident "500 errors on the checkout page"

What happens:

Classify (instant) — severity (CRITICAL/HIGH/MEDIUM/LOW) and type (infra, CI, code bug, performance, local ops, data, auth). CRITICAL proceeds with no questions; LOW asks whether you want full triage or a quick check.
Parallel diagnosis (< 60s) — four cheap agents at once: health endpoints, recent deploys + CI, error signals/logs, and a knowledge-base lookup for similar past incidents.
Challenge — devil's-advocate agents adversarially test the diagnosis (skipped for high-confidence CRITICALs; saying "just fix it" skips it too).
Route — the right skill is invoked automatically with the gathered evidence.
Verify and learn — health checks re-run, an incident report lands in .claude/qa-knowledge/incidents/, and the knowledge base is updated.

Other entry forms: /incident #234 loads a GitHub issue; bare /incident auto-detects by checking health, CI, and recent deploys.

Where it routes — and when to go there directly

Evidence points at	Routes to	Go direct when
Infrastructure, performance, data	/sre `debug`	You already suspect infra — bad deploy, resource exhaustion, a service down
CI pipeline	/ci-fix	You know CI is red — see Automate CI recovery
A code bug	/fix	You have a diagnosed bug or reproduction — the formal pipeline takes it from there
Local dev environment	/restart	Your local servers are the problem, not production

Two things worth knowing about the routes:

/sre debug never patches code. Infra-only fixes (restart, config, scale) are applied directly and health-verified; code bugs are escalated to /fix with the evidence — that's by design, because /fix has the verification pipeline (root cause, scope lock, regression proof).
Escalations between skills are automatic. If /sre finds a code bug or /ci-fix hits an architectural problem, the handoff carries the diagnosis with it.

If it stalls

If a routed skill doesn't resolve in ~15 minutes, the orchestrator alerts and presents options; for CRITICAL incidents unresolved in 30 minutes it suggests a revert. If triage reports missing data sources (no monitoring, no DB access), run /calibrate — it installs the recommended MCP servers so future triage gets richer evidence.

For a known issue you want recorded without running triage, use /qa-incident.

/incident — full triage reference
/sre — infra route; also standalone health checks (/sre status)
/ci-fix — CI route, with the 3-attempt loop
/fix — code-bug route, formal pipeline
/restart — local-ops route
Monitors — wake these skills automatically on failure events

Measure What AI Costs You

Arth Intelligence records every Claude Code session — tool calls, agent spawns, tokens, dollars — on your own machine. Its Experiments page answers the question this guide is about: what does the toolkit actually change, and what does my AI usage cost? This page is the condensed workflow; the full guide has setup, dashboard tour, and privacy details.

Prerequisite: observability enabled via /otel-setup — see Getting Started Step 7. Cost and token data require Claude Code's native OTEL (CLAUDE_CODE_ENABLE_TELEMETRY=1, which /otel-setup writes); without it the cost columns stay empty.

Run a baseline vs toolkit A/B

Every session is auto-tagged with an arth.experiment label — no setup before each launch:

Mode	Label format
No-toolkit baseline (plain `claude`, no plugin)	`auto-baseline-<git-branch>-<first-prompt-slug>-<unix>`
Toolkit-on session	`auto-toolkit-<git-branch>-<unix>`

The zero-config flow:

Run the task without the toolkit (plain claude in a project with no plugin).
Run the same task with the toolkit installed.
Open http://localhost:3100/experiments, pick one label on each side, click Compare.

Want telemetry on but toolkit side effects off for the baseline? export OTEL_OBSERVE_ONLY=true before launching. Want a human-readable label instead of the auto one? Set arth.experiment in OTEL_RESOURCE_ATTRIBUTES before launching — your value wins. Auto-tagging can be disabled with ARTH_AUTO_EXPERIMENT_DISABLED=1. Full details: Getting Started 7f.

Reading the six metrics

The comparison fills in side-by-side:

Metric	What it tells you
Cost (USD)	Total spend per run — the bottom line
Tokens	Input/output/cache volume behind that cost
API calls	How many model round-trips the run took
Cache hit rate	How much context was reused instead of re-paid
Lines edited	Output volume of actual code change
Active time	Wall-clock engagement, not idle time

Plus workflow attribution — which skills and agents the toolkit-on run used. Two caveats: cost/token/cache metrics need native OTEL (sessions without it show 0), and lines-edited only populates for toolkit-instrumented sessions.

Going deeper

Mark moments mid-run — /marker "spike here" drops an amber ◆ on the session's DAG timeline within ~5s, so you can correlate a cost spike with what you were doing.
Cost by owner — each session's detail view includes an owner × model cross-tab: spend by toolkit skills vs Claude Code built-ins vs your own config, per model tier (Opus 60x / Sonnet 10x / Haiku 1x).
Velocity score — the project view's Today summary composites outcome rate, toolkit coverage, and cost efficiency, with data-backed coaching insights.
Export a slice — /arth logs export --since 1h or the sidebar's diagnostic bundle; filter by experiment, marker, session ID, and time range.

Arth Intelligence — the full observability guide (source for everything above)
Getting Started Step 7 — install walkthrough, observer-only mode, auto-tagging controls
/otel-setup — the one-command setup skill
/marker — mid-session timeline annotations
Architecture — why model tiers make cost controllable in the first place

Beyond measuring: route by cost

Measuring tells you what you spent; the Arth Router decides what to spend before the work runs. It routes each task to a model/harness/platform with awareness of the capacity you already paid for — an under-used Max plan window routes to the best models at $0 marginal cost; a burned-down window right-sizes to cheaper ones — governed by budgets and tiers, with every decision explained.

/router-setup — one-command install: discovers your plan, generates a real pool config, wires the triage hook
/router — operate it: summary (override rate + estimate bias), pools (window burn), calibrate, outcome reporting

Pairs with this guide's telemetry: the outcomes bridge feeds actual usage back into the router, so its cost estimates are continuously reconciled against what you really spent.

Run Autonomous Work

The toolkit has two autonomous loops. Both work without you babysitting them, both auto-continue between phases, and both stop at PR creation — merging is always yours. The difference is the shape of the work.

Which one?

You have...	Run
One objective, fuzzy path ("Cut homepage LCP under 2s")	/goal
Multiple issues, ranked queue ("Work through these 8 issues")	/autopilot
One feature with a written plan	/implement — not an autonomous loop
One bug with a clear repro	/fix

If the work is "I have a destination, figure out the path" — /goal. If it's "I have a stack of well-scoped tickets, work them" — /autopilot.

/goal — speed-first, single objective

/goal Cut homepage LCP below 2s on mobile

A scout reads your knowledge base and project profile first, then scans the codebase to fill gaps; you answer 3–5 context-aware clarifying questions (or say "go" for defaults) and confirm the plan. Then it loops: pick action → execute → mandatory verify (lint + types + tests) → capture evidence → self-evaluate against each subtask's done_when clause. It runs inline, pulling in Sonnet agents only when needed, with a budget of 6 agents per goal. Lifecycle: /goal pause, resume, clear, status; state lives in .claude/.goals/current.json.

/autopilot — rigor-first, backlog queue

/autopilot --urgent-first

It ranks open issues P0–P5, then per item: classify → verify repro → plan → implement (with backend/frontend/QA agents as needed) → QA → self-review → PR. The PR body is assembled directly from the captured evidence array (git diffs, test results, lint, types). --dry-run shows what it would do without doing it — useful for trust-building. State lives in .claude/.workflow-state.json.

The safety gates

Merge approval is a hard stop. Both loops create the PR (for /goal, they stop at PR creation — you run /pr) and wait. Nothing merges without a human.
Risk classification (/autopilot). Each item is scored 0–12 on blast radius, reversibility, confidence, and domain sensitivity: score ≥ 11 is refused, ≥ 9 escalates to you before work starts.
Blocking conditions (/autopilot). Scope drift, QA failures after 2 attempts, or a per-item budget breach (> 6 agents) block the item rather than pushing through.
Loop guards. The auto-continue Stop hooks bail after consecutive no-progress turns (12 for /goal, 15 for /autopilot) and wait for you. They stay silent at human-gate phases — awaiting merge, blocked, paused.
Interruptions are safe. If you type an unrelated command mid-loop, the active workflow auto-pauses and your request runs normally.

Both skills ship in the cruise bundle, which requires forge + scalpel + sentinel (see the plugin catalog).

Workflow comparison — the full decision tree and a dimension-by-dimension /goal vs /autopilot table
/goal and /autopilot — full references
/implement — the plan-driven alternative for well-specified features
/pr — where every loop hands back to you
Measure what AI costs you — watch an autonomous run live in the dashboard