npx arthai-activateARTH-XXXX-XXXX-XXXX-XXXX)Email productive@getarth.ai with your GitHub username to request a license key.
You’ll receive:
ARTH-XXXX-XXXX-XXXX-XXXX)arthai-marketplace private repo on GitHubAccept the GitHub invite before continuing — Step 3 requires repo access.
Run this in your terminal (not inside Claude Code):
npx arthai-activate ARTH-XXXX-XXXX-XXXX-XXXX
This stores your key at ~/.arthai/license. You only need to do this once.
Inside Claude Code:
/plugin marketplace add ArthTech-AI/arthai-marketplace
Start with prime — the everything bundle. Includes all agents, skills, and hooks:
/plugin install prime@arthai-marketplace
If you want a smaller, focused bundle instead, see the plugin catalog for all available bundles.
/plugin update-policy arthai-marketplace auto
This keeps your toolkit up to date automatically — new agents, skills, and bug fixes land without manual intervention.
/reload-plugins
Restart Claude Code if skills don’t appear immediately.
You can skip this step and jump to Step 8: Calibrate. The toolkit works without observability. Come back here whenever you want a dashboard view of what Claude Code is doing — sessions, tool calls, agent spawns, cost.
⚠️ Experimental — limited preview. Observability is in active development. Expect rough edges, breaking changes between releases, and gaps in coverage. Feedback welcome at productive@getarth.ai.
See what Claude Code did — every tool call, agent spawn, and workflow phase visualized in a dashboard.
Two telemetry streams, both required for full data:
- Toolkit hook — adds session/prompt/tool/agent/skill spans (installed by sentinel below).
- Claude Code native OTEL — adds cost USD, input/output/cache tokens, and model to those spans (gated by the env var
CLAUDE_CODE_ENABLE_TELEMETRY=1).
/otel-setupturns both on for you. WithoutCLAUDE_CODE_ENABLE_TELEMETRY=1, the dashboard’s cost and token columns stay empty — that’s the most common “why is my dashboard half-broken?” question.
docker info. If you see version info, you’re good. If you see an error, open Docker Desktop first.localhost (engine, dashboard, postgres). If any of these are in use, stop the conflicting service before continuing.Inside Claude Code:
/plugin install sentinel@arthai-marketplace
This adds the OTEL tracing hook and the /otel-setup skill to your project.
If you already have sentinel installed, update your plugins:
/plugin marketplace remove arthai-marketplace
rm -rf ~/.claude/plugins/cache/*arthai*
/plugin marketplace add ArthTech-AI/arthai-marketplace
/plugin install sentinel@arthai-marketplace
/reload-plugins
Close and reopen Claude Code (or start a new session). When the session starts, you’ll see this message:
OTEL_SETUP_REQUIRED: Observability is installed but not configured.
Run /otel-setup now.
/otel-setupType:
/otel-setup
The skill asks how you want to set up. Pick “Local” (option 2).
The skill then does everything for you automatically:
~/.arthai/docker-compose.yml:4319 (receives traces) + dashboard on :3100 (shows traces) + database (stores traces) + Watchtower (auto-updates the engine image daily)CLAUDE_CODE_ENABLE_TELEMETRY=1 — to .claude/settings.local.json in your project (project-local, git-ignored). If you pick the “global” scope option instead, they go to ~/.zshrc/~/.bashrc.You don’t need to do anything during this step — just wait for it to finish.
Close and reopen Claude Code so it picks up the new env block from .claude/settings.local.json. Without this restart, the env vars aren’t loaded and traces won’t flow. (If you chose the global scope instead of project-local, source ~/.zshrc works too.)
You’ve just restarted Claude Code. The dashboard exists but is empty — there’s no data yet because you haven’t done anything yet. Walk through these in order:
Open the dashboard in your browser. Go to http://localhost:3100. You should see the Arth Intelligence UI with Sessions / Traces / Insights tabs. The Sessions list will probably be empty at this point — that’s expected, you haven’t run anything yet.
If the page doesn’t load at all, the Docker stack may not be up. Run:
docker ps
You should see arthai-intelligence and arthai-db. If they’re missing:
docker compose -f ~/.arthai/docker-compose.yml up -d
what's in package.json?
Or:
/onboard
The toolkit’s OTEL hook emits trace spans for every prompt, tool call, agent spawn, and stop event. Native OTEL emits cost and token data alongside.
curl -s http://localhost:4319/api/health | jq .
Expect a JSON response with "status":"healthy". If it fails, check logs:
docker logs arthai-intelligence | tail -50
Click into your session. You should see a waterfall of spans — your prompt, the tool calls Claude Code made, any agent spawns, etc. Each span shows duration and metadata.
$0.0023, 1,847 tokens) → native OTEL is flowing. You’re done.— or are empty → only the toolkit hook is on. Native OTEL needs CLAUDE_CODE_ENABLE_TELEMETRY=1. Verify:
grep CLAUDE_CODE_ENABLE_TELEMETRY .claude/settings.local.json
# should print: "CLAUDE_CODE_ENABLE_TELEMETRY": "1"
If missing, re-run /otel-setup, pick Local again, then restart Claude Code.
If all 5 steps work — observability is working end-to-end. Future Claude Code sessions automatically send traces to http://localhost:3100. You don’t need to do anything else.
Sometimes you want to know: what does Claude Code do on its own, vs. what changes when the toolkit is active? Maybe you’re evaluating whether to keep the toolkit on for a particular workflow, or you want to debug a behavior and need to isolate “is this the toolkit or is this Claude itself?”
Observer-only mode is for that. It keeps the OTEL hook emitting telemetry (so you still see the run in the dashboard), but suppresses every toolkit-specific side effect:
export OTEL_OBSERVE_ONLY=true
Then launch Claude Code as you normally would. That session’s spans land in the dashboard exactly like a regular run, but:
skill.current.json written (the file that tracks active slash commands for span attribution)agent.<id>.json written (the file that brackets subagent spans)~/.arthai/otel-configured markerarth.observe_only=trueTo clear it, unset OTEL_OBSERVE_ONLY (or just open a new shell). It only applies to sessions started while the env var is set.
Typical A/B workflow:
/onboard to brief you, then ask a follow-up).export OTEL_OBSERVE_ONLY=true, and launch Claude Code again. Run the same prompts.skill.name = "onboard" on the tool spans); the observer run does not. Compare the trace shapes, durations, and span counts.That diff is the toolkit’s contribution to your workflow.
Precedence — which env wins:
| Set | Behavior |
|---|---|
| Nothing | Default — toolkit on, telemetry on |
OTEL_OBSERVE_ONLY=true |
Telemetry on, toolkit side effects off (this section) |
OTEL_DISABLED=true |
Telemetry off, toolkit off (overrides observer mode) |
Filtering observer runs in the dashboard UI ships separately (arth-intelligence#101). Until that lands, the
arth.observe_onlyattribute is in every OTLP payload — easiest way to see it today ispsql -d arth_engine -c "SELECT trace_id, name FROM spans WHERE start_ns > NOW()..."and cross-reference against the run you just kicked off.
Once you’ve captured a baseline run AND a regular toolkit run, the arth dashboard’s /experiments page renders them side-by-side across cost, tokens, calls, cache hit rate, lines edited, and active time.
Auto-tagging (default ON for both modes):
Every session you run automatically gets an arth.experiment label so it shows up in /experiments dropdowns without you having to set anything before each launch. The label format makes baseline vs toolkit easy to scan:
| Mode | Label format | Generated by |
|---|---|---|
no-toolkit baseline (just claude, no plugin) |
auto-baseline-<git-branch>-<first-prompt-slug>-<unix> |
Engine’s session-watcher (reads CC’s session JSONL) |
toolkit-on session (prime@arthai installed) |
auto-toolkit-<git-branch>-<unix> |
Toolkit’s OTEL hook (hooks/otel-telemetry.sh) |
Example after running the same task twice:
auto-baseline-main-debug-login-failure-1715890123
auto-toolkit-main-1715891456
Pick one as left, the other as right, click Compare.
Quick walkthrough — zero-config path:
claude in your project.claude again — the toolkit auto-installs the hook.http://localhost:3100/experiments. Both runs are already in the dropdowns.Custom labels (override auto-tag):
If you want a specific human-readable label instead of the auto one, set arth.experiment before launching — your value wins:
export OTEL_RESOURCE_ATTRIBUTES="$OTEL_RESOURCE_ATTRIBUTES,arth.experiment=prepme-credit-bug-baseline"
claude
Turn auto-tagging OFF:
Set ARTH_AUTO_EXPERIMENT_DISABLED=1 — same env var honored by both modes:
~/.arthai/docker-compose.yml environment: block → disables the engine watcher’s auto-tag (no-toolkit / orphaned-toolkit sessions)<project>/.claude/settings.local.json env block → disables the toolkit hook’s auto-tag (toolkit sessions)/otel-setup asks you about this at install time — you can flip it then or at any point later.
/marker "spike here" inside any session. An amber ◆ glyph appears on the dashboard’s DAG timeline within ~5s. From the dashboard, you can also click Drop marker on any session detail page./arth logs export --since 1h from inside Claude Code, OR use the dashboard sidebar’s “Download diagnostic bundle”. Filter by experiment, marker, session ID, and time range — all AND-compose.For the complete walkthrough (install from scratch, OTEL env wiring, telemetry verification, CLI + UI marker flows, export bundle internals, troubleshooting table), see the dedicated guide in the arth-intelligence repo: docs/compare-toolkit-vs-baseline.md.
After you reboot your Mac (or restart Docker Desktop), here’s what comes back automatically and what doesn’t:
| Layer | Survives reboot? | Why |
|---|---|---|
Env vars in .claude/settings.local.json |
✅ Yes | File on disk — Claude Code reads it on every session start |
~/.arthai/docker-compose.yml |
✅ Yes | File on disk |
arthai_data volume (your traces, scores, patterns) |
✅ Yes | Docker named volume — persistent across container restarts and reboots |
Engine container (arthai-intelligence) |
✅ Yes — IF set up after this guide | Compose template sets restart: unless-stopped. Customers who ran /otel-setup before this fix will need a one-time migration (see below). |
Postgres container (arthai-db) |
✅ Yes — same caveat | Same — depends on compose template having restart: unless-stopped. |
| Watchtower auto-updater | ✅ Yes | Already had restart: unless-stopped in older compose files. |
| Docker Desktop itself | ⚠️ Depends on YOU | Docker Desktop has a per-user “Start Docker Desktop when you log in” toggle (Settings → General). If it’s off, nothing comes back until you launch Docker Desktop manually. We can’t set this for you — it’s an OS-level user preference. |
Quick verify after a reboot:
docker ps --filter 'name=arthai'
# Should show 3 running containers: arthai-intelligence, arthai-db, arthai-watchtower
If any are missing, start them:
docker compose -f ~/.arthai/docker-compose.yml up -d
Migration for existing customers (set up before this fix):
If docker inspect shows your containers have RestartPolicy: no, run this one-liner — no data loss, no re-setup:
docker update --restart unless-stopped arthai-db arthai-intelligence
Or re-run /otel-setup and pick Local — the new compose template will overwrite ~/.arthai/docker-compose.yml with the right policy.
Opting out of auto-restart:
If you’d rather start Arth Intelligence manually each session (e.g., to save resources when not coding):
docker update --restart no arthai-db arthai-intelligence arthai-watchtower
You’ll need to docker compose -f ~/.arthai/docker-compose.yml up -d whenever you want the dashboard back.
There are three things that can update independently. Each has its own update path. None of them touch your trace data — your sessions, scores, and patterns live in the arthai_data Docker volume and are preserved across all updates.
| Layer | What updates it | How often |
|---|---|---|
The container image (arthai/intelligence) |
Watchtower sidecar — pulls + restarts the container | Daily, automatic |
The skill on disk (/otel-setup) |
Standard plugin update — /plugin update sentinel@arthai-marketplace |
When you update plugins |
Your local compose file (~/.arthai/docker-compose.yml) |
Re-running /otel-setup — overwrites with the latest template |
Only when you run the skill again |
Automatic (default). A watchtower sidecar shipped in the compose template checks once a day, pulls the latest arthai/intelligence image, and restarts only that container. You don’t need to do anything. To verify it’s running:
docker ps --filter name=arthai-watchtower
Manual — force an update right now. Run the hosted update script:
curl -fsSL https://arthtech-ai.github.io/arthai-marketplace/scripts/update.sh | sh
Or paste the two commands directly if you’d rather not pipe to shell:
docker compose -f ~/.arthai/docker-compose.yml pull
docker compose -f ~/.arthai/docker-compose.yml up -d
Both do the same thing — pull the latest image and recreate the container against it. The named volume arthai_data is left untouched.
To opt out of auto-updates:
docker stop arthai-watchtower
docker rm arthai-watchtower
You’ll then need to update manually using the script above whenever you want a new version.
/otel-setup/plugin update sentinel@arthai-marketplace
/reload-plugins
If /plugin update doesn’t pick up the change, fall back to the marketplace remove + re-add flow shown in the FAQ.
If a sentinel release changes the compose template (e.g. adds a new service), you’ll need to re-run /otel-setup to regenerate ~/.arthai/docker-compose.yml with the new content. Re-running the skill is safe — it overwrites the compose file but never touches the arthai_data volume.
docker compose -f ~/.arthai/docker-compose.yml down -v # ← DON'T DO THIS
The -v (--volumes) flag drops the arthai_data volume and erases every session, score, and pattern. Plain docker compose down (without -v) stops the containers but keeps the data — that’s safe and reversible. Only use down -v if you intentionally want to wipe everything and start fresh.
| Problem | Fix |
|---|---|
/otel-setup says “Docker is not running” |
Open Docker Desktop, wait for it to start, then re-run /otel-setup |
Dashboard at localhost:3000 shows nothing |
Run a Claude Code session first (traces are sent at the end of each session). Refresh the dashboard page. |
| Dashboard doesn’t load at all | Check Docker is running: docker ps should show arthai-intelligence and arthai-db containers. If not, run: docker compose -f ~/.arthai/docker-compose.yml up -d |
| Traces stop appearing after a restart | Run source ~/.zshrc to reload environment variables, or check that Docker containers are still running: docker ps |
| Want to stop the dashboard | Run: docker compose -f ~/.arthai/docker-compose.yml down |
| Want to restart the dashboard | Run: docker compose -f ~/.arthai/docker-compose.yml up -d |
| Want to update to the latest version | See 7h: Updating Arth Intelligence — auto-updates daily, force now with update.sh |
| Want to remove everything | Run: docker compose -f ~/.arthai/docker-compose.yml down -v (this deletes all trace data) |
On first use in a project, run:
/calibrate
This scans your codebase and configures the toolkit to match your project’s
patterns, conventions, and tech stack. It also builds a knowledge graph —
a ranked index of your project’s conventions, domain rules, and patterns that
workflows like /fix query automatically to get the most relevant context for
each task. The graph auto-rebuilds whenever your knowledge base changes.
Restart your Claude Code session so the knowledge graph gets built and the OTEL env block from Step 7 is picked up. Then:
/onboard # prioritized briefing on what to work on
/planning my-feature # start building with the toolkit
You’re ready.
After installing and calibrating:
/onboard # get a briefing on your project
/planning my-first-feature # try the planning workflow
/implement my-first-feature # spawn the team that builds it
/qa # commit-mode QA on the diff
/pr # create the PR
If you’re not building something new, two good explorations:
/tech-debt # survey, prioritize, and propose plans for tech debt
/perf <scope> # cross-functional performance pass
The full list of every skill in your installed bundles is at skills-reference.md. The most common ones grouped by what you’re doing:
| You want to… | Use |
|---|---|
| Onboard / decide what to work on | /onboard, /welcome, /wizard |
| Plan and build a feature | /planning (includes design spec HTML by default), /implement, /qa, /pr |
| Fix a bug formally | /fix <description\|#issue> |
| Ship code | /precheck, /qa, /revert-check, /pr (or /ship for the one-shot) |
| Review a PR | /review-pr <#N> |
| Audit code health | /tech-debt, /perf, /lighthouse |
| Generate or audit docs | /docs <audit\|write\|check> |
| Repair a broken pipeline | /incident, /ci-fix, /sre |
| Restart local servers | /restart [service] |
| Deploy | /deploy <local\|staging\|...>, /deploy-ios |
| Schedule recurring agents | /schedule-routine, /autopilot |
| Manage GitHub issues | /issue <title>, /issue list, /issue close #N |
| Share a plan or strategy | /share <plan> --format md\|slack\|twitter |
| Generate from templates | /templates <type> <topic> |
Bundle-specific skills (consulting, design, etc.) live in their respective bundles — install the bundle to surface them. See the plugin catalog.
The prime bundle includes a Cowork Dispatch skill: paste a tweet URL in Cowork
and it automatically queues /monitor-tweet on your desktop Claude Code.
Additional requirement: the Cowork skill dispatches to ~/.claude-agents on
your Mac — the OG toolkit clone. You need both:
# 1. Plugin install (above) — surfaces the skill in Cowork
# 2. Clone the toolkit to ~/.claude-agents — provides /monitor-tweet on desktop
git clone git@github.com:ArthTech-AI/claude-agents.git ~/.claude-agents
~/.claude-agents/install.sh --key ARTH-XXXX-XXXX-XXXX-XXXX ~/.claude-agents
Without the clone, the Cowork skill fires but the desktop pipeline has no
/monitor-tweet to run.
When you run /calibrate, the toolkit builds a knowledge graph from your
project’s knowledge base (.claude/knowledge/). This is a ranked index of your
conventions, domain rules, patterns, and vocabulary that agents query automatically.
How it works:
/calibrate scans your codebase and populates .claude/knowledge/shared/ with
conventions, domain rules, patterns, and vocabulary.claude/knowledge/graph/)/fix, /planning, /implement, /qa, etc.), it queries
the graph to pull in only the most relevant context — instead of loading every
knowledge file in fullWhat this means for you:
/calibrate