arthai

AI-powered development toolkit for Claude Code

38 agents · 53 skills · 29 hooks — 12 installable bundles

← Back to overview

Get Started

Prerequisites

  • Claude Code installed and working
  • Node.js 22+ (for npx arthai-activate)
  • Git
  • macOS or Linux
  • An Arth AI license key (ARTH-XXXX-XXXX-XXXX-XXXX)
  • For observability: Docker Desktop running, ports 4319, 3100, 5432 available on localhost

Eight steps from zero to your first AI-assisted session.

1

Get a license key

Email productive@getarth.ai to request your license key.

2

Activate

npx arthai-activate ARTH-XXXX-XXXX-XXXX-XXXX
3

Add the marketplace

/plugin marketplace add ArthTech-AI/arthai-marketplace
4

Install a bundle

/plugin install prime@arthai-marketplace

prime is the everything bundle — all agents, skills, and hooks. To pick a focused bundle instead, see the specific bundles.

5

Enable auto-updates

/plugin update-policy arthai-marketplace auto

Keeps your toolkit up to date automatically — new agents, skills, and fixes land without manual intervention.

6

Enable observability · optional · experimental, limited preview

Optional — you can skip this and jump to Step 7. The toolkit works without observability. Come back here whenever you want a dashboard view of what Claude Code is doing.

/otel-setup

If you installed prime, /otel-setup is already available. Otherwise install sentinel@arthai-marketplace first. Pick Local in the prompt — starts a Docker container (engine on :4319, dashboard on :3100) and writes CLAUDE_CODE_ENABLE_TELEMETRY=1 + OTLP env to .claude/settings.local.json.

Then restart your Claude Code session so it picks up the new env block — without the restart, the env vars aren't loaded and traces won't flow.

Then verify it’s working — do these in order:

  1. Open the dashboard. Go to http://localhost:3100 in your browser. You should see the Arth Intelligence UI (Sessions / Traces / Insights tabs). If the page doesn’t load, run docker ps — you should see arthai-intelligence and arthai-db. If they’re missing, run docker compose -f ~/.arthai/docker-compose.yml up -d.
  2. Generate some activity. Back in Claude Code, run any prompt — even something trivial like “what’s in package.json?”. The toolkit emits trace spans for every prompt, tool call, agent spawn, and stop event.
  3. Refresh the dashboard. Your session appears in the Sessions list with a recent timestamp. If nothing shows after 10 seconds, check the engine: curl -s http://localhost:4319/api/health.
  4. Click into your session. You see a waterfall of spans — your prompt, tool calls, agent spawns — each with duration and metadata.
  5. Confirm cost columns are populated. If cost_usd / token columns show values, native OTEL is flowing — you’re done. If they show or are empty, only the toolkit hook is on. Check:
    grep CLAUDE_CODE_ENABLE_TELEMETRY .claude/settings.local.json
    Should print "CLAUDE_CODE_ENABLE_TELEMETRY": "1". If missing, re-run /otel-setup, pick Local again, then restart Claude Code.

Observability is in active development — expect rough edges. Full guide →

7

Calibrate

/calibrate

Deep-learns your project’s architecture, patterns, and domain. Builds a knowledge graph that all agents query.

8

Start a new session

Restart your Claude Code session so the knowledge graph gets built and the OTEL env block is picked up. Then run /onboard for a prioritized work briefing.

← Back to overview

Observability · Experimental, limited preview

⚠ Experimental — limited preview. Observability is in active development. Expect rough edges, breaking changes between releases, and gaps in coverage. Feedback welcome at productive@getarth.ai.

The toolkit ships an OTEL hook that emits a span for every prompt, tool call, agent spawn, skill invocation, and stop event. Paired with Claude Code's native OTEL, you get cost USD and token data on those spans — both streams flow into the same local Arth Intelligence container.

Prerequisites

  • Docker Desktop installed and running
  • Ports 4319, 3100, 5432 available on localhost (engine, dashboard, postgres)
  • The sentinel plugin installed (or prime, which includes sentinel)

Setup (one time)

/plugin install sentinel@arthai-marketplace
/otel-setup

The skill verifies Docker is running, writes ~/.arthai/docker-compose.yml, starts the engine + dashboard + Postgres + Watchtower auto-updater, and writes the OTEL env vars to .claude/settings.local.json (project-local, git-ignored):

CLAUDE_CODE_ENABLE_TELEMETRY=1
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4319
OTEL_EXPORTER_OTLP_PROTOCOL=http/json
OTEL_METRICS_EXPORTER=otlp
OTEL_LOGS_EXPORTER=otlp
OTEL_TRACES_EXPORTER=otlp

Why CLAUDE_CODE_ENABLE_TELEMETRY=1 is required

Two telemetry streams, complementary:

StreamSourceCarries
Trace spanstoolkit otel-telemetry hooksession, prompt, tool calls, agent spawns, skill invocations, stop events
Cost + token metricsClaude Code native OTEL (env-gated)per-call cost USD, input/output/cache tokens, model

If native OTEL is off, traces still flow but the dashboard's cost and token columns stay empty.

Verify

After /otel-setup finishes, start a new Claude Code session in the project, run any prompt, then visit:

  • Dashboard: http://localhost:3100 — Sessions, Traces, Insights tabs
  • Engine health: curl -s http://localhost:4319/api/health | jq .

If the dashboard shows your session with non-empty cost, you're done.

What survives a reboot

After a Mac reboot or Docker Desktop restart:

  • ✅ Env vars in .claude/settings.local.json — file on disk, picked up on next session
  • ✅ Compose file at ~/.arthai/docker-compose.yml — file on disk
  • ✅ Trace data in the arthai_data Docker volume — preserved across container restarts
  • ✅ Engine + DB + Watchtower containers — auto-restart because the compose template sets restart: unless-stopped on every service
  • ⚠ Docker Desktop itself — depends on a per-user OS toggle (Settings → General → "Start Docker Desktop when you log in"). We can't set this for you.

Quick verify after a reboot:

docker ps --filter 'name=arthai'

Should show three running containers (arthai-intelligence, arthai-db, arthai-watchtower). If any are missing:

docker compose -f ~/.arthai/docker-compose.yml up -d

Migration for existing customers (set up before the reboot-durability fix landed) — your engine and DB may have RestartPolicy: no. One-line fix, no data loss:

docker update --restart unless-stopped arthai-db arthai-intelligence

Or re-run /otel-setup → Local. The skill detects the legacy compose, prints the migration command, and the new template overwrites ~/.arthai/docker-compose.yml with the right policy on every service.

Updating & disabling

The Watchtower sidecar pulls the latest arthai/intelligence:latest image once a day. Trace data lives in the arthai_data Docker volume and is preserved across updates. Force-update now: docker compose -f ~/.arthai/docker-compose.yml pull && docker compose -f ~/.arthai/docker-compose.yml up -d.

To disable telemetry: export OTEL_DISABLED=true or remove the env block from .claude/settings.local.json. To opt out of auto-restart: docker update --restart no arthai-db arthai-intelligence arthai-watchtower.

← Back to overview

What Can I Do?

Pick an intent to see the recommended workflow.

Learn

Set up your project and learn the codebase

/calibrate → /onboard
calibrateonboardscanwiki-knowledge-basewelcome

Build

Plan features, implement with an AI team, ship PRs

/planning my-feature
planningimplementqaprprecheckreview-pr

Fix

Debug bugs, repair CI, triage incidents

/fix #42 or /incident "site down"
fixci-fixincidentsre

Test

4-layer QA, E2E generation, visual regression, 8 agents

/qa commit or /qa full
qaqa-e2e-genqa-visualqa-challengerqa-domain

Automate

Autonomous mode and event-driven remediation

/autopilot
autopilotdeploy-announceci-fix
← Back to overview

Learn Your Project

New to a project? /calibrate deep-scans your codebase — architecture, conventions, stack, domain model. Then /onboard gives you a prioritized briefing: what’s broken, what’s waiting, what to work on.

/calibrate
/onboard
start working
/calibrate
Scans source code, detects platform (web, mobile, CLI), maps integrations, recommends MCP servers and agents. Writes .claude/project-profile.md. Run once per project.
/onboard
Gathers git state, GitHub PRs/issues, environment health. Presents 4-tier briefing: Fix first → Waiting on you → Continue → Available. Suggests what to work on.
← Back to overview

Build a Feature

Full feature development with an adversarial planning team, parallel implementation agents, QA, and automated PR creation.

/planning my-feature
/implement my-feature
/qa commit
/pr
/planning
Spawns PM (Opus) + Architect (Opus) + Devil's Advocate (Sonnet). 3 rounds of structured debate. Produces a hardened plan. Modes: --lite, --fast, full.
/implement
Reads the plan, spawns backend + frontend + QA + red-team agents in parallel. Builds, tests, creates PR-ready diff.
/qa commit
Targeted QA on changed files — unit tests, regressions, acceptance criteria.
/pr
Stages, commits, pushes, creates GitHub PR with QA results.
← Back to overview

Fix a Bug

6-step formal pipeline. Root cause analysis, scope lock, behavior contract, fix, verify, PR.

/fix "description"
RCA
scope lock
contract
implement
verify
/pr
Hotfix mode
Use --hotfix for production emergencies (skips non-essential steps).
Severity levels
Use --severity critical|high|medium to set priority.
← Back to overview

Fix CI

Auto-reads CI failure, diagnoses root cause, patches, resubmits. 3 retry attempts. Discord alert if all fail.

/ci-fix
reads logs
diagnoses
patches
resubmits (up to 3x)
/ci-fix details
Reads CI logs from GitHub Actions, diagnoses the root cause, applies a targeted patch, and resubmits. Retries up to 3 times before sending a Discord alert. Supports ci, staging, and prod targets.
← Back to overview

Quality Assurance

Four-layer test strategy with 8 specialized agents. Commit mode for fast checks, full mode for comprehensive validation, plus opt-in E2E generation and visual regression.

Four-Layer Test Strategy

1. Baseline Tests

Existing test suites — regression anchor, same every run

2. Generated Scenarios

Fresh every run — thinks like real users based on the diff

3. Property-Based Tests

Infer invariants from code changes, test with random/edge-case inputs

4. Coverage Audit

Reviews if existing tests still match the codebase

Modes

/qaCommit mode — targeted checks on changed files (~1-3 min)
/qa fullComprehensive — all checks across full codebase (~10-20 min)
/qa stagingHealth + smoke + E2E against deployed staging
/qa prodRead-only health + smoke against production
/qa e2e-genGenerate exploratory Playwright tests for changed components (opt-in)
/qa visualComputer-use visual regression at desktop + mobile viewports (opt-in)

QA Agents (8)

qa sonnet
Orchestrator — spawns sub-agents, collects results, produces report
qa-e2e sonnet
Playwright E2E tests for user workflows
qa-e2e-gen sonnet
Generates exploratory Playwright tests from diffs
qa-visual sonnet
Computer-use visual regression (desktop + mobile)
qa-challenger sonnet
Adversarial red-teaming of test plans
qa-domain sonnet
Domain logic validation (state machines, constraints)
qa-baseline-updater sonnet
Manages test snapshots and golden files
qa-test-promoter haiku
Promotes generated tests that caught bugs to baselines

Typical Flow

/qa commit
4-layer analysis
report
/qa e2e-gen (opt-in)
/qa visual (opt-in)
Related skills
/qa-incident — log a QA incident from a known issue
/qa-learn — review QA knowledge base stats, prune stale entries
/ci-fix — auto-remediate CI failures (3 retry attempts)
← Back to overview

Ship Code

/precheck runs tests locally in ~30s. /qa validates changed files. /pr creates the PR.

/precheck
/qa commit
/pr
/precheck
Runs lint, type-check, and test suite locally. Catches CI failures in ~30 seconds instead of the 4-minute GitHub Actions round-trip.
/qa commit
Targeted QA on changed files only — fast validation before shipping.
/pr
Stages, commits with a conventional message, pushes, and opens a GitHub PR with QA summary in the description.
← Back to overview

Autonomous Mode

Fully autonomous. Picks highest-priority unassigned issue, plans, implements, QAs, PRs. Stops for merge approval.

/autopilot
picks issue
plans
implements
QA
PR
waits for merge
repeats
/autopilot details
Starts a continuous loop: fetches the highest-priority unassigned GitHub issue, creates a plan, implements, runs QA, and opens a PR. Pauses for human merge approval before picking the next issue. Zero intervention required between cycles.
← Back to overview

Research & Knowledge

Build curated topic wikis. Init scaffolds, ingest processes sources, query synthesizes answers, lint health-checks.

/wiki-knowledge-base init topic
ingest
query
lint
init
Scaffolds a new wiki directory structure for a topic with index, pages, and metadata files.
ingest
Processes source material (docs, URLs, text) and adds structured entries to the wiki.
query
Synthesizes answers from wiki content with source citations.
lint
Health-checks the wiki — finds stale entries, broken references, coverage gaps.
← Back to overview

Operations

Health checks, log tailing, deploy watching, incident triage, server restarts.

/sre statusHealth check across all services and infrastructure
/sre logsTail and analyze logs from running services
/sre watchWatch an active deployment for issues
/incidentClassify severity, diagnose in parallel, route to the right fix skill
/restartDiscover, restart, and validate local dev servers with health checks
← Back to overview

Event-Driven Monitors

Instead of polling ("check CI every 5 minutes"), monitors sleep until something happens and then wake the toolkit to respond automatically. Zero API calls while idle — you only pay when an event fires.

How it works

/calibrate detects your stack
GitHub Actions, Railway, Sentry...
Writes config to .claude/monitors/
you add webhook URL on platform
Platform event fires → skill auto-runs
/ci-fix, /sre, /qa, /fix

Each monitor is a JSON config in .claude/monitors/. Calibrate generates them from templates in monitors/ (repo root), adapted to your project's platform and branch.

Available monitor templates

github-ci.json

When CI fails on any non-default branch, /ci-fix automatically reads the failure log, diagnoses the issue, patches, and resubmits. Up to 3 retries with different strategies.

Event: workflow_run.conclusion == failure

deploy-health.json

When a deploy fails or a service crashes, /sre debug investigates logs, checks health endpoints, and attempts remediation before paging you.

Event: status == FAILED or CRASHED

staging-qa.json

When staging deploys successfully, /qa staging automatically runs the full QA suite against the live staging environment. No manual trigger needed.

Event: status == SUCCESS and environment == staging

runtime-errors.json

When runtime errors exceed a threshold (e.g., 5+ occurrences of a new error), /fix auto-runs with the error details. Includes 60-min cooldown and deduplication by fingerprint.

Event: occurrences >= 5 and status == unresolved

Setup

# Step 1: Run /calibrate — it auto-detects your stack and generates
# the right monitor configs in .claude/monitors/
/calibrate

# Step 2: Calibrate shows which monitors it generated:
#   ✓ .claude/monitors/github-ci.json (GitHub Actions detected)
#   ✓ .claude/monitors/deploy-health.json (Railway detected)
#   ✗ runtime-errors.json (no Sentry/Datadog detected)

# Step 3: Add webhook URLs on your platforms
#   GitHub:  Repo Settings → Webhooks → paste monitor endpoint URL
#   Railway: Project Settings → Webhooks → paste monitor endpoint URL
#   Sentry:  Settings → Integrations → Webhooks

# Step 4: Set webhook secrets in your environment
export GITHUB_WEBHOOK_SECRET="your-secret-here"
export DEPLOY_WEBHOOK_SECRET="your-secret-here"

# Done. Events fire → toolkit responds automatically.

Safety: Monitors include built-in loop guards — after 3 failed auto-fix attempts on the same branch, the monitor suspends itself and sends a Discord alert instead of retrying forever.

Re-running: If you add Sentry or change deploy platforms later, run /calibrate rescan to regenerate monitors for the new stack.

← Back to overview

Consulting

Full consulting engagement pipeline. Discovery, assessment, opportunity mapping, solution design, deliverables.

/client-discovery
/consulting
/opportunity-map
/solution-architect
/deliverable-builder
/client-discovery
Runs an AI readiness assessment for a client — current capabilities, gaps, maturity level.
/consulting
Full engagement management with phase tracking: discovery, assess, propose, design, deliver, track.
/opportunity-map
Maps AI opportunities by ROI and implementation effort. Recommends target initiatives.
/solution-architect
Designs technical AI solutions with architecture diagrams, tech stack, timeline, and risk analysis.
/deliverable-builder
Generates client-ready deliverables: final reports, board decks, implementation guides, training plans.
← Back to overview

Design

Add --design to planning for UX research and design critique before implementation.

/planning --design
design thinker + critic
/implement
Design workflow details
The --design flag adds UX researcher and design critic agents to the planning team. They conduct user journey mapping, heuristic evaluation, and accessibility review before any code is written.
← Back to overview

All Skills (53)

All Consulting Development Operations Quality Research Setup
← Back to overview

All Agents (38)

← Back to overview

Bundles (12)

Bundles are curated packages of agents, skills, and hooks that work together. Install one bundle to get a complete workflow. Bundles compose — install multiple without conflicts.

Which bundle should I pick?

Research + knowledge
atlas
Design-driven development
canvas
Product strategy
compass
AI consulting toolkit
counsel
Autonomous orchestration
cruise
Full development workflow
forge
Everything
prime
Deep QA
prism
Surgical bug fixing
scalpel
Operations and reliability
sentinel
Safety guardrails
shield
Project setup and onboarding
spark

All bundles

← Back to overview

Hooks (29)

Lifecycle hooks fire automatically at key moments. No manual invocation needed.