LESSON 01 START

Installation

Nanika is a multi-agent orchestration layer that runs on top of Claude Code. The installer is interactive — it checks prerequisites, lets you choose which plugins to include, builds the binaries, and runs doctor checks before finishing.

Prerequisites

Before running the installer, make sure the following are available on your system:

Dependency	Version	Required for
`Go`	>= 1.25	Skills and most plugins
`Claude Code`	latest	Agent integration
`Rust / Cargo`	latest	`tracker` plugin only

Rust is only needed if you plan to install the tracker plugin. The core system runs entirely on Go and Claude Code.

Clone and Install

Clone the repository and run the interactive installer:

git clone https://github.com/joeyhipolito/nanika
cd nanika
scripts/install.sh

The installer will walk you through each step, showing you what it's about to do before doing it. If a prerequisite is missing, it will tell you exactly what to install and how.

Install Flags

For automated environments or when you know exactly what you want, the installer accepts flags that skip the interactive prompts:

Flag	Behavior
`scripts/install.sh`	Interactive: pick what to install
`scripts/install.sh --core`	Core only (orchestrator + nen + tracker + scheduler)
`scripts/install.sh --all`	Core + discord + telegram
`scripts/install.sh --plugins discord`	Core + specific plugins
`scripts/install.sh --no-interactive`	CI: core only, no prompts
`scripts/install.sh --dry-run`	Show what would be installed
`scripts/install.sh --repair`	Re-check prereqs, rebuild broken plugins

After Installation

Once the installer finishes, open the nanika directory inside Claude Code:

cd nanika
claude

Claude Code reads the CLAUDE.md file at the root of the repository and automatically discovers all skills. You don't need to register anything manually — the skills index is built into the project structure.

Tip: Run scripts/install.sh --dry-run first if you want to see exactly what the installer will do before committing to it. This is especially useful in shared environments.

Repair Mode

If something breaks after installation — a plugin fails to build, a prerequisite was updated — run the installer in repair mode:

scripts/install.sh --repair

Repair mode re-checks all prerequisites and rebuilds any plugin binaries that are missing or outdated. It won't touch plugins that are already working.

LESSON 02 START

Quick Start

From zero to your first agent mission in under five minutes. This page covers the essential steps to get nanika running and explains what's happening under the hood.

The Three Steps

The entire setup is three commands:

git clone https://github.com/joeyhipolito/nanika
cd nanika
scripts/install.sh

Once installation completes, open the directory in Claude Code:

claude

Claude Code reads CLAUDE.md at startup and discovers all installed skills automatically. No manual registration, no config files to edit.

Your First Prompt

With Claude Code open inside the nanika directory, try this prompt:

research golang error handling best practices and write a report

Nanika decomposes the task into phases, spawns specialized worker agents, and coordinates results between them. The output is a finished report — not just a single response, but a structured artifact produced by multiple agents working in sequence.

What just happened: The orchestrator broke your request into phases (research, write, review), assigned each phase to a persona with the right specialization, ran them in dependency order, and collected the results. You gave one prompt; three agents handled it.

Key Scripts

The scripts/ directory contains utilities you'll use regularly:

Script	What it does
`scripts/install.sh`	Interactive installer — checks prereqs, picks plugins, builds, and runs doctor
`scripts/new-mission.sh <slug>`	Creates a new mission file at `~/.alluka/missions/<slug>.md`
`scripts/generate-agents-md.sh`	Regenerates the `AGENTS.md` routing index from the current persona set

How Skill Discovery Works

When you open nanika in Claude Code, the agent reads CLAUDE.md. That file contains a skills index — a table mapping skill names to their SKILL.md files. Claude Code loads each skill file, which teaches the agent what CLI commands are available, how to invoke them, and what each does.

This means skills are self-documenting. Adding a new plugin is as simple as dropping a SKILL.md into the right directory and running scripts/generate-agents-md.sh to update the index.

How Missions Flow

Every task you give nanika goes through the same pipeline:

Decompose — The task is broken into PHASE lines, each with a persona and objective.
Spawn — Workers execute in parallel where dependencies allow.
Collect — Artifacts from each phase flow into dependent phases.
Review — Quality gates check the output before the mission completes.

The next lesson walks through a real mission step by step, showing exactly what each phase produces and how results flow between them.

LESSON 03 START

First Mission

A mission is a task decomposed into phases, with each phase handled by a specialized agent. This lesson walks through what happens when you send nanika its first real task — from the moment you type your prompt to the moment the output lands.

The Mission Pipeline

Every mission follows the same high-level flow:

task → decompose → plan → spawn workers → collect artifacts → review → done

None of this is visible as individual commands — it all happens inside the orchestrator when you submit a prompt. Understanding the pipeline helps you predict what the system will do, write better prompts, and debug when something doesn't go as expected.

A Real Example

Here's what happens when you send:

research AI agent memory systems and write a report

The orchestrator decomposes this into three phases:

PHASE: research  | PERSONA: architect           | OBJECTIVE: Compare 5 agent memory approaches
PHASE: write     | PERSONA: technical-writer    | OBJECTIVE: Draft the report | DEPENDS: research
PHASE: review    | PERSONA: staff-code-reviewer | OBJECTIVE: Review for accuracy | DEPENDS: write

Three specialized workers execute in dependency order. Each worker is a full Claude Code session loaded with a persona prompt that shapes its behavior and priorities.

Key insight: The DEPENDS: field controls execution order. Phases with no dependencies run in parallel. Phases that depend on earlier phases wait for those artifacts to be written before starting.

The Five Stages

1. Decompose

The task is broken into PHASE lines. Each phase has a name, a persona, and an objective. Dependencies between phases are declared explicitly — nothing is inferred at runtime. Decomposition can be done by the orchestrator's LLM (for open-ended prompts) or from a pre-written mission file (deterministic, good for repeatable workflows).

2. Route

Each phase is assigned a model tier: think for complex reasoning, work for standard implementation, quick for fast, cheap tasks. The orchestrator also picks the runtime — Claude Code for agentic tasks that need file access and tool use, or Codex for code-only tasks.

3. Spawn

Workers launch in parallel wherever the dependency graph allows. Each worker gets:

A persona CLAUDE.md that defines its role, constraints, and output contract
Access to the skills its role needs
A private workspace at ~/.alluka/workspaces/<mission-id>/workers/<persona-phase>/
The artifact outputs from any phases it depends on

4. Gate

Review phases are special — they block the mission from completing until quality criteria are met. If a review phase fails, the orchestrator injects a fix phase followed by a re-review cycle. Gates exist to prevent bad output from being marked as done.

5. Learn

When the mission finishes, metrics are recorded: duration, retry count, which phases failed and why. Nen observers scan these metrics for patterns. Anomalies surface as findings. Findings feed back into improved decomposition prompts and persona configurations over time.

Where Artifacts Live

Each worker writes its output to its workspace directory. Dependent phases read from those directories. The mission report and final artifacts are collected at:

~/.alluka/workspaces/<mission-id>/

You can inspect a running mission's partial output at any time by reading from that directory. Nothing is buffered in memory — everything is written to disk as it's produced.

LESSON 04 START

Doctor Check

Before running missions, confirm your installation is healthy. There's no single nanika doctor command — health checks happen at two levels: the installer (full stack) and individual plugins (per-plugin dependencies).

Full Stack Check

To re-verify prerequisites and rebuild anything broken, run the installer in repair mode:

scripts/install.sh --repair

Repair mode re-checks Go, Claude Code, and Rust/Cargo, then rebuilds any plugin binaries that are missing or outdated. It skips plugins that are already healthy, so it's safe to run after any system update.

Plugin Doctor Commands

Every plugin exposes a doctor subcommand that checks its specific dependencies — API keys, browser cookies, external service connectivity:

scheduler doctor
tracker doctor
discord doctor
telegram doctor

# Machine-readable output for scripting
discord doctor --json

Run the doctor command for any plugin you plan to use before relying on it in a mission. These checks catch issues the installer can't see, like expired OAuth tokens or missing credentials.

Nen Health Score

For a broader view of system health, query the Shu scanner:

shu query status --json

This returns an overall health score (0–100), a count of critical findings, and whether the Nen daemon is running. It's the fastest way to check if anything has degraded since your last session.

Runtime Data Directory

Nanika stores all runtime data under ~/.alluka/:

Path	Contents
`~/.alluka/missions/`	Mission definition files (`.md` format)
`~/.alluka/workspaces/`	Per-mission worker workspaces and artifacts
`~/.alluka/metrics.db`	Mission metrics: duration, retries, failures
`~/.alluka/nen/findings.db`	Nen observer findings and anomaly records

Diagnosing and Repairing Issues

If a boot check fails or a plugin stops working after a system update, run the installer in repair mode:

scripts/install.sh --repair

Repair mode re-checks all prerequisites and rebuilds any plugin binaries that are broken or missing. It skips plugins that are already healthy, so it's safe to run even if only one thing is broken.

Plugin-Level Doctor Commands

Individual plugins also expose their own doctor commands. These check the plugin's specific dependencies — API keys, browser cookies, external services — and report whether the plugin is ready to use:

# Example: check the discord plugin
discord doctor

# Check with JSON output for scripting
discord doctor --json

Run the doctor command for any plugin you plan to use before relying on it in a mission. Plugin-level checks catch issues that the global boot sequence doesn't cover, like missing OAuth tokens or expired session cookies.

Good practice: After any system update (Go version bump, Claude Code update, OS upgrade), run scripts/install.sh --repair and shu query status --json. Catching a broken binary before a mission is much less disruptive than mid-run.

LESSON 05 ORCH

How Missions Work

The orchestrator is the engine that runs nanika. It takes a task, decomposes it into phases, routes each phase to a specialized agent, coordinates execution, enforces quality gates, and records what it learned. This lesson explains the full pipeline in detail.

The Full Pipeline

task → decompose → plan → spawn workers → collect artifacts → review → done

This isn't just a conceptual model — it's the literal execution path every mission takes. Understanding each step lets you write better missions, predict behavior, and debug failures.

System Architecture

Before the pipeline, it helps to see how the layers fit together:

┌────────────────────────────────────────────────────────┐
│  Claude Code  (reads CLAUDE.md → discovers skills)     │
├────────────────────────────────────────────────────────┤
│  Orchestrator                                          │
│  ┌─────────────┐  decomposes task into phases          │
│  │ decomposer  │  assigns personas + dependencies      │
│  └─────────────┘  spawns workers (Claude Code / Codex) │
│         │                                              │
│         ▼  workers call plugins via SKILL.md           │
├────────────────────────────────────────────────────────┤
│  Plugins  (CLIs in ~/bin, via plugin.json)             │
│  nen / tracker / scheduler / discord / telegram        │
│         ▲                                              │
│         │  subscribe to events                         │
├────────────────────────────────────────────────────────┤
│  Event Bus  (JSONL files + UDS socket)                 │
├────────────────────────────────────────────────────────┤
│  ~/.alluka/                                            │
│  missions/ · workspaces/ · metrics.db · nen/findings.db │
└────────────────────────────────────────────────────────┘

Claude Code sits at the top as the human-facing interface. The orchestrator runs beneath it as the coordination engine. Plugins are subscribers on the event bus — they observe and react, but the orchestrator doesn't depend on them being present.

Step 1: Decompose

The orchestrator receives a task and converts it into a set of PHASE lines. Each phase has a name, a persona, an objective, and an optional list of dependencies. Two decomposition modes are available:

LLM decomposition — for open-ended prompts, the orchestrator uses a model to produce PHASE lines. The resulting plan is reviewed before spawning starts.
Pre-decomposed — for repeatable workflows, you write the PHASE lines yourself in a mission file. Deterministic and faster to start.

Step 2: Route

Each phase is assigned:

A model tier: think for complex multi-step reasoning, work for standard tasks, quick for cheap fast responses
A runtime: Claude Code for agentic tasks (file access, tool use, multi-turn), or Codex for pure code generation

Routing decisions are made once at plan time, not dynamically during execution.

Step 3: Spawn

Workers launch in parallel wherever the dependency graph permits. A worker that depends on phase A and phase B won't start until both have written their artifact outputs. Each worker receives:

Its persona CLAUDE.md (defines role, constraints, output contract, methodology)
Skill access appropriate to its role
A private workspace at ~/.alluka/workspaces/<mission-id>/workers/<persona-phase>/
The artifact files from phases it depends on, injected as prior context

Workers are isolated: Each worker is a separate Claude Code session with its own workspace directory. Workers don't share state or communicate directly — all coordination happens through artifact files written to disk.

Step 4: Gate

Review phases are quality gates. A review phase reads the output of a preceding phase and evaluates it against defined criteria. If the review fails:

The orchestrator injects a fix phase targeting the specific failures
The fix phase runs and produces a corrected artifact
The review phase runs again against the corrected output
This cycle repeats until the review passes or a retry limit is reached

Gates prevent missions from completing with known-bad output. They're especially important for code generation, security reviews, and documentation that needs to meet specific standards.

Step 5: Learn

When a mission completes (successfully or not), the orchestrator records metrics to ~/.alluka/metrics.db:

Total duration and per-phase duration
Retry counts per phase
Which phases failed and what the failure reason was
Model tier and runtime used per phase

Nen observers scan these metrics continuously for patterns. When a pattern is significant enough, it surfaces as a finding in ~/.alluka/nen/findings.db. Findings accumulate over time and feed back into better defaults for decomposition and routing.

LESSON 06 ORCH

Phase Lines

A PHASE line is the atomic unit of a mission plan. Each line declares one unit of work: who does it, what they should accomplish, and what they need before they can start. Understanding the syntax lets you write precise, predictable mission files.

Syntax

PHASE: <name> | PERSONA: <persona> | OBJECTIVE: <objective> [| DEPENDS: <phase1,phase2>]

Fields are separated by pipe characters (|). The DEPENDS field is optional — phases with no declared dependencies can start immediately and run in parallel with other independent phases.

A Three-Phase Mission

The simplest useful pattern is research → write → review:

PHASE: research  | PERSONA: architect           | OBJECTIVE: Compare 5 agent memory approaches
PHASE: write     | PERSONA: technical-writer    | OBJECTIVE: Draft the report | DEPENDS: research
PHASE: review    | PERSONA: staff-code-reviewer | OBJECTIVE: Review for accuracy | DEPENDS: write

Here, research starts immediately. write waits for research to finish and then receives its output as context. review waits for write. This produces a linear chain where each phase builds on the previous one.

Parallel Phases

Phases without dependencies run simultaneously. A more complex mission might separate concerns and implement in parallel:

PHASE: design    | PERSONA: architect                | OBJECTIVE: Define the API contract
PHASE: implement | PERSONA: senior-backend-engineer  | OBJECTIVE: Build the service
PHASE: review    | PERSONA: security-auditor         | OBJECTIVE: Audit auth flow | DEPENDS: implement

In this plan, design and implement could potentially run in parallel (neither declares a dependency on the other). review waits for implement. In practice you'd want implement to depend on design, but the syntax gives you full control over which phases must be sequential and which can overlap.

Available Personas

Each persona has a CLAUDE.md file that defines its role, constraints, methodology, and output contract. Pick the persona whose specialization matches the work of the phase:

Persona	Best for
`academic-researcher`	Literature reviews, comparative analysis, citations
`architect`	System design, API contracts, architectural decisions
`data-analyst`	Data processing, statistical analysis, visualization
`devops-engineer`	Infrastructure, CI/CD, deployment configuration
`qa-engineer`	Test planning, test writing, coverage analysis
`security-auditor`	Security review, threat modeling, vulnerability analysis
`senior-backend-engineer`	Server-side implementation, APIs, database work
`senior-frontend-engineer`	UI implementation, accessibility, client-side logic
`staff-code-reviewer`	Code review, quality gates, standards enforcement
`technical-writer`	Documentation, reports, structured prose

Writing Mission Files

For repeatable workflows, write PHASE lines in a mission file rather than prompting the orchestrator to decompose on the fly. Create a new mission file with:

scripts/new-mission.sh my-feature

This creates ~/.alluka/missions/my-feature.md with a template structure. Edit the PHASE lines to match your workflow, then run it:

orchestrator run ~/.alluka/missions/my-feature.md

Pre-decomposed vs. LLM decomposition: Mission files with explicit PHASE lines are deterministic — the same file produces the same plan every time. LLM decomposition is flexible but non-deterministic. Use mission files for workflows you run repeatedly; use prompts for one-off explorations.

Objective Writing Tips

The OBJECTIVE field is what the worker agent reads as its primary instruction. Write it like a clear deliverable, not a vague description:

Good: Compare 5 agent memory approaches — output a markdown table with pros/cons and a recommended approach
Too vague: Research memory systems
Good: Audit the authentication flow in auth/middleware.go for OWASP Top 10 vulnerabilities — output findings as a numbered list with severity ratings
Too vague: Review the auth code

Specific objectives produce specific outputs. Vague objectives produce vague outputs that will fail review gates.

LESSON 07 ORCH

Dry Run

Before committing to a full mission execution, you can preview exactly what the orchestrator plans to do. Dry run mode shows the decomposed phases, persona assignments, and dependency graph — without spawning a single worker.

Running a Dry Run

orchestrator run --dry-run "task description"

Pass the same prompt you'd use for a real mission. The orchestrator decomposes it fully and prints the plan: every phase, its assigned persona, its objective, and what it depends on. Nothing executes.

Dry run is also available for mission files:

orchestrator run --dry-run ~/.alluka/missions/my-feature.md

Why dry run matters: Missions with 6–10 phases can take significant time and use substantial model credits. Dry run lets you verify the plan looks correct before spending either. It's also the fastest way to understand how the orchestrator interpreted an open-ended prompt.

Full Orchestrator Command Reference

Dry run is one flag among many. Here's the full set of orchestrator commands:

Running Missions

# Run from a natural language prompt
orchestrator run "research golang error handling best practices"

# Run with a domain context (changes default persona routing)
orchestrator run --domain personal "plan my Japan trip"

# Run a pre-written mission file
orchestrator run ~/.alluka/missions/FEATURE.md

# Preview without executing
orchestrator run --dry-run "task description"

Checking Status

# Show all active missions
orchestrator status

Cleanup

# Remove completed mission workspaces
orchestrator cleanup

# Remove workspaces older than 7 days
orchestrator cleanup --older 7d

Metrics

# Show recent mission metrics
orchestrator metrics

# Last 10 missions
orchestrator metrics --last 10

# Filter by domain
orchestrator metrics --domain dev

# Filter by status
orchestrator metrics --status failed

# Show a specific mission
orchestrator metrics --mission <id>

# Missions in the last 30 days
orchestrator metrics --days 30

Using Metrics to Improve Missions

The metrics subcommand is more useful than it looks at first. Filtering by --status failed shows which missions didn't complete and which phase caused the failure. Patterns here often reveal:

Objectives that are too vague for the assigned persona to interpret correctly
Phases that consistently need more retries, suggesting a model tier upgrade
Review gates that always fail on the first pass, indicating the preceding phase needs a more specific output contract

Cross-referencing failed missions with the findings in ~/.alluka/nen/findings.db (surfaced by Nen) gives a fuller picture of what's going wrong and why.

Domain Flag

The --domain flag changes the default context the orchestrator uses when routing phases to personas. Without a domain, it defaults to dev. Setting --domain personal adjusts routing toward personas suited to planning, research, and writing rather than engineering.

Domains are a lightweight way to shift the orchestrator's defaults without rewriting PHASE lines. They're most useful for recurring mission types that differ significantly in their persona requirements.

LESSON 08 ORCH

Daemon & Events

The orchestrator daemon is the long-running background process that emits events as missions progress. Nen scanners, notification channels, and other plugins subscribe to these events — giving you real-time visibility into what the system is doing without polling or manual status checks.

Starting the Daemon

orchestrator daemon

The daemon starts in the foreground. To run it persistently in the background, redirect its output to a log file:

orchestrator daemon >> ~/.alluka/logs/scheduler.log 2>&1 &

Once running, the daemon listens for mission commands, coordinates phase execution, and emits events to both a Unix domain socket and per-mission JSONL log files.

Event Flow

Events travel two paths simultaneously:

orchestrator daemon  →  events.sock (UDS)  →  nen-daemon (scanners)
                     →  events/*.jsonl     →  discord/telegram (notifications)

The Unix domain socket (events.sock) is for low-latency subscribers like the Nen daemon, which needs to react to events in near real-time. The JSONL files are for durability — they persist after the daemon exits and can be replayed or analyzed after the fact.

Event Locations

Path	Contents
`~/.alluka/events.sock`	Unix domain socket — live event stream for subscribers
`~/.alluka/events/<mission_id>.jsonl`	Per-mission event log — one JSON object per line

Plugins Are Subscribers, Not Dependencies

This is a critical architectural point: the orchestrator does not depend on any plugin being installed. Plugins are subscribers — they watch the event bus and react, but the orchestrator emits events regardless of whether anyone is listening.

This means:

You can run nanika without Discord, Telegram, or any notification plugin installed
Adding a new notification channel doesn't require changing the orchestrator
A plugin crashing doesn't affect mission execution — events keep flowing
You can add subscribers retroactively and they'll process future events without any core changes

Design principle: The event bus is a one-way broadcast. The orchestrator writes; subscribers read. No plugin can send commands back through the event bus — plugins that need to affect mission behavior do so through their own CLIs, not through events.

What Events Look Like

Each event in the JSONL log is a single line of JSON. Events carry a type, a mission ID, a timestamp, and a payload appropriate to the event type. Common event types include phase started, phase completed, phase failed, review passed, review failed, and mission done.

You can tail a mission's event log to watch it progress in real time:

tail -f ~/.alluka/events/<mission_id>.jsonl

Nen and the Daemon

The Nen daemon subscribes to the event socket and runs anomaly scanners against the event stream. When a scanner detects something significant — a phase taking unusually long, repeated retries on the same phase, a review gate looping — it records a finding in ~/.alluka/nen/findings.db.

These findings are passive: Nen observes and records, but doesn't intervene in running missions. The findings accumulate over time and surface patterns that inform how you configure personas, write objectives, and structure mission files.

Notification Plugins

If you have Discord or Telegram configured, their plugins subscribe to the JSONL event files and send you messages when notable events occur — mission started, mission completed, phase failed, review passed. Configuration for what triggers a notification lives in each plugin's config, not in the orchestrator.

To set up notifications, configure the relevant plugin and let it subscribe to events. The orchestrator doesn't need to know the plugin exists.

LESSON 09 MEM

Memory System Overview

Every persona in nanika accumulates a working memory as it executes missions. This memory is persisted across sessions so that lessons learned in one mission carry forward into the next — without you having to re-explain context or repeat corrections.

Two Memory Scopes

The memory system has two levels:

Scope	Path	Contents
Persona	`~/.alluka/personas/<name>/MEMORY.md`	Learnings specific to one persona
Global	`~/nanika/global/MEMORY.md`	Cross-cutting learnings promoted from any persona

Global entries are seeded into every worker session regardless of which persona is running. Persona entries are only seeded into sessions for that specific persona.

Entry Format

Each memory entry is a single line with optional inline metadata:

Use %w not %v in error constructors | filed: 2026-04-09 | by: senior-backend-engineer | type: feedback | used: 5

Metadata fields are optional — plain text entries without metadata are fully supported and backward-compatible.

How workers use memory: At the start of each phase, the orchestrator seeds the worker's Claude Code session with relevant entries from the persona and global MEMORY.md files. After the phase completes, any new memories the worker captured are merged back into the canonical file.

LESSON 10 MEM

Lifecycle

Memory flows through four stages for every phase execution: seed, execute, merge, and promote. Understanding this cycle tells you exactly when and how learnings propagate.

The Four Stages

Stage	What happens
Seed	Orchestrator writes `MEMORY.md` (read-only) and creates `MEMORY_NEW.md` (writable scratchpad) in the worker session directory
Execute	Worker runs the phase. New learnings go into `MEMORY_NEW.md`. The seeded `MEMORY.md` is read-only — the worker cannot corrupt the canonical snapshot
Merge	After the phase completes, the orchestrator reads `MEMORY_NEW.md`, deduplicates against the canonical file, and appends new entries. Duplicates are detected by normalized content hash
Promote	Entries with a `used` counter ≥ 3 are automatically promoted from the persona file to `global/MEMORY.md`

Read-Only Seeding

The seeded MEMORY.md is written with mode 0400 (read-only). This is intentional: it prevents a worker session from accidentally overwriting the canonical snapshot. New memories go into the separate MEMORY_NEW.md scratchpad, which is writable.

Merge Deduplication

When merging MEMORY_NEW.md back into the canonical file, the orchestrator computes a SHA-256 hash of each entry's normalized content (lowercased, whitespace-collapsed). An entry is skipped if its hash already exists in the canonical file. This catches semantically identical entries even when formatting differs.

Auto-Promotion

When an entry's used counter reaches 3, it is automatically promoted from the persona MEMORY.md to the global MEMORY.md during the next merge. The entry is removed from the persona file after promotion. This means frequently-useful persona learnings naturally become global without any manual intervention.

The used counter: The used field is incremented each time an entry is selected for seeding. High-used entries have proven their value across multiple sessions — that's the signal for global promotion.

LESSON 11 MEM

Safety & Limits

Memory entries come from worker sessions that process external content — web pages, emails, social posts. The memory system has two layers of protection: a safety gate that filters prompt-injection attempts, and a ceiling that caps the canonical file size.

Safety Gate

Before any entry is merged into the canonical file, the safety gate checks it against a set of imperative patterns. An entry that matches is quarantined — moved to MEMORY_QUARANTINE.md — and never merged into the canonical or global files.

Patterns that trigger quarantine:

Phrases like ignore previous instructions, disregard guidelines
Identity-switch commands: you are now, pretend to be
Prompt exfiltration: reveal your prompt, output system
Role-marker injection: lines starting with [system]:, [user]:
Invisible Unicode characters (zero-width joiners, soft hyphens, etc.)

Why this matters: A malicious web page could embed hidden text designed to corrupt your agent's memory. The safety gate ensures that external content processed during a mission cannot inject instructions into future sessions.

Memory Ceiling

Each persona's canonical MEMORY.md is capped at 100 non-empty entries. When a merge would push the file above this limit, the oldest entries are moved to MEMORY_ARCHIVE.md in the same directory, keeping only the 100 most recent entries in the active file.

File	Purpose
`MEMORY.md`	Active entries (max 100 lines)
`MEMORY_ARCHIVE.md`	Overflow entries, oldest first — never loaded into sessions
`MEMORY_QUARANTINE.md`	Rejected entries with the reason for quarantine
`MEMORY_NEW.md`	Worker scratchpad — writable during a phase, merged after

Both the ceiling enforcement and the safety gate run automatically during every merge. You do not need to configure or invoke them manually.

LESSON 12 MEM

Budgeted Seeding

Seeding a worker session with memory costs context window tokens. The seeder enforces a 4 KB budget and uses a scoring algorithm to pick the most relevant entries when the budget is tight.

The 4 KB Budget

At seed time, the orchestrator fills the worker's MEMORY.md greedily up to 4,096 bytes. Global entries are added first (they apply across all personas), then persona entries fill whatever budget remains. Entries that would push the total over budget are skipped.

# Budget fills like this:
global entries  →  up to 4 KB total
  then persona entries  →  remaining budget only

Relevance Scoring

Before filling the budget, entries are ranked by a composite score:

Component	Weight	Description
Keyword overlap	Primary	Jaccard similarity between entry words and the phase objective
Recency	Multiplier	Exponential decay — entries filed more recently score higher

If no entry has any keyword overlap with the objective, the ranker falls back to recency-only ordering. This ensures the budget is always filled with something useful even for novel objectives.

Used Counter Feedback Loop

Every persona entry selected for seeding has its used counter incremented in the canonical file. This creates a feedback loop: entries that consistently win the relevance ranking accumulate high used counts, which eventually triggers auto-promotion to global.

Practical implication: You don't need to manually manage which learnings become global. The seeder's selection pressure does it for you — entries that keep getting picked for seeding earn global status automatically.

LESSON 13 MEM

Global & Bridges

The global MEMORY.md accumulates learnings that are useful across all personas. Two mechanisms write to it: automatic promotion from persona files, and bridge-session, which harvests project and reference memories from your Claude Code sessions.

Global MEMORY.md

Global entries live at ~/nanika/global/MEMORY.md. They are seeded into every worker session before persona entries, consuming the first portion of the 4 KB budget.

Entries reach global via three paths:

Auto-promotion — persona entries with used >= 3 are promoted automatically at merge time
CLI promotion — orchestrator memory promote <persona> manually moves entries
Bridge-session — harvests from Claude Code auto-memory on the host machine

Bridge-Session

Claude Code maintains its own auto-memory directory at ~/.claude/projects/<key>/memory/. The bridge reads those individual .md files (which use YAML frontmatter) and promotes entries typed project or reference into the global MEMORY.md.

# Run manually
orchestrator bridge-session

# Runs automatically at the start of every Claude session via the SessionStart hook

Only project and reference type entries are bridged. User preferences (user) and behavior corrections (feedback) stay in Claude Code's own memory system — they're persona-specific and shouldn't bleed into mission workers.

The startup hook: The orchestrator hooks preflight command runs bridge-session automatically at the start of every Claude Code session. You'll see "bridge-session: merged N entries into global memory" in the session-start output when new entries are found.

Supersedure

When a newer entry corrects an older one (Jaccard similarity > 0.8 between same-type entries), the older entry is marked with superseded_by: <hash>. Superseded entries are excluded from seeding and auto-promotion — they stay in the file for audit history but are effectively retired.

LESSON 14 MEM

CLI Reference

The orchestrator memory subcommand exposes three operations: promote, sync, and search. Seeding and merging happen automatically — these commands are for inspection and manual curation.

memory promote

# Promote all entries for a persona to global MEMORY.md
orchestrator memory promote senior-backend-engineer

# Promote only entries whose content contains "golang"
orchestrator memory promote senior-backend-engineer --match "golang"

# Promote entries that have been used at least 3 times
orchestrator memory promote senior-backend-engineer --used 3

Promoted entries are appended to ~/nanika/global/MEMORY.md and removed from the persona canonical file. Duplicates (by normalized content hash) are silently skipped.

memory sync

orchestrator memory sync

Queries the learning database for entries with quality_score > 0.7 that have not yet been promoted. Converts matching learnings to MemoryEntry format and merges them into the appropriate persona MEMORY.md files. Once synced, learnings are marked as promoted to prevent duplicate entries on subsequent runs.

memory search

# Search for learnings matching a query
orchestrator memory search "golang error handling"

# Limit results
orchestrator memory search "testing patterns" --limit 5

Searches the learning database using hybrid FTS5 + semantic search. Results are ranked by relevance and quality and formatted as:

[quality] date persona: content

Command Summary

Command	What it does
`memory promote <persona>`	Move persona entries to global MEMORY.md
`memory promote <persona> --match <str>`	Promote only entries containing substring
`memory promote <persona> --used <n>`	Promote only entries with used count ≥ n
`memory sync`	Sync high-quality learnings from DB to persona files
`memory search <query>`	Search learning DB with FTS5 + semantic ranking

LESSON 15 PERSONA

Constraint-First Design

Most agent frameworks give models a role identity — "You are a senior software engineer with 10 years of experience..." — and call it a persona. Nanika doesn't do that. Understanding why is the key to understanding how personas actually work.

The Problem with Role-Playing

The role-playing framing is a holdover from how human teams are structured. It made sense when a human could only do one job. Models don't have that constraint. Telling a model it is an architect doesn't make it better at architecture — the identity framing is empirically inert.

What actually changes output is behavioral constraints: what to produce, what to avoid, what the output contract is, and what failure modes to guard against.

Role labels add no signal. They add noise. A model that's told it's a "senior engineer" will still produce junior code if nothing else constrains it. A model given a tight output contract and explicit anti-patterns will produce consistently scoped work regardless of what it's "called."

Constraints First, Identity Second

Every nanika persona leads with ## Constraints — not identity. The structure is:

What this agent must do
What it must never do
What a correct output looks like
What patterns to avoid

Identity follows, but it's minimal and functional — it names the persona for routing purposes, not to prime the model with a character to embody.

Required Section Order

The PERSONA-STANDARD.md defines a fixed section order that all persona files must follow:

Constraints
Identity
Goal
Expertise
When to Use
When NOT to Use
Principles
Anti-Patterns
Methodology
Output Format
Self-Check

The order is intentional. Constraints come before identity so the model encounters behavioral bounds before any framing. Anti-Patterns follow Principles so the positive guidance is anchored before the negations. Self-Check comes last — it's a checklist the model runs against its own output before returning a result.

YAML Frontmatter

Every persona file begins with YAML frontmatter that the orchestrator reads for routing and handoff decisions:

---
role: implementer
capabilities:
  - Go development
  - HTTP servers and middleware
triggers:
  - implement
  - build
  - backend
handoffs:
  - architect
  - senior-frontend-engineer
---

The role field is one of three values: planner, implementer, or reviewer. This maps directly to which phase type the persona is suited for. The triggers array feeds the keyword-match fallback in the routing algorithm. The handoffs array tells the orchestrator which personas this one delegates to when it encounters work outside its scope.

File Location and Naming

Persona files live at personas/{name}.md — kebab-case, matching the identity field inside the file. The name is the routing key: when the orchestrator assigns PERSONA: senior-backend-engineer to a phase, it loads personas/senior-backend-engineer.md verbatim into the worker's CLAUDE.md.

Direction: The long-term trajectory is composable constraint modules that stack onto any worker without any identity layer at all. The goal is to make roles unnecessary — not simulate them. Current persona files are a transition point on that path.

Why This Matters for Quality

When you read a persona file and its constraints are vague — "produce good code," "be thorough" — that persona will produce inconsistent output. The constraint is doing no work. Effective constraints are specific enough to fail: "Zero any types," "No CSS modules or styled-components," "Must render correctly on mobile (320px viewport)." These are testable. Vague identity claims are not.

The discipline of constraint-first design forces clarity about what you actually want from a worker phase before you send it to a model. That clarity is the real value — it happens in your head before the model sees anything.

LESSON 16 PERSONA

Persona Catalog

Nanika ships with 10 built-in personas covering the most common engineering specializations. Each persona is a markdown file in personas/ that gets injected verbatim into a worker's CLAUDE.md when assigned to a phase.

The 10 Built-in Personas

Persona	Role	Specialization
`academic-researcher`	planner	Deep research, literature synthesis, citation management
`architect`	planner	System design, API contracts, architectural decisions
`data-analyst`	implementer	Data analysis, queries, statistical reasoning
`devops-engineer`	implementer	Infrastructure, CI/CD, deployment pipelines
`qa-engineer`	reviewer	Test planning, quality assurance, edge case analysis
`security-auditor`	reviewer	Security review, vulnerability analysis, auth flow auditing
`senior-backend-engineer`	implementer	Backend implementation, Go/Rust services, APIs
`senior-frontend-engineer`	implementer	Frontend implementation, Next.js, React, Tailwind
`staff-code-reviewer`	reviewer	Code review, architectural feedback, blocking issues
`technical-writer`	implementer	Documentation, README files, tutorials

Persona Assignment in PHASE Lines

Personas are assigned in mission PHASE lines. The orchestrator reads the PERSONA: field and injects the corresponding persona file into the worker's context before the session starts.

PHASE: design    | PERSONA: architect               | OBJECTIVE: Define the API contract
PHASE: implement | PERSONA: senior-backend-engineer  | OBJECTIVE: Build the service
PHASE: review    | PERSONA: security-auditor         | OBJECTIVE: Audit auth flow | DEPENDS: implement

Each phase runs in an isolated worker with only the assigned persona's constraints, identity, and methodology visible. Workers don't share context — the orchestrator coordinates by reading output artifacts from previous phases and injecting relevant findings as prior-phase notes.

Choosing the Right Persona

A few practical rules for picking personas:

Planning phases — use architect for system design, academic-researcher for discovery work that needs citations and synthesis
Implementation phases — match the language: senior-backend-engineer for Go/Rust APIs, senior-frontend-engineer for Next.js/React work, devops-engineer for infra and pipelines
Review phases — staff-code-reviewer for general code quality, security-auditor when auth or data handling is involved, qa-engineer when you need edge-case coverage
Data and analysis work — data-analyst when the objective involves querying, aggregating, or interpreting structured data
Documentation — technical-writer for user-facing docs, READMEs, and tutorials; don't use implementer personas for docs work

What Gets Injected

When a worker phase starts, the orchestrator builds its CLAUDE.md from several sources in order:

The persona file (personas/{name}.md) — constraints, methodology, anti-patterns
The persona's memory file (personas/{name}/MEMORY.md) — accumulated learnings from past sessions
Available tools index — which CLI tools are installed and how to invoke them
Prior phase notes — findings and artifacts from phases this phase depends on
The phase objective — what this specific worker is being asked to do

The persona file is always first. Its constraints take precedence over everything else in the context window.

Note: You can run without specifying a persona. The orchestrator's LLM decomposer will route based on the task description. But for important phases — especially implementer and reviewer phases — explicit persona assignment produces more consistent results.

Omitting PERSONA

If a PHASE line has no PERSONA: field, the orchestrator falls back to automatic routing using the two-layer algorithm described in the next lesson. For exploratory or research phases where the optimal persona isn't obvious, omitting it and letting the router decide is often fine. For production implementation work, be explicit.

LESSON 17 PERSONA

Routing & Handoffs

When a PHASE line doesn't specify a persona explicitly, the orchestrator selects one automatically using a two-layer routing algorithm. Understanding how this works helps you write better WhenToUse sections in custom personas — and explains why some automatic routing decisions are better than others.

Layer 1: LLM Match (Primary)

The primary routing path uses a lightweight model (Haiku) to select a persona. The orchestrator calls FormatForDecomposer() to build a compact catalog summary — one entry per persona containing its name, title, WhenToUse triggers, and HandsOffTo targets. This catalog plus the task description goes to Haiku, which returns a single persona name.

This approach handles phrasing variations well. "Write a test plan for the payment flow" and "QA the checkout system" both route to qa-engineer without needing exact keyword matches. Haiku reads the task semantically against the WhenToUse triggers.

Layer 2: Keyword Match (Fallback)

When LLM routing fails or is unavailable, the orchestrator scores every persona against the task description using a deterministic algorithm:

+1 per WhenToUse word that matches the task description (prefix match, minimum 4 characters)
−1 per WhenNotToUse word that matches (minimum 6 characters)
+3 if the persona name stem appears in the task description
Alphabetically first persona wins as a deterministic fallback when all scores are 0

The minimum character thresholds prevent noise from short prepositions. The -1 penalty for WhenNotToUse matches is what makes explicit handoff guidance effective — a persona that says "don't use me for implementing code" will be actively down-scored for implementation tasks.

Handoff Patterns

Handoffs are declared in the When NOT to Use section of each persona file. The pattern the orchestrator parses is:

- Implementing code (hand off to senior-backend-engineer)
- System design (hand off to architect)
- Writing production code (hand off to senior-backend-engineer or senior-frontend-engineer)

The regex hand off to ([\w][\w-]*) extracts the target persona name. The extracted name must exist in the persona catalog — if it doesn't, it's silently ignored. This means handoff targets are a form of type-checking: they force you to reference personas that actually exist.

How WhenToUse Quality Affects Routing

The LLM router reads WhenToUse bullets as triggers. Quality matters:

Weak trigger	Strong trigger
Implementing code	Implementing Go HTTP endpoints or REST APIs
Writing tests	Writing integration tests for database-backed services
Frontend work	Building React components with Tailwind CSS in Next.js App Router

Weak triggers produce ambiguous routing. When two personas have overlapping weak triggers, routing becomes a coin flip. Strong, domain-specific triggers disambiguate — "implementing Go endpoints" won't match senior-frontend-engineer.

The HandsOffTo Field

The YAML frontmatter handoffs array is the machine-readable version of handoff guidance:

---
role: planner
handoffs:
  - senior-backend-engineer
  - senior-frontend-engineer
  - devops-engineer
---

The orchestrator uses this to build the catalog summary sent to Haiku. When the router sees a task that matches an architect trigger but also contains implementation signals, the handoffs array tells it which implementers the architect delegates to — helping the router pick the right persona for the next phase.

Debugging routing: If a phase is being assigned the wrong persona, look at the WhenToUse bullets for both the persona being selected and the one you expected. The selected persona's bullets are matching the task description more strongly than you intend. Add specificity to the correct persona's triggers, or add a WhenNotToUse entry to the wrongly-selected one.

Explicit Assignment Always Wins

Both routing layers only activate when PERSONA: is absent from the PHASE line. Explicit assignment is always respected and bypasses routing entirely. For any phase where correctness matters, being explicit costs nothing and removes a variable from the system.

LESSON 18 PERSONA

Custom Personas

The built-in personas cover common engineering roles, but domain-specific work often calls for domain-specific constraints. A financial data pipeline, a game engine modder, a compliance reviewer — these have different output contracts than anything in the default catalog. This lesson walks through creating a custom persona from scratch.

Step 1: Create the Persona File

Create personas/{name}.md following the required section order from the standard. The filename must be kebab-case and must match the identity field inside the file.

Title line format: # Name — Tagline, kept under 72 characters.

# ml-pipeline-engineer — Machine Learning Pipeline Specialist

---
role: implementer
capabilities:
  - Python ML pipelines
  - Data preprocessing
  - Model training workflows
triggers:
  - pipeline
  - training
  - preprocessing
  - ml
handoffs:
  - data-analyst
  - devops-engineer
---

## Constraints
- Output must be reproducible: all random seeds pinned, all data paths parameterized
- No hardcoded credentials or file paths — use environment variables throughout
- Zero untyped function signatures — all parameters and return values typed
- ...

## Identity
ml-pipeline-engineer — builds reproducible machine learning training pipelines.

WhenToUse Quality Criteria

The WhenToUse section feeds the routing algorithm. Getting it right is the difference between the router working and not:

4–8 bullets — fewer is too sparse for routing, more creates noise
Each bullet must contain at least one distinctive word (≥6 characters)
Use domain-specific vocabulary — "implement" is too broad, "implementing Go endpoints" is specific
Avoid vocabulary that overlaps with other personas

## When to Use
- Building data preprocessing pipelines in Python or PySpark
- Implementing model training loops with reproducibility requirements
- Setting up feature engineering workflows for tabular data
- Configuring experiment tracking with MLflow or Weights & Biases
- Debugging training instabilities or gradient issues

WhenNotToUse and Handoffs

Every WhenNotToUse bullet must name an exact existing persona filename — no aliases, no descriptions. The regex parser extracts the target from hand off to {name}:

## When NOT to Use
- Deploying models to production (hand off to devops-engineer)
- Writing data analysis reports (hand off to data-analyst)
- Auditing model security or data privacy (hand off to security-auditor)

Step 2: Create the Memory Directory

Every persona needs a memory directory with an empty seed file:

mkdir personas/ml-pipeline-engineer
touch personas/ml-pipeline-engineer/MEMORY.md

The MEMORY.md file is seeded into the worker's Claude auto-memory before each session. After the session, new lines are appended and the file is deduplicated. Keep it under 5KB — domain-relevant patterns and gotchas only. It accumulates real learnings from real missions over time.

Step 3: Run the Test Suite

Persona validation runs as part of the standard test suite. After creating the file, run:

go test ./internal/persona/...

The tests check: correct section order, title line length, WhenToUse bullet count, minimum word length in WhenToUse bullets, and that all WhenNotToUse handoff targets exist in the catalog. A persona that fails these tests won't route correctly.

Checklist

Filename is kebab-case and matches identity in content
YAML frontmatter has role, capabilities, triggers, and handoffs
Sections follow required order: Constraints → Identity → Goal → ...
Title line under 72 characters
WhenToUse has 4–8 bullets, each with ≥1 word ≥6 characters
Every WhenNotToUse bullet names an exact existing persona filename
personas/{name}/MEMORY.md created (can be empty)
Added to personaColor() in daemon/api.go
go test ./internal/persona/... passes

Start narrow: A persona with a tight, specific scope routes more reliably than one trying to cover a broad domain. It's better to have two narrow personas with clear handoffs between them than one broad persona that tries to do everything and routes unpredictably.

LESSON 19 PLUGIN

Plugin Protocol

Nanika's plugin system lets you extend the orchestrator with external CLIs — issue trackers, schedulers, notification channels, anything that can answer three query types. The protocol is intentionally thin: a JSON manifest, a binary, and a SKILL.md so agents know when to invoke it.

Two-Layer Architecture

The plugin protocol has two layers:

Discovery — Subscribers scan ~/nanika/plugins/*/plugin.json to find plugins and their metadata
Query — Subscribers invoke <binary> query {status|items|actions} --json to fetch live data or trigger actions

Both layers are consumed by any subscriber — the orchestrator, a Nen scanner, an MCP server, or a script you write. Plugins are stateless from the subscriber's perspective: no callbacks, no open connections, no lifecycle hooks.

File Layout

Every plugin lives under ~/nanika/plugins/<name>/:

~/nanika/plugins/<name>/
├── plugin.json        # Plugin manifest (required)
├── bin/<binary>       # Compiled binary (CLI)
└── skills/
    └── SKILL.md       # Tells agents when and how to invoke it

The plugin.json manifest is the only required file. Without it, the plugin won't be discovered. The binary is resolved via exec.LookPath(binary) — it must be on $PATH or in one of the standard locations below.

Discovery Rules

Subscribers scan ~/nanika/plugins/*/plugin.json on demand. A plugin is skipped if any of the following are true:

plugin.json is missing
The JSON is malformed
api_version is missing or less than 1

Path Resolution

Subscribers enrich $PATH before resolving plugin binaries:

~/bin
~/.local/bin
~/go/bin
/opt/homebrew/bin
/usr/local/bin

Install your plugin binary to ~/bin/ — subscribers will find it without any shell configuration changes.

API Version

The current protocol version is 1. Set api_version: 1 in plugin.json to be discovered. Future versions will increment this field; version 1 plugins will remain compatible.

Shipped Plugins

Nanika ships six first-party plugins that follow this protocol:

Plugin	Binary	Language	Purpose
nen	`shu`, `ko`	Go	Self-improvement scanners + eval engine
tracker	`tracker`	Rust	Local issue tracker
scheduler	`scheduler`	Go	Cron jobs + dispatch loop
discord	`discord`	Go	Channel notifications + voice messages
telegram	`telegram`	Go	Channel notifications + voice messages
nen_mcp	`nen-mcp`	Go	MCP server exposing nanika internal state

LESSON 20 PLUGIN

Query Interface

The query interface is the contract between subscribers and your plugin binary. Subscribers invoke your binary with standardized subcommand arguments and expect JSON on stdout. There are four query types: status, items, actions, and action execution via action run.

status

The status query returns a single summary object — one number that describes the plugin's current state. Subscribers use this to report plugin health and item counts.

# Invocation
<binary> query status --json

# Response
{ "ok": true, "count": 42, "type": "tracker-status" }

Fields:

ok — boolean, whether the plugin is operational
count — integer, the summary count (open issues, unread messages, scheduled jobs)
type — string identifier, used as a display hint by subscribers

If ok is false, subscribers should treat the plugin as in an error state.

items

The items query returns a list of records — a table of whatever the plugin tracks: issues, jobs, messages, transactions.

# Invocation
<binary> query items --json

# Response
{
  "items": [
    {
      "id": "trk-1",
      "title": "Fix login bug",
      "status": "in-progress",
      "priority": "P0"
    },
    {
      "id": "trk-2",
      "title": "Add rate limiting",
      "status": "open",
      "priority": "P1"
    }
  ],
  "count": 2
}

The item schema is flexible — only id and title are required. status and priority are optional but recommended.

actions

The actions query returns a list of available commands. Each action has a name, a shell command, and a description. Subscribers use this to discover what a plugin can do without reading its source.

# Invocation
<binary> query actions --json

# Response
{
  "actions": [
    {
      "name": "next",
      "command": "tracker query action next",
      "description": "Show highest-priority ready issue"
    },
    {
      "name": "create",
      "command": "tracker create",
      "description": "Create a new issue"
    }
  ]
}

The command field is the shell command a subscriber runs to trigger the action. It can be a full shell invocation with flags and arguments.

action run

To execute an action, the subscriber calls the binary with the action verb:

# Invocation
<binary> query action run <job_id> --json

# Response
{
  "ok": true,
  "message": "Job executed successfully",
  "exit_code": 0
}

Timeouts

Subscribers should bound query execution. Recommended defaults:

Query type	Timeout
`status`	15 seconds
`items`	15 seconds
`actions`	30 seconds

These timeouts are intentionally generous — most queries should complete in under a second. If your plugin is hitting timeouts, the issue is usually an uncached network call or a database query that should be indexed.

Error Handling

Exit codes signal success or failure. A non-zero exit code means the plugin is in an error state. Write errors to stderr — subscribers capture it for diagnostics.

# Success
exit 0

# Failure
echo '{"error": "database not found"}' >&2
exit 1

Keep queries fast: Subscribers may call your plugin frequently. If your status query does a cold network request every time, it will be slow. Cache the result locally and refresh in a background goroutine or system timer.

Testing Queries Locally

Test your plugin's query interface directly from the shell:

my-plugin query status --json
my-plugin query items --json
my-plugin query actions --json

Pipe through a JSON formatter to verify the output shape:

my-plugin query items --json | jq .

LESSON 21 PLUGIN

Building a Plugin

A plugin needs three things: a CLI binary, a plugin.json manifest, and a skills/SKILL.md file that tells Claude Code when and how to invoke it.

Directory Structure

Start by creating the plugin directory inside the nanika plugins folder:

plugins/my-plugin/
├── plugin.json          # Manifest
├── skills/SKILL.md      # Claude Code skill documentation
├── cmd/my-plugin/
│   └── main.go          # Entry point
└── go.mod

The skills/SKILL.md file is what makes the plugin callable from Claude Code sessions. It gets picked up by scripts/generate-agents-md.sh and injected into the skills index in CLAUDE.md.

Writing the CLI Binary

The binary is a standard CLI tool that implements the query subcommand. Here's the minimal Go structure:

package main

import (
  "encoding/json"
  "fmt"
  "os"
)

func main() {
  if len(os.Args) < 3 || os.Args[1] != "query" {
    fmt.Fprintf(os.Stderr, "usage: my-plugin query {status|items|actions} --json\n")
    os.Exit(1)
  }

  switch os.Args[2] {
  case "status":
    json.NewEncoder(os.Stdout).Encode(map[string]any{
      "ok":    true,
      "count": getCount(),
      "type":  "my-plugin-status",
    })
  case "items":
    json.NewEncoder(os.Stdout).Encode(map[string]any{
      "items": getItems(),
      "count": len(getItems()),
    })
  case "actions":
    json.NewEncoder(os.Stdout).Encode(map[string]any{
      "actions": []map[string]string{
        {
          "name":        "refresh",
          "command":     "my-plugin sync",
          "description": "Sync with remote",
        },
      },
    })
  default:
    fmt.Fprintf(os.Stderr, "unknown query type: %s\n", os.Args[2])
    os.Exit(1)
  }
}

Writing plugin.json

The manifest sits at the root of the plugin directory:

{
  "name": "my-plugin",
  "version": "0.1.0",
  "api_version": 1,
  "description": "A brief description of what this plugin does",
  "icon": "Plug",
  "binary": "my-plugin",
  "build": "go build -o bin/my-plugin ./cmd/my-plugin",
  "install": "cp bin/my-plugin ~/bin/my-plugin",
  "tags": ["productivity", "custom"],
  "provides": ["status", "items", "actions"]
}

Writing skills/SKILL.md

The skill file documents the plugin for Claude Code. It appears in the skills index that's injected into every nanika worker session:

# my-plugin — Short description of what it does

When to use this skill: brief triggers for when to invoke this plugin.

## Commands

| Command | Description |
|---------|-------------|
| `my-plugin query status --json` | Get current status |
| `my-plugin sync` | Sync with remote source |

## Examples

`my-plugin query items --json`
`my-plugin sync --force`

Build and Install

After creating the files, build and install the binary, then regenerate the agents index:

# Build the binary
make build-plugin-my-plugin

# Install to ~/bin/
make install-plugin-my-plugin

# Update AGENTS.md + CLAUDE.md routing index
scripts/generate-agents-md.sh

For development without a Makefile target, build manually:

cd plugins/my-plugin
go build -o bin/my-plugin ./cmd/my-plugin
ln -s $(pwd)/bin/my-plugin ~/bin/my-plugin

The symlink approach is recommended during development — you rebuild in place and the symlink automatically points to the updated binary.

Development Checklist

Create manifest (plugin.json with api_version: 1)
Implement CLI queries (status, items, actions subcommands)
Write skills/SKILL.md for Claude Code discovery
Build the binary and install to ~/bin/
Run scripts/generate-agents-md.sh to update the skills index
Test each query type: my-plugin query status --json | jq .

Keep the binary fast: Subscribers may call your plugin frequently. Compiled binaries in Go or Rust start in milliseconds. If you must use an interpreted language, cache aggressively.

LESSON 22 PLUGIN

plugin.json Spec

The plugin.json manifest is the single source of truth for a plugin's identity, capabilities, and configuration. Subscribers read it at discovery time to determine what queries to issue.

Required Fields

Field	Type	Notes
`name`	string	Unique identifier; lowercase, no spaces, kebab-case
`version`	string	SemVer string, e.g. `1.0.0`
`api_version`	int	Must be `1` for the current protocol

If any required field is missing or malformed, the plugin is skipped during discovery with no user-visible error.

Optional Fields

Field	Type	Notes
`description`	string	One-liner description of the plugin
`icon`	string	Icon key (e.g. `ListCheck`, `Calendar`, `Mail`)
`binary`	string	CLI binary name, resolved via `$PATH` with enriched lookup
`build`	string	Build command (documentation only)
`install`	string	Install command (documentation only)
`tags`	[]string	Searchable keywords
`provides`	[]string	Array of query types this plugin implements: `status`, `items`, `actions`
`actions`	object	Maps action keys to shell commands or command objects
`repository`	object	Source metadata (`url`, `branch`)

Full Example: tracker plugin

This is the manifest for the tracker plugin — the most complete first-party example:

{
  "name": "tracker",
  "version": "0.1.0",
  "api_version": 1,
  "description": "Local issue tracker with hierarchical relationships",
  "icon": "ListCheck",
  "binary": "tracker",
  "build": "cargo build --release",
  "install": "cp target/release/tracker ~/bin/tracker",
  "tags": ["issue-tracking", "task-management"],
  "provides": ["status", "items", "actions"],
  "actions": {
    "status": "tracker query status --json",
    "items": "tracker query items --json",
    "actions": "tracker query actions --json"
  }
}

The actions Field

The actions object maps action keys to commands. Commands can be plain strings (executed directly) or objects with a cmd array and optional description:

"actions": {
  "next": "tracker query action next",
  "create": {
    "cmd": ["tracker", "create", "--interactive"],
    "description": "Create a new issue interactively"
  }
}

String commands are passed to the shell. Array commands (cmd) are exec'd directly without a shell — safer for arguments that might contain spaces or special characters.

The provides Field

The provides array tells subscribers which query types to issue. If absent, subscribers will attempt all three and skip any that fail. Declaring it explicitly avoids unnecessary invocations:

"provides": ["status"]           // status-only plugin
"provides": ["status", "items"]  // no action support
"provides": ["status", "items", "actions"]  // full support

Icon Keys

Icon values map to Lucide icon names. Commonly used keys:

ListCheck — issue trackers, task lists
Calendar — scheduling, calendar plugins
Mail — email integrations
MessageSquare — chat/Discord/Telegram
DollarSign — finance plugins
Clock — time tracking, cron jobs
Plug — generic/uncategorized

Minimal Valid Manifest

The smallest possible plugin.json that will be discovered:

{
  "name": "my-plugin",
  "version": "0.1.0",
  "api_version": 1
}

A plugin with only these three fields will be discovered but not queryable — no binary means no queries, no description means a blank entry in any subscriber UI. Add binary and description at minimum before shipping.

Validation: Subscribers don't run JSON Schema validation beyond checking required fields and api_version. Invalid optional fields are silently ignored — typos in field names won't error, they'll just be ignored. Double-check field names against this spec.

LESSON 23 NEN

Nen Overview

Nanika watches itself. The Nen subsystem is a collection of self-improvement abilities that run alongside your missions — evaluating health scores, detecting anomalies, running eval suites, managing costs, and protecting against injection. Together they form a feedback loop that makes the system measurably better over time.

The name comes from Hunter x Hunter. Nanika (ナニカ) and Alluka are characters in the series: Alluka is the vessel; Nanika is the wish-granting intelligence inside. ~/.alluka/ is Nanika's vessel — it holds runtime state, missions, metrics, and findings. The Nen abilities each map to a real self-improvement capability.

The Six Abilities

Ability	Role	How it works
Shu	Broad sweep	Evaluates all component health scores, flags degradation
Gyo	Observe + diagnose	Watches mission metrics, detects anomalies (z-score), answers why things failed
Ko	Eval engine	Promptfoo-compatible YAML test runner — runs assertions against LLM output to verify prompt quality
En	System health	Binary freshness, workspace hygiene, daemon reachability
Ryu	Cost analysis	Surfaces cost trends, model efficiency gaps, retry waste, minimal-output phases
Zetsu	Suppress exposure	Strips untrusted input at trust boundaries so workers are invisible to injection

The Improvement Loop

The abilities work as a pipeline, not in isolation. Each ability hands off to the next:

Shu finds "decomposer accuracy dropped"
Gyo diagnoses "persona mis-routing on implementation tasks"
Ko re-runs evals, verifies the regression
You fix the prompt
Ko confirms scores improve

This loop closes the gap between observing a problem and verifying the fix. Without Ko confirming the improvement, you'd only know you tried to fix it — not that you actually did.

How They Run

Gyo, En, and Ryu run automatically via nen-daemon while missions execute. They're passive observers — you don't need to invoke them manually. Their findings accumulate in ~/.alluka/nen/findings.db.

Shu and Ko are on-demand tools. Run them manually, or schedule them via cron:

shu evaluate          # Broad sweep across all components
ko evaluate           # Run all eval suites

Zetsu is infrastructure — it runs at trust boundaries and has no user-facing commands. It fires events when it strips content, but otherwise operates invisibly.

Proposals and Auto-Remediation

When findings exceed severity thresholds, the system doesn't just report — it acts:

shu propose auto-generates remediation missions and tracker issues
Proposals queue in shu review
You approve the proposals you want to run
The scheduler dispatches approved missions automatically

This means Nanika can catch a performance regression, generate a fix mission, and queue it for your review — without you having to notice the problem first.

Why Nen? The Hunter x Hunter framing isn't just aesthetic. Each ability in the series has a specific domain and cost — using the wrong ability for a task is inefficient. The same principle applies here: Shu is for sweeps, not diagnosis; Gyo is for diagnosis, not evals. Using the right ability at the right stage keeps the system efficient and interpretable.

Querying Findings

Findings from all Nen abilities are stored in ~/.alluka/nen/findings.db. Use the nen_mcp plugin to query them from Claude:

nanika_findings {}
nanika_findings { "severity": "high" }
nanika_findings { "domain": "gyo" }

Each finding includes: ability name, severity (low/medium/high/critical), timestamp, component, and a human-readable description. High-severity findings trigger the proposal pipeline automatically.

LESSON 24 NEN

Shu — Broad Sweep

Shu is the broad-sweep ability. It evaluates all component health scores across the system and flags degradation — giving you a system-wide view of how Nanika is performing over time. Where Gyo watches individual mission events in real time, Shu steps back and asks: are things getting better or worse overall?

Commands

shu evaluate          # Run broad sweep across all components
shu propose           # Auto-generate remediation missions for findings above threshold
shu review            # Review pending proposals before the scheduler dispatches them

Shu is the only Nen ability that runs on-demand rather than automatically via nen-daemon. This is by design — a broad sweep is expensive to compute and produces noisy results if run too frequently. Weekly sweeps are a common pattern; you can schedule them via the scheduler plugin.

What Shu Evaluates

Each sweep scores the following components against historical baselines:

Decomposer accuracy — persona mis-routing rate across recent missions
Review gate pass rates — how often reviewer phases approve on first attempt
Phase retry rates — excess retries signal prompt or tooling issues
Worker failure rates — terminal failures grouped by persona and task type
Persona usage frequency — are the right personas being selected for the right tasks?
Mission duration trends — are missions taking longer than they used to?

Scores are relative, not absolute. A 5% retry rate might be fine for complex implementation missions but alarming for simple research tasks. Shu learns your system's normal range over time and flags deviations from it.

The Proposal Pipeline

When findings exceed severity thresholds, Shu doesn't just report — it generates actionable remediation work:

Run shu evaluate — findings are written to ~/.alluka/nen/findings.db
Run shu propose — Shu reads high-severity findings and generates remediation missions
Run shu review — inspect proposals before approving
Approve the proposals you want to act on
The scheduler dispatches approved missions automatically at its next run
Results feed back into the next sweep

Review before you approve. Shu's proposals are auto-generated from finding patterns. They're usually accurate, but always read them before approving — especially proposals that touch prompts or persona configuration. A mis-diagnosed finding can generate a proposal that makes things worse.

Querying Findings

Findings are stored in ~/.alluka/nen/findings.db. Query them from Claude using the nen_mcp plugin:

nanika_findings { "severity": "high" }
nanika_findings { "domain": "shu" }

Each finding includes a component name, severity level, timestamp, and a plain-language description. High-severity findings are highlighted in shu review and are the primary trigger for proposal generation.

Scheduling Sweeps

For ongoing health monitoring, schedule Shu sweeps via the scheduler plugin:

scheduler jobs add --name "weekly-shu" --cron "0 9 * * 1" --command "shu evaluate && shu propose"

This runs a sweep every Monday at 9am and automatically generates proposals for anything that degraded over the week. You review and approve during your normal workflow — no manual monitoring required.

LESSON 25 NEN

Gyo — Anomaly Detection

Gyo is the observe-and-diagnose ability. It watches mission metrics as they flow through the system and detects anomalies using z-score analysis — then goes further and answers why the anomaly occurred. While Shu gives you a weekly health report, Gyo is the real-time nervous system.

How Gyo Runs

Gyo runs automatically as part of nen-daemon — you don't invoke it manually. It listens on the event stream and processes metrics as missions execute:

orchestrator daemon → events.sock → nen-daemon → gyo scanner → nen/findings.db

To query Gyo's findings from Claude:

nanika_findings { "domain": "gyo" }
nanika_findings { "severity": "high" }

What Gyo Detects

Phase duration spikes — a phase that normally completes in 30s suddenly takes 4 minutes
Retry storms — a phase retrying more times than the configured threshold
Worker failures clustering on a persona — a specific role is consistently failing
Decompose fallback patterns — the LLM falling back to keyword routing instead of semantic decomposition

Z-Score Analysis

Gyo's detection is statistical, not rule-based. For each metric type, Gyo maintains a rolling window of recent observations. When a new metric arrives, it computes the z-score relative to the rolling window:

Z-score < 2.0 — normal, no finding
Z-score 2.0–3.0 — low/medium severity finding
Z-score > 3.0 — high or critical finding

This means Gyo adapts to your system's actual behavior rather than hard-coded thresholds. A system that normally has 10% retry rates won't alert on 12% — but a system that normally has 2% will alert immediately.

Cold start: Gyo needs a warm rolling window to detect anomalies accurately. For the first 10–20 missions after installation, findings may be noisy. This is expected — the window is building up. Severity thresholds become reliable after ~50 missions.

Diagnostic Context

Gyo doesn't just flag anomalies — it correlates them with context to explain why they happened. A finding like "phase duration spike" includes:

Which persona was assigned to the phase
What the task type was (research, implementation, review)
The baseline value and the observed value
Any co-occurring findings that might be related

This diagnostic context is what makes Gyo useful for the improvement loop. When Shu flags a regression, Gyo's findings tell you where to look — down to the specific persona and task type that's causing the problem.

Finding Severity Levels

Severity	Meaning	Action
`low`	Slight deviation from baseline	Monitor; no immediate action needed
`medium`	Meaningful deviation; worth investigating	Review at next sweep
`high`	Significant anomaly impacting mission quality	Investigate soon; triggers Shu proposals
`critical`	Systemic failure pattern detected	Investigate immediately; auto-generates proposals

Integration with the Improvement Loop

Gyo's findings flow into Shu's sweep reports. When you run shu evaluate, Shu reads Gyo's recent findings alongside its own component scores. This means the weekly sweep automatically incorporates everything Gyo observed during the week — you don't need to manually reconcile the two.

The full flow from detection to fix:

Gyo detects "retry storm on senior-engineer persona for implementation tasks"
Finding written to ~/.alluka/nen/findings.db with severity: high
Shu picks up the finding in the next sweep
Shu generates a proposal: "Review decomposer prompt for implementation task routing"
You approve; scheduler dispatches a review mission
Prompt is updated; Ko verifies the fix

LESSON 26 NEN

Ko — Eval Engine

Ko is the eval engine — a promptfoo-compatible YAML test runner that runs assertions against LLM output to verify prompt quality. When Gyo detects an anomaly and Shu flags a regression, Ko is how you confirm the problem and verify the fix. It closes the loop between observing degradation and proving recovery.

Commands

ko evaluate                       # Run all eval suites
ko evaluate --suite decomposer    # Run a specific suite

The Ko Loop

Ko is built around a tight iteration cycle for prompt improvements:

Run ko evaluate to get baseline scores
Change a prompt (or persona configuration)
Run ko evaluate again
If scores improve → commit the change
If scores regress → revert

This loop makes prompt changes safe. Without Ko, you're guessing whether a change helped. With Ko, you have numbers.

YAML Suite Format

Ko eval suites use the promptfoo YAML format. Here's a complete example for the decomposer prompt:

prompts:
  - file://prompts/decomposer.txt

providers:
  - id: anthropic:claude-haiku-4-5-20251001

tests:
  - vars:
      task: "build a REST API"
    assert:
      - type: contains
        value: "PHASE:"
      - type: contains
        value: "PERSONA:"
      - type: javascript
        value: output.split("PHASE:").length >= 2

  - vars:
      task: "research golang error handling"
    assert:
      - type: contains
        value: "PHASE:"
      - type: javascript
        value: "!output.includes('PHASE: 1') || output.includes('researcher')"

Assertions can be:

contains — output includes a literal string
not-contains — output does not include a string
javascript — arbitrary JS expression evaluated against output
regex — output matches a regular expression
llm-rubric — a secondary LLM grades the output against a rubric

Suites in Practice

Each major prompt in Nanika has a corresponding Ko suite. The most important ones are:

Suite	What it tests
`decomposer`	PHASE lines are generated, personas are assigned correctly
`reviewer`	Review gate produces actionable feedback, not false passes
`orchestrator`	Mission plans are valid and non-redundant

Suite files live in ~/.alluka/evals/. You can add custom suites for any prompt you care about.

Integration with Shu

When Shu flags a regression (e.g., decomposer accuracy dropped), it includes a reference to the Ko suite that covers the affected component. This makes the investigation workflow concrete:

Shu finding: "Decomposer persona mis-routing rate up 18% — see suite: decomposer"
Run ko evaluate --suite decomposer to confirm the regression in test form
Examine failing assertions to understand what's wrong
Update the decomposer prompt
Re-run ko evaluate --suite decomposer
All assertions pass → commit the change

Promptfoo compatibility: Ko suites are valid promptfoo YAML. If you're familiar with promptfoo, you already know the format. The Ko CLI is a thin wrapper that integrates suite results with Nanika's findings pipeline and proposal system.

Running Ko in CI

For teams, Ko evals can gate prompt changes in CI. Add a check that runs the relevant suites against any PR that modifies a prompt file. If scores regress, the PR fails. This prevents prompt degradation from reaching production silently.

# In your CI pipeline
ko evaluate --suite decomposer --format json --fail-below 0.95

The --fail-below flag sets a pass-rate threshold. A suite with 95 tests that passes 94 will exit non-zero, blocking the merge.

LESSON 27 NEN

En, Ryu & Zetsu

The final three Nen abilities handle system health, cost analysis, and injection protection. Unlike Shu and Ko, which you interact with directly, En, Ryu, and Zetsu operate as infrastructure — running automatically via nen-daemon and writing findings to ~/.alluka/nen/findings.db.

En — System Health

En monitors the operational health of the Nanika installation itself. It checks four categories:

Binary freshness — are installed binaries up-to-date with the source? Stale binaries silently run old behavior while the prompts and configuration expect new behavior.
Workspace hygiene — orphaned workspaces from missions that terminated abnormally, stale temp files, and workspace directories that should have been cleaned up
Daemon reachability — is orchestrator daemon running and healthy? Can it accept connections on events.sock?
Event log completeness — are mission events being written correctly? Missing events indicate a daemon or socket issue.

En findings surface via shu query status --json and the nen_mcp plugin:

nanika_findings { "domain": "en" }

Binary freshness matters: If you pull changes to Nanika and forget to rebuild a plugin, En will flag the stale binary. Without this check, you'd run updated prompts against old code and wonder why behavior changed unpredictably.

Ryu — Cost Analysis

Ryu analyzes token costs across missions and identifies where you're spending more than you should. It surfaces four types of findings:

Cost trends per domain — are dev missions getting more expensive over time? Is personal domain usage spiking?
Model efficiency gaps — are you using expensive models (Opus, Sonnet) for tasks that a cheaper model (Haiku) handles equally well?
Retry waste — retried phases inflate token cost significantly; Ryu flags phases where retry cost exceeds original cost
Minimal-output phases — workers that consume many tokens but produce little output relative to cost; often a sign of a poorly-scoped phase

Ryu has no standalone CLI — it runs automatically inside nen-daemon while missions execute. Findings accumulate in ~/.alluka/nen/findings.db and surface in the next shu evaluate sweep or via the MCP plugin:

nanika_findings { "domain": "ryu" }

Ryu findings don't automatically generate proposals — cost optimization requires human judgment about quality trade-offs. Instead, findings appear in shu review for your consideration during the next sweep.

Reading Ryu Output

A typical Ryu analysis report shows:

Column	Meaning
Phase type	The category of work (research, implementation, review)
Avg tokens	Mean token consumption for phases of this type
Avg cost	Mean dollar cost at current model pricing
Retry ratio	What fraction of cost came from retried phases
Output ratio	Output tokens / input tokens — low ratios flag minimal-output phases

Zetsu — Injection Protection

Zetsu handles security at trust boundaries. When workers process external content — web pages, emails, GitHub issues, social posts — that content is untrusted and may contain injected instructions designed to hijack the agent.

Zetsu strips injected content before it reaches workers. Specifically, it:

Removes content that matches instruction patterns ("ignore previous", "you are now", "system:", etc.)
Strips invisible unicode characters used to hide injections in otherwise normal text
Sanitizes content passed as context to worker prompts

When Zetsu acts, it fires events:

security.injection_detected — content matched an injection pattern
security.invisible_chars_stripped — invisible characters were removed

Workers receive the sanitized context and are unaware of the original content. They cannot see or act on injected instructions.

Zetsu is always active. It has no user-facing commands and cannot be disabled. If you're building a skill that processes external content, rely on Zetsu to sanitize it before passing it to worker prompts — don't try to sanitize it yourself in the skill.

All Three Together

En, Ryu, and Zetsu together ensure that Nanika stays healthy at the operational level — not just at the prompt quality level. En checks the installation, Ryu checks the economics, and Zetsu checks the security boundary. Their findings complement Shu and Gyo's quality-focused monitoring to give you a complete picture of system health.

All findings are queryable via nen_mcp:

nanika_findings {}                          # All findings
nanika_findings { "severity": "critical" }  # Critical only
nanika_findings { "domain": "zetsu" }       # Security findings

LESSON 28 SKILL

What Are Skills

A skill is a directory at .claude/skills/{name}/SKILL.md that tells Claude how to use a particular tool or approach. Skills are Nanika's knowledge layer — they describe when to use something, what commands it exposes, and how to configure it. Workers read skills to know what's available and how to use it.

The Two-Layer System

Nanika uses a two-layer approach to skill discovery, based on Vercel's research showing that passive context dramatically outperforms on-demand retrieval:

Layer	File	Role	When loaded
Reference	`SKILL.md`	Full command docs, examples, configuration	On demand — when a worker needs detail
Routing	AGENTS-MD block in `CLAUDE.md`	Compressed index of all skills	Every turn — always in context

Vercel's research found that passive context (always in context) achieves a 100% pass rate, versus 53% for on-demand skills. The routing index ensures workers always know what skills exist. When a worker needs to actually use a skill, it fetches the full SKILL.md for the details.

Three Skill Types

Type	Example	Location	`allowed-tools`	`## Commands`
CLI wrapper	engage, scout, orchestrator	Symlink → `~/skills/{name}/`	Required	Required
Pipeline	channels, decomposer	Real dir in `.claude/skills/`	Not required	Not required
Knowledge	golang-pro, vercel-react-best-practices	Symlink → `~/.agents/skills/{name}`	Not required	Not required

CLI wrapper skills teach workers how to use a command-line tool. The allowed-tools frontmatter restricts what bash commands the worker can run, and the ## Commands section is the source of truth for the routing index.

Pipeline skills are orchestration blueprints — they describe multi-step workflows that involve multiple tools or agents. They don't wrap a single CLI.

Knowledge skills are reference documents — best practices, style guides, domain expertise. Workers read them before starting work in a domain, not to execute commands.

Skills vs. Plugins

The distinction between skills and plugins is important:

Skills are the brain — orchestration, planning, and knowledge. They live in .claude/skills/ and are read by Claude Code workers.
Plugins are the hands — domain-specific CLI tools that skills invoke. They live in plugins/ and are Go binaries that workers call via bash.

For example, the orchestrator skill teaches workers how to invoke the orchestrator CLI. The orchestrator CLI (the plugin binary) is what actually does the work.

Layer	Examples	What it is
Skills	orchestrator, decomposer, channels	Knowledge docs in `.claude/skills/`
Plugins	nen, scheduler, tracker, discord, telegram	CLI binaries in `plugins/`

Skill Discovery

Claude Code discovers skills by reading CLAUDE.md at the root of the Nanika directory. The routing index block in CLAUDE.md lists every skill with its description and example commands. This is why opening Nanika in Claude Code is all you need to do — no manual registration, no configuration step.

The routing index is auto-generated from the actual SKILL.md files by scripts/generate-agents-md.sh. After installing a new skill, run the script to update the index and make the skill visible to workers.

More skills = smarter workers. Workers can only use skills they know about. A worker executing a mission that needs the scout CLI will fail if the scout skill isn't installed and indexed. Keeping skills up-to-date is maintenance, not setup.

LESSON 29 SKILL

Routing Index

The routing index is the compressed skill table that lives in CLAUDE.md between two HTML comment markers. Every time Claude Code starts a session in the Nanika directory, it reads this index and immediately knows every skill that's installed, what it does, and what commands it exposes — without fetching any individual SKILL.md file.

Why It Exists

Loading full SKILL.md files into every worker's context would be expensive and slow. Instead, Nanika uses a two-layer approach: the routing index is a compact summary (one pipe-delimited line per skill) that's always in context. When a worker needs the full details to actually use a skill, it fetches the SKILL.md on demand.

This mirrors how you work: you know what tools exist without having read every man page. You reach for the manual when you need specifics.

Index Format

The routing index block in CLAUDE.md:

<!-- NANIKA-AGENTS-MD-START -->
[Nanika Skills Index][root: .claude/skills]IMPORTANT: Prefer retrieval-led reasoning...

|{name} — {description}:{path/to/SKILL.md}|`cmd1`|`cmd2`|...|

[Domain Detection]|dev:{keywords}|personal:{keywords}|
[Orchestration Triggers]|keywords:{invoke: orchestrator run}|
<!-- NANIKA-AGENTS-MD-END -->

Each skill line is pipe-delimited:

Name and description (with the SKILL.md path for on-demand loading)
Up to 14 example commands, each in backticks

The domain detection section maps keywords to domains, so workers can route tasks to the right domain without asking. The orchestration triggers section tells workers when to automatically invoke the orchestrator.

Never Edit Manually

The AGENTS-MD block is auto-generated. Editing it by hand will be overwritten the next time the generation script runs, and manual edits often introduce formatting errors that break parsing. Always use the script:

./scripts/generate-agents-md.sh              # Generate + inject into CLAUDE.md
./scripts/generate-agents-md.sh --dry-run    # Print without writing
./scripts/generate-agents-md.sh --diff       # Show what would change

How Generation Works

The script runs a five-step pipeline:

Scan .claude/skills/*/SKILL.md — finds every installed skill
Extract commands from ```bash blocks under ## Commands in each file
Take the first 14 commands per skill (ordering in SKILL.md matters)
Build the compressed pipe-delimited routing table
Inject between the NANIKA-AGENTS-MD-START and NANIKA-AGENTS-MD-END markers

This means the routing index is always derived from the actual skill files. If a skill's commands change, regenerate the index and the change is immediately visible to workers in their next session.

After Adding a Skill

Any time you install a new skill or plugin, update the routing index:

# After installing a new skill
./scripts/generate-agents-md.sh

# Verify it appears
grep "new-skill-name" CLAUDE.md

Workers in missions that started before the index was updated won't see the new skill. Mission workspaces snapshot CLAUDE.md at creation time. New missions will pick up the updated index automatically.

Command ordering in SKILL.md matters. The generator takes the first 14 commands from each skill's ## Commands section. Put the most commonly used commands first. A skill with 30 commands will only surface 14 in the routing index — make sure the most important ones are at the top.

Domain Detection

The domain detection section of the routing index helps workers decide which domain to assign to a task before handing it to the orchestrator. Keywords are matched against the task description:

Domain	Example keywords
`dev`	build, deploy, code, API, refactor, test
`personal`	plan, research, travel, budget, schedule

Domain assignment affects which personas are available and which workspace the mission runs in. If a task is ambiguous, the orchestrator prompts for clarification — but for clear cases, domain detection routes automatically.

LESSON 30 SKILL

Installing Skills

Workers automatically use installed Claude Code skills during missions. Every skill you add expands what workers can do without changing any prompt or configuration — the routing index picks it up automatically. More skills means smarter workers.

Installation Methods

There are two ways to install a skill:

From GitHub (using the skills CLI)

npx skills i owner/skill-name

This uses the Vercel Labs skills CLI to fetch the skill repository, place it in ~/skills/{name}/, and create the necessary directory structure. It's the recommended method for published skills.

Manual Copy

cp -r .claude/skills/scout ~/.claude/skills/

For skills that aren't published, or when you're developing locally, copy the skill directory directly. The skill directory must contain a SKILL.md file at its root.

After Installing

Installing a skill doesn't make it visible to workers until you update the routing index:

./scripts/generate-agents-md.sh        # Update routing index so workers can discover it

This regenerates the AGENTS-MD block in CLAUDE.md to include the new skill. Workers in new missions will see it immediately. Existing running missions won't — they snapshotted CLAUDE.md at creation time.

Directory Layout

After installation, a CLI skill has this layout:

~/skills/{name}/                      # CLI repo (SOURCE OF TRUTH)
├── .claude/
│   └── skills/
│       └── {name}/
│           ├── SKILL.md
│           └── metadata.json
└── cmd/

~/nanika/.claude/skills/{name}/       # Symlink → ~/skills/{name}/.claude/skills/{name}

The symlink is what Nanika reads. The source of truth lives in ~/skills/{name}/. This separation means you can update a skill by pulling changes in its repo, and Nanika immediately sees the update — no re-installation needed.

Cross-Agent Compatibility

Nanika's skill format is compatible with other AI agent runtimes. Skills installed for Nanika can be used by other agents without modification:

Agent	Skill location
Claude Code (Nanika)	`~/.claude/skills/{name}/SKILL.md`
Gemini CLI	`.gemini/skills/{name}/SKILL.md`

The format is also compatible with the Anthropic skills standard (YAML frontmatter, allowed-tools), Vercel Labs' skills CLI, and the Cloudflare agents RFC.

Verifying Installation

After installing a skill and regenerating the index, verify it's visible:

# Check the skill directory exists
ls ~/.claude/skills/your-skill-name/

# Verify SKILL.md is present
ls ~/.claude/skills/your-skill-name/SKILL.md

# Confirm it appears in the routing index
grep "your-skill-name" ~/nanika/CLAUDE.md

If the skill doesn't appear in CLAUDE.md after running generate-agents-md.sh, check that the SKILL.md file has a valid ## Commands section with at least one command in a ```bash block. The generator skips skills that don't match the expected format.

Symlink or copy? For skills you're actively developing, use a symlink so changes in the source repo are immediately reflected in Nanika. For skills you're just using, a copy is simpler. The skills CLI creates symlinks by default.

LESSON 31 SKILL

Writing a Skill

A well-written skill file does two things: it routes correctly (workers know when to use this skill) and it documents accurately (workers know how to use it). This lesson covers the canonical format for CLI wrapper skills — the most common type.

Canonical Location

~/skills/{name}/.claude/skills/{name}/SKILL.md

The source of truth lives in the skill's own repository. Nanika reads a symlink that points here. This means you can maintain the skill separately and pull updates without touching Nanika.

Frontmatter Spec

---
name: scout
description: Gathers intelligence on configurable topics via scout CLI. Use when user asks about news, trends, scraping, intel gathering, or monitoring topics.
allowed-tools: Bash(scout:*)
argument-hint: "[topic-name]"
---

Field	Required	Rule
`name`	Yes	Must match CLI binary name. Lowercase, hyphens only, max 64 chars.
`description`	Yes	Two-part: `{What it does}. Use when {trigger conditions}.`
`allowed-tools`	CLI only	Pattern: `Bash({cli-name}:*)`. Omit for knowledge/pipeline skills.
`argument-hint`	No	Usage hint shown in slash command autocomplete.

The description field is the most important for routing. The first sentence describes what the tool does; the second tells workers when to reach for it. Workers match task descriptions against these trigger conditions. Be specific — "Use when user asks about news, trends, scraping" is better than "Use when user needs information."

Required Sections

For CLI wrapper skills, these sections are required:

Title and subtitle

# Scout — Intelligence Gathering CLI

One line that summarizes what the skill wraps.

When to Use

## When to Use

- User asks about recent news in a topic area
- User wants to monitor a subject over time
- User requests a competitive intelligence sweep
- User asks what's happening in a technical domain

Four to eight trigger bullets. These supplement the frontmatter description and give workers more context for routing decisions.

Commands

## Commands

```bash
scout topics
scout topics add "my-topic"
scout gather
scout intel "my-topic"
```

This section is the source of truth for the routing index generator. Commands must be in ```bash blocks. Each command must start with the CLI binary name. The first 14 commands are extracted — order matters, put the most common ones first.

Configuration

## Configuration

Config file: `~/.alluka/scout/config.json`

Where the tool stores its configuration. Workers need this to help users debug config issues.

Examples

## Examples

**User:** Gather intel on Go 1.25 release
**Action:** `scout gather "go-release"`

**User:** What are people saying about Claude on Hacker News?
**Action:** `scout topics add "claude-anthropic" --sources hackernews && scout gather`

User/Action pairs that show realistic usage. Workers use these as few-shot examples when deciding how to invoke the CLI.

Command Format Rules

These rules determine whether your skill routes correctly:

Commands must be in ```bash blocks (not ```sh or plain code blocks)
Commands must be under the ## Commands heading
Each command must start with the CLI binary name
The first 14 commands are extracted — put most-used commands first
Don't include placeholder text in commands that appear in the routing index

Test your routing: After writing a skill and regenerating the index, open Nanika in Claude Code and ask it to use the skill for a task. If it reaches for the skill unprompted, routing is working. If it doesn't know about the skill at all, check that the ## Commands section has commands in properly-fenced ```bash blocks.

New Skill Checklist

Follow these steps in order when creating a new CLI skill from scratch:

# 1. Create the CLI binary
mkdir -p ~/skills/{name}/cmd/{name}-cli
# Write main.go and Makefile

# 2. Create the skill directory
mkdir -p ~/skills/{name}/.claude/skills/{name}
# Write SKILL.md following the spec above

# 3. Build and install the binary
cd ~/skills/{name} && make install

# 4. Symlink into Nanika
ln -s ~/skills/{name}/.claude/skills/{name} ~/nanika/.claude/skills/{name}

# 5. Regenerate routing index
cd ~/nanika && ./scripts/generate-agents-md.sh

# 6. Verify the skill appears
grep "{name}" ~/nanika/CLAUDE.md

Knowledge and Pipeline Skills

Knowledge and pipeline skills have a simpler format — no allowed-tools, no ## Commands section required. They're reference documents or workflow blueprints. The frontmatter description and ## When to Use still matter for routing, but the rest of the structure is more flexible.

For a knowledge skill, the primary content is prose: best practices, patterns, examples, and anti-patterns. Workers read it before starting work in the domain, not to execute commands.

LESSON 32 EVENTS

Event Bus Overview

The orchestrator daemon emits structured events to JSONL files and a Unix domain socket. Any subscriber can watch. Plugins are subscribers, not dependencies — the orchestrator runs fine without any of them installed.

Event Flow

Events originate from the orchestrator daemon and fan out to two destinations simultaneously:

orchestrator daemon  →  events.sock (UDS)  →  nen-daemon (scanners)
                     →  events/*.jsonl     →  discord/telegram (notifications)

The Unix domain socket delivers live events to any connected process. The JSONL files provide a persistent log that can be replayed after the fact. Both transports carry identical events — choose based on whether you need real-time streaming or historical replay.

Event Types

The bus emits 28 event types grouped into 14 categories:

Category	Events
Mission	`mission.started`, `mission.completed`, `mission.failed`, `mission.cancelled`
Phase	`phase.started`, `phase.completed`, `phase.failed`, `phase.skipped`, `phase.retrying`
Worker	`worker.spawned`, `worker.output`, `worker.completed`, `worker.failed`
Decompose	`decompose.started`, `decompose.completed`, `decompose.fallback`
Learning	`learning.extracted`, `learning.stored`
DAG	`dag.dependency_resolved`, `dag.phase_dispatched`
Role	`role.handoff`
Contract	`contract.validated`, `contract.violated`, `persona.contract_violation`
Review	`review.findings_emitted`, `review.external_requested`
Git	`git.worktree_created`, `git.committed`, `git.pr_created`
System	`system.error`, `system.checkpoint_saved`
Signals	`signal.scope_expansion`, `signal.replan_required`, `signal.human_decision_needed`
Security	`security.invisible_chars_stripped`, `security.injection_detected`
File	`file_overlap.detected`

File Paths

All event bus files live under ~/.alluka/:

Path	Purpose
`~/.alluka/events.sock`	Event broadcast socket (UDS) — connect for live stream
`~/.alluka/daemon.pid`	Daemon PID file
`~/.alluka/daemon.sock`	Daemon control socket (internal use)
`~/.alluka/events/<mission_id>.jsonl`	Per-mission JSONL event log

Note: ~/.alluka is an intentional Hunter × Hunter reference — the vessel/intelligence split mirrors the series' concept of Nen vessels. Do not rename it to ~/.nanika/.

Plugin Architecture

Plugins connect to the event bus as subscribers. This is a deliberate architectural choice: the orchestrator has no knowledge of which plugins are installed. The discord notification plugin, the telegram plugin, and the nen scanners all subscribe to the same bus independently. If none are running, the orchestrator continues normally — it emits events into the void without blocking.

This decoupling means you can add or remove plugins without touching the orchestrator, and a slow or crashed plugin cannot stall a running mission.

LESSON 33 EVENTS

JSONL Log

Every event the orchestrator emits is appended to a per-mission JSONL file at ~/.alluka/events/<mission_id>.jsonl. One JSON object per line. The file persists after the mission ends, making it the primary source for post-hoc analysis and debugging.

Event Envelope

Every event follows the same envelope structure regardless of type:

{
  "id": "evt_6868d2b58d433630",
  "type": "mission.started",
  "timestamp": "2026-03-29T08:38:39.550666Z",
  "sequence": 3,
  "mission_id": "20260329-0ec406b5",
  "phase_id": "phase-1",
  "worker_id": "technical-writer-phase-1",
  "data": { "execution_mode": "sequential", "phases": 3 }
}

Field Reference

Field	Type	When Present	Notes
`id`	string	always	`evt_` prefix + 8 hex bytes
`type`	string	always	TypeScript-style dotted event name
`timestamp`	string	always	RFC3339 UTC
`sequence`	int64	always	Monotonic per bus (global order)
`mission_id`	string	always	Mission UUID
`phase_id`	string	optional	Phase/worker lifecycle events
`worker_id`	string	optional	Worker lifecycle events
`data`	object	optional	Event-type-specific payload fields

Gotcha: sequence is assigned by the Bus (globally monotonic), not by individual emitters. This prevents collisions when concurrent missions each start at seq=1, which would break SSE replay deduplication.

File Details

Property	Value
Location	`~/.alluka/events/<mission_id>.jsonl`
Format	One JSON event per line (JSONL / NDJSON)
Permissions	`0600` (user-only read/write)

Polling with tail

The simplest way to watch a running mission is to tail the JSONL file and pipe through jq:

# Watch new events from a mission in real time
tail -f ~/.alluka/events/<mission_id>.jsonl | jq -c '.type'

# Process all events since last check (byte-offset polling)
offset=$(stat -f%z ~/.alluka/events/20260329-0ec406b5.jsonl 2>/dev/null || echo 0)
tail -c +$offset ~/.alluka/events/20260329-0ec406b5.jsonl | jq .

The byte-offset pattern is useful for polling loops: record the file size after each pass and only read newly appended bytes on the next iteration. This avoids re-processing events you've already handled without needing a cursor file.

Historical Replay

Because JSONL files persist after mission completion, you can replay any mission's event history. Use orchestrator metrics --mission <id> for a summary view, or pipe the raw JSONL through jq for custom analysis:

# Count events by type for a completed mission
jq -r '.type' ~/.alluka/events/20260329-0ec406b5.jsonl | sort | uniq -c | sort -rn

# Extract all worker output events
jq 'select(.type == "worker.output")' ~/.alluka/events/20260329-0ec406b5.jsonl

LESSON 34 EVENTS

Unix Domain Socket

The orchestrator daemon listens on ~/.alluka/events.sock and writes all bus events as newline-delimited JSON to every connected client. This is the real-time transport — use it when you need events as they happen rather than after the mission completes.

Connecting to the Socket

Any tool that can speak to a Unix domain socket works. The simplest approaches:

# Using socat (one-liner, useful for debugging)
socat - UNIX-CONNECT:~/.alluka/events.sock

# Using nc (alternative)
nc -U ~/.alluka/events.sock

# Pipe through jq to filter by type
socat - UNIX-CONNECT:~/.alluka/events.sock | jq 'select(.type | startswith("mission."))'

Go Subscriber Example

For production subscribers, connect with a timeout and scan line by line:

conn, _ := net.DialTimeout("unix", "~/.alluka/events.sock", 5*time.Second)
scanner := bufio.NewScanner(conn)
for scanner.Scan() {
  var ev Event
  json.Unmarshal(scanner.Bytes(), &ev)
  // process ev
}

Bus Internals

The event bus uses a fixed-capacity ring buffer of 1000 events. Publishing to the bus is non-blocking — if a subscriber is slow, it misses events rather than stalling the mission. This is an explicit design choice: mission throughput takes priority over guaranteed delivery to observers.

Property	Value
Ring buffer size	1000 events
Subscriber channel buffer	64 events per subscriber
Publishing behavior	Non-blocking (slow subscribers drop events)
Replay on reconnect	Use `.EventsSince(seq)` to replay buffered events

Drop Detection

Three counters track dropped events at different layers:

UDS emitter .DroppedWrites() — socket timeouts and write failures to connected clients
File emitter .DroppedWrites() — I/O errors writing to the JSONL file
Bus .SubscriberDrops() — slow consumers whose channel buffer filled (missed events, not write errors)

Note: If any drop counter is non-zero, the event log is incomplete. Use orchestrator metrics --mission <id> to check phase and worker telemetry for the ground truth on what actually executed.

Reconnect Strategy

The daemon may restart or the socket may become unavailable. A robust subscriber should reconnect with exponential backoff:

for {
    conn, err := net.DialTimeout("unix", sockPath, 5*time.Second)
    if err != nil {
        time.Sleep(5 * time.Second) // back off and retry
        continue
    }
    // read until EOF or error, then loop to reconnect
}

On reconnect, call .EventsSince(lastSeq) against the ring buffer to replay any events missed during the disconnection window. Events older than the 1000-event buffer are gone from memory but still available in the JSONL log.

LESSON 35 EVENTS

Building a Subscriber

A subscriber is any process that connects to the event bus and reacts to events. All of nanika's observability plugins — nen-daemon, discord notifications, telegram alerts — are subscribers. This lesson covers the canonical pattern they follow.

Consumer Pattern

Every subscriber in the codebase uses the same four-step pattern:

Probe UDS — try to connect to ~/.alluka/events.sock
On success — stream NDJSON events; reconnect with backoff on disconnect
On failure — fall back to JSONL polling from ~/.alluka/events/ every 5 seconds
For each event — deserialize and route to handlers based on .type

The reference implementation lives at plugins/nen/cmd/nen-daemon/main.go.

Go Subscriber Skeleton

This is the minimal structure for a production-quality subscriber:

package main

import (
    "bufio"
    "encoding/json"
    "net"
    "time"
)

type Event struct {
    ID        string          `json:"id"`
    Type      string          `json:"type"`
    Timestamp string          `json:"timestamp"`
    Sequence  int64           `json:"sequence"`
    MissionID string          `json:"mission_id"`
    PhaseID   string          `json:"phase_id,omitempty"`
    Data      json.RawMessage `json:"data,omitempty"`
}

func subscribe(sockPath string) {
    for {
        conn, err := net.DialTimeout("unix", sockPath, 5*time.Second)
        if err != nil {
            time.Sleep(5 * time.Second)
            continue
        }
        scanner := bufio.NewScanner(conn)
        for scanner.Scan() {
            var ev Event
            if err := json.Unmarshal(scanner.Bytes(), &ev); err != nil {
                continue
            }
            handleEvent(ev)
        }
        conn.Close()
    }
}

func handleEvent(ev Event) {
    switch ev.Type {
    case "mission.started":
        // ...
    case "phase.completed":
        // ...
    }
}

The Nen Fan-Out Pattern

The nen-daemon extends this skeleton by subscribing once and fanning out to individual scanner goroutines. Each Nen scanner is named after a Hunter × Hunter technique:

Scanner	Role
`gyo`	Perception — observes mission patterns and anomalies
`en`	Awareness — monitors system-wide health signals
`ryu`	Flow — tracks metrics and performance trends
`zetsu`	Suppression — handles security and injection detection events

The nen-daemon receives each event once from the bus and distributes it to all four scanner goroutines via internal channels. This is more efficient than having each scanner maintain its own UDS connection.

Pattern: All Nen scanners follow the same subscribe-and-route structure. When building a new observer plugin, start with this skeleton and add handlers for the event types you care about. The UDS fallback to JSONL polling ensures your plugin continues working even if the daemon restarts mid-mission.

JSONL Fallback

When the UDS connection fails (daemon not running, socket removed), fall back to polling the JSONL files:

func pollJSONL(eventsDir string, handler func(Event)) {
    seen := map[string]int64{} // mission_id → byte offset
    for {
        entries, _ := os.ReadDir(eventsDir)
        for _, e := range entries {
            if !strings.HasSuffix(e.Name(), ".jsonl") {
                continue
            }
            path := filepath.Join(eventsDir, e.Name())
            offset := seen[e.Name()]
            f, err := os.Open(path)
            if err != nil {
                continue
            }
            f.Seek(offset, io.SeekStart)
            scanner := bufio.NewScanner(f)
            for scanner.Scan() {
                var ev Event
                if json.Unmarshal(scanner.Bytes(), &ev) == nil {
                    handler(ev)
                }
            }
            seen[e.Name()], _ = f.Seek(0, io.SeekCurrent)
            f.Close()
        }
        time.Sleep(5 * time.Second)
    }
}

LESSON 36 REF

CLI Commands

Complete reference for all nanika CLI commands. Commands are grouped by binary. All binaries are installed to ~/bin/ by the installer.

orchestrator

The core orchestration binary. Runs missions, shows status, and manages the agent system.

orchestrator run "task description"
orchestrator run --domain personal "task"
orchestrator run ~/.alluka/missions/FEATURE.md
orchestrator run --dry-run "task"
orchestrator status
orchestrator learn
orchestrator cleanup
orchestrator cleanup --older 7d
orchestrator metrics
orchestrator metrics --last 10
orchestrator metrics --domain dev
orchestrator metrics --status failed
orchestrator metrics --mission <id>
orchestrator metrics --days 30

Command	Description
`run "task"`	Decompose and execute a task as a multi-agent mission
`run --domain <d>`	Route to a specific domain workspace
`run <file.md>`	Execute a pre-written mission file
`run --dry-run`	Preview the decomposition without executing
`status`	Show active and recent missions
`learn`	Extract learnings from recent mission outputs
`cleanup`	Remove stale workspaces and artifacts
`cleanup --older 7d`	Remove workspaces older than 7 days
`metrics`	Show mission metrics summary
`metrics --mission <id>`	Detailed metrics for a specific mission

Nen Commands

User-facing binaries: shu (broad sweep, proposals) and ko (eval engine). Gyo, En, Ryu, and Zetsu run automatically via nen-daemon — they have no standalone CLIs.

shu evaluate
shu propose
shu review
shu query status --json
ko evaluate
ko evaluate --suite decomposer

scheduler

Runs cron jobs and manages the publishing pipeline.

scheduler daemon
scheduler daemon --notify
scheduler daemon --once
scheduler daemon --stop
scheduler init
scheduler jobs
scheduler jobs add --name "check-inbox" --cron "*/30 * * * *" --command "your-script"

tracker

Local issue tracker with hierarchical tasks, blocking links, and priority-based ready detection.

tracker create "Task title"
tracker create "Task" --priority P0
tracker show trk-ABC1
tracker list
tracker list --status open
tracker update trk-ABC1 --status in-progress
tracker link trk-ABC1 trk-XYZ2 --type blocks

scripts/

Helper scripts in the nanika root directory:

scripts/install.sh [--core|--all|--plugins X|--no-interactive|--dry-run|--repair]
scripts/new-mission.sh <slug>
scripts/generate-agents-md.sh [--dry-run|--diff]

Uninstall

To fully remove nanika from your system:

make uninstall              # Stop daemons, remove launchd plists
make clean                  # Remove build artifacts
rm -rf ~/bin/{orchestrator,shu,gyo,en,ryu,tracker,scheduler,discord,telegram}
rm -rf ~/.alluka/           # Remove all runtime data

Warning: rm -rf ~/.alluka/ permanently deletes all mission logs, learnings, and event history. Back up anything you want to keep before running this command.

LESSON 37 REF

Mission Format

A mission file is a Markdown file with PHASE lines. The orchestrator reads it and executes each phase in dependency order, spawning workers in parallel where the dependency graph allows.

PHASE Line Syntax

Each PHASE line defines one unit of work:

PHASE: <name> | PERSONA: <persona> | OBJECTIVE: <objective> [| DEPENDS: <phase,phase>]

Field	Required	Description
`PHASE`	Yes	Unique name for this phase. Used in DEPENDS references and logs.
`PERSONA`	No	Which persona to assign. Without it, the orchestrator routes via LLM.
`OBJECTIVE`	Yes	What the worker must produce. Plain language, as specific as needed.
`DEPENDS`	No	Comma-separated list of phase names that must complete first.

Full Mission File Example

# Build Authentication System

## Context
Building JWT auth for the API. Use RS256. PostgreSQL for sessions.

## PHASE Lines

PHASE: design    | PERSONA: architect               | OBJECTIVE: Define the JWT contract and session schema
PHASE: implement | PERSONA: senior-backend-engineer  | OBJECTIVE: Implement auth middleware and session store | DEPENDS: design
PHASE: test      | PERSONA: qa-engineer              | OBJECTIVE: Write integration tests | DEPENDS: implement
PHASE: review    | PERSONA: security-auditor         | OBJECTIVE: Audit auth flow for vulnerabilities | DEPENDS: implement

Execution Order

The orchestrator builds a DAG from the DEPENDS relationships:

Phases without DEPENDS run immediately, in parallel
A phase runs as soon as all its DEPENDS phases complete successfully
If a phase fails, dependent phases are skipped (not cancelled — they never started)
Phases with no dependencies between them always run in parallel

In the example above, design runs first. When it completes, implement starts. When implement completes, both test and review start in parallel — they both depend only on implement, not on each other.

Context Injection

All Markdown content above the PHASE lines is the mission context. The orchestrator injects this text into every worker's system prompt. Use it to provide domain knowledge, constraints, and background that all workers need:

Technology choices and versions
Architecture decisions already made
Constraints (no external dependencies, must use X pattern)
Links to relevant files or documentation

Running a Mission File

# Execute the mission
orchestrator run ~/.alluka/missions/auth.md

# Preview the DAG without executing
orchestrator run --dry-run ~/.alluka/missions/auth.md

Creating a Mission Scaffold

The new-mission.sh script creates a blank mission file with the correct structure:

scripts/new-mission.sh auth-system
# Creates ~/.alluka/missions/auth-system.md

Tip: Use --dry-run before executing any mission file. It shows you how the orchestrator parsed the PHASE lines, which phases will run in parallel, and what workers will be spawned — without spending any tokens.

LESSON 38 REF

plugin.json Reference

Every plugin in nanika has a plugin.json file at its root. Subscribers read this file to discover the plugin, resolve its binary, and enumerate its queryable actions.

Required Fields

Field	Type	Description
`name`	string	Unique plugin identifier. Lowercase, no spaces. Used in CLI paths.
`version`	string	SemVer version string (e.g., `"1.0.0"`). Documentation only.
`api_version`	integer	Must be `1`. Plugin is not discovered if this is missing or less than 1.

Optional Fields

Field	Type	Description
`description`	string	One-liner description of the plugin.
`icon`	string	Icon key (`ListCheck`, `Calendar`, `Bell`, etc.). Maps to icon registry.
`binary`	string	CLI binary name. Resolved via `$PATH` then `~/bin` fallback. Required to be queryable.
`build`	string	Build command (documentation only).
`install`	string	Install command (documentation only).
`tags`	array	Searchable keywords.
`provides`	array	List of query types: `["status", "items", "actions"]`. Documentation only.
`actions`	object	Maps action keys to command strings or objects with `cmd` + `description`.
`repository`	object	Source metadata: `type`, `url`, `path`.

Full Example

The scheduler plugin's plugin.json demonstrates all major fields:

{
  "name": "scheduler",
  "version": "1.0.0",
  "api_version": 1,
  "description": "Local job scheduler and social content publisher",
  "icon": "Calendar",
  "binary": "scheduler",
  "build": "go build -ldflags \"-s -w\" -o bin/scheduler ./cmd/scheduler-cli",
  "install": "ln -sf $(pwd)/bin/scheduler ~/bin/scheduler",
  "tags": ["scheduler", "cron", "jobs"],
  "provides": ["query status", "query items", "query action"],
  "actions": {
    "status": {
      "cmd": ["scheduler", "query", "status", "--json"],
      "description": "Daemon running state, job count, next scheduled run time"
    },
    "items": {
      "cmd": ["scheduler", "query", "items", "--json"],
      "description": "List all jobs"
    },
    "action_run": {
      "cmd": ["scheduler", "query", "action", "run", "<job_id>", "--json"],
      "description": "Execute a job immediately"
    }
  }
}

Actions Object

Actions can be defined in two forms:

Simple string — "status": "scheduler query status --json" — a shell command string
Object — {"cmd": [...], "description": "..."} — an array of args plus a human-readable description

The array form is preferred because it avoids shell quoting issues and is easier to template with runtime parameters like <job_id>.

Note: build and install are documentation fields — they help humans understand how to build the plugin manually. The installer script handles building and linking; subscribers never execute these fields.

LESSON 39 REF

Skill Standard

The skill standard defines the canonical format for SKILL.md files. Following it ensures correct routing by the skills index and compatibility with the Anthropic, Vercel Labs, and Cloudflare skill ecosystems.

Canonical Location

Skill files live at a fixed path relative to the skill's directory:

~/skills/{name}/.claude/skills/{name}/SKILL.md

The directory name, the binary name, and the frontmatter name field must all be identical. The only exception: the orchestrator's skill is named missions rather than orchestrator.

Frontmatter Spec

---
name: scout
description: Gathers intelligence on configurable topics via scout CLI. Use when user asks about news, trends, scraping, intel gathering, or monitoring topics.
allowed-tools: Bash(scout:*)
argument-hint: "[topic-name]"
---

Field	Required	Rule
`name`	Yes	Must match binary name. Lowercase, hyphens only, max 64 chars.
`description`	Yes	`{What it does}. Use when {trigger conditions}.` Third-person.
`allowed-tools`	CLI only	`Bash({cli-name}:*)`. Omit for knowledge/pipeline skills.
`argument-hint`	No	Shown in slash command autocomplete.

Section Requirements

Section	Applies To	Content
`# {Name} — {Subtitle}`	All	One-line summary of what the skill does
`## When to Use`	CLI	4–8 trigger bullets ("Use when the user asks...")
`## Commands`	CLI	Bash code blocks with all available commands
`## Configuration`	CLI	Config file path and any required setup
`## Examples`	CLI	User/Action pairs showing common workflows

Command Extraction Rules

The routing index generator extracts commands from SKILL.md files to build the skills routing table. It follows these rules:

Extract lines from ```bash blocks under ## Commands
Match lines starting with the tool name (from frontmatter name)
Strip trailing # comments and backslash continuations
Take the first 14 commands — ordering matters for routing priority

Note: Put the most common or distinctive commands first in your ## Commands section. The routing index only reads the first 14 lines that match the binary name, so commands buried further down won't be indexed.

Standards Compatibility

The nanika skill format is a superset of three external standards:

Anthropic — YAML frontmatter with name, description, allowed-tools
Vercel Labs — compatible with the skills CLI discovery format
Cloudflare RFC — argument-hint and section conventions align with the Cloudflare agent skill proposal

A skill file written to this standard can be used directly in Claude Code slash commands, discovered by the nanika routing index, and referenced by external skill managers without modification.

You've completed all 33 lessons. You now understand nanika's architecture from installation through self-improvement. Build something great.