Installation
Nanika is a multi-agent orchestration layer that runs on top of Claude Code. The installer is interactive — it checks prerequisites, lets you choose which plugins to include, builds the binaries, and runs doctor checks before finishing.
Prerequisites
Before running the installer, make sure the following are available on your system:
| Dependency | Version | Required for |
|---|---|---|
Go |
>= 1.25 | Skills and most plugins |
Claude Code |
latest | Agent integration |
Rust / Cargo |
latest | tracker plugin only |
Rust is only needed if you plan to install the tracker plugin. The core system runs entirely on Go and Claude Code.
Clone and Install
Clone the repository and run the interactive installer:
git clone https://github.com/joeyhipolito/nanika
cd nanika
scripts/install.sh
The installer will walk you through each step, showing you what it's about to do before doing it. If a prerequisite is missing, it will tell you exactly what to install and how.
Install Flags
For automated environments or when you know exactly what you want, the installer accepts flags that skip the interactive prompts:
| Flag | Behavior |
|---|---|
scripts/install.sh |
Interactive: pick what to install |
scripts/install.sh --core |
Core only (orchestrator + nen + tracker + scheduler) |
scripts/install.sh --all |
Core + discord + telegram |
scripts/install.sh --plugins discord |
Core + specific plugins |
scripts/install.sh --no-interactive |
CI: core only, no prompts |
scripts/install.sh --dry-run |
Show what would be installed |
scripts/install.sh --repair |
Re-check prereqs, rebuild broken plugins |
After Installation
Once the installer finishes, open the nanika directory inside Claude Code:
cd nanika
claude
Claude Code reads the CLAUDE.md file at the root of the repository and automatically discovers all skills. You don't need to register anything manually — the skills index is built into the project structure.
scripts/install.sh --dry-run first if you want to see exactly what the installer will do before committing to it. This is especially useful in shared environments.
Repair Mode
If something breaks after installation — a plugin fails to build, a prerequisite was updated — run the installer in repair mode:
scripts/install.sh --repair
Repair mode re-checks all prerequisites and rebuilds any plugin binaries that are missing or outdated. It won't touch plugins that are already working.
Quick Start
From zero to your first agent mission in under five minutes. This page covers the essential steps to get nanika running and explains what's happening under the hood.
The Three Steps
The entire setup is three commands:
git clone https://github.com/joeyhipolito/nanika
cd nanika
scripts/install.sh
Once installation completes, open the directory in Claude Code:
claude
Claude Code reads CLAUDE.md at startup and discovers all installed skills automatically. No manual registration, no config files to edit.
Your First Prompt
With Claude Code open inside the nanika directory, try this prompt:
research golang error handling best practices and write a report
Nanika decomposes the task into phases, spawns specialized worker agents, and coordinates results between them. The output is a finished report — not just a single response, but a structured artifact produced by multiple agents working in sequence.
Key Scripts
The scripts/ directory contains utilities you'll use regularly:
| Script | What it does |
|---|---|
scripts/install.sh |
Interactive installer — checks prereqs, picks plugins, builds, and runs doctor |
scripts/new-mission.sh <slug> |
Creates a new mission file at ~/.alluka/missions/<slug>.md |
scripts/generate-agents-md.sh |
Regenerates the AGENTS.md routing index from the current persona set |
How Skill Discovery Works
When you open nanika in Claude Code, the agent reads CLAUDE.md. That file contains a skills index — a table mapping skill names to their SKILL.md files. Claude Code loads each skill file, which teaches the agent what CLI commands are available, how to invoke them, and what each does.
This means skills are self-documenting. Adding a new plugin is as simple as dropping a SKILL.md into the right directory and running scripts/generate-agents-md.sh to update the index.
How Missions Flow
Every task you give nanika goes through the same pipeline:
- Decompose — The task is broken into PHASE lines, each with a persona and objective.
- Spawn — Workers execute in parallel where dependencies allow.
- Collect — Artifacts from each phase flow into dependent phases.
- Review — Quality gates check the output before the mission completes.
The next lesson walks through a real mission step by step, showing exactly what each phase produces and how results flow between them.
First Mission
A mission is a task decomposed into phases, with each phase handled by a specialized agent. This lesson walks through what happens when you send nanika its first real task — from the moment you type your prompt to the moment the output lands.
The Mission Pipeline
Every mission follows the same high-level flow:
task → decompose → plan → spawn workers → collect artifacts → review → done
None of this is visible as individual commands — it all happens inside the orchestrator when you submit a prompt. Understanding the pipeline helps you predict what the system will do, write better prompts, and debug when something doesn't go as expected.
A Real Example
Here's what happens when you send:
research AI agent memory systems and write a report
The orchestrator decomposes this into three phases:
PHASE: research | PERSONA: architect | OBJECTIVE: Compare 5 agent memory approaches
PHASE: write | PERSONA: technical-writer | OBJECTIVE: Draft the report | DEPENDS: research
PHASE: review | PERSONA: staff-code-reviewer | OBJECTIVE: Review for accuracy | DEPENDS: write
Three specialized workers execute in dependency order. Each worker is a full Claude Code session loaded with a persona prompt that shapes its behavior and priorities.
DEPENDS: field controls execution order. Phases with no dependencies run in parallel. Phases that depend on earlier phases wait for those artifacts to be written before starting.
The Five Stages
1. Decompose
The task is broken into PHASE lines. Each phase has a name, a persona, and an objective. Dependencies between phases are declared explicitly — nothing is inferred at runtime. Decomposition can be done by the orchestrator's LLM (for open-ended prompts) or from a pre-written mission file (deterministic, good for repeatable workflows).
2. Route
Each phase is assigned a model tier: think for complex reasoning, work for standard implementation, quick for fast, cheap tasks. The orchestrator also picks the runtime — Claude Code for agentic tasks that need file access and tool use, or Codex for code-only tasks.
3. Spawn
Workers launch in parallel wherever the dependency graph allows. Each worker gets:
- A persona CLAUDE.md that defines its role, constraints, and output contract
- Access to the skills its role needs
- A private workspace at
~/.alluka/workspaces/<mission-id>/workers/<persona-phase>/ - The artifact outputs from any phases it depends on
4. Gate
Review phases are special — they block the mission from completing until quality criteria are met. If a review phase fails, the orchestrator injects a fix phase followed by a re-review cycle. Gates exist to prevent bad output from being marked as done.
5. Learn
When the mission finishes, metrics are recorded: duration, retry count, which phases failed and why. Nen observers scan these metrics for patterns. Anomalies surface as findings. Findings feed back into improved decomposition prompts and persona configurations over time.
Where Artifacts Live
Each worker writes its output to its workspace directory. Dependent phases read from those directories. The mission report and final artifacts are collected at:
~/.alluka/workspaces/<mission-id>/
You can inspect a running mission's partial output at any time by reading from that directory. Nothing is buffered in memory — everything is written to disk as it's produced.
Doctor Check
Before running missions, confirm your installation is healthy. There's no single nanika doctor command — health checks happen at two levels: the installer (full stack) and individual plugins (per-plugin dependencies).
Full Stack Check
To re-verify prerequisites and rebuild anything broken, run the installer in repair mode:
scripts/install.sh --repair
Repair mode re-checks Go, Claude Code, and Rust/Cargo, then rebuilds any plugin binaries that are missing or outdated. It skips plugins that are already healthy, so it's safe to run after any system update.
Plugin Doctor Commands
Every plugin exposes a doctor subcommand that checks its specific dependencies — API keys, browser cookies, external service connectivity:
scheduler doctor
tracker doctor
discord doctor
telegram doctor
# Machine-readable output for scripting
discord doctor --json
Run the doctor command for any plugin you plan to use before relying on it in a mission. These checks catch issues the installer can't see, like expired OAuth tokens or missing credentials.
Nen Health Score
For a broader view of system health, query the Shu scanner:
shu query status --json
This returns an overall health score (0–100), a count of critical findings, and whether the Nen daemon is running. It's the fastest way to check if anything has degraded since your last session.
Runtime Data Directory
Nanika stores all runtime data under ~/.alluka/:
| Path | Contents |
|---|---|
~/.alluka/missions/ |
Mission definition files (.md format) |
~/.alluka/workspaces/ |
Per-mission worker workspaces and artifacts |
~/.alluka/metrics.db |
Mission metrics: duration, retries, failures |
~/.alluka/nen/findings.db |
Nen observer findings and anomaly records |
Diagnosing and Repairing Issues
If a boot check fails or a plugin stops working after a system update, run the installer in repair mode:
scripts/install.sh --repair
Repair mode re-checks all prerequisites and rebuilds any plugin binaries that are broken or missing. It skips plugins that are already healthy, so it's safe to run even if only one thing is broken.
Plugin-Level Doctor Commands
Individual plugins also expose their own doctor commands. These check the plugin's specific dependencies — API keys, browser cookies, external services — and report whether the plugin is ready to use:
# Example: check the discord plugin
discord doctor
# Check with JSON output for scripting
discord doctor --json
Run the doctor command for any plugin you plan to use before relying on it in a mission. Plugin-level checks catch issues that the global boot sequence doesn't cover, like missing OAuth tokens or expired session cookies.
scripts/install.sh --repair and shu query status --json. Catching a broken binary before a mission is much less disruptive than mid-run.
How Missions Work
The orchestrator is the engine that runs nanika. It takes a task, decomposes it into phases, routes each phase to a specialized agent, coordinates execution, enforces quality gates, and records what it learned. This lesson explains the full pipeline in detail.
The Full Pipeline
task → decompose → plan → spawn workers → collect artifacts → review → done
This isn't just a conceptual model — it's the literal execution path every mission takes. Understanding each step lets you write better missions, predict behavior, and debug failures.
System Architecture
Before the pipeline, it helps to see how the layers fit together:
┌────────────────────────────────────────────────────────┐
│ Claude Code (reads CLAUDE.md → discovers skills) │
├────────────────────────────────────────────────────────┤
│ Orchestrator │
│ ┌─────────────┐ decomposes task into phases │
│ │ decomposer │ assigns personas + dependencies │
│ └─────────────┘ spawns workers (Claude Code / Codex) │
│ │ │
│ ▼ workers call plugins via SKILL.md │
├────────────────────────────────────────────────────────┤
│ Plugins (CLIs in ~/bin, via plugin.json) │
│ nen / tracker / scheduler / discord / telegram │
│ ▲ │
│ │ subscribe to events │
├────────────────────────────────────────────────────────┤
│ Event Bus (JSONL files + UDS socket) │
├────────────────────────────────────────────────────────┤
│ ~/.alluka/ │
│ missions/ · workspaces/ · metrics.db · nen/findings.db │
└────────────────────────────────────────────────────────┘
Claude Code sits at the top as the human-facing interface. The orchestrator runs beneath it as the coordination engine. Plugins are subscribers on the event bus — they observe and react, but the orchestrator doesn't depend on them being present.
Step 1: Decompose
The orchestrator receives a task and converts it into a set of PHASE lines. Each phase has a name, a persona, an objective, and an optional list of dependencies. Two decomposition modes are available:
- LLM decomposition — for open-ended prompts, the orchestrator uses a model to produce PHASE lines. The resulting plan is reviewed before spawning starts.
- Pre-decomposed — for repeatable workflows, you write the PHASE lines yourself in a mission file. Deterministic and faster to start.
Step 2: Route
Each phase is assigned:
- A model tier:
thinkfor complex multi-step reasoning,workfor standard tasks,quickfor cheap fast responses - A runtime: Claude Code for agentic tasks (file access, tool use, multi-turn), or Codex for pure code generation
Routing decisions are made once at plan time, not dynamically during execution.
Step 3: Spawn
Workers launch in parallel wherever the dependency graph permits. A worker that depends on phase A and phase B won't start until both have written their artifact outputs. Each worker receives:
- Its persona CLAUDE.md (defines role, constraints, output contract, methodology)
- Skill access appropriate to its role
- A private workspace at
~/.alluka/workspaces/<mission-id>/workers/<persona-phase>/ - The artifact files from phases it depends on, injected as prior context
Step 4: Gate
Review phases are quality gates. A review phase reads the output of a preceding phase and evaluates it against defined criteria. If the review fails:
- The orchestrator injects a fix phase targeting the specific failures
- The fix phase runs and produces a corrected artifact
- The review phase runs again against the corrected output
- This cycle repeats until the review passes or a retry limit is reached
Gates prevent missions from completing with known-bad output. They're especially important for code generation, security reviews, and documentation that needs to meet specific standards.
Step 5: Learn
When a mission completes (successfully or not), the orchestrator records metrics to ~/.alluka/metrics.db:
- Total duration and per-phase duration
- Retry counts per phase
- Which phases failed and what the failure reason was
- Model tier and runtime used per phase
Nen observers scan these metrics continuously for patterns. When a pattern is significant enough, it surfaces as a finding in ~/.alluka/nen/findings.db. Findings accumulate over time and feed back into better defaults for decomposition and routing.
Phase Lines
A PHASE line is the atomic unit of a mission plan. Each line declares one unit of work: who does it, what they should accomplish, and what they need before they can start. Understanding the syntax lets you write precise, predictable mission files.
Syntax
PHASE: <name> | PERSONA: <persona> | OBJECTIVE: <objective> [| DEPENDS: <phase1,phase2>]
Fields are separated by pipe characters (|). The DEPENDS field is optional — phases with no declared dependencies can start immediately and run in parallel with other independent phases.
A Three-Phase Mission
The simplest useful pattern is research → write → review:
PHASE: research | PERSONA: architect | OBJECTIVE: Compare 5 agent memory approaches
PHASE: write | PERSONA: technical-writer | OBJECTIVE: Draft the report | DEPENDS: research
PHASE: review | PERSONA: staff-code-reviewer | OBJECTIVE: Review for accuracy | DEPENDS: write
Here, research starts immediately. write waits for research to finish and then receives its output as context. review waits for write. This produces a linear chain where each phase builds on the previous one.
Parallel Phases
Phases without dependencies run simultaneously. A more complex mission might separate concerns and implement in parallel:
PHASE: design | PERSONA: architect | OBJECTIVE: Define the API contract
PHASE: implement | PERSONA: senior-backend-engineer | OBJECTIVE: Build the service
PHASE: review | PERSONA: security-auditor | OBJECTIVE: Audit auth flow | DEPENDS: implement
In this plan, design and implement could potentially run in parallel (neither declares a dependency on the other). review waits for implement. In practice you'd want implement to depend on design, but the syntax gives you full control over which phases must be sequential and which can overlap.
Available Personas
Each persona has a CLAUDE.md file that defines its role, constraints, methodology, and output contract. Pick the persona whose specialization matches the work of the phase:
| Persona | Best for |
|---|---|
academic-researcher | Literature reviews, comparative analysis, citations |
architect | System design, API contracts, architectural decisions |
data-analyst | Data processing, statistical analysis, visualization |
devops-engineer | Infrastructure, CI/CD, deployment configuration |
qa-engineer | Test planning, test writing, coverage analysis |
security-auditor | Security review, threat modeling, vulnerability analysis |
senior-backend-engineer | Server-side implementation, APIs, database work |
senior-frontend-engineer | UI implementation, accessibility, client-side logic |
staff-code-reviewer | Code review, quality gates, standards enforcement |
technical-writer | Documentation, reports, structured prose |
Writing Mission Files
For repeatable workflows, write PHASE lines in a mission file rather than prompting the orchestrator to decompose on the fly. Create a new mission file with:
scripts/new-mission.sh my-feature
This creates ~/.alluka/missions/my-feature.md with a template structure. Edit the PHASE lines to match your workflow, then run it:
orchestrator run ~/.alluka/missions/my-feature.md
Objective Writing Tips
The OBJECTIVE field is what the worker agent reads as its primary instruction. Write it like a clear deliverable, not a vague description:
- Good:
Compare 5 agent memory approaches — output a markdown table with pros/cons and a recommended approach - Too vague:
Research memory systems - Good:
Audit the authentication flow in auth/middleware.go for OWASP Top 10 vulnerabilities — output findings as a numbered list with severity ratings - Too vague:
Review the auth code
Specific objectives produce specific outputs. Vague objectives produce vague outputs that will fail review gates.
Dry Run
Before committing to a full mission execution, you can preview exactly what the orchestrator plans to do. Dry run mode shows the decomposed phases, persona assignments, and dependency graph — without spawning a single worker.
Running a Dry Run
orchestrator run --dry-run "task description"
Pass the same prompt you'd use for a real mission. The orchestrator decomposes it fully and prints the plan: every phase, its assigned persona, its objective, and what it depends on. Nothing executes.
Dry run is also available for mission files:
orchestrator run --dry-run ~/.alluka/missions/my-feature.md
Full Orchestrator Command Reference
Dry run is one flag among many. Here's the full set of orchestrator commands:
Running Missions
# Run from a natural language prompt
orchestrator run "research golang error handling best practices"
# Run with a domain context (changes default persona routing)
orchestrator run --domain personal "plan my Japan trip"
# Run a pre-written mission file
orchestrator run ~/.alluka/missions/FEATURE.md
# Preview without executing
orchestrator run --dry-run "task description"
Checking Status
# Show all active missions
orchestrator status
Cleanup
# Remove completed mission workspaces
orchestrator cleanup
# Remove workspaces older than 7 days
orchestrator cleanup --older 7d
Metrics
# Show recent mission metrics
orchestrator metrics
# Last 10 missions
orchestrator metrics --last 10
# Filter by domain
orchestrator metrics --domain dev
# Filter by status
orchestrator metrics --status failed
# Show a specific mission
orchestrator metrics --mission <id>
# Missions in the last 30 days
orchestrator metrics --days 30
Using Metrics to Improve Missions
The metrics subcommand is more useful than it looks at first. Filtering by --status failed shows which missions didn't complete and which phase caused the failure. Patterns here often reveal:
- Objectives that are too vague for the assigned persona to interpret correctly
- Phases that consistently need more retries, suggesting a model tier upgrade
- Review gates that always fail on the first pass, indicating the preceding phase needs a more specific output contract
Cross-referencing failed missions with the findings in ~/.alluka/nen/findings.db (surfaced by Nen) gives a fuller picture of what's going wrong and why.
Domain Flag
The --domain flag changes the default context the orchestrator uses when routing phases to personas. Without a domain, it defaults to dev. Setting --domain personal adjusts routing toward personas suited to planning, research, and writing rather than engineering.
Domains are a lightweight way to shift the orchestrator's defaults without rewriting PHASE lines. They're most useful for recurring mission types that differ significantly in their persona requirements.
Daemon & Events
The orchestrator daemon is the long-running background process that emits events as missions progress. Nen scanners, notification channels, and other plugins subscribe to these events — giving you real-time visibility into what the system is doing without polling or manual status checks.
Starting the Daemon
orchestrator daemon
The daemon starts in the foreground. To run it persistently in the background, redirect its output to a log file:
orchestrator daemon >> ~/.alluka/logs/scheduler.log 2>&1 &
Once running, the daemon listens for mission commands, coordinates phase execution, and emits events to both a Unix domain socket and per-mission JSONL log files.
Event Flow
Events travel two paths simultaneously:
orchestrator daemon → events.sock (UDS) → nen-daemon (scanners)
→ events/*.jsonl → discord/telegram (notifications)
The Unix domain socket (events.sock) is for low-latency subscribers like the Nen daemon, which needs to react to events in near real-time. The JSONL files are for durability — they persist after the daemon exits and can be replayed or analyzed after the fact.
Event Locations
| Path | Contents |
|---|---|
~/.alluka/events.sock |
Unix domain socket — live event stream for subscribers |
~/.alluka/events/<mission_id>.jsonl |
Per-mission event log — one JSON object per line |
Plugins Are Subscribers, Not Dependencies
This is a critical architectural point: the orchestrator does not depend on any plugin being installed. Plugins are subscribers — they watch the event bus and react, but the orchestrator emits events regardless of whether anyone is listening.
This means:
- You can run nanika without Discord, Telegram, or any notification plugin installed
- Adding a new notification channel doesn't require changing the orchestrator
- A plugin crashing doesn't affect mission execution — events keep flowing
- You can add subscribers retroactively and they'll process future events without any core changes
What Events Look Like
Each event in the JSONL log is a single line of JSON. Events carry a type, a mission ID, a timestamp, and a payload appropriate to the event type. Common event types include phase started, phase completed, phase failed, review passed, review failed, and mission done.
You can tail a mission's event log to watch it progress in real time:
tail -f ~/.alluka/events/<mission_id>.jsonl
Nen and the Daemon
The Nen daemon subscribes to the event socket and runs anomaly scanners against the event stream. When a scanner detects something significant — a phase taking unusually long, repeated retries on the same phase, a review gate looping — it records a finding in ~/.alluka/nen/findings.db.
These findings are passive: Nen observes and records, but doesn't intervene in running missions. The findings accumulate over time and surface patterns that inform how you configure personas, write objectives, and structure mission files.
Notification Plugins
If you have Discord or Telegram configured, their plugins subscribe to the JSONL event files and send you messages when notable events occur — mission started, mission completed, phase failed, review passed. Configuration for what triggers a notification lives in each plugin's config, not in the orchestrator.
To set up notifications, configure the relevant plugin and let it subscribe to events. The orchestrator doesn't need to know the plugin exists.
Constraint-First Design
Most agent frameworks give models a role identity — "You are a senior software engineer with 10 years of experience..." — and call it a persona. Nanika doesn't do that. Understanding why is the key to understanding how personas actually work.
The Problem with Role-Playing
The role-playing framing is a holdover from how human teams are structured. It made sense when a human could only do one job. Models don't have that constraint. Telling a model it is an architect doesn't make it better at architecture — the identity framing is empirically inert.
What actually changes output is behavioral constraints: what to produce, what to avoid, what the output contract is, and what failure modes to guard against.
Role labels add no signal. They add noise. A model that's told it's a "senior engineer" will still produce junior code if nothing else constrains it. A model given a tight output contract and explicit anti-patterns will produce consistently scoped work regardless of what it's "called."
Constraints First, Identity Second
Every nanika persona leads with ## Constraints — not identity. The structure is:
- What this agent must do
- What it must never do
- What a correct output looks like
- What patterns to avoid
Identity follows, but it's minimal and functional — it names the persona for routing purposes, not to prime the model with a character to embody.
Required Section Order
The PERSONA-STANDARD.md defines a fixed section order that all persona files must follow:
- Constraints
- Identity
- Goal
- Expertise
- When to Use
- When NOT to Use
- Principles
- Anti-Patterns
- Methodology
- Output Format
- Self-Check
The order is intentional. Constraints come before identity so the model encounters behavioral bounds before any framing. Anti-Patterns follow Principles so the positive guidance is anchored before the negations. Self-Check comes last — it's a checklist the model runs against its own output before returning a result.
YAML Frontmatter
Every persona file begins with YAML frontmatter that the orchestrator reads for routing and handoff decisions:
---
role: implementer
capabilities:
- Go development
- HTTP servers and middleware
triggers:
- implement
- build
- backend
handoffs:
- architect
- senior-frontend-engineer
---
The role field is one of three values: planner, implementer, or reviewer. This maps directly to which phase type the persona is suited for. The triggers array feeds the keyword-match fallback in the routing algorithm. The handoffs array tells the orchestrator which personas this one delegates to when it encounters work outside its scope.
File Location and Naming
Persona files live at personas/{name}.md — kebab-case, matching the identity field inside the file. The name is the routing key: when the orchestrator assigns PERSONA: senior-backend-engineer to a phase, it loads personas/senior-backend-engineer.md verbatim into the worker's CLAUDE.md.
Why This Matters for Quality
When you read a persona file and its constraints are vague — "produce good code," "be thorough" — that persona will produce inconsistent output. The constraint is doing no work. Effective constraints are specific enough to fail: "Zero any types," "No CSS modules or styled-components," "Must render correctly on mobile (320px viewport)." These are testable. Vague identity claims are not.
The discipline of constraint-first design forces clarity about what you actually want from a worker phase before you send it to a model. That clarity is the real value — it happens in your head before the model sees anything.
Persona Catalog
Nanika ships with 10 built-in personas covering the most common engineering specializations. Each persona is a markdown file in personas/ that gets injected verbatim into a worker's CLAUDE.md when assigned to a phase.
The 10 Built-in Personas
| Persona | Role | Specialization |
|---|---|---|
academic-researcher |
planner | Deep research, literature synthesis, citation management |
architect |
planner | System design, API contracts, architectural decisions |
data-analyst |
implementer | Data analysis, queries, statistical reasoning |
devops-engineer |
implementer | Infrastructure, CI/CD, deployment pipelines |
qa-engineer |
reviewer | Test planning, quality assurance, edge case analysis |
security-auditor |
reviewer | Security review, vulnerability analysis, auth flow auditing |
senior-backend-engineer |
implementer | Backend implementation, Go/Rust services, APIs |
senior-frontend-engineer |
implementer | Frontend implementation, Next.js, React, Tailwind |
staff-code-reviewer |
reviewer | Code review, architectural feedback, blocking issues |
technical-writer |
implementer | Documentation, README files, tutorials |
Persona Assignment in PHASE Lines
Personas are assigned in mission PHASE lines. The orchestrator reads the PERSONA: field and injects the corresponding persona file into the worker's context before the session starts.
PHASE: design | PERSONA: architect | OBJECTIVE: Define the API contract
PHASE: implement | PERSONA: senior-backend-engineer | OBJECTIVE: Build the service
PHASE: review | PERSONA: security-auditor | OBJECTIVE: Audit auth flow | DEPENDS: implement
Each phase runs in an isolated worker with only the assigned persona's constraints, identity, and methodology visible. Workers don't share context — the orchestrator coordinates by reading output artifacts from previous phases and injecting relevant findings as prior-phase notes.
Choosing the Right Persona
A few practical rules for picking personas:
- Planning phases — use
architectfor system design,academic-researcherfor discovery work that needs citations and synthesis - Implementation phases — match the language:
senior-backend-engineerfor Go/Rust APIs,senior-frontend-engineerfor Next.js/React work,devops-engineerfor infra and pipelines - Review phases —
staff-code-reviewerfor general code quality,security-auditorwhen auth or data handling is involved,qa-engineerwhen you need edge-case coverage - Data and analysis work —
data-analystwhen the objective involves querying, aggregating, or interpreting structured data - Documentation —
technical-writerfor user-facing docs, READMEs, and tutorials; don't use implementer personas for docs work
What Gets Injected
When a worker phase starts, the orchestrator builds its CLAUDE.md from several sources in order:
- The persona file (
personas/{name}.md) — constraints, methodology, anti-patterns - The persona's memory file (
personas/{name}/MEMORY.md) — accumulated learnings from past sessions - Available tools index — which CLI tools are installed and how to invoke them
- Prior phase notes — findings and artifacts from phases this phase depends on
- The phase objective — what this specific worker is being asked to do
The persona file is always first. Its constraints take precedence over everything else in the context window.
Omitting PERSONA
If a PHASE line has no PERSONA: field, the orchestrator falls back to automatic routing using the two-layer algorithm described in the next lesson. For exploratory or research phases where the optimal persona isn't obvious, omitting it and letting the router decide is often fine. For production implementation work, be explicit.
Routing & Handoffs
When a PHASE line doesn't specify a persona explicitly, the orchestrator selects one automatically using a two-layer routing algorithm. Understanding how this works helps you write better WhenToUse sections in custom personas — and explains why some automatic routing decisions are better than others.
Layer 1: LLM Match (Primary)
The primary routing path uses a lightweight model (Haiku) to select a persona. The orchestrator calls FormatForDecomposer() to build a compact catalog summary — one entry per persona containing its name, title, WhenToUse triggers, and HandsOffTo targets. This catalog plus the task description goes to Haiku, which returns a single persona name.
This approach handles phrasing variations well. "Write a test plan for the payment flow" and "QA the checkout system" both route to qa-engineer without needing exact keyword matches. Haiku reads the task semantically against the WhenToUse triggers.
Layer 2: Keyword Match (Fallback)
When LLM routing fails or is unavailable, the orchestrator scores every persona against the task description using a deterministic algorithm:
- +1 per
WhenToUseword that matches the task description (prefix match, minimum 4 characters) - −1 per
WhenNotToUseword that matches (minimum 6 characters) - +3 if the persona name stem appears in the task description
- Alphabetically first persona wins as a deterministic fallback when all scores are 0
The minimum character thresholds prevent noise from short prepositions. The -1 penalty for WhenNotToUse matches is what makes explicit handoff guidance effective — a persona that says "don't use me for implementing code" will be actively down-scored for implementation tasks.
Handoff Patterns
Handoffs are declared in the When NOT to Use section of each persona file. The pattern the orchestrator parses is:
- Implementing code (hand off to senior-backend-engineer)
- System design (hand off to architect)
- Writing production code (hand off to senior-backend-engineer or senior-frontend-engineer)
The regex hand off to ([\w][\w-]*) extracts the target persona name. The extracted name must exist in the persona catalog — if it doesn't, it's silently ignored. This means handoff targets are a form of type-checking: they force you to reference personas that actually exist.
How WhenToUse Quality Affects Routing
The LLM router reads WhenToUse bullets as triggers. Quality matters:
| Weak trigger | Strong trigger |
|---|---|
| Implementing code | Implementing Go HTTP endpoints or REST APIs |
| Writing tests | Writing integration tests for database-backed services |
| Frontend work | Building React components with Tailwind CSS in Next.js App Router |
Weak triggers produce ambiguous routing. When two personas have overlapping weak triggers, routing becomes a coin flip. Strong, domain-specific triggers disambiguate — "implementing Go endpoints" won't match senior-frontend-engineer.
The HandsOffTo Field
The YAML frontmatter handoffs array is the machine-readable version of handoff guidance:
---
role: planner
handoffs:
- senior-backend-engineer
- senior-frontend-engineer
- devops-engineer
---
The orchestrator uses this to build the catalog summary sent to Haiku. When the router sees a task that matches an architect trigger but also contains implementation signals, the handoffs array tells it which implementers the architect delegates to — helping the router pick the right persona for the next phase.
WhenToUse bullets for both the persona being selected and the one you expected. The selected persona's bullets are matching the task description more strongly than you intend. Add specificity to the correct persona's triggers, or add a WhenNotToUse entry to the wrongly-selected one.
Explicit Assignment Always Wins
Both routing layers only activate when PERSONA: is absent from the PHASE line. Explicit assignment is always respected and bypasses routing entirely. For any phase where correctness matters, being explicit costs nothing and removes a variable from the system.
Custom Personas
The built-in personas cover common engineering roles, but domain-specific work often calls for domain-specific constraints. A financial data pipeline, a game engine modder, a compliance reviewer — these have different output contracts than anything in the default catalog. This lesson walks through creating a custom persona from scratch.
Step 1: Create the Persona File
Create personas/{name}.md following the required section order from the standard. The filename must be kebab-case and must match the identity field inside the file.
Title line format: # Name — Tagline, kept under 72 characters.
# ml-pipeline-engineer — Machine Learning Pipeline Specialist
---
role: implementer
capabilities:
- Python ML pipelines
- Data preprocessing
- Model training workflows
triggers:
- pipeline
- training
- preprocessing
- ml
handoffs:
- data-analyst
- devops-engineer
---
## Constraints
- Output must be reproducible: all random seeds pinned, all data paths parameterized
- No hardcoded credentials or file paths — use environment variables throughout
- Zero untyped function signatures — all parameters and return values typed
- ...
## Identity
ml-pipeline-engineer — builds reproducible machine learning training pipelines.
WhenToUse Quality Criteria
The WhenToUse section feeds the routing algorithm. Getting it right is the difference between the router working and not:
- 4–8 bullets — fewer is too sparse for routing, more creates noise
- Each bullet must contain at least one distinctive word (≥6 characters)
- Use domain-specific vocabulary — "implement" is too broad, "implementing Go endpoints" is specific
- Avoid vocabulary that overlaps with other personas
## When to Use
- Building data preprocessing pipelines in Python or PySpark
- Implementing model training loops with reproducibility requirements
- Setting up feature engineering workflows for tabular data
- Configuring experiment tracking with MLflow or Weights & Biases
- Debugging training instabilities or gradient issues
WhenNotToUse and Handoffs
Every WhenNotToUse bullet must name an exact existing persona filename — no aliases, no descriptions. The regex parser extracts the target from hand off to {name}:
## When NOT to Use
- Deploying models to production (hand off to devops-engineer)
- Writing data analysis reports (hand off to data-analyst)
- Auditing model security or data privacy (hand off to security-auditor)
Step 2: Create the Memory Directory
Every persona needs a memory directory with an empty seed file:
mkdir personas/ml-pipeline-engineer
touch personas/ml-pipeline-engineer/MEMORY.md
The MEMORY.md file is seeded into the worker's Claude auto-memory before each session. After the session, new lines are appended and the file is deduplicated. Keep it under 5KB — domain-relevant patterns and gotchas only. It accumulates real learnings from real missions over time.
Step 3: Run the Test Suite
Persona validation runs as part of the standard test suite. After creating the file, run:
go test ./internal/persona/...
The tests check: correct section order, title line length, WhenToUse bullet count, minimum word length in WhenToUse bullets, and that all WhenNotToUse handoff targets exist in the catalog. A persona that fails these tests won't route correctly.
Checklist
- Filename is kebab-case and matches identity in content
- YAML frontmatter has
role,capabilities,triggers, andhandoffs - Sections follow required order: Constraints → Identity → Goal → ...
- Title line under 72 characters
WhenToUsehas 4–8 bullets, each with ≥1 word ≥6 characters- Every
WhenNotToUsebullet names an exact existing persona filename personas/{name}/MEMORY.mdcreated (can be empty)- Added to
personaColor()indaemon/api.go go test ./internal/persona/...passes
Plugin Protocol
Nanika's plugin system lets you extend the orchestrator with external CLIs — issue trackers, schedulers, notification channels, anything that can answer three query types. The protocol is intentionally thin: a JSON manifest, a binary, and a SKILL.md so agents know when to invoke it.
Two-Layer Architecture
The plugin protocol has two layers:
- Discovery — Subscribers scan
~/nanika/plugins/*/plugin.jsonto find plugins and their metadata - Query — Subscribers invoke
<binary> query {status|items|actions} --jsonto fetch live data or trigger actions
Both layers are consumed by any subscriber — the orchestrator, a Nen scanner, an MCP server, or a script you write. Plugins are stateless from the subscriber's perspective: no callbacks, no open connections, no lifecycle hooks.
File Layout
Every plugin lives under ~/nanika/plugins/<name>/:
~/nanika/plugins/<name>/
├── plugin.json # Plugin manifest (required)
├── bin/<binary> # Compiled binary (CLI)
└── skills/
└── SKILL.md # Tells agents when and how to invoke it
The plugin.json manifest is the only required file. Without it, the plugin won't be discovered. The binary is resolved via exec.LookPath(binary) — it must be on $PATH or in one of the standard locations below.
Discovery Rules
Subscribers scan ~/nanika/plugins/*/plugin.json on demand. A plugin is skipped if any of the following are true:
plugin.jsonis missing- The JSON is malformed
api_versionis missing or less than 1
Path Resolution
Subscribers enrich $PATH before resolving plugin binaries:
~/bin~/.local/bin~/go/bin/opt/homebrew/bin/usr/local/bin
Install your plugin binary to ~/bin/ — subscribers will find it without any shell configuration changes.
API Version
The current protocol version is 1. Set api_version: 1 in plugin.json to be discovered. Future versions will increment this field; version 1 plugins will remain compatible.
Shipped Plugins
Nanika ships six first-party plugins that follow this protocol:
| Plugin | Binary | Language | Purpose |
|---|---|---|---|
| nen | shu, ko |
Go | Self-improvement scanners + eval engine |
| tracker | tracker |
Rust | Local issue tracker |
| scheduler | scheduler |
Go | Cron jobs + dispatch loop |
| discord | discord |
Go | Channel notifications + voice messages |
| telegram | telegram |
Go | Channel notifications + voice messages |
| nen_mcp | nen-mcp |
Go | MCP server exposing nanika internal state |
Query Interface
The query interface is the contract between subscribers and your plugin binary. Subscribers invoke your binary with standardized subcommand arguments and expect JSON on stdout. There are four query types: status, items, actions, and action execution via action run.
status
The status query returns a single summary object — one number that describes the plugin's current state. Subscribers use this to report plugin health and item counts.
# Invocation
<binary> query status --json
# Response
{ "ok": true, "count": 42, "type": "tracker-status" }
Fields:
ok— boolean, whether the plugin is operationalcount— integer, the summary count (open issues, unread messages, scheduled jobs)type— string identifier, used as a display hint by subscribers
If ok is false, subscribers should treat the plugin as in an error state.
items
The items query returns a list of records — a table of whatever the plugin tracks: issues, jobs, messages, transactions.
# Invocation
<binary> query items --json
# Response
{
"items": [
{
"id": "trk-1",
"title": "Fix login bug",
"status": "in-progress",
"priority": "P0"
},
{
"id": "trk-2",
"title": "Add rate limiting",
"status": "open",
"priority": "P1"
}
],
"count": 2
}
The item schema is flexible — only id and title are required. status and priority are optional but recommended.
actions
The actions query returns a list of available commands. Each action has a name, a shell command, and a description. Subscribers use this to discover what a plugin can do without reading its source.
# Invocation
<binary> query actions --json
# Response
{
"actions": [
{
"name": "next",
"command": "tracker query action next",
"description": "Show highest-priority ready issue"
},
{
"name": "create",
"command": "tracker create",
"description": "Create a new issue"
}
]
}
The command field is the shell command a subscriber runs to trigger the action. It can be a full shell invocation with flags and arguments.
action run
To execute an action, the subscriber calls the binary with the action verb:
# Invocation
<binary> query action run <job_id> --json
# Response
{
"ok": true,
"message": "Job executed successfully",
"exit_code": 0
}
Timeouts
Subscribers should bound query execution. Recommended defaults:
| Query type | Timeout |
|---|---|
status |
15 seconds |
items |
15 seconds |
actions |
30 seconds |
These timeouts are intentionally generous — most queries should complete in under a second. If your plugin is hitting timeouts, the issue is usually an uncached network call or a database query that should be indexed.
Error Handling
Exit codes signal success or failure. A non-zero exit code means the plugin is in an error state. Write errors to stderr — subscribers capture it for diagnostics.
# Success
exit 0
# Failure
echo '{"error": "database not found"}' >&2
exit 1
Testing Queries Locally
Test your plugin's query interface directly from the shell:
my-plugin query status --json
my-plugin query items --json
my-plugin query actions --json
Pipe through a JSON formatter to verify the output shape:
my-plugin query items --json | jq .
Building a Plugin
A plugin needs three things: a CLI binary, a plugin.json manifest, and a skills/SKILL.md file that tells Claude Code when and how to invoke it.
Directory Structure
Start by creating the plugin directory inside the nanika plugins folder:
plugins/my-plugin/
├── plugin.json # Manifest
├── skills/SKILL.md # Claude Code skill documentation
├── cmd/my-plugin/
│ └── main.go # Entry point
└── go.mod
The skills/SKILL.md file is what makes the plugin callable from Claude Code sessions. It gets picked up by scripts/generate-agents-md.sh and injected into the skills index in CLAUDE.md.
Writing the CLI Binary
The binary is a standard CLI tool that implements the query subcommand. Here's the minimal Go structure:
package main
import (
"encoding/json"
"fmt"
"os"
)
func main() {
if len(os.Args) < 3 || os.Args[1] != "query" {
fmt.Fprintf(os.Stderr, "usage: my-plugin query {status|items|actions} --json\n")
os.Exit(1)
}
switch os.Args[2] {
case "status":
json.NewEncoder(os.Stdout).Encode(map[string]any{
"ok": true,
"count": getCount(),
"type": "my-plugin-status",
})
case "items":
json.NewEncoder(os.Stdout).Encode(map[string]any{
"items": getItems(),
"count": len(getItems()),
})
case "actions":
json.NewEncoder(os.Stdout).Encode(map[string]any{
"actions": []map[string]string{
{
"name": "refresh",
"command": "my-plugin sync",
"description": "Sync with remote",
},
},
})
default:
fmt.Fprintf(os.Stderr, "unknown query type: %s\n", os.Args[2])
os.Exit(1)
}
}
Writing plugin.json
The manifest sits at the root of the plugin directory:
{
"name": "my-plugin",
"version": "0.1.0",
"api_version": 1,
"description": "A brief description of what this plugin does",
"icon": "Plug",
"binary": "my-plugin",
"build": "go build -o bin/my-plugin ./cmd/my-plugin",
"install": "cp bin/my-plugin ~/bin/my-plugin",
"tags": ["productivity", "custom"],
"provides": ["status", "items", "actions"]
}
Writing skills/SKILL.md
The skill file documents the plugin for Claude Code. It appears in the skills index that's injected into every nanika worker session:
# my-plugin — Short description of what it does
When to use this skill: brief triggers for when to invoke this plugin.
## Commands
| Command | Description |
|---------|-------------|
| `my-plugin query status --json` | Get current status |
| `my-plugin sync` | Sync with remote source |
## Examples
`my-plugin query items --json`
`my-plugin sync --force`
Build and Install
After creating the files, build and install the binary, then regenerate the agents index:
# Build the binary
make build-plugin-my-plugin
# Install to ~/bin/
make install-plugin-my-plugin
# Update AGENTS.md + CLAUDE.md routing index
scripts/generate-agents-md.sh
For development without a Makefile target, build manually:
cd plugins/my-plugin
go build -o bin/my-plugin ./cmd/my-plugin
ln -s $(pwd)/bin/my-plugin ~/bin/my-plugin
The symlink approach is recommended during development — you rebuild in place and the symlink automatically points to the updated binary.
Development Checklist
- Create manifest (
plugin.jsonwithapi_version: 1) - Implement CLI queries (
status,items,actionssubcommands) - Write
skills/SKILL.mdfor Claude Code discovery - Build the binary and install to
~/bin/ - Run
scripts/generate-agents-md.shto update the skills index - Test each query type:
my-plugin query status --json | jq .
plugin.json Spec
The plugin.json manifest is the single source of truth for a plugin's identity, capabilities, and configuration. Subscribers read it at discovery time to determine what queries to issue.
Required Fields
| Field | Type | Notes |
|---|---|---|
name |
string | Unique identifier; lowercase, no spaces, kebab-case |
version |
string | SemVer string, e.g. 1.0.0 |
api_version |
int | Must be 1 for the current protocol |
If any required field is missing or malformed, the plugin is skipped during discovery with no user-visible error.
Optional Fields
| Field | Type | Notes |
|---|---|---|
description |
string | One-liner description of the plugin |
icon |
string | Icon key (e.g. ListCheck, Calendar, Mail) |
binary |
string | CLI binary name, resolved via $PATH with enriched lookup |
build |
string | Build command (documentation only) |
install |
string | Install command (documentation only) |
tags |
[]string | Searchable keywords |
provides |
[]string | Array of query types this plugin implements: status, items, actions |
actions |
object | Maps action keys to shell commands or command objects |
repository |
object | Source metadata (url, branch) |
Full Example: tracker plugin
This is the manifest for the tracker plugin — the most complete first-party example:
{
"name": "tracker",
"version": "0.1.0",
"api_version": 1,
"description": "Local issue tracker with hierarchical relationships",
"icon": "ListCheck",
"binary": "tracker",
"build": "cargo build --release",
"install": "cp target/release/tracker ~/bin/tracker",
"tags": ["issue-tracking", "task-management"],
"provides": ["status", "items", "actions"],
"actions": {
"status": "tracker query status --json",
"items": "tracker query items --json",
"actions": "tracker query actions --json"
}
}
The actions Field
The actions object maps action keys to commands. Commands can be plain strings (executed directly) or objects with a cmd array and optional description:
"actions": {
"next": "tracker query action next",
"create": {
"cmd": ["tracker", "create", "--interactive"],
"description": "Create a new issue interactively"
}
}
String commands are passed to the shell. Array commands (cmd) are exec'd directly without a shell — safer for arguments that might contain spaces or special characters.
The provides Field
The provides array tells subscribers which query types to issue. If absent, subscribers will attempt all three and skip any that fail. Declaring it explicitly avoids unnecessary invocations:
"provides": ["status"] // status-only plugin
"provides": ["status", "items"] // no action support
"provides": ["status", "items", "actions"] // full support
Icon Keys
Icon values map to Lucide icon names. Commonly used keys:
ListCheck— issue trackers, task listsCalendar— scheduling, calendar pluginsMail— email integrationsMessageSquare— chat/Discord/TelegramDollarSign— finance pluginsClock— time tracking, cron jobsPlug— generic/uncategorized
Minimal Valid Manifest
The smallest possible plugin.json that will be discovered:
{
"name": "my-plugin",
"version": "0.1.0",
"api_version": 1
}
A plugin with only these three fields will be discovered but not queryable — no binary means no queries, no description means a blank entry in any subscriber UI. Add binary and description at minimum before shipping.
api_version. Invalid optional fields are silently ignored — typos in field names won't error, they'll just be ignored. Double-check field names against this spec.
Nen Overview
Nanika watches itself. The Nen subsystem is a collection of self-improvement abilities that run alongside your missions — evaluating health scores, detecting anomalies, running eval suites, managing costs, and protecting against injection. Together they form a feedback loop that makes the system measurably better over time.
The name comes from Hunter x Hunter. Nanika (ナニカ) and Alluka are characters in the series: Alluka is the vessel; Nanika is the wish-granting intelligence inside. ~/.alluka/ is Nanika's vessel — it holds runtime state, missions, metrics, and findings. The Nen abilities each map to a real self-improvement capability.
The Six Abilities
| Ability | Role | How it works |
|---|---|---|
| Shu | Broad sweep | Evaluates all component health scores, flags degradation |
| Gyo | Observe + diagnose | Watches mission metrics, detects anomalies (z-score), answers why things failed |
| Ko | Eval engine | Promptfoo-compatible YAML test runner — runs assertions against LLM output to verify prompt quality |
| En | System health | Binary freshness, workspace hygiene, daemon reachability |
| Ryu | Cost analysis | Surfaces cost trends, model efficiency gaps, retry waste, minimal-output phases |
| Zetsu | Suppress exposure | Strips untrusted input at trust boundaries so workers are invisible to injection |
The Improvement Loop
The abilities work as a pipeline, not in isolation. Each ability hands off to the next:
- Shu finds "decomposer accuracy dropped"
- Gyo diagnoses "persona mis-routing on implementation tasks"
- Ko re-runs evals, verifies the regression
- You fix the prompt
- Ko confirms scores improve
This loop closes the gap between observing a problem and verifying the fix. Without Ko confirming the improvement, you'd only know you tried to fix it — not that you actually did.
How They Run
Gyo, En, and Ryu run automatically via nen-daemon while missions execute. They're passive observers — you don't need to invoke them manually. Their findings accumulate in ~/.alluka/nen/findings.db.
Shu and Ko are on-demand tools. Run them manually, or schedule them via cron:
shu evaluate # Broad sweep across all components
ko evaluate # Run all eval suites
Zetsu is infrastructure — it runs at trust boundaries and has no user-facing commands. It fires events when it strips content, but otherwise operates invisibly.
Proposals and Auto-Remediation
When findings exceed severity thresholds, the system doesn't just report — it acts:
shu proposeauto-generates remediation missions and tracker issues- Proposals queue in
shu review - You approve the proposals you want to run
- The
schedulerdispatches approved missions automatically
This means Nanika can catch a performance regression, generate a fix mission, and queue it for your review — without you having to notice the problem first.
Querying Findings
Findings from all Nen abilities are stored in ~/.alluka/nen/findings.db. Use the nen_mcp plugin to query them from Claude:
nanika_findings {}
nanika_findings { "severity": "high" }
nanika_findings { "domain": "gyo" }
Each finding includes: ability name, severity (low/medium/high/critical), timestamp, component, and a human-readable description. High-severity findings trigger the proposal pipeline automatically.
Shu — Broad Sweep
Shu is the broad-sweep ability. It evaluates all component health scores across the system and flags degradation — giving you a system-wide view of how Nanika is performing over time. Where Gyo watches individual mission events in real time, Shu steps back and asks: are things getting better or worse overall?
Commands
shu evaluate # Run broad sweep across all components
shu propose # Auto-generate remediation missions for findings above threshold
shu review # Review pending proposals before the scheduler dispatches them
Shu is the only Nen ability that runs on-demand rather than automatically via nen-daemon. This is by design — a broad sweep is expensive to compute and produces noisy results if run too frequently. Weekly sweeps are a common pattern; you can schedule them via the scheduler plugin.
What Shu Evaluates
Each sweep scores the following components against historical baselines:
- Decomposer accuracy — persona mis-routing rate across recent missions
- Review gate pass rates — how often reviewer phases approve on first attempt
- Phase retry rates — excess retries signal prompt or tooling issues
- Worker failure rates — terminal failures grouped by persona and task type
- Persona usage frequency — are the right personas being selected for the right tasks?
- Mission duration trends — are missions taking longer than they used to?
Scores are relative, not absolute. A 5% retry rate might be fine for complex implementation missions but alarming for simple research tasks. Shu learns your system's normal range over time and flags deviations from it.
The Proposal Pipeline
When findings exceed severity thresholds, Shu doesn't just report — it generates actionable remediation work:
- Run
shu evaluate— findings are written to~/.alluka/nen/findings.db - Run
shu propose— Shu reads high-severity findings and generates remediation missions - Run
shu review— inspect proposals before approving - Approve the proposals you want to act on
- The
schedulerdispatches approved missions automatically at its next run - Results feed back into the next sweep
Querying Findings
Findings are stored in ~/.alluka/nen/findings.db. Query them from Claude using the nen_mcp plugin:
nanika_findings { "severity": "high" }
nanika_findings { "domain": "shu" }
Each finding includes a component name, severity level, timestamp, and a plain-language description. High-severity findings are highlighted in shu review and are the primary trigger for proposal generation.
Scheduling Sweeps
For ongoing health monitoring, schedule Shu sweeps via the scheduler plugin:
scheduler jobs add --name "weekly-shu" --cron "0 9 * * 1" --command "shu evaluate && shu propose"
This runs a sweep every Monday at 9am and automatically generates proposals for anything that degraded over the week. You review and approve during your normal workflow — no manual monitoring required.
Gyo — Anomaly Detection
Gyo is the observe-and-diagnose ability. It watches mission metrics as they flow through the system and detects anomalies using z-score analysis — then goes further and answers why the anomaly occurred. While Shu gives you a weekly health report, Gyo is the real-time nervous system.
How Gyo Runs
Gyo runs automatically as part of nen-daemon — you don't invoke it manually. It listens on the event stream and processes metrics as missions execute:
orchestrator daemon → events.sock → nen-daemon → gyo scanner → nen/findings.db
To query Gyo's findings from Claude:
nanika_findings { "domain": "gyo" }
nanika_findings { "severity": "high" }
What Gyo Detects
- Phase duration spikes — a phase that normally completes in 30s suddenly takes 4 minutes
- Retry storms — a phase retrying more times than the configured threshold
- Worker failures clustering on a persona — a specific role is consistently failing
- Decompose fallback patterns — the LLM falling back to keyword routing instead of semantic decomposition
Z-Score Analysis
Gyo's detection is statistical, not rule-based. For each metric type, Gyo maintains a rolling window of recent observations. When a new metric arrives, it computes the z-score relative to the rolling window:
- Z-score < 2.0 — normal, no finding
- Z-score 2.0–3.0 — low/medium severity finding
- Z-score > 3.0 — high or critical finding
This means Gyo adapts to your system's actual behavior rather than hard-coded thresholds. A system that normally has 10% retry rates won't alert on 12% — but a system that normally has 2% will alert immediately.
Diagnostic Context
Gyo doesn't just flag anomalies — it correlates them with context to explain why they happened. A finding like "phase duration spike" includes:
- Which persona was assigned to the phase
- What the task type was (research, implementation, review)
- The baseline value and the observed value
- Any co-occurring findings that might be related
This diagnostic context is what makes Gyo useful for the improvement loop. When Shu flags a regression, Gyo's findings tell you where to look — down to the specific persona and task type that's causing the problem.
Finding Severity Levels
| Severity | Meaning | Action |
|---|---|---|
low |
Slight deviation from baseline | Monitor; no immediate action needed |
medium |
Meaningful deviation; worth investigating | Review at next sweep |
high |
Significant anomaly impacting mission quality | Investigate soon; triggers Shu proposals |
critical |
Systemic failure pattern detected | Investigate immediately; auto-generates proposals |
Integration with the Improvement Loop
Gyo's findings flow into Shu's sweep reports. When you run shu evaluate, Shu reads Gyo's recent findings alongside its own component scores. This means the weekly sweep automatically incorporates everything Gyo observed during the week — you don't need to manually reconcile the two.
The full flow from detection to fix:
- Gyo detects "retry storm on senior-engineer persona for implementation tasks"
- Finding written to
~/.alluka/nen/findings.dbwith severity: high - Shu picks up the finding in the next sweep
- Shu generates a proposal: "Review decomposer prompt for implementation task routing"
- You approve; scheduler dispatches a review mission
- Prompt is updated; Ko verifies the fix
Ko — Eval Engine
Ko is the eval engine — a promptfoo-compatible YAML test runner that runs assertions against LLM output to verify prompt quality. When Gyo detects an anomaly and Shu flags a regression, Ko is how you confirm the problem and verify the fix. It closes the loop between observing degradation and proving recovery.
Commands
ko evaluate # Run all eval suites
ko evaluate --suite decomposer # Run a specific suite
The Ko Loop
Ko is built around a tight iteration cycle for prompt improvements:
- Run
ko evaluateto get baseline scores - Change a prompt (or persona configuration)
- Run
ko evaluateagain - If scores improve → commit the change
- If scores regress → revert
This loop makes prompt changes safe. Without Ko, you're guessing whether a change helped. With Ko, you have numbers.
YAML Suite Format
Ko eval suites use the promptfoo YAML format. Here's a complete example for the decomposer prompt:
prompts:
- file://prompts/decomposer.txt
providers:
- id: anthropic:claude-haiku-4-5-20251001
tests:
- vars:
task: "build a REST API"
assert:
- type: contains
value: "PHASE:"
- type: contains
value: "PERSONA:"
- type: javascript
value: output.split("PHASE:").length >= 2
- vars:
task: "research golang error handling"
assert:
- type: contains
value: "PHASE:"
- type: javascript
value: "!output.includes('PHASE: 1') || output.includes('researcher')"
Assertions can be:
contains— output includes a literal stringnot-contains— output does not include a stringjavascript— arbitrary JS expression evaluated againstoutputregex— output matches a regular expressionllm-rubric— a secondary LLM grades the output against a rubric
Suites in Practice
Each major prompt in Nanika has a corresponding Ko suite. The most important ones are:
| Suite | What it tests |
|---|---|
decomposer |
PHASE lines are generated, personas are assigned correctly |
reviewer |
Review gate produces actionable feedback, not false passes |
orchestrator |
Mission plans are valid and non-redundant |
Suite files live in ~/.alluka/evals/. You can add custom suites for any prompt you care about.
Integration with Shu
When Shu flags a regression (e.g., decomposer accuracy dropped), it includes a reference to the Ko suite that covers the affected component. This makes the investigation workflow concrete:
- Shu finding: "Decomposer persona mis-routing rate up 18% — see suite: decomposer"
- Run
ko evaluate --suite decomposerto confirm the regression in test form - Examine failing assertions to understand what's wrong
- Update the decomposer prompt
- Re-run
ko evaluate --suite decomposer - All assertions pass → commit the change
Running Ko in CI
For teams, Ko evals can gate prompt changes in CI. Add a check that runs the relevant suites against any PR that modifies a prompt file. If scores regress, the PR fails. This prevents prompt degradation from reaching production silently.
# In your CI pipeline
ko evaluate --suite decomposer --format json --fail-below 0.95
The --fail-below flag sets a pass-rate threshold. A suite with 95 tests that passes 94 will exit non-zero, blocking the merge.
En, Ryu & Zetsu
The final three Nen abilities handle system health, cost analysis, and injection protection. Unlike Shu and Ko, which you interact with directly, En, Ryu, and Zetsu operate as infrastructure — running automatically via nen-daemon and writing findings to ~/.alluka/nen/findings.db.
En — System Health
En monitors the operational health of the Nanika installation itself. It checks four categories:
- Binary freshness — are installed binaries up-to-date with the source? Stale binaries silently run old behavior while the prompts and configuration expect new behavior.
- Workspace hygiene — orphaned workspaces from missions that terminated abnormally, stale temp files, and workspace directories that should have been cleaned up
- Daemon reachability — is
orchestrator daemonrunning and healthy? Can it accept connections onevents.sock? - Event log completeness — are mission events being written correctly? Missing events indicate a daemon or socket issue.
En findings surface via shu query status --json and the nen_mcp plugin:
nanika_findings { "domain": "en" }
Ryu — Cost Analysis
Ryu analyzes token costs across missions and identifies where you're spending more than you should. It surfaces four types of findings:
- Cost trends per domain — are dev missions getting more expensive over time? Is personal domain usage spiking?
- Model efficiency gaps — are you using expensive models (Opus, Sonnet) for tasks that a cheaper model (Haiku) handles equally well?
- Retry waste — retried phases inflate token cost significantly; Ryu flags phases where retry cost exceeds original cost
- Minimal-output phases — workers that consume many tokens but produce little output relative to cost; often a sign of a poorly-scoped phase
Ryu has no standalone CLI — it runs automatically inside nen-daemon while missions execute. Findings accumulate in ~/.alluka/nen/findings.db and surface in the next shu evaluate sweep or via the MCP plugin:
nanika_findings { "domain": "ryu" }
Ryu findings don't automatically generate proposals — cost optimization requires human judgment about quality trade-offs. Instead, findings appear in shu review for your consideration during the next sweep.
Reading Ryu Output
A typical Ryu analysis report shows:
| Column | Meaning |
|---|---|
| Phase type | The category of work (research, implementation, review) |
| Avg tokens | Mean token consumption for phases of this type |
| Avg cost | Mean dollar cost at current model pricing |
| Retry ratio | What fraction of cost came from retried phases |
| Output ratio | Output tokens / input tokens — low ratios flag minimal-output phases |
Zetsu — Injection Protection
Zetsu handles security at trust boundaries. When workers process external content — web pages, emails, GitHub issues, social posts — that content is untrusted and may contain injected instructions designed to hijack the agent.
Zetsu strips injected content before it reaches workers. Specifically, it:
- Removes content that matches instruction patterns ("ignore previous", "you are now", "system:", etc.)
- Strips invisible unicode characters used to hide injections in otherwise normal text
- Sanitizes content passed as context to worker prompts
When Zetsu acts, it fires events:
security.injection_detected— content matched an injection patternsecurity.invisible_chars_stripped— invisible characters were removed
Workers receive the sanitized context and are unaware of the original content. They cannot see or act on injected instructions.
All Three Together
En, Ryu, and Zetsu together ensure that Nanika stays healthy at the operational level — not just at the prompt quality level. En checks the installation, Ryu checks the economics, and Zetsu checks the security boundary. Their findings complement Shu and Gyo's quality-focused monitoring to give you a complete picture of system health.
All findings are queryable via nen_mcp:
nanika_findings {} # All findings
nanika_findings { "severity": "critical" } # Critical only
nanika_findings { "domain": "zetsu" } # Security findings
What Are Skills
A skill is a directory at .claude/skills/{name}/SKILL.md that tells Claude how to use a particular tool or approach. Skills are Nanika's knowledge layer — they describe when to use something, what commands it exposes, and how to configure it. Workers read skills to know what's available and how to use it.
The Two-Layer System
Nanika uses a two-layer approach to skill discovery, based on Vercel's research showing that passive context dramatically outperforms on-demand retrieval:
| Layer | File | Role | When loaded |
|---|---|---|---|
| Reference | SKILL.md |
Full command docs, examples, configuration | On demand — when a worker needs detail |
| Routing | AGENTS-MD block in CLAUDE.md |
Compressed index of all skills | Every turn — always in context |
Vercel's research found that passive context (always in context) achieves a 100% pass rate, versus 53% for on-demand skills. The routing index ensures workers always know what skills exist. When a worker needs to actually use a skill, it fetches the full SKILL.md for the details.
Three Skill Types
| Type | Example | Location | allowed-tools |
## Commands |
|---|---|---|---|---|
| CLI wrapper | engage, scout, orchestrator | Symlink → ~/skills/{name}/ |
Required | Required |
| Pipeline | channels, decomposer | Real dir in .claude/skills/ |
Not required | Not required |
| Knowledge | golang-pro, vercel-react-best-practices | Symlink → ~/.agents/skills/{name} |
Not required | Not required |
CLI wrapper skills teach workers how to use a command-line tool. The allowed-tools frontmatter restricts what bash commands the worker can run, and the ## Commands section is the source of truth for the routing index.
Pipeline skills are orchestration blueprints — they describe multi-step workflows that involve multiple tools or agents. They don't wrap a single CLI.
Knowledge skills are reference documents — best practices, style guides, domain expertise. Workers read them before starting work in a domain, not to execute commands.
Skills vs. Plugins
The distinction between skills and plugins is important:
- Skills are the brain — orchestration, planning, and knowledge. They live in
.claude/skills/and are read by Claude Code workers. - Plugins are the hands — domain-specific CLI tools that skills invoke. They live in
plugins/and are Go binaries that workers call via bash.
For example, the orchestrator skill teaches workers how to invoke the orchestrator CLI. The orchestrator CLI (the plugin binary) is what actually does the work.
| Layer | Examples | What it is |
|---|---|---|
| Skills | orchestrator, decomposer, channels | Knowledge docs in .claude/skills/ |
| Plugins | nen, scheduler, tracker, discord, telegram | CLI binaries in plugins/ |
Skill Discovery
Claude Code discovers skills by reading CLAUDE.md at the root of the Nanika directory. The routing index block in CLAUDE.md lists every skill with its description and example commands. This is why opening Nanika in Claude Code is all you need to do — no manual registration, no configuration step.
The routing index is auto-generated from the actual SKILL.md files by scripts/generate-agents-md.sh. After installing a new skill, run the script to update the index and make the skill visible to workers.
scout CLI will fail if the scout skill isn't installed and indexed. Keeping skills up-to-date is maintenance, not setup.
Routing Index
The routing index is the compressed skill table that lives in CLAUDE.md between two HTML comment markers. Every time Claude Code starts a session in the Nanika directory, it reads this index and immediately knows every skill that's installed, what it does, and what commands it exposes — without fetching any individual SKILL.md file.
Why It Exists
Loading full SKILL.md files into every worker's context would be expensive and slow. Instead, Nanika uses a two-layer approach: the routing index is a compact summary (one pipe-delimited line per skill) that's always in context. When a worker needs the full details to actually use a skill, it fetches the SKILL.md on demand.
This mirrors how you work: you know what tools exist without having read every man page. You reach for the manual when you need specifics.
Index Format
The routing index block in CLAUDE.md:
<!-- NANIKA-AGENTS-MD-START -->
[Nanika Skills Index][root: .claude/skills]IMPORTANT: Prefer retrieval-led reasoning...
|{name} — {description}:{path/to/SKILL.md}|`cmd1`|`cmd2`|...|
[Domain Detection]|dev:{keywords}|personal:{keywords}|
[Orchestration Triggers]|keywords:{invoke: orchestrator run}|
<!-- NANIKA-AGENTS-MD-END -->
Each skill line is pipe-delimited:
- Name and description (with the
SKILL.mdpath for on-demand loading) - Up to 14 example commands, each in backticks
The domain detection section maps keywords to domains, so workers can route tasks to the right domain without asking. The orchestration triggers section tells workers when to automatically invoke the orchestrator.
Never Edit Manually
The AGENTS-MD block is auto-generated. Editing it by hand will be overwritten the next time the generation script runs, and manual edits often introduce formatting errors that break parsing. Always use the script:
./scripts/generate-agents-md.sh # Generate + inject into CLAUDE.md
./scripts/generate-agents-md.sh --dry-run # Print without writing
./scripts/generate-agents-md.sh --diff # Show what would change
How Generation Works
The script runs a five-step pipeline:
- Scan
.claude/skills/*/SKILL.md— finds every installed skill - Extract commands from
```bashblocks under## Commandsin each file - Take the first 14 commands per skill (ordering in SKILL.md matters)
- Build the compressed pipe-delimited routing table
- Inject between the
NANIKA-AGENTS-MD-STARTandNANIKA-AGENTS-MD-ENDmarkers
This means the routing index is always derived from the actual skill files. If a skill's commands change, regenerate the index and the change is immediately visible to workers in their next session.
After Adding a Skill
Any time you install a new skill or plugin, update the routing index:
# After installing a new skill
./scripts/generate-agents-md.sh
# Verify it appears
grep "new-skill-name" CLAUDE.md
Workers in missions that started before the index was updated won't see the new skill. Mission workspaces snapshot CLAUDE.md at creation time. New missions will pick up the updated index automatically.
## Commands section. Put the most commonly used commands first. A skill with 30 commands will only surface 14 in the routing index — make sure the most important ones are at the top.
Domain Detection
The domain detection section of the routing index helps workers decide which domain to assign to a task before handing it to the orchestrator. Keywords are matched against the task description:
| Domain | Example keywords |
|---|---|
dev |
build, deploy, code, API, refactor, test |
personal |
plan, research, travel, budget, schedule |
Domain assignment affects which personas are available and which workspace the mission runs in. If a task is ambiguous, the orchestrator prompts for clarification — but for clear cases, domain detection routes automatically.
Installing Skills
Workers automatically use installed Claude Code skills during missions. Every skill you add expands what workers can do without changing any prompt or configuration — the routing index picks it up automatically. More skills means smarter workers.
Installation Methods
There are two ways to install a skill:
From GitHub (using the skills CLI)
npx skills i owner/skill-name
This uses the Vercel Labs skills CLI to fetch the skill repository, place it in ~/skills/{name}/, and create the necessary directory structure. It's the recommended method for published skills.
Manual Copy
cp -r .claude/skills/scout ~/.claude/skills/
For skills that aren't published, or when you're developing locally, copy the skill directory directly. The skill directory must contain a SKILL.md file at its root.
After Installing
Installing a skill doesn't make it visible to workers until you update the routing index:
./scripts/generate-agents-md.sh # Update routing index so workers can discover it
This regenerates the AGENTS-MD block in CLAUDE.md to include the new skill. Workers in new missions will see it immediately. Existing running missions won't — they snapshotted CLAUDE.md at creation time.
Directory Layout
After installation, a CLI skill has this layout:
~/skills/{name}/ # CLI repo (SOURCE OF TRUTH)
├── .claude/
│ └── skills/
│ └── {name}/
│ ├── SKILL.md
│ └── metadata.json
└── cmd/
~/nanika/.claude/skills/{name}/ # Symlink → ~/skills/{name}/.claude/skills/{name}
The symlink is what Nanika reads. The source of truth lives in ~/skills/{name}/. This separation means you can update a skill by pulling changes in its repo, and Nanika immediately sees the update — no re-installation needed.
Cross-Agent Compatibility
Nanika's skill format is compatible with other AI agent runtimes. Skills installed for Nanika can be used by other agents without modification:
| Agent | Skill location |
|---|---|
| Claude Code (Nanika) | ~/.claude/skills/{name}/SKILL.md |
| Gemini CLI | .gemini/skills/{name}/SKILL.md |
The format is also compatible with the Anthropic skills standard (YAML frontmatter, allowed-tools), Vercel Labs' skills CLI, and the Cloudflare agents RFC.
Verifying Installation
After installing a skill and regenerating the index, verify it's visible:
# Check the skill directory exists
ls ~/.claude/skills/your-skill-name/
# Verify SKILL.md is present
ls ~/.claude/skills/your-skill-name/SKILL.md
# Confirm it appears in the routing index
grep "your-skill-name" ~/nanika/CLAUDE.md
If the skill doesn't appear in CLAUDE.md after running generate-agents-md.sh, check that the SKILL.md file has a valid ## Commands section with at least one command in a ```bash block. The generator skips skills that don't match the expected format.
Writing a Skill
A well-written skill file does two things: it routes correctly (workers know when to use this skill) and it documents accurately (workers know how to use it). This lesson covers the canonical format for CLI wrapper skills — the most common type.
Canonical Location
~/skills/{name}/.claude/skills/{name}/SKILL.md
The source of truth lives in the skill's own repository. Nanika reads a symlink that points here. This means you can maintain the skill separately and pull updates without touching Nanika.
Frontmatter Spec
---
name: scout
description: Gathers intelligence on configurable topics via scout CLI. Use when user asks about news, trends, scraping, intel gathering, or monitoring topics.
allowed-tools: Bash(scout:*)
argument-hint: "[topic-name]"
---
| Field | Required | Rule |
|---|---|---|
name |
Yes | Must match CLI binary name. Lowercase, hyphens only, max 64 chars. |
description |
Yes | Two-part: {What it does}. Use when {trigger conditions}. |
allowed-tools |
CLI only | Pattern: Bash({cli-name}:*). Omit for knowledge/pipeline skills. |
argument-hint |
No | Usage hint shown in slash command autocomplete. |
The description field is the most important for routing. The first sentence describes what the tool does; the second tells workers when to reach for it. Workers match task descriptions against these trigger conditions. Be specific — "Use when user asks about news, trends, scraping" is better than "Use when user needs information."
Required Sections
For CLI wrapper skills, these sections are required:
Title and subtitle
# Scout — Intelligence Gathering CLI
One line that summarizes what the skill wraps.
When to Use
## When to Use
- User asks about recent news in a topic area
- User wants to monitor a subject over time
- User requests a competitive intelligence sweep
- User asks what's happening in a technical domain
Four to eight trigger bullets. These supplement the frontmatter description and give workers more context for routing decisions.
Commands
## Commands
```bash
scout topics
scout topics add "my-topic"
scout gather
scout intel "my-topic"
```
This section is the source of truth for the routing index generator. Commands must be in ```bash blocks. Each command must start with the CLI binary name. The first 14 commands are extracted — order matters, put the most common ones first.
Configuration
## Configuration
Config file: `~/.alluka/scout/config.json`
Where the tool stores its configuration. Workers need this to help users debug config issues.
Examples
## Examples
**User:** Gather intel on Go 1.25 release
**Action:** `scout gather "go-release"`
**User:** What are people saying about Claude on Hacker News?
**Action:** `scout topics add "claude-anthropic" --sources hackernews && scout gather`
User/Action pairs that show realistic usage. Workers use these as few-shot examples when deciding how to invoke the CLI.
Command Format Rules
These rules determine whether your skill routes correctly:
- Commands must be in
```bashblocks (not```shor plain code blocks) - Commands must be under the
## Commandsheading - Each command must start with the CLI binary name
- The first 14 commands are extracted — put most-used commands first
- Don't include placeholder text in commands that appear in the routing index
## Commands section has commands in properly-fenced ```bash blocks.
New Skill Checklist
Follow these steps in order when creating a new CLI skill from scratch:
# 1. Create the CLI binary
mkdir -p ~/skills/{name}/cmd/{name}-cli
# Write main.go and Makefile
# 2. Create the skill directory
mkdir -p ~/skills/{name}/.claude/skills/{name}
# Write SKILL.md following the spec above
# 3. Build and install the binary
cd ~/skills/{name} && make install
# 4. Symlink into Nanika
ln -s ~/skills/{name}/.claude/skills/{name} ~/nanika/.claude/skills/{name}
# 5. Regenerate routing index
cd ~/nanika && ./scripts/generate-agents-md.sh
# 6. Verify the skill appears
grep "{name}" ~/nanika/CLAUDE.md
Knowledge and Pipeline Skills
Knowledge and pipeline skills have a simpler format — no allowed-tools, no ## Commands section required. They're reference documents or workflow blueprints. The frontmatter description and ## When to Use still matter for routing, but the rest of the structure is more flexible.
For a knowledge skill, the primary content is prose: best practices, patterns, examples, and anti-patterns. Workers read it before starting work in the domain, not to execute commands.
Event Bus Overview
The orchestrator daemon emits structured events to JSONL files and a Unix domain socket. Any subscriber can watch. Plugins are subscribers, not dependencies — the orchestrator runs fine without any of them installed.
Event Flow
Events originate from the orchestrator daemon and fan out to two destinations simultaneously:
orchestrator daemon → events.sock (UDS) → nen-daemon (scanners)
→ events/*.jsonl → discord/telegram (notifications)
The Unix domain socket delivers live events to any connected process. The JSONL files provide a persistent log that can be replayed after the fact. Both transports carry identical events — choose based on whether you need real-time streaming or historical replay.
Event Types
The bus emits 28 event types grouped into 14 categories:
| Category | Events |
|---|---|
| Mission | mission.started, mission.completed, mission.failed, mission.cancelled |
| Phase | phase.started, phase.completed, phase.failed, phase.skipped, phase.retrying |
| Worker | worker.spawned, worker.output, worker.completed, worker.failed |
| Decompose | decompose.started, decompose.completed, decompose.fallback |
| Learning | learning.extracted, learning.stored |
| DAG | dag.dependency_resolved, dag.phase_dispatched |
| Role | role.handoff |
| Contract | contract.validated, contract.violated, persona.contract_violation |
| Review | review.findings_emitted, review.external_requested |
| Git | git.worktree_created, git.committed, git.pr_created |
| System | system.error, system.checkpoint_saved |
| Signals | signal.scope_expansion, signal.replan_required, signal.human_decision_needed |
| Security | security.invisible_chars_stripped, security.injection_detected |
| File | file_overlap.detected |
File Paths
All event bus files live under ~/.alluka/:
| Path | Purpose |
|---|---|
~/.alluka/events.sock |
Event broadcast socket (UDS) — connect for live stream |
~/.alluka/daemon.pid |
Daemon PID file |
~/.alluka/daemon.sock |
Daemon control socket (internal use) |
~/.alluka/events/<mission_id>.jsonl |
Per-mission JSONL event log |
~/.alluka is an intentional Hunter × Hunter reference — the vessel/intelligence split mirrors the series' concept of Nen vessels. Do not rename it to ~/.nanika/.
Plugin Architecture
Plugins connect to the event bus as subscribers. This is a deliberate architectural choice: the orchestrator has no knowledge of which plugins are installed. The discord notification plugin, the telegram plugin, and the nen scanners all subscribe to the same bus independently. If none are running, the orchestrator continues normally — it emits events into the void without blocking.
This decoupling means you can add or remove plugins without touching the orchestrator, and a slow or crashed plugin cannot stall a running mission.
JSONL Log
Every event the orchestrator emits is appended to a per-mission JSONL file at ~/.alluka/events/<mission_id>.jsonl. One JSON object per line. The file persists after the mission ends, making it the primary source for post-hoc analysis and debugging.
Event Envelope
Every event follows the same envelope structure regardless of type:
{
"id": "evt_6868d2b58d433630",
"type": "mission.started",
"timestamp": "2026-03-29T08:38:39.550666Z",
"sequence": 3,
"mission_id": "20260329-0ec406b5",
"phase_id": "phase-1",
"worker_id": "technical-writer-phase-1",
"data": { "execution_mode": "sequential", "phases": 3 }
}
Field Reference
| Field | Type | When Present | Notes |
|---|---|---|---|
id |
string | always | evt_ prefix + 8 hex bytes |
type |
string | always | TypeScript-style dotted event name |
timestamp |
string | always | RFC3339 UTC |
sequence |
int64 | always | Monotonic per bus (global order) |
mission_id |
string | always | Mission UUID |
phase_id |
string | optional | Phase/worker lifecycle events |
worker_id |
string | optional | Worker lifecycle events |
data |
object | optional | Event-type-specific payload fields |
sequence is assigned by the Bus (globally monotonic), not by individual emitters. This prevents collisions when concurrent missions each start at seq=1, which would break SSE replay deduplication.
File Details
| Property | Value |
|---|---|
| Location | ~/.alluka/events/<mission_id>.jsonl |
| Format | One JSON event per line (JSONL / NDJSON) |
| Permissions | 0600 (user-only read/write) |
Polling with tail
The simplest way to watch a running mission is to tail the JSONL file and pipe through jq:
# Watch new events from a mission in real time
tail -f ~/.alluka/events/<mission_id>.jsonl | jq -c '.type'
# Process all events since last check (byte-offset polling)
offset=$(stat -f%z ~/.alluka/events/20260329-0ec406b5.jsonl 2>/dev/null || echo 0)
tail -c +$offset ~/.alluka/events/20260329-0ec406b5.jsonl | jq .
The byte-offset pattern is useful for polling loops: record the file size after each pass and only read newly appended bytes on the next iteration. This avoids re-processing events you've already handled without needing a cursor file.
Historical Replay
Because JSONL files persist after mission completion, you can replay any mission's event history. Use orchestrator metrics --mission <id> for a summary view, or pipe the raw JSONL through jq for custom analysis:
# Count events by type for a completed mission
jq -r '.type' ~/.alluka/events/20260329-0ec406b5.jsonl | sort | uniq -c | sort -rn
# Extract all worker output events
jq 'select(.type == "worker.output")' ~/.alluka/events/20260329-0ec406b5.jsonl
Unix Domain Socket
The orchestrator daemon listens on ~/.alluka/events.sock and writes all bus events as newline-delimited JSON to every connected client. This is the real-time transport — use it when you need events as they happen rather than after the mission completes.
Connecting to the Socket
Any tool that can speak to a Unix domain socket works. The simplest approaches:
# Using socat (one-liner, useful for debugging)
socat - UNIX-CONNECT:~/.alluka/events.sock
# Using nc (alternative)
nc -U ~/.alluka/events.sock
# Pipe through jq to filter by type
socat - UNIX-CONNECT:~/.alluka/events.sock | jq 'select(.type | startswith("mission."))'
Go Subscriber Example
For production subscribers, connect with a timeout and scan line by line:
conn, _ := net.DialTimeout("unix", "~/.alluka/events.sock", 5*time.Second)
scanner := bufio.NewScanner(conn)
for scanner.Scan() {
var ev Event
json.Unmarshal(scanner.Bytes(), &ev)
// process ev
}
Bus Internals
The event bus uses a fixed-capacity ring buffer of 1000 events. Publishing to the bus is non-blocking — if a subscriber is slow, it misses events rather than stalling the mission. This is an explicit design choice: mission throughput takes priority over guaranteed delivery to observers.
| Property | Value |
|---|---|
| Ring buffer size | 1000 events |
| Subscriber channel buffer | 64 events per subscriber |
| Publishing behavior | Non-blocking (slow subscribers drop events) |
| Replay on reconnect | Use .EventsSince(seq) to replay buffered events |
Drop Detection
Three counters track dropped events at different layers:
- UDS emitter
.DroppedWrites()— socket timeouts and write failures to connected clients - File emitter
.DroppedWrites()— I/O errors writing to the JSONL file - Bus
.SubscriberDrops()— slow consumers whose channel buffer filled (missed events, not write errors)
orchestrator metrics --mission <id> to check phase and worker telemetry for the ground truth on what actually executed.
Reconnect Strategy
The daemon may restart or the socket may become unavailable. A robust subscriber should reconnect with exponential backoff:
for {
conn, err := net.DialTimeout("unix", sockPath, 5*time.Second)
if err != nil {
time.Sleep(5 * time.Second) // back off and retry
continue
}
// read until EOF or error, then loop to reconnect
}
On reconnect, call .EventsSince(lastSeq) against the ring buffer to replay any events missed during the disconnection window. Events older than the 1000-event buffer are gone from memory but still available in the JSONL log.
Building a Subscriber
A subscriber is any process that connects to the event bus and reacts to events. All of nanika's observability plugins — nen-daemon, discord notifications, telegram alerts — are subscribers. This lesson covers the canonical pattern they follow.
Consumer Pattern
Every subscriber in the codebase uses the same four-step pattern:
- Probe UDS — try to connect to
~/.alluka/events.sock - On success — stream NDJSON events; reconnect with backoff on disconnect
- On failure — fall back to JSONL polling from
~/.alluka/events/every 5 seconds - For each event — deserialize and route to handlers based on
.type
The reference implementation lives at plugins/nen/cmd/nen-daemon/main.go.
Go Subscriber Skeleton
This is the minimal structure for a production-quality subscriber:
package main
import (
"bufio"
"encoding/json"
"net"
"time"
)
type Event struct {
ID string `json:"id"`
Type string `json:"type"`
Timestamp string `json:"timestamp"`
Sequence int64 `json:"sequence"`
MissionID string `json:"mission_id"`
PhaseID string `json:"phase_id,omitempty"`
Data json.RawMessage `json:"data,omitempty"`
}
func subscribe(sockPath string) {
for {
conn, err := net.DialTimeout("unix", sockPath, 5*time.Second)
if err != nil {
time.Sleep(5 * time.Second)
continue
}
scanner := bufio.NewScanner(conn)
for scanner.Scan() {
var ev Event
if err := json.Unmarshal(scanner.Bytes(), &ev); err != nil {
continue
}
handleEvent(ev)
}
conn.Close()
}
}
func handleEvent(ev Event) {
switch ev.Type {
case "mission.started":
// ...
case "phase.completed":
// ...
}
}
The Nen Fan-Out Pattern
The nen-daemon extends this skeleton by subscribing once and fanning out to individual scanner goroutines. Each Nen scanner is named after a Hunter × Hunter technique:
| Scanner | Role |
|---|---|
gyo |
Perception — observes mission patterns and anomalies |
en |
Awareness — monitors system-wide health signals |
ryu |
Flow — tracks metrics and performance trends |
zetsu |
Suppression — handles security and injection detection events |
The nen-daemon receives each event once from the bus and distributes it to all four scanner goroutines via internal channels. This is more efficient than having each scanner maintain its own UDS connection.
JSONL Fallback
When the UDS connection fails (daemon not running, socket removed), fall back to polling the JSONL files:
func pollJSONL(eventsDir string, handler func(Event)) {
seen := map[string]int64{} // mission_id → byte offset
for {
entries, _ := os.ReadDir(eventsDir)
for _, e := range entries {
if !strings.HasSuffix(e.Name(), ".jsonl") {
continue
}
path := filepath.Join(eventsDir, e.Name())
offset := seen[e.Name()]
f, err := os.Open(path)
if err != nil {
continue
}
f.Seek(offset, io.SeekStart)
scanner := bufio.NewScanner(f)
for scanner.Scan() {
var ev Event
if json.Unmarshal(scanner.Bytes(), &ev) == nil {
handler(ev)
}
}
seen[e.Name()], _ = f.Seek(0, io.SeekCurrent)
f.Close()
}
time.Sleep(5 * time.Second)
}
}
CLI Commands
Complete reference for all nanika CLI commands. Commands are grouped by binary. All binaries are installed to ~/bin/ by the installer.
orchestrator
The core orchestration binary. Runs missions, shows status, and manages the agent system.
orchestrator run "task description"
orchestrator run --domain personal "task"
orchestrator run ~/.alluka/missions/FEATURE.md
orchestrator run --dry-run "task"
orchestrator status
orchestrator learn
orchestrator cleanup
orchestrator cleanup --older 7d
orchestrator metrics
orchestrator metrics --last 10
orchestrator metrics --domain dev
orchestrator metrics --status failed
orchestrator metrics --mission <id>
orchestrator metrics --days 30
| Command | Description |
|---|---|
run "task" |
Decompose and execute a task as a multi-agent mission |
run --domain <d> |
Route to a specific domain workspace |
run <file.md> |
Execute a pre-written mission file |
run --dry-run |
Preview the decomposition without executing |
status |
Show active and recent missions |
learn |
Extract learnings from recent mission outputs |
cleanup |
Remove stale workspaces and artifacts |
cleanup --older 7d |
Remove workspaces older than 7 days |
metrics |
Show mission metrics summary |
metrics --mission <id> |
Detailed metrics for a specific mission |
Nen Commands
User-facing binaries: shu (broad sweep, proposals) and ko (eval engine). Gyo, En, Ryu, and Zetsu run automatically via nen-daemon — they have no standalone CLIs.
shu evaluate
shu propose
shu review
shu query status --json
ko evaluate
ko evaluate --suite decomposer
scheduler
Runs cron jobs and manages the publishing pipeline.
scheduler daemon
scheduler daemon --notify
scheduler daemon --once
scheduler daemon --stop
scheduler init
scheduler jobs
scheduler jobs add --name "check-inbox" --cron "*/30 * * * *" --command "your-script"
tracker
Local issue tracker with hierarchical tasks, blocking links, and priority-based ready detection.
tracker create "Task title"
tracker create "Task" --priority P0
tracker show trk-ABC1
tracker list
tracker list --status open
tracker update trk-ABC1 --status in-progress
tracker link trk-ABC1 trk-XYZ2 --type blocks
scripts/
Helper scripts in the nanika root directory:
scripts/install.sh [--core|--all|--plugins X|--no-interactive|--dry-run|--repair]
scripts/new-mission.sh <slug>
scripts/generate-agents-md.sh [--dry-run|--diff]
Uninstall
To fully remove nanika from your system:
make uninstall # Stop daemons, remove launchd plists
make clean # Remove build artifacts
rm -rf ~/bin/{orchestrator,shu,gyo,en,ryu,tracker,scheduler,discord,telegram}
rm -rf ~/.alluka/ # Remove all runtime data
rm -rf ~/.alluka/ permanently deletes all mission logs, learnings, and event history. Back up anything you want to keep before running this command.
Mission Format
A mission file is a Markdown file with PHASE lines. The orchestrator reads it and executes each phase in dependency order, spawning workers in parallel where the dependency graph allows.
PHASE Line Syntax
Each PHASE line defines one unit of work:
PHASE: <name> | PERSONA: <persona> | OBJECTIVE: <objective> [| DEPENDS: <phase,phase>]
| Field | Required | Description |
|---|---|---|
PHASE |
Yes | Unique name for this phase. Used in DEPENDS references and logs. |
PERSONA |
No | Which persona to assign. Without it, the orchestrator routes via LLM. |
OBJECTIVE |
Yes | What the worker must produce. Plain language, as specific as needed. |
DEPENDS |
No | Comma-separated list of phase names that must complete first. |
Full Mission File Example
# Build Authentication System
## Context
Building JWT auth for the API. Use RS256. PostgreSQL for sessions.
## PHASE Lines
PHASE: design | PERSONA: architect | OBJECTIVE: Define the JWT contract and session schema
PHASE: implement | PERSONA: senior-backend-engineer | OBJECTIVE: Implement auth middleware and session store | DEPENDS: design
PHASE: test | PERSONA: qa-engineer | OBJECTIVE: Write integration tests | DEPENDS: implement
PHASE: review | PERSONA: security-auditor | OBJECTIVE: Audit auth flow for vulnerabilities | DEPENDS: implement
Execution Order
The orchestrator builds a DAG from the DEPENDS relationships:
- Phases without DEPENDS run immediately, in parallel
- A phase runs as soon as all its DEPENDS phases complete successfully
- If a phase fails, dependent phases are skipped (not cancelled — they never started)
- Phases with no dependencies between them always run in parallel
In the example above, design runs first. When it completes, implement starts. When implement completes, both test and review start in parallel — they both depend only on implement, not on each other.
Context Injection
All Markdown content above the PHASE lines is the mission context. The orchestrator injects this text into every worker's system prompt. Use it to provide domain knowledge, constraints, and background that all workers need:
- Technology choices and versions
- Architecture decisions already made
- Constraints (no external dependencies, must use X pattern)
- Links to relevant files or documentation
Running a Mission File
# Execute the mission
orchestrator run ~/.alluka/missions/auth.md
# Preview the DAG without executing
orchestrator run --dry-run ~/.alluka/missions/auth.md
Creating a Mission Scaffold
The new-mission.sh script creates a blank mission file with the correct structure:
scripts/new-mission.sh auth-system
# Creates ~/.alluka/missions/auth-system.md
--dry-run before executing any mission file. It shows you how the orchestrator parsed the PHASE lines, which phases will run in parallel, and what workers will be spawned — without spending any tokens.
plugin.json Reference
Every plugin in nanika has a plugin.json file at its root. Subscribers read this file to discover the plugin, resolve its binary, and enumerate its queryable actions.
Required Fields
| Field | Type | Description |
|---|---|---|
name |
string | Unique plugin identifier. Lowercase, no spaces. Used in CLI paths. |
version |
string | SemVer version string (e.g., "1.0.0"). Documentation only. |
api_version |
integer | Must be 1. Plugin is not discovered if this is missing or less than 1. |
Optional Fields
| Field | Type | Description |
|---|---|---|
description |
string | One-liner description of the plugin. |
icon |
string | Icon key (ListCheck, Calendar, Bell, etc.). Maps to icon registry. |
binary |
string | CLI binary name. Resolved via $PATH then ~/bin fallback. Required to be queryable. |
build |
string | Build command (documentation only). |
install |
string | Install command (documentation only). |
tags |
array | Searchable keywords. |
provides |
array | List of query types: ["status", "items", "actions"]. Documentation only. |
actions |
object | Maps action keys to command strings or objects with cmd + description. |
repository |
object | Source metadata: type, url, path. |
Full Example
The scheduler plugin's plugin.json demonstrates all major fields:
{
"name": "scheduler",
"version": "1.0.0",
"api_version": 1,
"description": "Local job scheduler and social content publisher",
"icon": "Calendar",
"binary": "scheduler",
"build": "go build -ldflags \"-s -w\" -o bin/scheduler ./cmd/scheduler-cli",
"install": "ln -sf $(pwd)/bin/scheduler ~/bin/scheduler",
"tags": ["scheduler", "cron", "jobs"],
"provides": ["query status", "query items", "query action"],
"actions": {
"status": {
"cmd": ["scheduler", "query", "status", "--json"],
"description": "Daemon running state, job count, next scheduled run time"
},
"items": {
"cmd": ["scheduler", "query", "items", "--json"],
"description": "List all jobs"
},
"action_run": {
"cmd": ["scheduler", "query", "action", "run", "<job_id>", "--json"],
"description": "Execute a job immediately"
}
}
}
Actions Object
Actions can be defined in two forms:
- Simple string —
"status": "scheduler query status --json"— a shell command string - Object —
{"cmd": [...], "description": "..."}— an array of args plus a human-readable description
The array form is preferred because it avoids shell quoting issues and is easier to template with runtime parameters like <job_id>.
build and install are documentation fields — they help humans understand how to build the plugin manually. The installer script handles building and linking; subscribers never execute these fields.
Skill Standard
The skill standard defines the canonical format for SKILL.md files. Following it ensures correct routing by the skills index and compatibility with the Anthropic, Vercel Labs, and Cloudflare skill ecosystems.
Canonical Location
Skill files live at a fixed path relative to the skill's directory:
~/skills/{name}/.claude/skills/{name}/SKILL.md
The directory name, the binary name, and the frontmatter name field must all be identical. The only exception: the orchestrator's skill is named missions rather than orchestrator.
Frontmatter Spec
---
name: scout
description: Gathers intelligence on configurable topics via scout CLI. Use when user asks about news, trends, scraping, intel gathering, or monitoring topics.
allowed-tools: Bash(scout:*)
argument-hint: "[topic-name]"
---
| Field | Required | Rule |
|---|---|---|
name |
Yes | Must match binary name. Lowercase, hyphens only, max 64 chars. |
description |
Yes | {What it does}. Use when {trigger conditions}. Third-person. |
allowed-tools |
CLI only | Bash({cli-name}:*). Omit for knowledge/pipeline skills. |
argument-hint |
No | Shown in slash command autocomplete. |
Section Requirements
| Section | Applies To | Content |
|---|---|---|
# {Name} — {Subtitle} |
All | One-line summary of what the skill does |
## When to Use |
CLI | 4–8 trigger bullets ("Use when the user asks...") |
## Commands |
CLI | Bash code blocks with all available commands |
## Configuration |
CLI | Config file path and any required setup |
## Examples |
CLI | User/Action pairs showing common workflows |
Command Extraction Rules
The routing index generator extracts commands from SKILL.md files to build the skills routing table. It follows these rules:
- Extract lines from
```bashblocks under## Commands - Match lines starting with the tool name (from frontmatter
name) - Strip trailing
# commentsand backslash continuations - Take the first 14 commands — ordering matters for routing priority
## Commands section. The routing index only reads the first 14 lines that match the binary name, so commands buried further down won't be indexed.
Standards Compatibility
The nanika skill format is a superset of three external standards:
- Anthropic — YAML frontmatter with
name,description,allowed-tools - Vercel Labs — compatible with the
skillsCLI discovery format - Cloudflare RFC —
argument-hintand section conventions align with the Cloudflare agent skill proposal
A skill file written to this standard can be used directly in Claude Code slash commands, discovered by the nanika routing index, and referenced by external skill managers without modification.