Help

If you understand the five concepts below, you understand how the factory works. The rest is implementation detail.

What works today vs. what’s coming

You’re looking at v0.1 — scaffolding, auth, and the database. Most action buttons in the dashboard (“Submit a job”, “Register an app”) don’t do anything yet because the machinery behind them lands in later phases. Here’s the honest map:

P1 — done Sign in (GitHub OAuth + org-membership), navigate the dashboard, hit /api/health, deploy via Upsun.
P2 — next Submit a job → factory dispatches claude-job.yml via GitHub Actions, agent opens a PR, telemetry recorded.
P3 — later Cookiecutter template generates new apps; per-app GitHub repo + Upsun project bootstrapped.
P3b — later Per-app weekly anonymization agent (Claude Code job that updates anonymization-rules.sql via PR).
P4 — later Manifesto compliance engine; sticky comment + manifesto/score status check on PRs.
P5 — later Per-PR Upsun preview envs; factory-evals harness; weekly harness scorecard routine.
P6 — last Empathic dashboard polish: glossary keystrokes, onboarding tour, principle attribution chips, prompt examples.

Full plan with phase-by-phase deliverables: PLAN.md on GitHub.

The five concepts

The factory is a PR-opening machine

Every factory action ends in a pull request. The agent never pushes directly to main. The PR is the unit of review, rollback, and audit.

The harness is what makes the agent reliable

The agent (Claude Code) is the brain. The harness is everything around it — hooks, evals, manifesto checks, prompts. You don't make the agent better by tweaking the agent; you make it better by improving the harness.

Each generated app is a fully isolated sibling

Own GitHub repo, own Upsun project, own DB. The factory only opens PRs against an app's repo — it never reaches into the app's runtime.

Mechanical verification is the only way this scales

Manifesto checks, evals, layered-architecture tests, telemetry. Humans review what mechanical checks couldn't catch — judgment calls, novel cases.

The dashboard is a teacher

Every page answers what is this, why does it matter, what should I do. Press ? for the glossary. The harness scorecard at /harness gives you a weekly read on which principles are slipping.

The five harness principles

Constrain

Hooks, allowed tools, branch protection, and per-app isolation cap blast radius. If an agent shouldn't be able to do X, the hook denies it.

Inform

CLAUDE.md, skills/, docs/, prompt templates. Information lives in artifacts the agent reads at job time so it doesn't have to re-derive context.

Verify

Manifesto checks, evals, layered-architecture test, telemetry. If only humans can tell whether the agent did the right thing, you have a bottleneck, not a factory.

Correct

@claude PR review iterations and ADRs that capture rationale. Verification findsproblems; correction primitives let the agent fix them without a fresh job.

Human-in-loop

All factory actions land as PRs. Sticky comments summarize deltas. Branch protectionblocks merge below threshold — humans review the right things at the right time.

When you’re stuck

Check the relevant page; read “Why this matters.”
Press ? for the glossary.
Read the relevant section of HARNESS.md and LEARNING.md.
Check RUNBOOK.md for known incidents.
Open an issue with the prompt + outcome + your guess at what went wrong.
Update LEARNING.md with what confused you. Future-you will thank present-you.