Help
If you understand the five concepts below, you understand how the factory works. The rest is implementation detail.
What works today vs. what’s coming
You’re looking at v0.1 — scaffolding, auth, and the database. Most action buttons in the dashboard (“Submit a job”, “Register an app”) don’t do anything yet because the machinery behind them lands in later phases. Here’s the honest map:
- P1 — done Sign in (GitHub OAuth + org-membership), navigate the dashboard, hit
/api/health, deploy via Upsun. - P2 — next Submit a job → factory dispatches
claude-job.ymlvia GitHub Actions, agent opens a PR, telemetry recorded. - P3 — later Cookiecutter template generates new apps; per-app GitHub repo + Upsun project bootstrapped.
- P3b — later Per-app weekly anonymization agent (Claude Code job that updates
anonymization-rules.sqlvia PR). - P4 — later Manifesto compliance engine; sticky comment +
manifesto/scorestatus check on PRs. - P5 — later Per-PR Upsun preview envs;
factory-evalsharness; weekly harness scorecard routine. - P6 — last Empathic dashboard polish: glossary keystrokes, onboarding tour, principle attribution chips, prompt examples.
Full plan with phase-by-phase deliverables: PLAN.md on GitHub.
The five concepts
The factory is a PR-opening machine
Every factory action ends in a pull request. The agent never pushes directly to main. The PR is the unit of review, rollback, and audit.
The harness is what makes the agent reliable
The agent (Claude Code) is the brain. The harness is everything around it — hooks, evals, manifesto checks, prompts. You don't make the agent better by tweaking the agent; you make it better by improving the harness.
Each generated app is a fully isolated sibling
Own GitHub repo, own Upsun project, own DB. The factory only opens PRs against an app's repo — it never reaches into the app's runtime.
Mechanical verification is the only way this scales
Manifesto checks, evals, layered-architecture tests, telemetry. Humans review what mechanical checks couldn't catch — judgment calls, novel cases.
The dashboard is a teacher
Every page answers what is this, why does it matter, what should I do. Press ? for the glossary. The harness scorecard at /harness gives you a weekly read on which principles are slipping.
The five harness principles
Hooks, allowed tools, branch protection, and per-app isolation cap blast radius. If an agent shouldn't be able to do X, the hook denies it.
CLAUDE.md, skills/, docs/, prompt templates. Information lives in artifacts the agent reads at job time so it doesn't have to re-derive context.
Manifesto checks, evals, layered-architecture test, telemetry. If only humans can tell whether the agent did the right thing, you have a bottleneck, not a factory.
@claude PR review iterations and ADRs that capture rationale. Verification findsproblems; correction primitives let the agent fix them without a fresh job.
All factory actions land as PRs. Sticky comments summarize deltas. Branch protectionblocks merge below threshold — humans review the right things at the right time.
When you’re stuck
- Check the relevant page; read “Why this matters.”
- Press ? for the glossary.
- Read the relevant section of HARNESS.md and LEARNING.md.
- Check RUNBOOK.md for known incidents.
- Open an issue with the prompt + outcome + your guess at what went wrong.
- Update LEARNING.md with what confused you. Future-you will thank present-you.