Help

If you understand the five concepts below, you understand how the factory works. The rest is implementation detail.

What works today vs. what’s coming

You’re looking at v0.1 — scaffolding, auth, and the database. Most action buttons in the dashboard (“Submit a job”, “Register an app”) don’t do anything yet because the machinery behind them lands in later phases. Here’s the honest map:

  • P1 — done Sign in (GitHub OAuth + org-membership), navigate the dashboard, hit /api/health, deploy via Upsun.
  • P2 — next Submit a job → factory dispatches claude-job.yml via GitHub Actions, agent opens a PR, telemetry recorded.
  • P3 — later Cookiecutter template generates new apps; per-app GitHub repo + Upsun project bootstrapped.
  • P3b — later Per-app weekly anonymization agent (Claude Code job that updates anonymization-rules.sql via PR).
  • P4 — later Manifesto compliance engine; sticky comment + manifesto/score status check on PRs.
  • P5 — later Per-PR Upsun preview envs; factory-evals harness; weekly harness scorecard routine.
  • P6 — last Empathic dashboard polish: glossary keystrokes, onboarding tour, principle attribution chips, prompt examples.

Full plan with phase-by-phase deliverables: PLAN.md on GitHub.

The five concepts

1

The factory is a PR-opening machine

Every factory action ends in a pull request. The agent never pushes directly to main. The PR is the unit of review, rollback, and audit.

2

The harness is what makes the agent reliable

The agent (Claude Code) is the brain. The harness is everything around it — hooks, evals, manifesto checks, prompts. You don't make the agent better by tweaking the agent; you make it better by improving the harness.

3

Each generated app is a fully isolated sibling

Own GitHub repo, own Upsun project, own DB. The factory only opens PRs against an app's repo — it never reaches into the app's runtime.

4

Mechanical verification is the only way this scales

Manifesto checks, evals, layered-architecture tests, telemetry. Humans review what mechanical checks couldn't catch — judgment calls, novel cases.

5

The dashboard is a teacher

Every page answers what is this, why does it matter, what should I do. Press ? for the glossary. The harness scorecard at /harness gives you a weekly read on which principles are slipping.

The five harness principles

When you’re stuck

  1. Check the relevant page; read “Why this matters.”
  2. Press ? for the glossary.
  3. Read the relevant section of HARNESS.md and LEARNING.md.
  4. Check RUNBOOK.md for known incidents.
  5. Open an issue with the prompt + outcome + your guess at what went wrong.
  6. Update LEARNING.md with what confused you. Future-you will thank present-you.