Dark Factory transforms rough feature ideas into structured implementation artifacts, review workflows, and reproducible engineering outputs — the way infrastructure should.
Most teams have already wired a chat model into their IDE. What they haven't built is the orchestration layer above it. Without one, the same problems compound across every feature.
Decisions live in chat threads. Two weeks later, no one remembers which version of "the plan" actually shipped.
Every prompt re-explains the problem. Architectural constraints, ADRs, and prior trade-offs are continually re-discovered.
The same request, an hour later, returns a different design. There is no run ID, no seed, no trace of what produced what.
Reviewers see code, not reasoning. There is no diff at the spec layer, no verdict beyond LGTM, no way to compare alternatives.
Dark Factory treats every feature as a pipeline run. Inputs are structured. Outputs are versioned. Agents are specialised. Reviewers get a first-class surface, not a comment thread.
Every run produces SPEC.md, ARCH.md, QA.md, TASKS.md, tasks.json, RUN_REPORT.md. The set is fixed; deviations are flagged as quality failures, not silent skips.
PM, ARCH, BE, FE, QA, SEC, UXW. Each has explicit responsibilities, non-goals, and synthesis variants. Configurable per template.
Iterative quality modes critique their own output and revise. Each cycle is preserved with its feedback, score, and verdict.
Human review is a first-class state, with ACCEPTED / NEEDS_CHANGES verdicts that feed back into the next revise loop.
Every run is a state machine. State transitions are recorded, scored, and replayable. Failed gates surface as actionable diffs, not free-form error text.
The surface is designed around the same primitives an SRE expects: runs, traces, quality gates, replay, audit, budget.
14 configurable agent roles with explicit responsibilities and synthesis variants.
Fast, balanced, or thorough. Thorough adds critique rounds with explicit cycle limits.
First-class ACCEPTED / NEEDS_CHANGES states that feed the next revise cycle.
Swap providers per agent. LM Studio, Ollama, OpenAI, Anthropic, or deterministic mock.
Run the full pipeline against a local LLM. Air-gapped reproducibility on a workstation.
Run artifacts become PRs with linked issues. Rate-limit aware, branch-anchored.
Dispatch run completions, request reviews, monitor inbound events from a visibility panel.
Two-up diff at the artifact level. Deltas in score, tokens, calls, and verdicts.
Quality and efficiency over time. Template performance. Top failing checks.
Seeded runs, deterministic mock mode, full provider-call trace per stage.
A real run from the console. Every cell — score, verdict, agents, cost — is anchored to a run ID and can be diffed, replayed, or published.
Reviewer feedback is a typed input that drives the next cycle. Every cycle is preserved with its score, verdict, and the feedback that produced the change — not just the change itself.
Dark Factory is applied AI systems engineering — not a chatbot productivity app, not a prompt library, not a copilot wrapper.
It is the orchestration layer for teams who treat AI like infrastructure.