Factory

The Autonomous Factory

A multi-LLM orchestrator that builds production software with zero human intervention. Not a wrapper around ChatGPT — a full development pipeline with role-based delegation, structured communication, TDD enforcement, and cross-project learning.

ClaudeKimiGeminiCodexphi4Qwen CoderPythonDaC ProtocolTDD

# Architecture

pipeline.md

USER: "Build X"
  |
  v
BLUEPRINT CREATION
  |-- Load User DNA -> skip known decisions
  |-- Domain research -> auto-apply best practices
  `-- Generate modular blueprint (CORE + PROJECT)
  |
  v
BLUEPRINT REVIEW (parallel DaC)
  |-- Kimi QC + Gemini Architecture
  |-- Auto-patch via SEARCH/REPLACE
  `-- Target: both >= 95/100
  |
  v
HUMAN GATE (Codex as Owner's Twin)
  `-- Decides using owner's DNA -> ACCEPT/CHANGE/REJECT
  |
  v
WAVE EXECUTION (TDD per task)
  |-- AC -> RED -> GREEN -> REFACTOR
  |-- GATES: Bug Capture + Schema + Security
  `-- Circuit breaker: 2 TRAP -> rollback
  |
  v
CODE QC (Kimi reviews all code)
  |-- Score 0-100, must pass >= 85%
  `-- Dead code check, contract alignment
  |
  v
FINAL VERIFICATION (Codex as Human Twin)
  |-- Tries to break the app
  |-- Tests edge cases, bad data, missing links
  `-- Score >= 90% to ship
  |
  v
LEARNING -> Save DNA + domain traps + new rules

org-chart.md

OWNER
  |  Direction, final authority
  v
CLAUDE (CEO)
  |
  |-- KIMI (QC Director)
  |   Bug capture, quality audits
  |   Score-gated: >= 95% to proceed
  |
  |-- GEMINI (Architecture Director)
  |   Structure, security, contracts
  |   Cross-reviewer (different failures)
  |
  |-- CODEX (Human Twin)
  |   Loaded with the owner's DNA:
  |   decision logic, heuristics, redlines.
  |   Decides like the owner would.
  |   Read-only. NEVER writes files.
  |
  |-- PHI4 (Local Assistant)
  |   DaC parsing, routing, summaries
  |
  `-- QWEN CODER (Junior Dev)
      Boilerplate, scaffolding

# Watch It Build

A factory run in real time — from blueprint to production.

factory run

*Loading blueprint... "PPF Workshop Monitoring"|

Each step runs autonomously. The factory decides, reviews, tests, and ships — no human input between start and finish.

# Three Modes

CREATE

Full lifecycle. Greenfield project from requirements to deployed app. Blueprint, review, TDD build, gate swarm, ship.

Trigger: "Build X", "Create X"

AUDIT

Forensic review of existing code. No code generation. Recon, self-audit, external QC, save learnings, report.

Trigger: "Audit X", "Review X"

UPDATE

Delta blueprint for existing projects. Only change what is needed. Targeted fixes, new features, cleanup.

Trigger: "Update X", "Add Y to X"

# By the Numbers

Projects Built

1,700+

Tests Passing

172

Factory Learnings

Cost Per Project

Production Systems

Operating Modes

# What It Shipped

Production systems built by the factory, running in the real world.

PPF Monitoring

Production

IoT SaaS for automotive workshops. ESP32 sensors, MQTT telemetry, live video streaming, customer tracking portal. Real workshops, real customers.

Mushroom Ki Mandi

Production

Smart mushroom farming platform. Climate monitoring, relay automation, growth stage tracking, grower-to-buyer marketplace. Real farms, real hardware.

io-gita

Live

A physics engine for the mind. Ancient philosophy meets computational dynamics. Describe a dilemma, get deterministic guidance. Zero hallucination.

See all use cases →

# Evolution

How the factory got smarter, project by project.

P1–P3 · Foundation

Core Pipeline Established

Blueprint → build → test. Raw SQL + SQLite. First learnings captured. Discovered the orchestrator was rubber-stamping its own work — introduced independent review.

P4–P6 · Maturation

Quality Scores + Multi-LLM Review

Kimi + Gemini scoring introduced. JWT/RBAC solidified. First healthcare project (MedVault). Multi-model review catches failures single models miss.

P7–P9 · Real-time

WebSocket + Search + Geolocation

LivePulse (real-time chat), RecipeForge (full-text search), FleetTracker (GPS geofencing). The factory learned to handle multiple protocol types.

P10–P12 · Scale

DLQ + E2E Testing + First Frontends

Dead letter queues, Playwright E2E, vanilla JS frontends. TeamForge: full project management with WebSocket. Test coverage grew significantly.

P13–P15 · Advanced

React + gRPC + PostgreSQL + Redis + C++

FleetCore (hot state, collision avoidance), WareFlow (full warehouse lifecycle), FleetBridge (gRPC + C++17 robot simulator). Multi-language, multi-protocol.

P16–P18 · Multi-protocol

MQTT + gRPC + WebSocket Combined

GridSense: 4 protocols in one project (REST + gRPC + MQTT + WebSocket). PostgreSQL + asyncpg. Energy metering with billing precision.

P19–P20 · Self-Audit

The Factory Audits Itself

AUDIT mode found dead code and untested modules. UPDATE mode cleaned it all up. The factory improved the factory.

P21 · Production

PPF Monitoring Goes Live

First production IoT SaaS. Real workshops, real ESP32 sensors, real MQTT, real customers. Hardware + software from one prompt.

P22–P24 · Research + Product

Semantic Gravity → io-gita Goes Live

Non-token reasoning engine built from Bhagavad Gita concepts. Hopfield attractor networks, ODE dynamics. Then shipped as a live product at io-gita.com.

# The Factory Updates the Factory

Self-healing — it audits its own code and fixes what it finds.

P19 - Self-Audit

The factory ran its own AUDIT mode on itself. Found dead code, untested modules, and critical issues. The factory that builds software couldn't pass its own quality gate.

Mode: AUDIT

P20 - Self-Update

The factory ran UPDATE mode on itself using P19's findings as input. Cleaned dead code, added new tests. The factory fixed the factory.

Mode: UPDATE

This is the moment it stopped being a tool and started being a system. A factory that can audit itself, find its own bugs, and ship its own fixes — autonomously.

# What It Learned to Never Do Again

R01Never skip review scores — checkpoints enforce quality before proceeding

R02Never trust "tests pass" = "app works" — run real server verification

R03Never claim "fixed" without showing output — CLAIMED != VERIFIED

R04Never build services without wiring them — dead code check at every gate

R05Never let Claude review its own work — Codex replaces self-review

R06Never use float/double for money — string-encoded decimals for billing