DESIGNVAULT

OVERVIEW

AI GAME DEVELOPMENT PIPELINE

Agent Harness Engineering for Game Production

DesignVault is an AI Agent workflow system for game development. It is not just a documentation vault, but a shared working surface where human designers, coding agents, and project knowledge can collaborate around the same source of current truth.

In AI-assisted production, short tasks are often fast, but long-running projects introduce drift: context changes across threads, design and implementation diverge, and agents may redesign while executing. DesignVault turns that instability into a maintainable workflow: design converges first, execution has boundaries, validation produces evidence, and confirmed changes are written back.

Knowledge Layer

A retrieval-friendly Wiki Truth layer adapted from Karpathy's LLM Wiki idea.

Agent Harness

Skills, phase packets, execution logs, handoffs, acceptance, and repair loops.

Production Workflow

Three lanes:
/design, /execute, and /bug.

GitHub Repository

GENREAI Workflow

YEAR2026

AI AgentLLM WikiAgent HarnessContext Engineering

MY ROLETechnical Designer / Workflow Designer (Individual)

COLLABORATORSIndividual Project

PLATFORMCodex, Obsidian, GitHub, Unity MCP-compatible workflow

Gallery

DESIGN DETAILS

1. LLM Wiki + Spec Coding

DesignVault is directly inspired by Karpathy's LLM Wiki concept: instead of relying only on chat history, the LLM works with a searchable, curated, and rule-bound knowledge system.

WIKI TRUTH

Stable rules, system definitions, UI responsibilities, design boundaries, and terminology.

LONGFORM

Design reasoning, tradeoffs, open questions, and convergence before implementation.

EXECUTION PLAN

Implementation phases, required context, validation methods, and stop conditions.

Core principle: Longform shapes the design, Wiki stores the current truth, and Execution Plan drives implementation.

2. Agent Harness Engineering

The core of DesignVault is not a longer prompt, but an Agent harness that makes AI work repeatable, recoverable, and verifiable. The harness turns human intent into executable context, constrains agent autonomy within the right boundaries, and preserves state at key checkpoints.

The harness design also references ideas from the OpenAI Agents SDK: agents, tools / handoffs, guardrails, sessions, human-in-the-loop control, tracing, and MCP tool calling. DesignVault adapts these agent-engineering ideas into a file-based workflow for game development.

Skill Layering

Reusable skills such as designvault-design, designvault-execute, designvault-bug, designvault-ui-handoff, and designvault-wiki-maintain. Each skill owns a workflow lane, references, scripts, and handoff expectations.

Execution State Machine

/execute is modeled as preflight -> phase -> acceptance -> complete. The parent harness decides the next action, while child agents only execute bounded phase tasks.

Context Packet Generation

Phase packets package required Wiki pages, code entry points, success criteria, non-goals, validation methods, and stop conditions before agent execution.

Structured Output Normalization

Agent output is normalized into completed work, changed files, verification evidence, risks, Wiki writeback needs, and whether execution should stop.

Machine Evidence Adapter

The harness separates agent claims from machine evidence. Completion requires acceptance evidence such as compile results, console status, targeted tests, or Unity Editor evidence.

Stop Protocol + Repair Loop

When plan, Wiki, or implementation context conflicts, the workflow returns a Decision Packet. When acceptance finds a mismatch, it enters a bounded repair loop instead of silently redesigning.

3. Why It Improves Production

DesignVault is designed around one production reality: most studios already have some form of wiki or design documentation, but those documents often receive the most attention during pre-production. Once implementation starts, maintaining the wiki becomes expensive, designers do not want to constantly rewrite tool docs, and programmers rarely have time to read every design page in full. This is exactly the kind of coordination work an AI agent can absorb.

Design: faster idea clarification

The /design lane uses a Socratic questioning style: instead of asking the designer to write a complete spec upfront, the agent interviews them about edge cases, player feedback, system boundaries, UI surfaces, and failure cases. A rough idea becomes a clearer design contract faster.

Documentation: lower human maintenance cost

DesignVault lets agents read, summarize, index, and update workflow documents. Humans still make design decisions, but agents take over the repetitive burden of keeping Wiki Truth, tool notes, API-style docs, and execution traces searchable.

Implementation: not vibe coding

AI-assisted development is becoming a default choice for many developers, but DesignVault avoids unbounded vibe coding. It follows the same direction recommended in agentic coding practice: explore first, plan before implementation, keep context specific, and give the agent verification criteria.

Validation: separate doing from judging

Execution and acceptance are separated. A phase executor implements within a bounded context, while an acceptance pass checks the result against Wiki Truth, the plan, tests, console output, or Unity evidence. This reduces the risk that the same agent both creates and over-trusts its own work.

This maps directly to current agent engineering practice: OpenAI's Agents SDK emphasizes orchestration, state, approvals, guardrails, handoffs, tracing, and evaluation; Anthropic's Claude Code guidance recommends verification, separating exploration/planning from coding, aggressive context management, and using subagents for investigation.

4. Three Workflow Lanes

/design

Used when the design is not yet stable. The agent reads minimal Wiki truth, asks clarifying questions, then produces a longform draft, Wiki updates, and an executable implementation plan.

/execute

Used when the design is stable and ready to implement. It reads the implementation plan, runs preflight, executes phase by phase, performs acceptance, and writes confirmed changes back to the Wiki.

/bug

Used for concrete observed issues. It starts from the symptom, reads minimal truth, performs a narrow fix and validation, and stops if the root cause is design ambiguity.

5. References

Karpathy: LLM Wiki Raw sources / Wiki / Schema pattern. OpenAI Agents SDK Agent loops, handoffs, guardrails, sessions, tracing, and MCP tool calling. Claude Code Best Practices Verification, explore-plan-code workflow, context management, and subagent investigation. DesignVault Repository Public reusable workflow shell with skills, starter vault, and execution scripts.