Project Jarvis — The Playbook

Under the Hood

How Jarvis Works

Every message you send travels through a precise chain of services — from your phone to an AI brain and back in under 10 seconds.

📱 WhatsApp Ted's phone

→

📡 Twilio Webhook bridge

→

⚙️ Railway server.js 24/7

→

🧠 Claude Sonnet 4.6

→

🔧 Tool Loop 19 tools, 15 iterations

→

🗄️ SQLite Last 20 turns

→

💬 Reply 2–10 seconds

Identity Layer

soul.md v3.3

Cached at boot — personality, protocol, file paths, 16,487 chars

Working Memory

context.md

Evolving memory on Railway Volume, rewrites every 50 messages

Short-Term Memory

SQLite

Last 20 conversation turns, persisted on Railway Volume

Task Intelligence

Task Classifier

7 classes, class-aware loop limits, multi-keyword triggers

Model Router

4 Models

Sonnet (default), Opus ("go deep"), GPT-4o, GPT-4o-mini (/^quick\b/i)

Source of Truth

trophy-registry.json

19 live tools tracked and auto-deployed via Tool Deployment Protocol

// The full message path — Phase 9
WhatsApp message arrives
  → $ commands check → cost cap check → model selection
  → Task Classifier (7 classes, multi-class, loop limits assigned)
  → soul.md v3.3 (cached at boot — identity + protocol)
  → context.md (fresh read — evolving memory)
  → SQLite last 20 turns loaded
  → Claude API called with 19 tools defined
  → Tool Loop (up to 15 iterations for tool_build)
  → Step 5.5: read-back verification after every commit
  → Error alerting: crash handler + 90s timeout guard
  → Save to SQLite → log tokens → Twilio reply sent
  → Total: 2–10 seconds
      

April 19 – April 26, 2026

10 Days. 9 Phases.

From a blank GitHub repo to a fully autonomous AI agent — built entirely over WhatsApp and late-night sessions.

<\!-- Phase 0 -->

Phase 0

Infrastructure Day — The Foundation

April 19, 2026

Before a single message could be sent, the entire scaffolding had to exist. GitHub repo created. Netlify connected. Railway spun up. The skeleton server.js written from scratch. This was zero-to-alive in a single day.

GitHub Netlify Railway server.js skeleton

The Move Build the whole pipeline first — even if it does nothing. A working deployment loop (write → push → live) is the foundation everything else runs on.

<\!-- Phase 1 -->

Phase 1

First Contact — WhatsApp Loop Working

April 19, 2026

The first time Ted texted Jarvis and got a reply. WhatsApp → Twilio → Railway webhook → Claude API → back. The loop closed. This is the moment Jarvis became real.

WhatsApp Twilio Webhook First AI Reply

The Move Don't add features until the basic loop works. A broken hello-world teaches you nothing. A working one teaches you everything.

<\!-- Phase 2 -->

Phase 2

The Soul — Memory, Identity, Intelligence Routing

April 19, 2026

Jarvis got a personality (soul.md v2.0), persistent memory (SQLite), a multi-model router so different questions route to different AI models, and cost tracking so it never goes over budget.

soul.md v2.0 SQLite memory Multi-model router Cost tracking

The Move An agent without persistent memory is just a chatbot. Memory + identity is the line between a toy and a tool.

<\!-- Phase 3 -->

Phase 3

The Arms — GitHub Tools Born

April 19, 2026

Jarvis got its hands. GitHub PAT connected. Four site manager tools deployed: read, write, list, and commit. For the first time, Jarvis could reach into the codebase and make real changes — not just talk about them.

GitHub PAT read_file write_file list_files

The Move Intelligence without action is just advice. The moment Jarvis could write code, it became a builder, not a chatbot.

<\!-- Phase 4 -->

Phase 4

Cost Engineering — 74% Cheaper Overnight

April 20, 2026

Prompt caching implemented — soul.md cached at boot, never re-sent with each message. Context auto-rewrite added. Five new tools deployed. Monthly costs dropped 74% without touching capabilities.

Prompt caching 74% cost reduction Context auto-rewrite 5 new tools

The Move The soul.md is big and never changes. Cache it. The math was obvious once someone asked the question — $7/month instead of $27/month for identical output.

<\!-- Phase 5 -->

Phase 5

The Showcase — Jarvis Builds Its Own Page

April 20, 2026

Jarvis.html deployed as the public showcase page. Autonomous index.html patch — Jarvis read the homepage, wrote the card, and committed the change without Ted touching a single file. Jarvis built its own portfolio entry.

jarvis.html Autonomous patch Self-deployed

The Move If you have to tell your agent what it built, you've already lost. Jarvis adding itself to the homepage was the first proof it could operate end-to-end without handholding.

<\!-- Phase 6 -->

Phase 6

The System — Classifier, Protocol, Trophy Wall

April 21, 2026

Jarvis became a system, not just a script. Task classifier added — every message gets categorized before a single tool fires. Tool Deployment Protocol written: a mandatory 7-step process for adding new tools. Trophy Wall launched to track what's live.

Task classifier 7 classes Tool Deployment Protocol Trophy Wall

The Move Systems beat willpower. The Protocol meant Ted never had to babysit a tool deployment again — Jarvis had its own checklist and followed it.

<\!-- Phase 7 -->

Phase 7

Infrastructure Maturity — Code Split, SMS Ready

April 22, 2026

server.js split into server.js + tool-handlers.js — 600+ lines refactored for maintainability. Reconnaissance cap added to prevent Jarvis from reading 10 files before writing one. Real Twilio phone number purchased (+1 949 239 1134). A2P 10DLC registration submitted.

server.js split Recon cap SMS number A2P submitted

The Move A 600-line file that works is technical debt waiting to hurt you. Split before it becomes surgery.

<\!-- Phase 8 -->

Phase 8

Stability — Loop Limits, Alerting, Locked Paths

April 24, 2026

Class-aware loop limits added — tool_build gets 15 iterations, a simple question gets 6. Loop runaway protection. Ping tool built for health checks. Error alerting added — if Jarvis crashes or times out, Ted gets a text. File paths locked permanently in context.md.

LOOP_LIMITS Class-aware limits ping tool Error alerting Locked file paths

The Move An agent that can loop forever will, eventually. Loop limits are the seatbelt — you never notice it until you need it.

<\!-- Phase 9 -->

Phase 9 — Current

The Munger — Pre-Mortem Engine + System Hardening

April 26, 2026

Tool #16 deployed: the Munger — a dual-tone pre-mortem inversion engine named after Charlie Munger's "invert, always invert" principle. Asks: what would have to be true for this to fail spectacularly? Quick scope (5 failure modes) or full scope (10 modes + prevention plan). Model router regex anchored to prevent keyword collisions. Step 5.5 added to Tool Deployment Protocol — mandatory read-back verification after every commit, catching the ghost commit failure mode. soul.md updated to v3.3. Netlify MCP connected.

munger tool (#16) Dual-tone pre-mortem Model router fix Step 5.5 Ghost commit protection soul.md v3.3 Netlify MCP

The Move Jarvis self-diagnosed two failures this session and asked permission to fix them. The accountability loop — built into the system via Step 5.5 — is now producing real error detection without Ted having to go looking for it.

The Arsenal

16 Live Tools

Every tool Jarvis can use autonomously. Built, tested, and verified — each one deployed through the Tool Deployment Protocol.

📖

read_file

Read any file in the GitHub repository — source code, config, or memory files.

Phase 3

✍️

write_file

Create or fully overwrite files. The starting point for any new build.

Phase 3

🔪

patch_file

Surgical find-and-replace within a file. Precise edits without touching what works.

Phase 3

📂

list_files

List any directory in the repo. Reconnaissance before building.

Phase 3

🔄

update_context

Rewrite context.md — Jarvis's evolving working memory on Railway Volume.

Phase 4

💾

checkpoint_context

Push a context snapshot to GitHub. Permanent memory backup.

Phase 4

⚡

multi_patch

Batch file edits in a single call. One operation, multiple changes.

Phase 4

🌐

fetch_url

Fetch any webpage up to 15K characters. Live data for context-aware decisions.

Phase 4

🔍

search_files

Search across all repo content. Find patterns, references, or any string.

Phase 4

🗑️

delete_file

Remove files from the repo. Cleanup is a first-class operation.

Phase 4

🏷️

read_site_meta

Read live site metadata from the deployed Netlify site — verify deployments without screenshots.

Phase 5

🔎

web_search

Search the web for real-time information. Documentation, news, pricing — anything public.

Phase 6

🚀

get_deploy_status

Check Netlify deploy status. Confirm the site is live before reporting success.

Phase 6

🏓

ping

Protocol smoke test — returns pong. Verify the entire pipeline is breathing.

Phase 8

🚨

error_alerting

Crash handler + 90-second timeout guard. If Jarvis goes down, Ted gets a text — immediately.

Phase 8

🧠

munger

Pre-mortem inversion engine. Inverts any plan to surface failure modes before they happen. Quick scope (5 modes) or full scope (10 modes + prevention). Dual-tone format.

Phase 9 — Latest

Key Discoveries

What Got Learned the Hard Way

Every innovation here came from something breaking, or from asking "why aren't we doing this yet?"

👻

The Ghost Commit Failure Mode

Jarvis would describe code changes in markdown — looking like a successful commit — without actually calling patch_file. The code appeared to be written. GitHub said otherwise. Step 5.5 was born: after every commit, read the file back and confirm the change is actually there before reporting success. Fake commit hashes (like abc1234def) are now a known tell.

🔧

Model Router Regex Anchoring

The model router used a simple string match for "quick" — meaning "Quick scope" (a munger command) accidentally routed to GPT-4o-mini instead of Sonnet. Fixed by anchoring: /^quick\b/i — "quick" must be the first word of the message, with a word boundary after it. Classifier keywords can silently conflict with tool command syntax unless you're surgical about regex.

💰

Prompt Caching — 74% Cost Reduction

soul.md is 16,487 characters and never changes between messages. Re-sending it every call was burning money. Caching it at boot — using Anthropic's cache_control content blocks — cut monthly costs from ~$27 to ~$7. Same output, same capabilities, same speed. The optimization cost two hours. The savings are permanent.

🧠

The Munger Protocol

Named after Charlie Munger's "invert, always invert" principle. Before building anything significant, ask: what would have to be true for this to fail spectacularly? The munger tool does this systematically — surfaces failure modes you wouldn't have thought to look for. Quick scope (5 modes) for fast sanity checks, full scope (10 modes + prevention plan) before major builds.

👴

The Boomer-Proof Design Principle

Engineer for the least technical user first — always. If the system confuses a non-technical person, that's a design failure, not a user failure. This principle — coined during the Agent 2 design session — applies to everything: onboarding flows, error messages, command syntax. If a pitcher's wife can use it without a tutorial, it's ready. If she can't, it isn't.

📐

The Agent Template Pattern

One Railway server. One codebase. N agents — each with a unique soul.md. Phone number routing (the From field in the Twilio webhook) determines which agent's identity loads. No duplicate infrastructure. Agent 2 (wife's Jarvis) requires zero new servers — just a new soul.md and a route entry. This is the architecture for a scalable agent business.

🏗️

Cowork vs. Jarvis — Never Mix Layers

Cowork (Claude Code / planning layer) handles strategy, architecture, and design decisions. Jarvis handles execution — writing code, patching files, deploying. Mixing them creates confusion: you'd be asking the builder to critique the blueprint mid-pour. Two layers with clean separation means faster decisions and cleaner builds.

🔄

Class-Aware Loop Limits

Not all tasks need the same number of tool calls. A simple question needs 6 iterations max. Building a new tool needs 15. Giving every task the same ceiling wastes resources on easy tasks and caps complex ones. The task classifier assigns the right limit before the first tool fires — the system is efficient by design, not by luck.

What This Becomes

The Business Case

Jarvis proved the model works. Now the model becomes a product.

"I build AI agents for your business.
You get a number customers can text.
It answers questions, books appointments, qualifies leads, never calls in sick."

Setup in 1 week. Running for $50–200/month. Built on the same stack that powers Jarvis.

Tier 1

WhatsApp FAQ Bot

Answers common customer questions, routes to the right person, never misses a message. Perfect for any business with repetitive inbound questions.

$500–1K setup

+ $50/mo retainer

Tier 2

WhatsApp Booking Agent

Full conversational booking flow. Qualifies leads, books appointments, sends confirmations, follows up. Runs while you sleep.

$2–3.5K setup

+ $150/mo retainer

Tier 3

Custom AI Agent

Full-stack agent with custom tools, CRM integration, memory, and personality. Built to spec. Unique soul.md. Scales to your business.

$4–6K setup

+ $200/mo retainer

Phase 10 — What's Next In Progress

→ Multi-user routing — Route by phone number. One server, N agents, each with their own soul.md. First test: wife's Jarvis (Agent 2).
→ 12-question soul intake — New agent onboarding flow. Answer 12 questions, get a custom soul.md generated. Boomer-proof by design.
→ Shared context pool — /data/shared/context.md. Both agents read and write it — they feel like they "know each other."
→ Admin commands — $ users lists all active agents. $ user [name] shows their context. Ted stays in control.
→ Observability tools — get_railway_logs, get_netlify_deploy, verify_commit. Jarvis can self-diagnose infrastructure issues without Ted's screenshots.
→ A2P approval — Flip TWILIO_PHONE_NUMBER to +19492391134. Real SMS instead of sandbox. Full production deployment.