<\!DOCTYPE html> Project Jarvis — The Playbook <\!-- ─── NAV ─── --> <\!-- ─── HERO ─── -->
Ted Stuka  •  April 2026  •  Orange County, CA

Project Jarvis

A personal AI agent — built from scratch in 10 days. From zero infrastructure to a fully autonomous system with 19 live tools, persistent memory, and a personality to match.

10 Days to Build
9 Phases Complete
19 Live Tools
$7 Per Month
The North Star — April 16, 2026 "
I want Jarvis to be my intellectual sparring mate that has the arms and features to build with and for me, check my work, challenge my thoughts, and fuel my innovation. On top of that, I want it to sound like a best friend or brother but is also happy to be critical when necessary. I want Jarvis to learn my dry humor and see my momentum — capture it and feed it.
— Ted Stuka, April 16, 2026
<\!-- ─── STATS BAR ─── -->
Apr 19
First Line of Code
Apr 26
Phase 9 Complete
74%
Cost Reduction (Phase 4)
16
Autonomous Tools
~$7
Monthly Run Cost
<\!-- ─── ARCHITECTURE ─── -->
Under the Hood

How Jarvis Works

Every message you send travels through a precise chain of services — from your phone to an AI brain and back in under 10 seconds.

📱 WhatsApp Ted's phone
📡 Twilio Webhook bridge
⚙️ Railway server.js 24/7
🧠 Claude Sonnet 4.6
🔧 Tool Loop 19 tools, 15 iterations
🗄️ SQLite Last 20 turns
💬 Reply 2–10 seconds
Identity Layer
soul.md v3.3
Cached at boot — personality, protocol, file paths, 16,487 chars
Working Memory
context.md
Evolving memory on Railway Volume, rewrites every 50 messages
Short-Term Memory
SQLite
Last 20 conversation turns, persisted on Railway Volume
Task Intelligence
Task Classifier
7 classes, class-aware loop limits, multi-keyword triggers
Model Router
4 Models
Sonnet (default), Opus ("go deep"), GPT-4o, GPT-4o-mini (/^quick\b/i)
Source of Truth
trophy-registry.json
19 live tools tracked and auto-deployed via Tool Deployment Protocol
// The full message path — Phase 9 WhatsApp message arrives$ commands check → cost cap check → model selection → Task Classifier (7 classes, multi-class, loop limits assigned) → soul.md v3.3 (cached at boot — identity + protocol) → context.md (fresh read — evolving memory) → SQLite last 20 turns loaded → Claude API called with 19 tools defined → Tool Loop (up to 15 iterations for tool_build) → Step 5.5: read-back verification after every commit → Error alerting: crash handler + 90s timeout guard → Save to SQLite → log tokens → Twilio reply sent → Total: 2–10 seconds
<\!-- ─── PHASES ─── -->
April 19 – April 26, 2026

10 Days. 9 Phases.

From a blank GitHub repo to a fully autonomous AI agent — built entirely over WhatsApp and late-night sessions.

<\!-- Phase 0 -->
Phase 0
Infrastructure Day — The Foundation
April 19, 2026
Before a single message could be sent, the entire scaffolding had to exist. GitHub repo created. Netlify connected. Railway spun up. The skeleton server.js written from scratch. This was zero-to-alive in a single day.
GitHub Netlify Railway server.js skeleton
The Move Build the whole pipeline first — even if it does nothing. A working deployment loop (write → push → live) is the foundation everything else runs on.
<\!-- Phase 1 -->
Phase 1
First Contact — WhatsApp Loop Working
April 19, 2026
The first time Ted texted Jarvis and got a reply. WhatsApp → Twilio → Railway webhook → Claude API → back. The loop closed. This is the moment Jarvis became real.
WhatsApp Twilio Webhook First AI Reply
The Move Don't add features until the basic loop works. A broken hello-world teaches you nothing. A working one teaches you everything.
<\!-- Phase 2 -->
Phase 2
The Soul — Memory, Identity, Intelligence Routing
April 19, 2026
Jarvis got a personality (soul.md v2.0), persistent memory (SQLite), a multi-model router so different questions route to different AI models, and cost tracking so it never goes over budget.
soul.md v2.0 SQLite memory Multi-model router Cost tracking
The Move An agent without persistent memory is just a chatbot. Memory + identity is the line between a toy and a tool.
<\!-- Phase 3 -->
Phase 3
The Arms — GitHub Tools Born
April 19, 2026
Jarvis got its hands. GitHub PAT connected. Four site manager tools deployed: read, write, list, and commit. For the first time, Jarvis could reach into the codebase and make real changes — not just talk about them.
GitHub PAT read_file write_file list_files
The Move Intelligence without action is just advice. The moment Jarvis could write code, it became a builder, not a chatbot.
<\!-- Phase 4 -->
Phase 4
Cost Engineering — 74% Cheaper Overnight
April 20, 2026
Prompt caching implemented — soul.md cached at boot, never re-sent with each message. Context auto-rewrite added. Five new tools deployed. Monthly costs dropped 74% without touching capabilities.
Prompt caching 74% cost reduction Context auto-rewrite 5 new tools
The Move The soul.md is big and never changes. Cache it. The math was obvious once someone asked the question — $7/month instead of $27/month for identical output.
<\!-- Phase 5 -->
Phase 5
The Showcase — Jarvis Builds Its Own Page
April 20, 2026
Jarvis.html deployed as the public showcase page. Autonomous index.html patch — Jarvis read the homepage, wrote the card, and committed the change without Ted touching a single file. Jarvis built its own portfolio entry.
jarvis.html Autonomous patch Self-deployed
The Move If you have to tell your agent what it built, you've already lost. Jarvis adding itself to the homepage was the first proof it could operate end-to-end without handholding.
<\!-- Phase 6 -->
Phase 6
The System — Classifier, Protocol, Trophy Wall
April 21, 2026
Jarvis became a system, not just a script. Task classifier added — every message gets categorized before a single tool fires. Tool Deployment Protocol written: a mandatory 7-step process for adding new tools. Trophy Wall launched to track what's live.
Task classifier 7 classes Tool Deployment Protocol Trophy Wall
The Move Systems beat willpower. The Protocol meant Ted never had to babysit a tool deployment again — Jarvis had its own checklist and followed it.
<\!-- Phase 7 -->
Phase 7
Infrastructure Maturity — Code Split, SMS Ready
April 22, 2026
server.js split into server.js + tool-handlers.js — 600+ lines refactored for maintainability. Reconnaissance cap added to prevent Jarvis from reading 10 files before writing one. Real Twilio phone number purchased (+1 949 239 1134). A2P 10DLC registration submitted.
server.js split Recon cap SMS number A2P submitted
The Move A 600-line file that works is technical debt waiting to hurt you. Split before it becomes surgery.
<\!-- Phase 8 -->
Phase 8
Stability — Loop Limits, Alerting, Locked Paths
April 24, 2026
Class-aware loop limits added — tool_build gets 15 iterations, a simple question gets 6. Loop runaway protection. Ping tool built for health checks. Error alerting added — if Jarvis crashes or times out, Ted gets a text. File paths locked permanently in context.md.
LOOP_LIMITS Class-aware limits ping tool Error alerting Locked file paths
The Move An agent that can loop forever will, eventually. Loop limits are the seatbelt — you never notice it until you need it.
<\!-- Phase 9 -->
Phase 9 — Current
The Munger — Pre-Mortem Engine + System Hardening
April 26, 2026
Tool #16 deployed: the Munger — a dual-tone pre-mortem inversion engine named after Charlie Munger's "invert, always invert" principle. Asks: what would have to be true for this to fail spectacularly? Quick scope (5 failure modes) or full scope (10 modes + prevention plan). Model router regex anchored to prevent keyword collisions. Step 5.5 added to Tool Deployment Protocol — mandatory read-back verification after every commit, catching the ghost commit failure mode. soul.md updated to v3.3. Netlify MCP connected.
munger tool (#16) Dual-tone pre-mortem Model router fix Step 5.5 Ghost commit protection soul.md v3.3 Netlify MCP
The Move Jarvis self-diagnosed two failures this session and asked permission to fix them. The accountability loop — built into the system via Step 5.5 — is now producing real error detection without Ted having to go looking for it.
<\!-- ─── TOOLS TROPHY WALL ─── -->
The Arsenal

16 Live Tools

Every tool Jarvis can use autonomously. Built, tested, and verified — each one deployed through the Tool Deployment Protocol.

📖
01
read_file
Read any file in the GitHub repository — source code, config, or memory files.
Phase 3
✍️
02
write_file
Create or fully overwrite files. The starting point for any new build.
Phase 3
🔪
03
patch_file
Surgical find-and-replace within a file. Precise edits without touching what works.
Phase 3
📂
04
list_files
List any directory in the repo. Reconnaissance before building.
Phase 3
🔄
05
update_context
Rewrite context.md — Jarvis's evolving working memory on Railway Volume.
Phase 4
💾
06
checkpoint_context
Push a context snapshot to GitHub. Permanent memory backup.
Phase 4
07
multi_patch
Batch file edits in a single call. One operation, multiple changes.
Phase 4
🌐
08
fetch_url
Fetch any webpage up to 15K characters. Live data for context-aware decisions.
Phase 4
🔍
09
search_files
Search across all repo content. Find patterns, references, or any string.
Phase 4
🗑️
10
delete_file
Remove files from the repo. Cleanup is a first-class operation.
Phase 4
🏷️
11
read_site_meta
Read live site metadata from the deployed Netlify site — verify deployments without screenshots.
Phase 5
🔎
12
web_search
Search the web for real-time information. Documentation, news, pricing — anything public.
Phase 6
🚀
13
get_deploy_status
Check Netlify deploy status. Confirm the site is live before reporting success.
Phase 6
🏓
14
ping
Protocol smoke test — returns pong. Verify the entire pipeline is breathing.
Phase 8
🚨
15
error_alerting
Crash handler + 90-second timeout guard. If Jarvis goes down, Ted gets a text — immediately.
Phase 8
🧠
16
munger
Pre-mortem inversion engine. Inverts any plan to surface failure modes before they happen. Quick scope (5 modes) or full scope (10 modes + prevention). Dual-tone format.
Phase 9 — Latest
<\!-- ─── INNOVATIONS ─── -->
Key Discoveries

What Got Learned the Hard Way

Every innovation here came from something breaking, or from asking "why aren't we doing this yet?"

👻
The Ghost Commit Failure Mode
Jarvis would describe code changes in markdown — looking like a successful commit — without actually calling patch_file. The code appeared to be written. GitHub said otherwise. Step 5.5 was born: after every commit, read the file back and confirm the change is actually there before reporting success. Fake commit hashes (like abc1234def) are now a known tell.
🔧
Model Router Regex Anchoring
The model router used a simple string match for "quick" — meaning "Quick scope" (a munger command) accidentally routed to GPT-4o-mini instead of Sonnet. Fixed by anchoring: /^quick\b/i — "quick" must be the first word of the message, with a word boundary after it. Classifier keywords can silently conflict with tool command syntax unless you're surgical about regex.
💰
Prompt Caching — 74% Cost Reduction
soul.md is 16,487 characters and never changes between messages. Re-sending it every call was burning money. Caching it at boot — using Anthropic's cache_control content blocks — cut monthly costs from ~$27 to ~$7. Same output, same capabilities, same speed. The optimization cost two hours. The savings are permanent.
🧠
The Munger Protocol
Named after Charlie Munger's "invert, always invert" principle. Before building anything significant, ask: what would have to be true for this to fail spectacularly? The munger tool does this systematically — surfaces failure modes you wouldn't have thought to look for. Quick scope (5 modes) for fast sanity checks, full scope (10 modes + prevention plan) before major builds.
👴
The Boomer-Proof Design Principle
Engineer for the least technical user first — always. If the system confuses a non-technical person, that's a design failure, not a user failure. This principle — coined during the Agent 2 design session — applies to everything: onboarding flows, error messages, command syntax. If a pitcher's wife can use it without a tutorial, it's ready. If she can't, it isn't.
📐
The Agent Template Pattern
One Railway server. One codebase. N agents — each with a unique soul.md. Phone number routing (the From field in the Twilio webhook) determines which agent's identity loads. No duplicate infrastructure. Agent 2 (wife's Jarvis) requires zero new servers — just a new soul.md and a route entry. This is the architecture for a scalable agent business.
🏗️
Cowork vs. Jarvis — Never Mix Layers
Cowork (Claude Code / planning layer) handles strategy, architecture, and design decisions. Jarvis handles execution — writing code, patching files, deploying. Mixing them creates confusion: you'd be asking the builder to critique the blueprint mid-pour. Two layers with clean separation means faster decisions and cleaner builds.
🔄
Class-Aware Loop Limits
Not all tasks need the same number of tool calls. A simple question needs 6 iterations max. Building a new tool needs 15. Giving every task the same ceiling wastes resources on easy tasks and caps complex ones. The task classifier assigns the right limit before the first tool fires — the system is efficient by design, not by luck.
<\!-- ─── BUSINESS VISION ─── -->
What This Becomes

The Business Case

Jarvis proved the model works. Now the model becomes a product.

"I build AI agents for your business.
You get a number customers can text.
It answers questions, books appointments, qualifies leads, never calls in sick."
Setup in 1 week. Running for $50–200/month. Built on the same stack that powers Jarvis.
Tier 1
WhatsApp FAQ Bot
Answers common customer questions, routes to the right person, never misses a message. Perfect for any business with repetitive inbound questions.
$500–1K setup
+ $50/mo retainer
Tier 3
Custom AI Agent
Full-stack agent with custom tools, CRM integration, memory, and personality. Built to spec. Unique soul.md. Scales to your business.
$4–6K setup
+ $200/mo retainer
Phase 10 — What's Next In Progress
  • Multi-user routing — Route by phone number. One server, N agents, each with their own soul.md. First test: wife's Jarvis (Agent 2).
  • 12-question soul intake — New agent onboarding flow. Answer 12 questions, get a custom soul.md generated. Boomer-proof by design.
  • Shared context pool — /data/shared/context.md. Both agents read and write it — they feel like they "know each other."
  • Admin commands$ users lists all active agents. $ user [name] shows their context. Ted stays in control.
  • Observability tools — get_railway_logs, get_netlify_deploy, verify_commit. Jarvis can self-diagnose infrastructure issues without Ted's screenshots.
  • A2P approval — Flip TWILIO_PHONE_NUMBER to +19492391134. Real SMS instead of sandbox. Full production deployment.
<\!-- ─── FOOTER ─── -->