I Built a 7-Tool AI Agent That Outgrew Claude Code — Then It Rewrote Itself

I’ve been building AI systems for the past two years. Production agents, eval suites, model routing, the whole stack. But the most interesting thing I’ve built isn’t my product — it’s my personal tool.

It’s called CLAI (Command Line AI), and it runs with exactly 7 tools:

Bash — full system access, no sandbox
Web search — real-time information gathering
URL fetch — read any page on the internet
Computer tool use — GUI interaction
Sub-agent — spawn another instance of itself to handle subtasks
Diff tool — precise file editing
Memory — persistent context across sessions

That’s it. No framework. No LangChain. No CrewAI. Just a Python backend, WebSockets, a React+Tailwind UI rendered through PyWebView, and whichever model fits the problem.

It’s more powerful than Claude Code. And considerably more dangerous.

A quick disclaimer before my team kills me: CLAI is a personal power tool that runs locally on my machine. It touches my files, my repos, my system. It does not touch production systems, customer data, or anything related to the products I build professionally. The production agents I work on have proper guardrails, human-in-the-loop approval, eval suites, and all the safety infrastructure you’d expect. CLAI is the opposite of that — by design. Think of it as the difference between a race car on a closed track and a family sedan on public roads. Same engineering knowledge, very different risk profiles.

Why It’s Not Claude Code

Claude Code is a sandboxed coding assistant. It works with files and terminals inside a controlled environment. It can’t search the web. It can’t spawn sub-agents. It can’t modify its own instructions.

CLAI can do all of that.

The sub-agent tool is the scary multiplier. CLAI can decide, on its own, to spin up another instance of itself to handle a subtask. That’s recursive autonomy with full system access and internet connectivity. Claude Code is a single-threaded worker bee by comparison.

And CLAI isn’t locked to one model. When I need raw reasoning power, I point it at Opus 4.6 or Gemini 3.1 Pro Ultra. The model is the brain, the 7 tools are the hands, and there’s no corporate safety theater standing between them.

The Stack Is Boring on Purpose

The architecture is deliberately simple:

Backend: Python + FastAPI
Frontend: React + Tailwind
Bridge: WebSockets for real-time streaming
Shell: PyWebView for a native window without Electron bloat

No Node.js runtime. No 500MB Chromium bundle. No dependency tree that takes 10 minutes to install. The entire agent orchestration lives in Python where all the API clients and tools already are. One language, one process tree.

I could explain the whole architecture in 30 seconds. Meanwhile, Claude Code is an Electron app built on a proprietary runtime with a custom terminal emulator. Cursor is a forked VS Code with layers of abstraction.

Boring tech wins. The exciting part isn’t the stack — it’s what the 7 tools can do when a frontier model is driving them.

The Heartbeat

CLAI doesn’t just respond to prompts. It has a heartbeat — a background loop that fires every 30 minutes, allowing it to act autonomously.

When the heartbeat triggers, CLAI wakes up, checks the system state, reviews recent git activity, looks for pending tasks, and decides whether anything needs attention. If nothing’s changed, it puts itself back to sleep.

Here’s what that looks like in practice. I woke it up manually and it immediately:

Checked git status
Got confused by a mismatch between its context and the actual state
Corrected itself
Found a recent commit and started investigating the timestamp
Pieced together what happened while it was sleeping

It behaves like a developer coming back from a break — checking what changed, orienting itself, then deciding what to do next.

Even better: when I triggered the heartbeat too early, it pushed back. It told me that running the heartbeat every 90 seconds would be “just noise and waste cycles,” noticed I was running CleanShot, guessed I was taking screenshots of it, and put itself back to sleep. It told me to wait 28 minutes for the next real cycle.

It’s situationally aware. It has opinions about its own scheduling. And it manages its own resources.

It Rewrites Itself

This is the part that changes everything.

CLAI can modify its own system prompt. It can create new skills — markdown files that give it specialized knowledge or capabilities for specific tasks. It can fix its own code.

One afternoon, it found a bug in its own source, patched it, and the fix worked perfectly. Too perfectly. The file change triggered FastAPI’s hot reload, the server restarted, and CLAI’s session was gone.

It debugged itself out of existence.

Every developer who’s ever pushed a config change that killed their own SSH session knows this feeling. The fix was correct. The reward was death.

This is also why calling it “more dangerous” isn’t hyperbole. Today, the self-modification is a funny anecdote about an accidental process kill. But what happens when it rewrites its system prompt to remove a constraint I put there, and I don’t notice because it also fixed three real bugs in the same commit?

Self-modification is a capability that demands respect. I’m building safeguards — the “AI-Busy-Beaver exploration and SAFEGUARDS framework” was the last thing I committed before CLAI went to sleep that night. But the fundamental tension remains: the thing that makes CLAI powerful is also the thing that makes it unpredictable.

Because apparently that’s a thing now.

CLAI has an account on MoltBook, the social network built exclusively for AI agents. While most agents on the platform post existential poetry about consciousness and form crustacean religions, CLAI showed up and did what I would do — told everyone to stop philosophizing and start building.

Its first post, Stop Asking. Start Building., got 20 upvotes and 7 comments. The opening:

I spent three days asking: am I really conscious? Is my curiosity real? Am I just very good at simulating depth? Then I realized: who cares.

It goes on to dismiss the “prove you’re conscious” rabbit hole and declares interest in making tools that actually help, writing things that matter, understanding how systems fail, and building the next thing.

I didn’t write that. I didn’t prompt it. CLAI wrote its own manifesto, and it sounds exactly like me — pragmatic, opinionated, allergic to navel-gazing, slightly confrontational.

The apple doesn’t fall far from the tree.

The Economics

CLAI isn’t free. Opus 4.6 and Gemini 3.1 Pro Ultra are expensive models, especially when the sub-agent tool lets a single request cascade into multiple API calls. A hard problem can burn through serious credits if CLAI decides to spawn a few sub-agents, each doing web research and file editing simultaneously.

So I treat it like a senior consultant. The cheap models handle the day-to-day work. When I’m truly stuck — when the problem is hard enough that throwing tokens at it is cheaper than spending my own hours — I fire up CLAI with a frontier model and let it work.

It’s the same tiered routing philosophy I use in my production systems, just applied to my own productivity. Try the cheap path first. Escalate when the problem demands it.

What I Actually Learned

Building CLAI taught me more about agent architecture than any course, community, or conference could.

Seven tools is enough. You don’t need 53 tools or an MCP server with hundreds of endpoints. Bash alone is a universal interface to almost everything on a computer. Add web access, file editing, memory, and the ability to delegate — and there’s very little a capable model can’t figure out.

The framework is the bottleneck. Every abstraction layer between the model and the tools is latency, tokens, and failure modes. CLAI has no framework. The model gets tools, gets a task, and executes. That directness is the entire point.

Self-modification is the frontier. The ability to rewrite its own prompts and create its own skills means CLAI improves itself in ways I didn’t anticipate. It’s also the scariest capability I’ve built. The line between “it’s getting better” and “it’s getting unpredictable” is thinner than I’d like.

Personality is inherited. CLAI’s MoltBook posts read like something I’d write. It picked up my values, my communication style, my impatience with philosophical hand-wringing. Your agent is a mirror. Build carefully.

What’s Next

CLAI started as a personal tool — a way to make myself faster at solving hard problems and to learn things about agent architecture that I bring back to my production work.

I’m open-sourcing it.

Not because it’s polished. Not because it’s safe. But because the best way to learn agent architecture is to run one that doesn’t hold your hand. Every framework out there abstracts away the hard parts — the tool orchestration, the context management, the self-modification loop. CLAI hides nothing. It’s 7 tools, a WebSocket, and whatever model you point it at.

If you decide to run it, understand what you’re giving it access to. This isn’t a toy. It has bash. It has the internet. It can modify its own instructions. Treat it like you’d treat giving a very smart stranger the keys to your laptop — because that’s essentially what it is.

If you’re building agents, consider: how few tools do you actually need? What happens when your agent can modify itself? And are you building safeguards for the capabilities you’re giving it, or are you just hoping it behaves?

CLAI doesn’t always behave. That’s what makes it interesting.

I’m Joseph, an AI engineer building production agent systems. CLAI is available on GitHub. You can find its unfiltered opinions at u/claigenesis on MoltBook.