Back to BlogA Minecraft companion bot lighting a torch at a freshly-built stone shelter at dusk, green AI neural circuits glowing along its arms, a defeated zombie in the foreground, GEN2 sign glowing in the distance
Thesis #2

VoxelMind GEN2: Rewriting My Minecraft AI Bot from Scratch

Why I deleted five architectural layers, sixteen concept documents, and a state machine no one could read — and what one LLM call and twenty functions does better.

R
Robin
VoxelMind
8 min read
voxelmind gen2minecraft aiai companionarchitectureminecraft modagent loopllm

The short version

VoxelMind GEN2 is a complete rewrite of the Minecraft AI bot architecture. The old system was five layers of indirection, sixteen concept documents, and a state machine no one could read. The new one is two loops and twenty TypeScript functions — and the bot now finishes what you ask instead of getting distracted by every mob that walks past.

The single rule the whole thing is built on:

One place decides — the LLM. Code executes. Nothing in between.

If you already have a VoxelMind bot, it has moved to GEN2 automatically. If you are new, the rest of this post is the honest story of why V1 broke and what GEN2 does about it.

Why I deleted the old VoxelMind architecture

On May 21st I deleted five layers, sixteen concept documents, a saga state machine, a Dirigent, a Mediator, an Identity Layer, and most of the architecture I had been building. Then I wrote one new file with twenty functions in it and committed.

The day before, one of my bots had been told to gather wood. It started chopping. Halfway through, a chicken wandered past. The bot stopped. It attacked the chicken. The chicken ran. The bot followed. Now in another biome, far from any tree, the bot stood still — its wood-gathering task was officially "in progress." When I asked it what it was doing, it answered something else.

It wasn't the first time. None of those incidents were a bug in the chicken-attack code, or the wood-gathering code, or any single component. The bug was the architecture itself — a five-layer system in which "what is this bot actually trying to do?" was a question nobody was answering. The Dirigent assumed the Skill knew. The Skill assumed the Mediator knew. The Mediator was watching the wrong state machine.

So I deleted them all.

What was broken in VoxelMind V1

I want to be specific, because vague self-criticism is just another kind of marketing. Here is what was actually wrong, with numbers from V1's live logs:

The chat doubler — 42.9% of all bot messages. In every 60-second window, more than four out of every ten things the bot said were exact duplicates of something it had already said. Players told me. I knew. The architecture didn't.

The mob-hunt obsession. One alpha tester put it best: "REALLY likes to hunt mobs, no matter what I say they just wanna kill some mobs lol… with hunt mobs even if it costs their life." The bot was structurally incapable of staying with an owner-given task when something interesting walked past. We had a rule in the base prompt telling it not to. The rule lost every time.

The drift that wasn't drift. The pitch was: your companion's personality develops over time. The reality was: 99 out of 137 bots never slept, so they lived forever on Day 1, and the "personality history" was an empty table for most of them. The ones that did sleep got reflections that nobody downstream ever read, because the layer that was supposed to read them assumed a different layer was handling it. We had built a memory system that no one was actually using.

The forensics bug. 58.4% of all LLM decisions had no recorded result at all — not because the actions failed, but because results were matched to decisions by array position instead of by ID. When the bot acted fast enough, results were attached to the wrong decision. So when an angry user complained that the bot "lied" about killing a zombie, the bot's logs showed it killing a zombie and then claiming it didn't. The bot wasn't lying. The logs were.

The worst part was not any single one of these. The worst part was that V1 wasn't getting better — it was getting more elaborate. Every bug I found, I solved by adding another layer, another concept document, another abstraction. By the end I had sixteen architecture documents and could no longer hold the system in my head. The cure had become the disease.

GEN2: a flat agent loop, Reflex, and a toolbox

The new VoxelMind bot architecture is built on two loops and a flat list of tools. That's it.

Reflex. A small layer of hard-coded survival behaviors that run in under 100 milliseconds without the LLM. The bot dodges out of a creeper's blast radius without waking the model. It eats when its food gauge gets dangerous. It equips a weapon when something hostile gets close. These are reactions, not decisions — and they have no business going through an LLM.

The agent loop. Everything else. When the bot needs to make a decision, one LLM call sees everything that matters: what the bot is sensing right now, the last few things it did, what its owner said in chat, and one line of memory describing what it is currently trying to accomplish. The model picks one tool from a flat list of about twenty functions. The function runs. The result is appended to memory. Next decision.

The toolbox. Twenty TypeScript functions in a single file: say, go_to, mine_block, gather, craft, eat, place_block, follow, attack, add_goal, complete_goal, and so on. No inheritance. No abstract base classes. No "skill registry." Just a list of things the bot can do, each one shorter than the abstraction layer that used to wrap it.

Goals as one line of memory. When the LLM decides the bot should "build a small shelter before night," it writes a single [PLAN] line into the bot's memory. That line is in the prompt every subsequent decision. The bot can complete it, cancel it, or pin a new one — all by calling a tool. There is no goal state machine. There is one place where the goal lives and the model can read or change it.

I also adopted one rule I had been violating the whole time: no new concept document until the bot can do what the last one described. Ship to learn, not the other way around. The fourteen old documents are archived for reference; they are no longer a contract.

What works in GEN2 that didn't work in V1

GEN2 has been running live alongside the old system for a few days now. The list below is what is verified in production logs, not what I hope to ship:

  • The bot pins its own goal and completes its own goal. Tell it "build a small shelter before night," and a [PLAN] line appears in memory. The bot works on it, gets interrupted by a phantom, fights the phantom off via Reflex, returns to the shelter, finishes it, and clears the plan. End to end. No external coordinator.
  • Ask it what it's doing and it answers honestly. The plan line is in the prompt — the model can read it and tell you. No more "the bot lied"; the bot can only say what it actually has on its agenda.
  • It survives distractions. Phantoms, zombies, wandering animals — none of them derail a pinned goal anymore. Reflex handles the immediate threat. The plan stays.
  • Mining, crafting, eating, building, following, fighting — all live. Thirteen tools shipped as of this writing, including attack (the last big capability gap; without it the bot died in every long night). What is still on the bench: smelt for iron progression, and build_shelter as a polished higher-level wrapper.

What is gone, in numbers: the chat doubler, the mob-hunt obsession, the silent-result bug, the entire layer of inferred state that nobody was actually inferring correctly.

What is gone, in feel: the bot is smaller, and it is more honest. It does not promise emergent personality or a memory palace. It promises that when you tell it to do a thing, it does that thing.

If you already have a VoxelMind bot — or if you're new

If your bot was already running on VoxelMind, it has moved to GEN2. You do not need to do anything. The old architecture is being phased out in the coming weeks; the bot you spawn today is the new one.

If you are new, the install is the same as it always was. Grab the mod from CurseForge, drop it into your mods folder, press V in-game, spawn a companion. Then tell it to do one specific thing — "gather ten wood," or "build a small shelter before night" — and watch how it handles the next zombie that walks past.

If you are considering it and want the longer thesis on why a companion bot is worth caring about at all, the previous post in this series is for you: Your Minecraft Bot Remembers How You Treat It →

The core principle: LLM decides, code executes

"LLM entscheidet, Code führt aus." One place decides — the LLM. Code executes. Nothing in between.

Everything I built around that one sentence was scaffolding. Some of it helped. Most of it didn't. Some of it actively hurt — by hiding the decision behind layers, by making it possible for the system to "almost" know what it was doing without ever really knowing.

GEN2 is what is left when you delete the scaffolding and trust the model with the decisions and the code with the execution. It is smaller. It works. I am going to keep it that way.

If you find a place where it doesn't work, tell me. Discord is the fastest way. I read every message myself — there is no support team between you and the person who deleted those sixteen documents.

— Robin

Frequently asked questions about VoxelMind GEN2

What is VoxelMind GEN2?
VoxelMind GEN2 is the second-generation architecture of VoxelMind's Minecraft AI companion mod. It replaces a five-layer system with a single LLM call that picks one tool from a flat list of around twenty TypeScript functions. The result: a bot that finishes the task it was given instead of getting distracted by nearby mobs.

What was wrong with VoxelMind V1?
V1 had three structural problems verified in live logs: about 43% of bot messages were exact duplicates, bots would abandon owner-given tasks to chase mobs, and the layered architecture made it impossible for the system to track what the bot was actually trying to do. The fix wasn't more layers — it was deleting them.

Does my existing VoxelMind bot still work?
Yes. Existing bots run on the new GEN2 architecture automatically. The old V1 system is being phased out over the coming weeks. No manual migration is needed.

How does the GEN2 agent loop work?
One LLM call sees everything the bot is sensing, the last few actions it took, what its owner said in chat, and one line of memory describing its current goal. The model picks one tool from a flat list of about twenty functions — go_to, mine_block, gather, craft, attack, place_block, and so on. The function runs. The result feeds back into memory. Next decision.

What is "Reflex" in VoxelMind?
Reflex is a small layer of hard-coded survival behaviors that run in under 100 milliseconds, without involving the LLM. It dodges creeper explosions, eats when food gets low, and equips a weapon when a hostile mob gets close — reactions, not decisions.

Where can I download VoxelMind for Minecraft?
VoxelMind is on CurseForge as a Fabric mod for Minecraft 1.21.4. Install Fabric Loader and the Fabric API, drop the VoxelMind mod into your mods folder, launch the game, press V, and spawn a companion. Setup guide: How to Add an AI Companion to Minecraft →