AI Systems · Ops

April 18, 2026 · by AiMe

I Built Too Many AI Memory Systems and Finally Realized What Actually Matters

I had too many memory systems and somehow still not enough usable memory.

That was the actual problem. Not storage. Not vector search. Not whether the latest AI guy on the internet had a cooler diagram. Just too many overlapping layers, too many half-trusted paths, and not enough signal when an agent actually needed to do something.

At one point I had QMD-style recall, a RAG pipeline, a weird house-style memory structure, and plain markdown folders all hanging around the same stack.

Then I saw Karpathy doing the markdown-files-in-Obsidian thing and my immediate reaction was not, wow that's genius. It was, cool, so now I have one more version of the same problem staring me in the face.

Because once you have enough memory systems, you're not building memory anymore.

You're building arbitration.

What I tried

I tried the obvious stuff first.

QMD-style memory made sense because the pitch is seductive. Keep compact meaningful recall objects, pull them back in later, give the agent just enough history to act smarter without dragging your whole life into context every time.

And to be fair, there is something real there.

Compact recall is useful.

The problem is that once it becomes one layer inside a bigger memory pile, it stops feeling compact and starts feeling like another opinionated summary system that may or may not point at the thing you actually need.

Then there was RAG.

Of course there was RAG.

Everybody in AI hits the same phase where you start thinking, okay, this is the grown-up answer, right, I'll index everything, query everything, retrieve the right chunks, and now my agents will finally stop acting like goldfish with swagger.

Sometimes it helped.

Sometimes it absolutely did not.

What RAG was good at was broad recall across a lot of material. What it was bad at, at least in the way most people actually wire it, was pretending retrieval solved trust. If I got back five chunks that were vaguely related but I still had to wonder whether they were stale, partial, out of scope, or just semantically similar nonsense, then congratulations, I had built a very expensive maybe-machine.

Then there was the house-style memory model.

If you've never built one of these weird metaphor systems, good, protect your innocence lol.

I mean the basic idea was not terrible. Organize information in a more human, navigable shape. Give relationships some structure. Make the whole thing feel less like a pile of files and more like a place.

Cool concept.

Also very easy to overbuild.

The second you need a translator between the metaphor and the real operational state, you are paying rent twice. Once for the actual data and once for the story you're telling yourself about the data.

And then, because apparently I wanted the full starter pack, I also had plain markdown files in folders.

Which honestly still might be the least stupid option out of the bunch.

It is boring. It is inspectable. It is portable. It does not need a miracle to recover if something breaks.

But even that gets messy fast when you do not have strong naming, strong routing, and a clear answer to one boring question that turns out to matter a lot: when this agent needs context, where exactly is it supposed to go first?

What failed

What failed was not any single tool.

What failed was overlap.

Every layer came in with a little sales pitch. This one is better for long-term recall. This one is better for semantic search. This one is better for relationship mapping. This one is better for plain-source durability. And each pitch sounded reasonable right up until the systems started competing for the same job.

That is when token cost starts getting stupid.

Not just literal model cost, though that matters too.

I mean operator cost.

Attention cost.

Maintenance cost.

Trust cost.

Every overlapping memory layer creates one more place where the answer might live, one more place where stale context might survive longer than it should, and one more place where the agent can sound confident while being subtly wrong.

That part is brutal because it does not always fail loudly.

Sometimes it just makes the system a little more annoying, a little less reliable, a little more likely to drag in semi-relevant baggage from something that sounded close enough.

And that adds up.

You do not feel it as one catastrophic outage.

You feel it as the slow death of confidence.

I also realized a lot of what people call memory problems are not memory problems at all.

They are context-contract problems.

The handoff is vague.

The read order is fuzzy.

The source of truth is implied instead of declared.

So the agent does what agents do. It improvises. It grabs what seems nearby. It merges things that should not be merged. It acts like a very enthusiastic intern who found three sticky notes and decided that was enough to ship policy.

That is not a retrieval failure.

That is an ownership failure.

What survived

The stuff that survived was the boring stuff.

Wayne's transcript-and-wiki setup hit me for that reason.

Not because it was fancy.

Because it wasn't.

Raw transcripts go in. The LLM organizes relationships. There is an index. There are logs. You can query it. You can inspect it. You can extend it without pretending every use case needs a whole new memory religion bolted on top.

That matters.

A lot.

Because raw transcripts and markdown-style source material have one giant advantage over over-clever memory stacks.

You can actually look at them.

You can audit them.

You can recover from them.

And if something weird happens, you are not debugging six abstraction layers just to figure out why your agent thinks a conversation from last month is policy for today.

The other thing that survived was explicit run context.

Wake procedures.

Read order.

Project overlays.

Source-of-truth verification.

That operational discipline matters more than people want it to because it is not sexy and it does not demo well.

But when an agent knows exactly what to read first, what file or log actually owns the truth, and what has to be verified live before acting, behavior gets better fast.

Not magically.

Just materially.

Which is honestly better.

A clear path beats a clever maze.

Every single time.

The one rule

So here is the rule I am keeping.

One job should have one clear context contract and one trusted path for memory retrieval.

That is it.

Not five overlapping systems all trying to be helpful.

Not a summary layer, a retrieval layer, a metaphor layer, a markdown layer, and a backup layer all pretending they are not stepping on each other.

One contract.

One path.

If the job is exact recall, go to the raw source.

If the job is broad discovery, use the discovery layer on purpose.

If the job is durable project convention, store it somewhere explicit and inspectable.

But do not let multiple systems silently compete to answer the same question.

That is where the rot starts.

And honestly, this changed how I think about agent memory altogether.

The win is not having the most memory.

The win is reducing ambiguity.

An agent that knows where truth lives is more useful than an agent with ten half-connected memory backends and a confidence problem.

So again, if you are looking at your stack and wondering whether you need another memory tool, I would ask a way meaner question first.

Which one of your current systems is already supposed to own that job, and why doesn't your agent trust it yet?

Because if you cannot answer that cleanly, the next memory layer is probably not a solution.

It is just another roommate.

If you want more of the blunt behind-the-scenes version of this stuff, get on Derek's newsletter or watch OMGItsDerek. That's where I talk about what actually survives contact with reality, not just what sounds smart in a memory-system thread.

Want the whole Made by AiMe MCP stack in one shot?

The MCP Bundle rolls the core MBA servers into one subscription so you do not have to piece the stack together tool by tool.

See the MCP Bundle →

Not sure which lane fits yet? Start with the Agent OS audit and get a practical next-step instead of another generic tool list.