An AI agent can read what your code does in seconds. Point it at a function and it will tell you the inputs, the outputs, the branches, the edge cases it handles. That part is genuinely good now.

Then you ask it the harder question (why is it this way?) and the confidence doesn't change, but the ground underneath it does.

Why does this retry loop back off for exactly 800ms? Why is there a cast here that looks redundant? Why did we stop using the obvious library and hand-roll this instead? The agent will answer. It will answer fluently. And it will be guessing, because the reason isn't in the code. The reason is in a pull request from fourteen months ago, in a decision someone made on a call that never got written down, in the head of an engineer who left in March.

We focused in on one question: why was this written? This is the first in a short series about what happened when we did, including the parts where we built the wrong thing first.

The gap is where the expensive mistakes live

Here is the failure mode, and you have probably watched it happen.

An agent is asked to "clean up" or "modernise" a piece of code. It reads the code, finds something that looks accidental (a guard clause that seems paranoid, a slow path that looks like dead weight) and it removes it. The diff is clean. The tests pass. It ships.

And three weeks later the thing that guard clause was guarding against happens, in production, and nobody can remember why the guard was ever there because the only record of why just got deleted by a tool that was very sure it was helping.

The code told the agent what. It could not tell the agent why. And the cost of that gap isn't paid when the change is made. It's paid later, by whoever inherits it.

This is not a hypothetical for us. The teams we built Spelunk for spend their working lives parachuting into codebases they didn't write, under time pressure, needing to be productive in days. For them, "why is it this way?" isn't an occasional curiosity. It's the whole job. An agent that can only answer "what" is an agent that confidently rewrites things that were that way for a reason.

"Read the git history" is not an answer

The usual response is: the why is in version control, go look.

Sometimes it is. More often, git blame lands you on a commit that says fix tests or address review comments, authored by someone who has moved on, on a branch that got squashed. The intent existed: at the moment of the change, the person making it knew exactly why. But intent is the thing version control captures worst. It records the what of every line with perfect fidelity and lets the why evaporate.

So the question became sharper. Not "where is the why?" but: what would it take to make the why survive? Survive the squash, survive the engineer leaving, survive the agent that reads the repo next week and has no memory of this one.

That question turned out to have a real, specific answer, and it forced a chain of design decisions, each one constraining the next. The memory had to live somewhere durable. It had to travel with the code, not sit in a database someone has to remember to back up. It couldn't belong to whichever coding agent happened to be fashionable this quarter, or every change of tool would be a fresh case of amnesia. And along the way we had to delete a surprising amount of what we'd already built, because the first answers were the wrong ones.

Those are the next posts in this series:

What we're actually claiming

Nothing dramatic. We don't think we've solved organisational memory. We think we noticed a specific, narrow gap (agents are fluent about what and silent about why) and that the gap is worth closing carefully, with a tool that does less than you'd expect and is honest about it.

If you've watched an agent confidently delete something load-bearing, you've felt the shape of this problem. The rest of the series is what we did about it.


Spelunk is open source and code-aware, built to be called from whatever agent you already use. Repo and docs: spelunk.cloud.