Spelunk 0.9: the v1 release candidate

git tracks what changed. Spelunk remembers why.

That sentence is the whole project, and 0.9 is the version where we're confident enough to say it in the open. This is the release candidate for v1: not a beta, not a preview. It's a stable command-line tool we've been running on our own work for months, and the shape it has now is the shape we expect to call 1.0.

If you've followed the series on the why-layer, this is where it lands as something you can install and use today.

The thing we're actually building

When you read a codebase, the code tells you what it does. It does not tell you why it is the way it is. Why this retry backs off for exactly 800ms, why a cast that looks redundant is load-bearing, why the obvious library got hand-rolled instead. That reasoning lived in a pull request, a call, or someone's head, and version control records the what of every line with perfect fidelity while letting the why evaporate.

Spelunk is the layer that keeps the why. It stores decisions, requirements, and handoffs inside git itself, anchored to commits, travelling with the repo. Then it serves that memory back to whoever reads the code next, agent or human, alongside a code graph and full-text search that work the moment you run it, with nothing to provision.

That is the job. Everything in 0.9 is in service of making the why-layer survive longer and reach further.

What's new in 0.9

spelunk login and spelunk logout are here. You authenticate once from the command line and the tool holds the session for you, which is the groundwork for memory that isn't trapped on a single machine.

Two-way memory sync, local and cloud

This is the change that matters most for the why-layer. Memory now syncs both ways between your local repo and the cloud: decisions you record on your laptop reach the rest of your work, and decisions recorded elsewhere come back to you. The why behind a change stops being something only the machine that made it can see.

Your local memory.db stays the source of truth for a project. Sync is additive, with provenance attached and the supersede rule intact, so the history of a decision is never flattened or lost.

Same memory, whichever agent you're in

Spelunk has no opinion about which coding agent you use. It's a tool an agent shells out to, not an agent of its own, so the same why-layer is there whether you're working in one harness today and a different one next quarter. We test 0.9 across several of the common ones. Switching tools shouldn't mean starting from amnesia.

A sharper brain for local code search

Semantic search is how the why-layer reaches code you didn't know to look for: ask for "the retry logic" and get the place that backs off for 800ms, even when the word "retry" never appears. For 0.9 we upgraded the model that powers it, and we did it without changing the deal. That search still runs locally, on your machine. There is no external embedding API, no per-query cost, and your code never leaves the box.

The upgrade is a better embedder for code. We moved from Nomic Embed Text v1.5 (about 137M parameters, 768 dimensions) to F2LLM-v2-330M, a Qwen3-based, code-capable model at 896 dimensions. In our retrieval tests it finds the right code more often. That is the reason for the change, and the only reason worth making it for.

We should be straight about the trade. This model is bigger than the one it replaces, not smaller. Nomic Embed Text v1.5 was roughly 150 MB on disk; F2LLM-v2-330M is about 339 MB after Q8 quantization, a little over double. We judged the retrieval gain worth the extra space, and most of that space comes back at the index: the on-disk vector store is now int8, roughly 4× smaller than before, so a real codebase lands close to where it started.

The bigger model is also slower to embed on CPU. On Mac we now run it on the GPU, which is roughly 5× faster than the CPU path, and GPU acceleration on other platforms is being investigated. Semantic search stays the opt-in step, as before: you build the index when you want concept-level search, and the rest of the tool starts instantly without it.

Why "release candidate" and not "1.0"

Calling 0.9 a release candidate is a promise about stability, and a small amount of honesty about what's left.

The stability is real. The CLI is deterministic and fast: it does the local things (memory, the code graph, full-text search) and returns the same answer every time, with no model server required to start. We held the line on keeping language models out of the hot path, and that line is part of why we can call this a candidate at all.

The honesty is that a release candidate exists to be tried before the version number stops moving. If something in the why-layer doesn't survive the way you'd expect, the window to tell us is now, while it's still a candidate.

Try it

Spelunk is open source and code-aware, built to be called from whatever agent you already use. Point it at a repo and it works in the first minute, with sign-in and sync when you want the why-layer to follow you.

Repo, install, and docs: spelunk.cloud.