CASE STUDY

RAG wiring, forward-saving £200 to £800 a month

The maths, the architecture, and the honest limits of what RAG closes.

We are not saying RAG is novel. We are saying the cost-discipline maths is worth showing.

The maths first, then the architecture, then the honest limits.

Worked example

The Via Negativa worked example

The infrastructure was already on disk for three days. The Claude Code sessions did not know to use it. Wiring it in took about three hours of overnight work. The forward saving lands in the £200 to £800 a month bracket at typical session volume.

This is the Via Negativa worked example: clarity by subtraction. The answer was already there; the gap was not building more, it was routing what existed.

The setting

The setting

882 docs in the Grace wiki. 139 docs in the consultancy folder. 1,238 documents and 15,782 chunks indexed across the whole GNL ecosystem (Grace wiki, consultancy, phil-audits, explorer audits, email logs, accounts, manuscripts, gnl-app docs). The mirror runs SQLite plus a sentence-transformer encoder (all-MiniLM-L6-v2), FTS5 lexical search, and cosine fusion. The whole thing is local; zero Anthropic tokens to query.

Component Figure
Grace wiki documents indexed 882
Consultancy folder documents indexed 139
Total ecosystem documents indexed 1,238
Total chunks indexed 15,782
Encoder sentence-transformers, all-MiniLM-L6-v2
Storage SQLite, local-only
Retrieval FTS5 lexical plus cosine fusion

The maths

The maths (projected at GNL session volume)

Per-question saving compared with a Glob + Grep + multi-Read sweep: 85 to 95%. Projected monthly saving at GNL session volume: £200 to £800. The methodology and the working sit in docs/TOKEN_OPTIMISATION_PLAYBOOK.md.

Honest limits

What we are not saying

We are not saying RAG is novel. We are not saying you should build ours. We are saying wiring matters more than building.

If your AI-product team has built any kind of internal knowledge store, audit how often your developer sessions actually query it before they reach for a model call. The gap is almost always discipline and wiring, not capability. The fix is usually small, and the bill response is large.

Scroll to Top