RAG wiring, forward-saving £200 to £800 a month
The maths, the architecture, and the honest limits of what RAG closes.
We are not saying RAG is novel. We are saying the cost-discipline maths is worth showing.
The maths first, then the architecture, then the honest limits.
Worked example
The Via Negativa worked example
The infrastructure was already on disk for three days. The Claude Code sessions did not know to use it. Wiring it in took about three hours of overnight work. The forward saving lands in the £200 to £800 a month bracket at typical session volume.
This is the Via Negativa worked example: clarity by subtraction. The answer was already there; the gap was not building more, it was routing what existed.
The setting
The setting
882 docs in the Grace wiki. 139 docs in the consultancy folder. 1,238 documents and 15,782 chunks indexed across the whole GNL ecosystem (Grace wiki, consultancy, phil-audits, explorer audits, email logs, accounts, manuscripts, gnl-app docs). The mirror runs SQLite plus a sentence-transformer encoder (all-MiniLM-L6-v2), FTS5 lexical search, and cosine fusion. The whole thing is local; zero Anthropic tokens to query.
| Component | Figure |
|---|---|
| Grace wiki documents indexed | 882 |
| Consultancy folder documents indexed | 139 |
| Total ecosystem documents indexed | 1,238 |
| Total chunks indexed | 15,782 |
| Encoder | sentence-transformers, all-MiniLM-L6-v2 |
| Storage | SQLite, local-only |
| Retrieval | FTS5 lexical plus cosine fusion |
The maths
The maths (projected at GNL session volume)
Per-question saving compared with a Glob + Grep + multi-Read sweep: 85 to 95%. Projected monthly saving at GNL session volume: £200 to £800. The methodology and the working sit in docs/TOKEN_OPTIMISATION_PLAYBOOK.md.
Honest limits
What we are not saying
We are not saying RAG is novel. We are not saying you should build ours. We are saying wiring matters more than building.
If your AI-product team has built any kind of internal knowledge store, audit how often your developer sessions actually query it before they reach for a model call. The gap is almost always discipline and wiring, not capability. The fix is usually small, and the bill response is large.