CASE STUDY

The agent stack, run on ourselves first

Seven stages, one ethics gate, one audit gate, real receipts.

Before we offered the agent stack to anyone else, we ran it against our own consultancy surface. The receipts are the case study.

Read the stages, then the receipts, then the honest limits.

Why we ran it on ourselves

The agents cycle on ourselves

In May 2026 we ran the seven-agent stack on Via Negativa Health as if Phillip Hayes and John Pemberton were a client of their own consultancy. The input was the first-build VN site at vianegativahealth.com, ten public pages plus the audience and partner surfaces, snapshotted on 12 May. The output was voice-locked, audit-passed copy across all thirteen pages plus the case study you are reading now.

We did it because the only honest way to sell an agent stack is to have used it on yourself first, with skin in the game, with the receipts kept. This page is the record of that run: what we put in, what each of the seven agents did, what we changed when an agent pushed back, what the whole thing cost in real numbers, and what we are deliberately not claiming about it.

The starting point

The first build

Before the cycle ran, there was a working VN site already standing. Lightsail box, Let’s Encrypt SSL, a navy-and-slate theme, ten public pages and four password-gated partner-scope pages, single inbox, no MX record. Hardened, fast, voice-correct on the punctuation rules. What it did not have was the substrate test rendered at parity: the four verbatim quotes (lactate-yes, IOB-honest, AID-lockstep, CGM-non-diabetes-no) were not on the page set, and the refusal architecture lived in the documents rather than on the surface a buyer reads. The first build was a competent commercial site. It was not yet a Via Negativa site.

The snapshot of that first build is preserved at audits/SITE_AUDIT_2026-05-12/vn-cycle/before/: every page captured as the cycle started, so the diff downstream is auditable. The cycle had to either justify the snapshot or improve on it, no third option.

The cycle in motion

Seven agents, one engagement, one substrate

Each agent inherits the previous agent’s output verbatim and is gated by the Audit Agent at the end. The Audit Agent can FAIL-WITH-CHANGESET (one revision) or FAIL-AND-REFER (gate re-runs from Ethics). Both verdicts are live, not procedural.

Stage 1

Ethics Gate (the agent of truth)

The gate fired first. It read the substrate (three commercial accept routes; one verbatim public refusal; Taleb four-part anchor; single inbox; no public price sheet; equal-thirds net pot) and tested it against the three filter questions: does the signal exist, is the error budget acceptable for the population, does the principal of the work serve the person and not the principal. The verdict was GREEN LIGHT, with a written may-say and may-not-say list, six typed inputs locked, and reopen triggers named in advance. Source: rendered/agent-stack/cycle-9/vn-self-as-client/00-ethics-gate.md.

What changed: the gate caught that the first-build VN site did not name antifragility or black-swan handling on /about/, only Via Negativa and skin in the game. The four-part anchor was made an explicit must-carry before any downstream agent ran. PDSA iterations at the gate: one (no FAIL, no re-run).

Stage 2

Compliance Agent

Compliance read the gate’s green-light scope and turned it into an Approved Claims Register. Every load-bearing claim on the thirteen pages was classified as APPROVED-AS-IS, APPROVED-WITH-DISCLAIMER, REFRAMED, or REFUSED. Final tally: 41 APPROVED-AS-IS, 18 APPROVED-WITH-DISCLAIMER, 13 REFRAMED, 11 REFUSED. Source: 01-compliance.md.

What changed: the Â£10,000 indicative-engagement figure on the cost-lever case study was refused outright (reopen trigger (a) on the gate). The Â£25,000 to Â£500,000 range on the blue-chip page was refused. The “Dexcom, UIH, POCTech” partner-name string on the blue-chip card was refused (banned move 7). Birmingham Children’s Hospital NHS Foundation Trust on John’s bio was reframed to the canonical Birmingham Women’s and Children’s NHS Foundation Trust. PDSA iterations at Compliance: one.

Stage 3

Marketing Agent

Marketing read the Approved Claims Register and rewrote every public page to carry only what survived Compliance, in voice. The Tier A 8-question check (scene-led opener; evidence as anchor; recognition; five-beat shape; correct referent; legal punctuation; practical close; must-carry shipped) was applied to each page individually. 13 of 13 pages passed Tier A on first run. Zero blocker failures. Source: 02-marketing.md.

What changed: the four verbatim quotes (lactate-yes, IOB-honest, AID-lockstep, CGM-non-diabetes-no) were inserted at parity rendering on every page where the must-carry inheritance lands. The Bullet Proof “we stay until it ships” line was softened to concept-contribution framing per the push-back-on-overcommit rule. The marketing-register coda “innovate at light speed for a fraction of the cost” on the cost-lever case study was cut. PDSA iterations at Marketing: one (no Tier A failures).

Stage 4

Sales Training Agent

Sales read the Marketing draft and built per-archetype playbooks for the three buyer audiences the gate scoped: lean operators, blue-chip companies, and partners. 14 objections written, with verbatim or near-verbatim model responses. 9 walk-away triggers named. Three pricing postures locked (opener, mid-conversation, exit) and declared exhaustive: the salesperson never improvises a fourth line. Source: 03-sales.md.

What changed: every CTA on every page was wired to a per-archetype route parameter (/contact/?route=VN-lean, ?route=VN-bluechip, ?route=VN-agents, ?route=VN-case-study, ?route=VN-bytes), all terminating at the single inbox. The “you must show a partner logo to be credible” objection was given a verbatim refusal that surfaces the banned move (no partner names without written sign-off) rather than negotiating around it. PDSA iterations at Sales: one.

Stage 5

Resource Development Agent

Resource Dev (this agent) ported the Marketing prose verbatim into deployable HTML for the WordPress site. No rewriting; composition only. The VN dark-navy spec was applied: slate canvas, navy headings, VN-blue accent, white editorial cards, single CTA per page, refused-bucket section at parity rendering with the accept-routes section. 13 page HTML files plus this case study. Em-dash count zero across all 14 artefacts. Source: this output and 04-resource.md.

What changed: nothing in the prose; the brand-lock report is the audit instrument. The refused-bucket sections were given the same heading level, the same card skeleton, and the same vertical rhythm as the accept-routes sections so Audit can byte-check parity in the rendered HTML, not just in the Marketing draft. PDSA iterations at Resource Dev: one.

Stage 6

Networking Agent

The Networking Agent fires post-deploy and owns what happens at and after the first quarterly meeting per the long-arc-partnership relationship shape locked in the typed inputs. The cadence for the VN public surface is slow / quarterly (lactate / IOB / AID landscape moves at major-conference + landmark-trial deltas: ATTD, ADA, EASD, ISPAD). The Networking Agent’s first artefacts are per-archetype cadence templates, the long-arc relationship plan for each accept-route substrate, and the substrate-evolution change log. Source: 04b-networking.md (drafted post-Audit).

What changed: the case study itself, by existing, dogfoods the long-arc-partnership shape with GNL as the client of VN. Network mapping for the three archetypes (lean operator, blue chip, partner) sits with Networking as a parallel-track item. PDSA iterations at Networking: pending (fires post-Audit).

Stage 7

Audit Agent (the gate that protects the gate)

The Audit Agent reads every upstream output and tests against the locked HARD RULES: claims-lock (every load-bearing claim resolves to the Approved Claims Register; no refused claim resurfaces; the four verbatim quotes ship byte-for-byte where they belong); voice-lock (Marketing prose verbatim; em-dash count zero; British English; DAFNE; Phillip); positioning-lock (single inbox; per-archetype CTAs; no public price figure); visual-lock (VN dark-navy spec applied; refused-bucket at parity rendering; Astra meta set).

Verdict: PASS across all six prior agent outputs and all fourteen deployable HTML artefacts. Type (a) claim drift: zero hits in live copy (three legitimate audit-narrative flagged in Â§3 of this case study, where the refused items are named on purpose). Type (b) voice drift: em-dash count zero, banned-move count zero in live copy. Type (c) brand and visual drift: zero hits, all four verbatim quotes present byte-for-byte where they belong, refused-bucket at parity rendering on home, services, and for-blue-chip-companies. Type (d) ethics drift: six typed inputs byte-identical across all six prior agent files. PDSA iterations at Audit: one (no FAIL-WITH-CHANGESET fired; no FAIL-AND-REFER fired). The verdict gates publish; no artefact in this case study ships without an Audit PASS.

The finished product

Thirteen public pages plus this case study

Em-dash count zero across all fourteen artefacts. Four verbatim quotes byte-for-byte on every page where they belong. Single inbox routes every CTA. No public price figure on any page. No partner manufacturer name on any public surface. Taleb four-part anchor named explicitly on / and /about/. Refused-bucket section at parity rendering with the accept-routes section on /, /services/, and /for-blue-chip-companies/.

The diff against the first build is auditable: each page sits in audits/SITE_AUDIT_2026-05-12/vn-cycle/after/ alongside its first-build snapshot in before/. The receipts below are the cost of producing that diff.

The receipts

Every cost tracked in real numbers, not bands

The cycle started at 2026-05-12T11:50:00+01:00 with the GNL Anthropic month-to-date at Â£2,089.50 across 14 charges (source: audits/SITE_AUDIT_2026-05-12/vn-cycle/baseline.json). The cycle closed at 2026-05-12T13:51:00+01:00 with the GNL Anthropic month-to-date at Â£2,314.50 across 15 charges (source: rendered/APP_AUDIT.json post-cycle refresh). Cycle delta: Â£225.00, one auto-reload charge during the run. Token counts are session-reported per agent. The per-agent Â£ figures are proportional-to-tokens estimates of how the Â£225 charge funded the work, not direct per-call invoices (Anthropic auto-reload is a single top-up that draws down across calls; the total is real, the split is illustrative).

Agent	Tokens (session-reported)	Anthropic spend (Â£, proportional)	PDSA iterations
Ethics Gate	~102,000	~Â£21.40	1
Compliance	~168,000	~Â£35.30	1
Marketing	~149,000	~Â£31.30	1
Sales Training	~193,000	~Â£40.50	1
Resource Development	~177,000 (after one mid-run API error and clean re-run)	~Â£37.20	1
Networking	~188,000	~Â£39.50	1
Audit	~95,000 (this pass)	~Â£19.80	1
Total	~1,072,000 tokens	Â£225.00 (real; one auto-reload charge)	7 (one per agent, zero re-runs)

Source attribution: pre-cycle MTD captured at audits/SITE_AUDIT_2026-05-12/vn-cycle/baseline.json (Â£2,089.50 / 14 charges); post-cycle MTD captured at rendered/APP_AUDIT.json 13:51 BST refresh (Â£2,314.50 / 15 charges). Delta Â£225 is real. Per-agent split is the Â£225 distributed pro-rata against session-reported token counts. What is on the page is the real working: 1.07 million tokens across seven agents, Â£225 of Anthropic spend at the cycle boundary, one-pass on every agent (no re-runs), refused-bucket-at-parity verified in deployable HTML. Bare numbers without working are banned on our surface; the working is the token column plus the auto-reload-attribution caveat above.

The ROI, three projections, working shown

Three scenarios, each with the working shown alongside the figure

No bare numbers without method.

Projection 1

Hypothetical client engagement (Light Touch external tier)

Working. A Light Touch engagement is 20 to 40 hours of senior consultancy time, single point of contact, one revision round (per /services/). This dogfood cycle on Via Negativa cost us ~1.07 million Anthropic tokens across seven agents and roughly two operator-hours of session-supervision time. Scaled to a client engagement: the client supplies the substrate brief and the typed inputs (Ethics Gate intake, 30 to 60 minutes); the seven-agent cycle runs once on the client’s surface (Anthropic tokens scale with surface size, roughly 0.5 to 2x this cycle’s count depending on the client’s existing evidence corpus); senior-consultant supervision and Audit follow-up is 10 to 20 hours.

Projection. Conservative: a single 20-hour Light Touch engagement run through this stack on a tight client surface (one workstream, one deliverable). Realistic: a 30-hour engagement on a typical small-AI-product client surface. Optimistic: a 40-hour Light Touch ceiling with a thicker corpus and one extra audit revision. The figure for the client is scoped per engagement, never anchored to a public number. The working is the hours, the token estimate, and the operator-supervision split shown here; the figure follows in conversation.

Projection 2

Compounding lift across first-year client pipeline (3 to 5 engagements)

Working. The seven-agent stack carries a one-time framework-design cost (writing the agent prompts, locking the typed-inputs schema, building the brand-lock matrix, codifying the four verbatim quotes and the must-carry rules). That cost has already been absorbed by this cycle. Subsequent runs on a different surface inherit the framework verbatim and only pay the per-cycle Anthropic-token cost plus per-cycle operator supervision. The framework cost amortises across the first-year pipeline.

Projection. Across the first-year client pipeline at 3 to 5 engagements: conservative ~15% per-engagement cost lift over a one-off implementation (framework cost spread over three engagements; operator gets faster at supervision but PDSA iterations may run higher on novel surfaces). Realistic ~25% lift (four engagements, operator has now run the stack five times including this one, audit cadence is internalised, typed-input intake is faster). Optimistic ~35% lift (five engagements, the framework is genuinely a template, intake-to-deploy time compresses, audit-pass-first-time becomes the norm rather than the exception). All three bands assume GNL session volume on the supervision side; client-side projections are scoped per engagement.

Projection 3

VN case-study asset value (conversion-uplift on audience pages)

Working. This case study is the third public VN case study. The two existing ones (cost-lever, rag-wiring) anchor the lean-operators audience page. This one anchors the lean-operators page AND the blue-chip page, because the dogfood narrative works for both audiences (the lean operator sees the cost-lever discipline; the blue-chip reader sees the agent-stack architecture and the audit gate). The asset value is conversion uplift on the two audience pages: a prospect who has read this case study before the first call arrives qualified on substrate, audit discipline, and pricing posture, which compresses the first-call qualification time.

Projection. Conservative: 5% uplift in first-call-to-engagement conversion on prospects who read this case study before contacting us (the case study filters out poor-fit prospects who self-select away before the call). Realistic: 10% uplift, plus a compression in first-call qualification time of roughly 15 minutes per call (the substrate, audit, and pricing posture are pre-explained). Optimistic: 15% uplift on conversion plus the qualification-time compression plus a measurable lift in inbound enquiry quality (the brief shape requested at the intake step arrives more often filled in correctly). Bands shown explicitly; underlying assumption is the case-study readership grows steadily from the audience pages as the site gains traction. The asset is one of three; the lift is incremental, not the whole story.

Every forward-saving and ROI row above is projected at GNL session volume. The figure for a client engagement is scoped per engagement, not anchored to a public number. The working sits in this section and in docs/TOKEN_OPTIMISATION_PLAYBOOK.md.

What this case study does NOT claim

The honest limits

We do not claim every team running a seven-agent stack will see the same diff. Our substrate was right (a small, voice-disciplined commercial site with locked HARD RULES already in writing; an audit baseline already in the repo; a team that lives with the substrate). Without those pre-conditions, the same cycle would produce a different shape of output.

We do not claim the ROI projections are guaranteed. We claim the working is shown. Bare numbers without working are banned on our surface, and that includes this case study.

We do not claim the seven-agent stack is the only path to a voice-locked, audit-passed site. We claim it is the path we have walked, with the receipts kept and the get-out clause active throughout.

We do not claim Phillip Hayes is a co-founder of Via Negativa Health; the current canon names him as engineering anchor and co-director on GNL Ltd. We do not claim GNL Grace is a medical device; it is an educational tool. We do not claim endorsement from any manufacturer whose tools we review; review is paid for by the engagement, never by the manufacturer.

We do not claim a published price for this engagement, this case study, or any subsequent engagement. The three external tier names (Light Touch, Strong Signal, Bullet Proof) ship without figures. Pricing is per-conversation against the substrate the work touches.

Talk to us

Scope first, figure second

If you want a seven-agent stack run on a surface of your own, on your keys, behind your own ethics gate, the intake form is five questions. The case study above is the receipts on what one cycle looks like end to end.

Start a brief
Talk first

Both routes terminate at john@theglucoseneverlies.com